Contents:
First byte = 0FFh = non-protected program.
A different byte specifies a protected program and the rest of the data is
pseudo-encrypted by XORing the data with successive bytes from two chunks of
BASIC constants (the two chunk lengths being relatively prime to each other).
I currently don't have the decoding information available so protected
programs are not covered here (but it's coming).
Subsequent bytes:
1A
(a control-Z) is tacked onto the end
of a saved program.
Note: all numeric values given in hexadecimal unless otherwise noted. "xx" represents one unspecified hexadecimal byte which can contain arbitrary information.
0E
xxxx.1E
.1B
1C
xxxx1D
xxxxxxxx1E
0B
to
0F
or 1C
or 1D
or 1F
,
it processes the numeric constant, puts it in a special accumulator stack,
advances the BASIC pointer past the constant, pushes that on a stack and
points to the first byte of a 1E10
byte string. When
1E
is returned by a "fetch the current item" call, the
numeric constant type token is returned. When a "I'm done with the current
item so increment the program pointer and fetch the next item."
call is made, the program counter now points to 10
which
prompts the interpreter to discard the constant from the accumulator stack
and pop the original program pointer (now pointing past the constant) back
off the stack and continue past the constant.
1F
xxxxxxxxxxxxxxxxxxxx specified by 0B
, 0C
,
0D
, 0E
or 1C
. The bytes are in
Intel order with the least significant byte first. The range of values
for signed integers (0B
, 0C
or 1C
)
is -32768 (hexadecimal 8000) to +32767 (hexadecimal 7FFF). Unsigned integers, specified by 0D
or 0E
are unsigned numbers in the range 0 to 65535.
[xxxxxxxx]xxxxxxxx specified by 1D
or 1F
. The bytes are stored in reverse order with the
least-significant byte first.
[ hh gg ff ee ] dd cc bb aa rearranged in significant order and converted to binary:
aaaaaaaabbbbbbbbccccccccddddddddeeeeeeeeffffffffgggggggghhhhhhhh
If aaaaaaaa equals zero then the numeric value of the constant is assumed to be zero and the other three or seven bytes are ignored.
If aaaaaaaa is not zero, it is the base-2 exponent of the floating-point number with 128 (80 hexadecimal) added to it. 01 hexadecimal equals 2 to the -127 power, ... , 7F hexadecimal equals 2 to the -1 power, 80 hexadecimal equals 2 to the 0 power (1), 81 hexadecimal equals 2 to the 1 power (2), ... , FF hexadecimal equals 2 to the 127 power.
For a non-zero value, the first bit of bbbbbbbb is the sign
bit, 1 represents a negative number and 0 represents a
positive number. The rest of the bits store the absolute value
(i.e. no inversion of bit values is done for negative numbers) with the
most significant bit (assumed to follow the binary point) omitted as it
is assumed to always be present:
b (sign bit) 0.1 (assumed and not stored)
bbbbbbb (the 2nd to 8th significant data bits). cc
and dd and (for double-precision) ee to
hh are the rest of the data bits. The range for non-zero
values is roughly ±2.9387360000E-39 to ±1.7014120000E+38
decimal.
Single-precision examples (in binary):
00000000 00000000 00000000 00000000 equals 0.0 decimal or 0.000000000000000000000000 binary.
00000000 11111111 11111111 11111111 still equals 0.0 decimal or 0.000000000000000000000000 binary (the numeric data bits are ignored when the exponent byte equals zero).
10000000 00000000 00000000 00000000 equals 0.5 decimal or 0.100000000000000000000000 binary.
10000000 10000000 00000000 00000000 equals -0.5 decimal or -0.100000000000000000000000 binary.
10000001 00000000 00000000 00000000 equals 1.0 decimal or 1.00000000000000000000000 binary.
10000010 00000000 00000000 00000000 equals 2.0 decimal or 10.0000000000000000000000 binary.
10000011 00000000 00000000 00000000 equals 4.0 decimal or 100.000000000000000000000 binary.
10000011 01000000 00000000 00000000 equals 6.0 decimal or 110.000000000000000000000 binary.
10000011 01100000 00000000 00000000 equals 7.0 decimal or 111.000000000000000000000 binary.
10000011 01110000 00000000 00000000 equals 7.5 decimal or 111.100000000000000000000 binary.
10000011 01111000 00000000 00000000 equals 7.75 decimal or 111.110000000000000000000 binary.
10000011 11111000 00000000 00000000 equals -7.75 decimal or -111.110000000000000000000 binary.
10000011 01111100 00000000 00000000 equals 7.875 decimal or 111.111000000000000000000 binary.
10000000 00000000 00000000 00000000 equals 0.5 decimal or 0.100000000000000000000000 binary (same as above, repeated here).
01111111 00000000 00000000 00000000 equals 0.25 decimal or 0.010000000000000000000000 binary.
01111110 00000000 00000000 00000000 equals 0.125 decimal or 0.001000000000000000000000 binary.
01111101 00000000 00000000 00000000 equals 0.0625 decimal or 0.000100000000000000000000 binary.
01111101 10000000 00000000 00000000 equals -0.0625 decimal or -0.000100000000000000000000 binary.
Eight-byte double-precision numbers are the same except that they have four more significant bytes to the right of the other bytes.
3AA1
, ":ELSE" but
the ":" is suppressed when the program is listed.)B1E9
, "WHILE+"
but the "+" is suppressed when the program is listed.)3A8FD9
, ":REM'" but
the ":REM" is suppressed when the program is listed.)An example of manually detokenizing a memory hexdump of a GW-BASIC program. The program being detokenized was used to dump the memory segment it resided in. The detokenizing of the hexdump of a tokenized GW-BASIC program file would be similar except for three differences:
FF
and the equivalent byte
ahead of the first program line in memory is 00
.0D
xxxx
in a program file. They are only in a running program to speed up
execution.Practically, the only time you really need to manually
de-tokenize a GW-BASIC program is when you have a GW-BASIC program file in
tokenized form but no copy of GW-BASIC that supports that file (possible
if the GW-BASIC version you have came out before some additional keywords
used by the tokenized program were added to newer versions of GW-BASIC.
If you do have a version of GW-BASIC that understands the file, you can
load the file with the command,
load "filename.ext"
and then save it in ASCII form with the command,
save "newname.ext",A
GW-BASIC has an optional parameter to the SAVE statement
that allows a program to be saved in "protected" form. The file is
pseudo-encrypted and the initial byte of the file is changed to
FE
instead of FF
. When the program is loaded,
it is decoded and a flag is set in the GW-BASIC data work area to indicate
that a protected program is loaded. While this flag is set, any GW-BASIC
statement is still allowed in a program line but editing the program is
prohibited and several statements are disabled in direct statements to
prevent access to the protected program. SAVE without the P option is
prohibited; LIST and LLIST is prohibited; PEEK, POKE, BSAVE and BLOAD
are disabled, and CHAIN cannot include the MERGE option.
While this appears to suggest that you are out of luck if you have an old protected GW-BASIC program that needs updating (such as an accounting package that won't accept non-US addresses for employees when you just went international) there is still a way to unprotect such programs if you have a copy of GW-BASIC. It relies on one of the bugs in the VAL function.
If a way can be found to poke a zero value into the the GW-BASIC interpreter's protection flag byte, a loaded protected program would suddenly become unprotected and could then be LISTed, SAVEd in unprotected form or edited. To do that you need two things, the address of the protection flag for your version of GW-BASIC and a way to trick the GW-BASIC interpreter into thinking that a manually-entered POKE statement was a program statement and not a direct manual entry.
Protection flag addresses I have run across:
To determine the protection flag address for your version of GW BASIC, you need to load an unprotected program into memory and then find out which address will protect the program when it is changed from a zero byte to a non-zero byte. Load the following short program into your BASIC interpreter, save it in unprotected form, LIST it, and then run the last line as a direct statement by typing over the line number and pressing your <Enter> or <Return> key. A bunch of successive addresses will be printed and then the program will stop with "Illegal function call" displayed. Write down the last address printed. That will be the address of the protection flag for your version of GW-BASIC. Here is the short program (I call it "poketest.bas" -- make sure line 104 is entered as a single line even if your browser wraps it):
100 ' type the following line 104 as a direct statement or just LIST 102 ' this and then run line 104 by deleting the line number "104" 103 ' and pressing the <Return> or <Enter> key: 104 FOR I=1000 TO 16000:PRINT I: J=PEEK(I): POKE I,((J=0)AND 255) OR J: POKE I,J:NEXT I
What the program does is change successive zero bytes to non-zero and then restores them back to their original values until it hits the protection flag. Once the POKE I,((J=0)AND 255) OR J statement turns on the protection flag, the POKE I,J statement becomes a forbidden statement in a direct statement.
Now go back to the top of this page and review what has been written
about the numeric tokens, paying special attention to
0D
xxxx, 0E
xxxx,
10
and 1E
. Then review the
detokenizing example at the link above. The background there will
help you understand the bug we are going to exploit.
Almost everywhere in the BASIC interpretor, one of two routines are
called to get the current or next significant character to be processed.
The "get next item" routine would be very confused if it encountered a
byte in
a numeric constant because those bytes can have any value. Therefore,
whenever a numeric token is encountered, the following things are
done:
11
to 1B
) or two-byte
(0F
xx) tokens,1E
byte of a byte pair, 1E10
andWhat happens with VAL? The regular program counter is pushed on a stack, it is changed to point to the string to be evaluated, a zero-byte is inserted at the end of the string to ensure that VAL won't evaluate past the end of the string (such as when two strings adjacent in BASIC's string work area have the values "123" and "456" respectively, we don't want VAL to return 123456 as the value), and then BASIC's numeric-evaluation routine is called to evaluate the string. It uses the same call to "get current byte" and "get next byte" as the rest of the interpreter does. When the evaluation encounters what it thinks is the end of the number, it returns to VAL, VAL then replaces the zero byte, pops the original program counter from the stack and returns the evaluation result to its caller.
"Where's the bug!" you ask. What do you think happens if VAL is followed by a numeric constant and the number in the string is also followed by a numeric constant? Right. The one in the program line that follows VAL is over-written by the one in the string. On return to the VAL processing routine, the program counter is not restored to point to the statement with the VAL function. It is restored to point to the constant in the string. Now the string is being interpreted instead of the statement with VAL in it.
Normally this would cause a syntax error but there is one statement in BASIC that can have a numeric constant follow a function without intervening punctuation. That is a PRINT statement.
"How does this help?" you ask. The answer is that POKE is not allowed in a direct statement but is allowed in a program line. If we can go to a program line that has a POKE in it to turn off the protection flag then that will be allowed and the protection is turned off. Once a GOTO is executed, we are no longer in a direct statement. Obviously we cannot go to any line in the protected program. Firstly we don't know what line numbers are in it so we can't go to any one of them. Secondly, we don't know what is in the lines so we couldn't go to a suitable POKE if there was one, and thirdly, the chances of there being a suitable POKE in the protected program are slimmer than my chances of becoming richer than Bill Gates. However, using a pointer token instead of a line-number token means that the "program" line we go to can be anywhere in memory and doesn't have to really be in the program we need to unprotect. A numeric array is a convenient place for it.
What we need to do is to
Replace the 1450 below with the protection flag address of your version of GW-BASIC and "secret.bas" with the name of the program to be unprotected.
load "secret.bas" dim a%[14] a%[0]=0:a%[1]=&h2020:a%[2]=&h2020:a%[3]=&h2097 ' " " " " "DEF " a%[4]=&h4553:a%[5]=&h2047:a%[6]=&H203A:a%[7]=&H2098 ' "SE" "G " ": " "POKE " a%[8]=&H1C20 :a%[9]=1450 :a%[10]=&h112C ' " " + integer token the flag address ",0" a%[11]=&h903A :a%[12]=0 ' ":STOP" end-of-line b$="" ' must be defined before VARPTR is called. b$="123"+chr$(28)+":::"+chr$(137)+chr$(13)+mki$(varptr(a%[0]))+":" print val(b$) 456
Visitors to this page are encouraged to copy the information above and make it available to others as long as credit is given for the source. I'm not getting any younger and would hate to have the information lost forever if I should unexpectedly drop dead next week, month, year, decade or century.
I have patched a copy of GW-BASIC to get rid of some bugs and add the new features described and documented in the NewBASIC.txt text file. [Ooops! I forgot to change the permissions for the file after uploading it so it wasn't accessible for a couple of days. It should be accessible now.] The patched BASIC (NBASIC.EXE) is available for download in a zipped file, NBASIC.EXE.zip.
Note: My NBASIC.EXE is totally unrelated to another BASIC interpreter also called "NBASIC", available at: