o.css">
How C64 BASIC is stored
Updated : 28/02/2023
I decided to put together a tutorial explaining how the Commodore 64 stores BASIC programs (or listings) in it's memory.
The need to knows :
How any information is stored in a computers memory :
All information is stored in binary format, a series of ones and zeros each one or zero is called a bit. The Commodore 64 is an 8bit computer so each piece (or byte) of data consists of eight one's or zero's.
An 8bit binary byte ranges from 00000000 to 11111111, the one or zero on the left (first didgit) is known as the MSB or Most Significant Bit and holds a value of 128. (Will explain why soon). The bit on the right (last didgit) is known as the LSB or Least Significant bit and holds a value of 1.
The six bit's left from left to right hold values of 64, 32, 16, 8, 4 and 2. So when given a binary string you start at either end and if it's a one you add the particuar bit's value and if it's a zero you don't. By doing this you will end up with a value between 0 to 255. For example the binary value of "11111111" = 255 as you add 128, 64,32,16,8,4,2 and 1 and the binary value of "00000000" = 0 as they are all zero's so no bit values are added. Another example might be "01010101", this would equal 85. In this case the MSB (or bit on the left = zero so you jump it, the next bit along = one and has a value of 64 so that is included. Once again the next bit is a zero so it is skipped but the following bit is a one and it has a value of 16 so it is added to the 64. This is continued untill the LSB (or bit on the right) is reached, also a one containig the value of 1 so you end up with 64+16+4+1 that equals 85.
I intend on doing a much more indepth expenation on this in the future in another tutorial but i bring it up as it may help with how we work with data stored on a computer.
How we work with data stored on a computer :
Bacause working with binary strings when it comes to operations such as addition, subtraction, multiplication etc it is near on impossible for most people so we often see the strings value as a decimal (0-255) but we will see it more often represented as an hexidecimal value (00-FF). This is a Base 16 numeric system where we have 00 to 09 equaling 0 to 9 in decimal but once we go past 09 we have 0A, 0B, 0C, 0D, 0E and 0F. 0A = 10 in decimal, 0B = 11 in decimal through to 0F that equals 15 in decimal. Then we have the hex value of 10 that equals 16 in decimal...
While this may seem a little unnessecary their is an viable reason for this. Unfortuantly i can't remember off the top of my head why this is i believe it has to do with the first didgit in the hex value representing the first four bit's (more significant) of the binary string and the second digit representing the last four bits (less significant) bits of the binary string. I'll update this when i know/remember the exact reason.
Anyway, while explaining how C64 Basic is stored on memory i will first show hexidecimal values but also include decimal values in brackets.
BASIC memory space :
While memory space is most definatly a topic for a seperate tutorial the Commodore 64 has a 16bit memory address register giving memory and ROM (Read Only Momory) address values between $0000-$FFFF (0-65535), this range holds everything that is going on within the computer be it the actual BASIC V2 operating system $A000-$BFFF (40960-49151), the VIC2 graphics $D000-$D3FF (53248-54271) to the SID audio $D400-$D7FF (54272-55295).
By default (it can be moved) the BASIC memory space is located at $0800-$9FFF (2048-40959). This is by default the space in the Commodore 64's memory that BASIC programs are stored so will be looking at this area.
Note: The 128's BASIC V7 basic storage starts at $1C01 (7169)
Little Endian :
Another thing to know when it comes to memory space is the way the Commodore 64 stores memory space values. Although the memory space value is a 16bit value the Commodore stores it as two seperate 8bit values in what is called the "Little Endian" format. This basically means the the least significant 8bit byte is stored before the most significant 8bit byte. Say for the address $C000 (49152) the Commodore will store this as two different bytes the first being $00 and second being $C0.
How BASIC is stored :
The line headder, line number and the end if line :
Ok, now we can get to how the Commodore 64 actually stores a BASIC program in it's memory. For some reason I'm not entirley sure of the first byte stored at $0800 (2048) must be a value of $00 (0). Without this BASIC programs won't run so basically the BASIC program starts at memory location $0801 (2049). After this each line of Basic is stored using minimum of six bites.
The first two bytes are the memory address (in Little Endian) of the following line. So if the first like starting at $0801 (2049) is the minimum of 6bit's long then this value would be $0801+$06 (2049+6) so $0807 (2055). Being in Little Endian format this would be stored as $07 and $08.
The third and fourth byte's make up the line number. The third byte is the line number multiplied by 1 and the forth byte is multiplied by 256 to make up the actual line number. For example a line number of 5 will have the values of $05 (5) and $00 (0). $05*1 = 5 and $00*256 = 0, add these together and you get 5. For a line number of 500 the values will be $F4 (244) and $01 (1) where $F4 (244)*1 = 244 and $01 (1)*256 = 256, these added together = 500
The fifth byte is the data that makes up the BASIC line, we will get to this is a minute.
The very last byte is always a $00 (0), this signifies the end of the line.
The data that makes up each line :
The data that makes up each line can be broken into two catogories, simple PETSCII text that is used by commands for example numbers to carry out mathamatical calculations, GOTO statements etc and letters, numbers graphical characters for the PRINT statement etc.
The other catogory is the command's themselves. For example the PRINT command isn't stored as five individual PETSCII characters it is stored as a $99 (153). Click here for a list of the BASIC commands and their respective values. (Link to be created)
How the program finishes :
The last line of the BASIC program starts the same as any other with having the address of what would be the next line at the beginning but at that address the values will be $00 (0) and $00 (0). As there is no line to folow these bytes the double $00 (0) bytes tell the computer this is the end of the listing.
Below is a simple C64 BASIC program along with a HEX dump of the program :
Here is a breakdown of the HEX dump starting at memory address $0801 (2049) :
$0801-$0802:1A 08 : The starting adress for the following line ($081A)
$0803-0804:0A 00 : Line number 0A (10 in decimal) + (0x256) = 10
$0805-$0819:99 C7 28 31 34 37 29 22 48 45 4C 4C 4F 20 57 4F 52 4C 44 22 00 : HEX data for line 10
99 : HEX code for the 'PRINT' command
C7 : HEX code for the 'CHR$' command
28 : Ascii HEX value for open bracket
31,34,37 : Ascii HEX values for '1','4' and '7'
29 : Ascii HEX value for close bracket
22 : Ascii HEX value for double quote
48.....44 : Ascii HEX values for 'HELLO WORLD'
22 : Ascii HEX value for double quote
00 : End of line
$081A-$081B:31 08 : The starting address for the following line ($0831)
$081C-$081D:14 00 : Line number 14 (20 in decimal) + (0x256=0) = 20
$081E-$083F:81 41 B2 30 A4 31 36 3A 97 35 33 32 38 30 2C 41 3A 82 00 : HEX data for line 20
81 : HEX code for the 'FOR' command
41 : Ascii HEX values for 'A'
B2 : HEX code for the '=' command
30 : Ascii HEX values for '0'
A4 : HEX code for the 'TO' command
31,36 : Ascii HEX values for '1' and '6'
3A : Ascii HEX values for ':'
97 : HEX code for the 'POKE' command
35.....30 : Ascii HEX values for '5', '3', '2', '8' and '0'
2C : Ascii HEX values for ','
41 : Ascii HEX values for 'A'
3A : Ascii HEX values for ':'
82 : HEX code for the 'NEXT' command
00 : End of line
$0831:$0832:3A 08 : The starting address for the following line (083A)
$0833-$0834:1E 00 : Line number 1E (30 in decimal)+(0x256=0) = 30
$0835-$0839:89 20 32 30 00 : HEX data for line 30
89 : HEX code for the 'GOTO' command
20 : Ascii HEX values for ' ' - (Please see note 1)
32 : Ascii HEX values for '2'
30 : Ascii HEX values for '0'
00 : End of line
$083A-$083B:5F 08 : The starting address for the following line (085F)
$083C-$083D:E8 03 : Line number E8 (232 in decimal) + 03 ((3 in decimal)x256=768) 232+768 = 1000
$083E-$085E:8F 20 45 58 41 4D 50 4C 45 20 4F 46 20 41 20 4C 41
52 47 45 20 4C 49 4E 45 20 4E 55 4D 42 45 52 00 : HEX data for line 1000
8F : HEX code for the 'REM' command
20.....52 : Ascii HEX values for the remark message
00 : End of line
$085F-$0860:00 00 : The starting address for the following line = 0 so the program has finished.
Note 1 - Although having spaces in your program makes it a lot easier to read it isn't necessary and wastes program space and CPU cycles.
If you have any questions etc about this project, any project on this site or any ideas about any future projects feel free to pop by on my Discord server, say hi and fire away. Copy the following link text into your browser to join my Discord server : "https://discord.gg/SzFh5j3sw5"