Wednesday, October 26, 2022

Part 2f: Going further down the rabbit hole

The Adrenalin had kicked in. Those Blue Pills were working well and even with the small amount of progress I had completed so far it encouraged me to do a bit more. At times I was up until 5am disassembling code.

I now needed to work out where most of the BASIC subroutines were located in memory.

Just after the list of keywords I found what looked like a table of pointers. This is it here:

;********************************************************************
; pointer table for the immediate BASIC functions comment shows token and 'Yes' if pointer is confirmed
0234 : 06 2E ENDptr: dw END ;80 Yes
0236 : 05 65 FORptr: dw FOR ;81 Yes
0238 : 09 B3 NEXTprt: dw NEXT ;82 Yes
023A : 07 06 DATAptr: dw DATA ;83 Yes
023C : 08 DA INPUTptr: dw INPUT ;84 Yes
023E : 0C 20 DIMptr: dw DIM ;85 Yes
0240 : 09 00 READptr: dw READ ;86 Yes
0242 : 07 91 LETptr: dw LET ;87 Yes
0244 : 06 C9 GOTOptr: dw GOTO ;88 Yes
0246 : 06 A3 RUNptr: dw RUN ;89 Yes
0248 : 07 27 IFptr: dw IF ;8A Yes
024A : 06 12 RESTOREptr: dw RESTORE ;8B Yes
024C : 06 AD GOSUBptr: dw GOSUB ;8C Yes
024E : 06 E0 RETURNptr: dw RETURN ;8D Yes
0250 : 07 3A REMptr: dw REM ;8E Yes
0252 : 06 2D STOPptr: dw STOP ;8F Yes
0254 : 07 47 ONptr: dw ON ;90 Yes
0256 : 06 64 NULLptr: dw NULL ;91 Yes
0258 : 11 9B WAITptr: dw WAIT ;92 Yes
025A : 0E 4F DEFptr: dw DEF ;93 Yes
025C : 11 94 POKEptr: dw POKE ;94 Yes
025E : 07 FA PRINTptr: dw PRINT ;95 Yes
0260 : 06 52 CONTptr: dw CONT ;96 Yes
0262 : 05 14 LISTptr: dw LIST ;97 Yes
0264 : 06 75 CLEARptr: dw CLEAR ;98 Yes
0266 : 04 E2 NEWptr: dw NEW ;99 Yes
;********************************************************************

Initially I didn't know if this was the entire table as it seemed to be a bit short.

Also I didn't know what each pointer address did. To see if this was actually correct I had to look at the code at each of those pointer addresses to see if it seemed likely that they were subroutines that did something. Then I had to guess the purpose of each of those locations. My first assumption was that they were in the same sequence as the keywords, so I named them accordingly.

But how do I know if the subroutines do what I assume they do. Does LIST do a LIST command or does it do something else. What happened to all the commands after the NEW command, like TAB(, TO, FN, SPC(, THEN, NOT and STEP that I found in the keyword table.

After looking at the disassembly at a few of these pointer addresses I felt they were valid subroutines, but I wasn't sure if they were 1 for 1 in the same sequence as the keywords. I decided to just guess that they were and later on if I find out otherwise then I'd edit the list. I also put a comment section at each of those subroutines stating what I thought they were for, To mark things as a best guess I just put up a '?' on each comment, and a lot of those still remain in my disassembly listing.

For example the following short subroutine is given a title banner. Later on I added some comments as I discovered what certain addresses are used for. In this example zero-page address at $000C is used to hold the terminal width.

;********************************************************************
; perform NULL:
0664 : BD 11 3B     L0664:  jsr L113B ;LineNumberFromString ?
0667 : 26 FA  bne L0663 ;exit
0669 : 5C  incb ;+1 to test if too wide
066A : D1 0C  cmpb X000C ;is it at TERMINAL_WIDTH
066C : 24 04  bcc L0672 ;no ?
066E : 5A  decb ;-1, back to setting specified
066F : D7 0A  stab X000A ;save #NULLS
0671 : 39  rts
0672 : 7E 0D 4C  L0672: jmp L0D4C ;?FC ERROR
;********************************************************************

I'm not quite at the command prompt yet, nor can I type in any commands or program lines, so that's the next and possibly final goal of this test.

I start by following the code after the 'MITS ALTAIR...' message, and this is where I see that the code is self modifying, not only in zero-page, but outside of that too.

19F0 : CE 1A 52 > ldx #$1A52 ;' BYTES FREE / MITS ALTAIR...' message -1
19F3 : BD 08 79 jsr L0879 ;display message
19F6 : CE 08 79 >x ldx #$0879 ;adrs of display msg routine
19F9 : FF 01 13 stx X0113
19FC : BD 04 E4 jsr L04E4 ;
19FF : 86 BD ldaa #$BD ;jsr instruction
1A01 : 97 04 staa X0004 ;replace jmp at cold entry
1A03 : CE 05 00 ldx #$0500 ;point to subroutine ?
1A06 : DF 05 stx X0005 ;replace cold entry start location 
1A08 : 7E 03 3C jmp L033C ;BASIC warm start

At $19F6 the x-register is loaded with a patch address, which happens to be the display message subroutine, and that is then written over the content of $0113 and $0114. So now at $0112 it would be changed from to JMP $18F9 (INIT) to JMP $0879 (DISPLAY MSG). This is not the only place the code is overwritten.

After following the code I finally see where it obtains and displays the OK prompt which is hidden in a batch of seemingly random ASCII characters. The display message routine is entered at $0879 with the x-register containing the address of the actual ASCII message minus 1.

;********************************************************************
; Warm start BASIC
;send BASIC startup prompt 'OK'
033C : 7F 01 11 L033C: clr X0111 ;NUL char ?
033F : CE 02 97 ldx #$0297 ;ptr to 'OK' prompt
0342 : BD 01 12 jsr L0112 ;do altered INIT routine, restart ?
;********************************************************************
;get input line from user

0345 : 86 2C L0345: ldaa #$2C ;save ',' at start of line buffer
0347 : 97 0F staa LineBuffer ;to indicate type of input ?
0349 : BD 03 EC jsr L03EC ;get input line from user

And this is when it now gets the user input as a command or line of BASIC code. The input line is stored in zero-page after address $0F upwards. I wasn't sure how long that might be, but looking at zero-page initially there is a bunch of zero's from $0010 to $00BE, so it could be as much as 175 characters. Later I find out that the maximum input line is 72 characters, which could be a bit restrictive, but it is what it is.

I thought it might be best to just add the simplest keyword command subroutines one or two at a time, patching them as I go, then testing them. The shortest routines would be done first so I could get immediate feedback. Routines such as CLEAR, NEW, NULL, REM seemed to be good candidates.

Eventually after entering enough of the hex code and patching any addresses that refer to zero-page values or subroutines that have been moved, was able to enter a short BASIC program and LIST it, such as the one below:

Doesn't seem like much, but now I am able to enter a program and LIST it. Running it comes later. Notice the leading zero's on the BASIC program lines.

I noticed the leading zero's on the BASIC program listing. This was something I also noticed when BASIC starts up with the number of BYTES FREE. I thought perhaps this was normal for the ALTAIR 680 version of BASIC.

Reaching out..

I have no way to know if these leading zero's are supposed to be there or not, so I reached out to Mike Douglas, owner of deramp.com, and he confirmed that the leading zero's were not supposed to be there. It didn't seem to cause a problem for now, so I left that bug for later to look into.

It looks like I am almost there, just a bit more...

TOKENISING


Microsoft BASIC doesn't store programs exactly as they appear in a LISTing. If they did the programs space it occupies would be significantly larger. Instead programs are tokenised as lines are entered. What this means is that keywords are replaced with a single byte representation of that keyword. Then when a program is listed that token is expanded into it's full character representation.

There is an excellent description of tokenising 4K ALTAIR 8800 BASIC here: http://altairbasic.org/int_ex.htm

I wrongly assumed that Microsoft used the same tokens in all of it's versions of BASICs. Looking at a disassembly listing for SYM 1.1 BASIC I found these tokens:

LAB_C088 = LAB_C089-1
.byte "EN",("D"+$80) ; $80
.byte "FO",("R"+$80) ; $81
.byte "NEX",("T"+$80) ; $82
.byte "DAT",("A"+$80) ; $83
.byte "INPU",("T"+$80) ; $84
.byte "DI",("M"+$80) ; $85
.byte "REA",("D"+$80) ; $86
.byte "LE",("T"+$80) ; $87
.byte "GOT",("O"+$80) ; $88
.byte "RU",("N"+$80) ; $89
.byte "I",("F"+$80) ; $8A
.byte "RESTOR",("E"+$80); $8B
.byte "GOSU",("B"+$80) ; $8C
.byte "RETUR",("N"+$80) ; $8D
.byte "RE",("M"+$80) ; $8E
.byte "STO",("P"+$80) ; $8F
.byte "O",("N"+$80) ; $90
.byte "NUL",("L"+$80) ; $91
.byte "WAI",("T"+$80) ; $92
.byte "LOA",("D"+$80) ; $93
.byte "SAV",("E"+$80) ; $94
.byte "DE",("F"+$80) ; $95
.byte "POK",("E"+$80) ; $96
.byte "PRIN",("T"+$80) ; $97
.byte "CON",("T"+$80) ; $98
.byte "LIS",("T"+$80) ; $99
.byte "CLEA",("R"+$80) ; $9A
.byte "GE",("T"+$80) ; $9B
.byte "NE",("W"+$80) ; $9C
.byte "TAB",("("+$80) ; $9D
.byte "T",("O"+$80) ; $9E
.byte "F",("N"+$80) ; $9F
.byte "SPC",("("+$80) ; $A0
.byte "THE",("N"+$80) ; $A1
.byte "NO",("T"+$80) ; $A2
.byte "STE",("P"+$80) ; $A3
.byte ("+"+$80) ; $A4
.byte ("-"+$80) ; $A5
.byte ("*"+$80) ; $A6
.byte ("/"+$80) ; $A7
.byte ("^"+$80) ; $A8
.byte "AN",("D"+$80) ; $A9
.byte "O",("R"+$80) ; $AA
.byte (">"+$80) ; $AB
.byte ("="+$80) ; $AC
.byte ("<"+$80) ; $AD
.byte "SG",("N"+$80) ; $AE
.byte "IN",("T"+$80) ; $AF
.byte "AB",("S"+$80) ; $B0
.byte "US",("R"+$80) ; $B1
.byte "FR",("E"+$80) ; $B2
.byte "PO",("S"+$80) ; $B3
.byte "SQ",("R"+$80) ; $B4
.byte "RN",("D"+$80) ; $B5
.byte "LO",("G"+$80) ; $B6
.byte "EX",("P"+$80) ; $B7
.byte "CO",("S"+$80) ; $B8
.byte "SI",("N"+$80) ; $B9
.byte "TA",("N"+$80) ; $BA
.byte "AT",("N"+$80) ; $BB
.byte "PEE",("K"+$80) ; $BC
.byte "LE",("N"+$80) ; $BD
.byte "STR",("$"+$80) ; $BE
.byte "VA",("L"+$80) ; $BF
.byte "AS",("C"+$80) ; $C0
.byte "CHR",("$"+$80) ; $C1
.byte "LEFT",("$"+$80) ; $C2
.byte "RIGHT",("$"+$80) ; $C3
.byte "MID",("$"+$80) ; $C4
.byte "G",("O"+$80) ; $C5
.byte $00

SYM 1.1 BASIC token list.

At first it was looking the same as 8K ALTAIR 680 BASIC. Starting from the top 'END' is #$80, FOR is #$81 etc., and ALTAIR BASIC looked the same, that is until you get to 'POKE' which is #$96 on the SYM, and #$94 on 8K ALTAIR 680 BASIC. Oh oh... bugger!!! Cheating never pays.

I now knew that I could not rely fully on other BASIC disassembly listings and I'd have to test every single token somehow. But how...

Well it's simpler than I first thought. All I had to do was enter in some program lines, then have a look at how it was stored in memory, reading back the hexadecimal tokens from a memory dump.

But how do I see the memory dump...

BREAKING into the Fantom II monitor and beyond...

Referring back to Tiny BASIC, there was a 'BYE' command that allowed the user to drop into the monitor, and then to return to BASIC you simply typed 'B' <Enter> back to where you left off with the memory contents unaltered. Unfortunately 8K ALTAIR 680 BASIC does not have an equivalent command for the ALTAIR PROM monitor, so I had to create a way to do that and return. Would I have to create a new command ? A bit early to be adding commands when I don't even have it running yet.

The method I came up with is to place a breakpoint at some BASIC subroutine that I could call from a BASIC command line. This had to be done before starting ALTAIR 680 BASIC with the MON> G 0 command sequence. You can set up to four breakpoints in the Fantom II monitor, but as long as you have one, you can always alter them or add more if you can drop into the monitor. I decided to place a breakpoint at $0664 NULL command which is not a command I thought I would need too often.

MON> H 664
MON> 0664 FFFF FFFF FFFF

Typing H 664 adds a breakpoint at the NULL subroutine. Then typing <ctrl>H lists the (maximum) 4 breakpoints.

What this allows me to do from BASIC is type 'NULL 0 <Enter>' whereby BASIC drops into the Fantom II monitor. I can do almost whatever I want while in the monitor, such as listing memory, modifying code etc. Then when I'm done I can return to BASIC by continuing the NULL subroutine from the breakpoint using the G 664 command. The Fantom II restores the registers to the values saved when the breakpoint was invoked and then continues execution, which in this case finishes the NULL command and returns to the command line in BASIC. Now I have a way to figure out the tokens used in this version of BASIC.

Next I confirm most of the tokens and my pcbs arrived...






No comments:

Post a Comment

Part 2o: The future of the ETA-3400X

You can find all the work I've done on the ETA-3400X and 8K ALTAIR 680 BASIC, except for the ETA-3400X Gerber files, on the ET-3400 grou...