Quick update on progress before I take a couple of weeks break on holiday.
The below are comparisons against the same Falcon with NVDI installed but the accelerator card disabled. I have to have NVDI at the moment as I've not developed Maprom to replace the blitter yet. This is at least a fair comparison of what it's doing.
I've just be tweaking and paring down firmware. I've removed my counter routine (that counted the processor 'S' states) and replaced where it was used with cascading flip flops as delay lines. It's reduced my macrocell footprint, but I'm running out of line routeing space. I've had to jumper some signals to other parts of the CPLD. I may be running up against the limits of the old Max3000A.
Anyway, by holding off returning to full speed after a mainboard access for a couple of cycles, I've avoided the (slow) switch back in to 16MHz mode in between the consecutive word accesses of a longword retrieval. This has speeded up RAM access from 89% normal to 103% normal (presumably the extra 3% is when it can jump back to 50MHz between longword retrievals).
Also I've tweaked the altram DSACK signal timing and got the altram test down to just under 12 seconds (speed boost of 157% over the TT reference figure). Not bad for 16 bit. When using maprom to put TOS into altram, that yields an impressive 356% figure for ROM access.
Unfortunately, with only 1MB altram available at the moment I have to choose to either use maprom to get quick ROM routines, or reserve the whole 1MB for the OS. The latter means I can run Gembench from altram.
With Maprom, running Gembench from ST-RAM:
IMG_3860.jpeg
Without Maprom, Gembench run from AltRAM:
IMG_3859.jpeg
I want to consolidate the bodges I've had to apply to this board into a new revision, but before I do that I want to decide on the next steps to experiment with, so I can build in support.
The obvious ones are:-
* Try a clock multiplier rather than a crystal to simplify the switching. I've a couple on order to experiment with. I'll probably just bodge them into the crystal socket, so that's not a biggie.
* Move away from the obsolete Max3000A CPLD. That will probably mean going to 3.3V, so I'll need some bulk buffering. Stephen uses 8 bit SN74CB3T3245s for this, which look nice, but level shifting two lots of 32 bit lines would need eight of them, plus the control signals. Does anyone have any recommendations for similar, directionally auto-sensing level translators?
* More memory. This was the main aim of this project. Memory, and lots of it. DRAM is the only real option here. This will again need 3.3V infrastructure and probably a bigger CPLD.
* 32 bit memory. I kept the 16 bit bus for simplicity in this early design. This is the second revision, but the first that's worked. It's time to get the benefit of those opened up lines.
* Onboard ROM. I'm thinking of putting a flash chip on the board for quick-access and repogrammable ROM. Does anyone have any hint or tips on doing in-circuit external programming of a flash chip? Else I might try to fit a TSSOP48 ZIF socket on here for now.
Otherwise, whilst mulling those over, I can try experimenting again with the 'exxos' delayed DTACK clock domain technique on this board and I've a *significant* issue I need to investigate: the DSP. Anything that tries to use the DSP hangs. I'd kind of overlooked it in development as I don't really use the DSP much, but it's one of the big Falcon selling points so I ought to spend a bit of time learning how it is talked to and see if there's anything to do about it. My FPU's been off the board for a while (long story going back to 1996...) and I suspect that'll be a similar situation as I simply haven't thought about them. I'll have to try to lay hands on a minimalistic program for the DSP to help me debug. 'Hello Word' for 56K, I suppose.
Any thoughts or recommendations on any of the above brain dump appreciated :-)
Cheers,
BW.
You do not have the required permissions to view the files attached to this post.