(Atari) ST-RAM boost WIP

68030 + SDRAM + IDE

Moderators: terriblefire, Terriblefire Moderator

User avatar
Badwolf
Posts: 2228
Joined: Tue Nov 19, 2019 12:09 pm

Re: (Atari) ST-RAM boost WIP

Post by Badwolf »

Didn't we establish that the address lines aren't driven when ACSI and the Floppy are doing their thing & it's consequently impossible to snoop?

Snooping the blitter is possible, however.

BW
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
DSTB1 Open source 16Mhz 68k and AltRAM accelerator for the ST
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
ijor
Posts: 428
Joined: Fri Nov 30, 2018 8:45 pm

Re: (Atari) ST-RAM boost WIP

Post by ijor »

Badwolf wrote: Tue Apr 19, 2022 11:14 pm Didn't we establish that the address lines aren't driven when ACSI and the Floppy are doing their thing & it's consequently impossible to snoop?
That's why I said in my previous message that you have to snoop writes to the DMA address registers.
exxos wrote: Tue Apr 19, 2022 10:57 pm Time is never on anyone's side, unfortunately.
Yep. Very true!
http://github.com/ijor/fx68k 68000 cycle exact FPGA core
FX CAST Cycle Accurate Atari ST core
http://pasti.fxatari.com
User avatar
Badwolf
Posts: 2228
Joined: Tue Nov 19, 2019 12:09 pm

Re: (Atari) ST-RAM boost WIP

Post by Badwolf »

ijor wrote: Tue Apr 19, 2022 11:17 pm
Badwolf wrote: Tue Apr 19, 2022 11:14 pm Didn't we establish that the address lines aren't driven when ACSI and the Floppy are doing their thing & it's consequently impossible to snoop?
That's why I said in my previous message that you have to snoop writes to the DMA address registers.
Oh, lawks! Then try to infer where data is being written from that? You’re a better man than I to attempt that.

I think I might — if I were to attempt an ST RAM booster — which I’m definitely not — come at it from the other direction. Have an SRAM-backed DRAM emulator with an independent high-speed bus to the CPU. Serving two masters, as it were.

Probably look to work on the STE first, emulating a pair of SIMMs with a flat flex off to a CPU board.

But that’s probably because I’m better at circuit boards than glue logic :lol:

BW
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
DSTB1 Open source 16Mhz 68k and AltRAM accelerator for the ST
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
User avatar
agranlund
Posts: 777
Joined: Sun Aug 18, 2019 10:43 pm
Location: Sweden
Contact:

Re: (Atari) ST-RAM boost WIP

Post by agranlund »

ijor wrote: Tue Apr 19, 2022 11:17 pm
Badwolf wrote: Tue Apr 19, 2022 11:14 pm Didn't we establish that the address lines aren't driven when ACSI and the Floppy are doing their thing & it's consequently impossible to snoop?
That's why I said in my previous message that you have to snoop writes to the DMA address registers.
exxos wrote: Tue Apr 19, 2022 10:57 pm Time is never on anyone's side, unfortunately.
Yep. Very true!
Yeah that experiment was abandoned back in 2020 due to lack of time.
It worked well on my setup but getting it to a point where it would work for everyone felt like much too big of a hassle.
Just enabling L1 cache for ST-RAM gives a significant speed boost so that's what I went with in the end.
Not as fast but still very nice and a lot less complicated.

Snooping register access sounds like maybe it could work to some degree.
Or snoop instructions to detect when software invalidates the cache and act on that, assuming you are able to invalidate the shadow.
That simple proof-of-concept I experimented with couldn't do any such fancy things :)
Oh and that also assumes TOS is actually cache aware and issues cache flushes... EmuTos and PAK-TOS3 is, but I think people are still insisting on trying to have a good experience with that unholy combination of TOS206 + 68030 + TT-RAM :lol:


Here's another very different idea for the same problem.
I take it the main target is to run games fast.. games that does not work if you use the prgflags to put it into altram because its framebuffer ends up in there too.

One could make a game launcher that traps the games access to the shifter registers.
It would have a pool of a few pre-allocated framebuffers in st-ram.
When the game attempts to set the screen address it'll trigger an mmu exception. If it's not already been mapped to the st-ram pool, we grab a screen from the pool and set up a logical->physical mapping towards it. The value written to the shifter register is the ones for the st-ram buffer, and then we exit the exception handler letting it continue along its merry way..
It'll have to copy the contents on a first-access to I suppose.

I'm sure there's a lot of edge cases that would make this not work 100% for everything.
The obvious one is not knowing the upper 8 bits of where the game thinks it has the framebuffer.
You'd have to guess, but it's probably almost always going to be 0x01 unless you've somehow used up 16MB fastram already.
To improve the success rate, you could get fancy by checking where you're at when the program launches maybe?

This thing would make anything scrolling just become slower due to constantly having to reuse (and recopy to) screens in the st-ram pool.
As luck would have it, these types of games are very rarely the ones you want to boost into unplayable speeds anyway :)

The ones using two non-moving double buffered screens are usually the ones you do want to speed up, and these are also the ones that may work quite well with a "simple" software solution like that.

I have no idea if this would work well in practice but it feels doable.
(I'm not going to, even though I've been very tempted to try exactly this for quite some time now :lol: )
ijor
Posts: 428
Joined: Fri Nov 30, 2018 8:45 pm

Re: (Atari) ST-RAM boost WIP

Post by ijor »

agranlund wrote: Wed Apr 20, 2022 1:02 am Snooping register access sounds like maybe it could work to some degree.
Mostly out of curiosity, why you think it would work "only" to some degree?
Or snoop instructions to detect when software invalidates the cache and act on that, assuming you are able to invalidate the shadow.
But if you really shadow the whole ram and snoop bus masters, then you don't care about invalidating the shadow. This is not really cache, it is just the normal ram as far as the CPU is concerned, and ram is always valid.
I take it the main target is to run games fast..
Really ??? Most games won't benefit from any type of acceleration. They are either speed locked to the 200Hz clock, or sometimes to the video frame rate. These games would always run at constant speed. Or there are games that assume a stock 8MHz system and they will run too fast if accelerated. Games that perform 3D rendering, or some kind of slow calculation that would play better accelerated, they are a minority.

I would think that the main goal of accelerators is precisely to run non-games faster. Not just faster, but as fast as possible. But don't know, may be I am wrong.
Badwolf wrote: Wed Apr 20, 2022 12:03 am
ijor wrote: Tue Apr 19, 2022 11:17 pm That's why I said in my previous message that you have to snoop writes to the DMA address registers.
Oh, lawks! Then try to infer where data is being written from that? You’re a better man than I to attempt that.
Nah. Having reverse engineered the ST chipset, I am just very familiar with the internals. So I know exactly how this works (or at least I like to think that :) )
http://github.com/ijor/fx68k 68000 cycle exact FPGA core
FX CAST Cycle Accurate Atari ST core
http://pasti.fxatari.com
User avatar
agranlund
Posts: 777
Joined: Sun Aug 18, 2019 10:43 pm
Location: Sweden
Contact:

Re: (Atari) ST-RAM boost WIP

Post by agranlund »

ijor wrote: Wed Apr 20, 2022 2:20 am Mostly out of curiosity, why you think it would work "only" to some degree?
But if you really shadow the whole ram and snoop bus masters, then you don't care about invalidating the shadow. This is not really cache, it is just the normal ram as far as the CPU is concerned, and ram is always valid.
Agreed. The shadow I did was not really a cache - it was the st-ram as far as the rest of the system knew.

But when the DMA is performing, the TF board has no idea what those writes are so they would not end up in the shadow.
Even if it could be made to detect when DMA is taking place I'm not sure the A and D bus contains enough to actually update the shadow accordingly?

If not it'll have to either detect that a DMA has taken place and invalidate the whole thing, or snoop for cache instructions and invalidate it accordingly, or something along those ways?

Edit: Hmmn, or are you saying that you'll snoop the registers for relevant info and then copy to shadow after DMA completed.
Or better yet, during DMA assuming the data is passed on the normal data bus. I expect it would be but I don't have schematics on hand.
That does indeed seem like it would work :)

Really ??? Most games won't benefit from any type of acceleration. They are either speed locked to the 200Hz clock, or sometimes to the video frame rate. These games would always run at constant speed. Or there are games that assume a stock 8MHz system and they will run too fast if accelerated.
Games that perform 3D rendering, or some kind of slow calculation that would play better accelerated, they are a minority.
I would think that the main goal of accelerators is precisely to run non-games faster. Not just faster, but as fast as possible. But don't know, may be I am wrong.
But non-games already run fine from fastram?

So in practical terms the problem becomes about making things that cannot run from fastram, run faster (I suppose this means pre-TT games)
Either by making ST-RAM access faster, or by overcoming the issue that is preventing said software from being able to run from fastram in the first place.

I do agree that even among games, it's a minority you'll actually need/want to speed up and that makes the whole endeavour more interesting in a kind of "can it be done" type of challenge rather than doing it for the end result :)

This is one reason why I stopped that old experiment. It worked fine under EmuTOS and no DMA devices, but getting it to a point where it would "just work" for everyone's configuration would have been a much larger time investment I wasn't willing to make for such small practical gain.
For sure, the concept of making ST-RAM always much faster is interesting but it does feel a bit like a solution looking for a problem rather than the other way around :)


Edit:
If the main purpose is to speed up TOSs usage of ST-RAM then the most practical solution is to just run a build of EmuTOS that puts most of its internal work ram in fastram to begin with.
The Vampire build does this, but you can just as easily build the Atari version with the option to do the same thing.
(An additional benefit of this type of build is that it's actually more compatible with old stuff that overwrites the TOS ram area, or is expecting it to be no larger than what TOS1.4 used)
User avatar
exxos
Site Admin
Site Admin
Posts: 23488
Joined: Wed Aug 16, 2017 11:19 pm
Location: UK
Contact:

Re: (Atari) ST-RAM boost WIP

Post by exxos »

@agranlund how long does it actually take to copy STram to altram ? I assume by "invalidate" you mean copy STram ? Would have to be done any time there is bus activity from a bus master.

I'd assume for the blitter it would be a disaster unless its bus assesses were clone to altram which is probably not simple.

For DMA floppy HDD, a second delay( or whatever ) in refreshing the shadow ram after the DMA access I doubt anyone would care about anyway. But when it gets to like 10seconds + for a delay, its still probably not a problem, if the game can run faster afterwards.. Games will hammer the floppy drive during play might be a issue :lol: but there is always going to be pros and cons to any upgrade. It is what it is and nothing more.
https://www.exxosforum.co.uk/atari/ All my hardware guides - mods - games - STOS
https://www.exxosforum.co.uk/atari/store2/ - All my hardware mods for sale - Please help support by making a purchase.
viewtopic.php?f=17&t=1585 Have you done the Mandatory Fixes ?
Just because a lot of people agree on something, doesn't make it a fact. ~exxos ~
People should find solutions to problems, not find problems with solutions.
User avatar
agranlund
Posts: 777
Joined: Sun Aug 18, 2019 10:43 pm
Location: Sweden
Contact:

Re: (Atari) ST-RAM boost WIP

Post by agranlund »

exxos wrote: Wed Apr 20, 2022 10:48 am @agranlund how long does it actually take to copy STram to altram ? I assume by "invalidate" you mean copy STram ? Would have to be done any time there is bus activity from a bus master.
I really don't know how invalidate would work for a shadow like this, and of that size. Copy from st-ram perhaps like you mentioned? It would probably end up horrible no matter what :)
Even if it was implemented more like a normal cache instead of an always-on shadow, clearing the cache-line valid flags for that size is probably not very fun?

The best is probably if it could be guaranteed to be always valid by aid of snooping as @ijor mentioned.
User avatar
Badwolf
Posts: 2228
Joined: Tue Nov 19, 2019 12:09 pm

Re: (Atari) ST-RAM boost WIP

Post by Badwolf »

agranlund wrote: Wed Apr 20, 2022 9:12 am Edit: Hmmn, or are you saying that you'll snoop the registers for relevant info and then copy to shadow after DMA completed.
Or better yet, during DMA assuming the data is passed on the normal data bus. I expect it would be but I don't have schematics on hand.
That does indeed seem like it would work :)
The second.

The data lines are driven & can be snooped easily enough but the address lines aren't. Ijor's suggestion is to listen for commands sent to the DMA chip and therefore deduce which address lines are being written to as the data lines are driven.

How to even get that right on paper would drive me mad. Trying to write verilog to actually do it would make me give up computers ;)

BW
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
DSTB1 Open source 16Mhz 68k and AltRAM accelerator for the ST
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
ijor
Posts: 428
Joined: Fri Nov 30, 2018 8:45 pm

Re: (Atari) ST-RAM boost WIP

Post by ijor »

agranlund wrote: Wed Apr 20, 2022 9:12 am Edit: Hmmn, or are you saying that you'll snoop the registers for relevant info and then copy to shadow after DMA completed.
Or better yet, during DMA assuming the data is passed on the normal data bus. I expect it would be but I don't have schematics on hand.
Exactly. The data is passed over the main CPU data bus (where else it could be?). The main problem is that the address bus is not driven because there is nobody to drive it. So you have to snoop writes to the DMA addr registers. Yes, you could perform your own writes on parallel, no need to wait to DMA to complete.

The second problem is that you need to grab the mentioned DMA signal from GLUE (or MMU). I assume you don't have access to that signal in your current hardware, so you might need to solder one wire. Incidentally, this signal is still bonded out on the STE GLUE/MMU combo. This is curious because in the combo it becomes an internal signal and in fact, it is not connected to the board. But what could be a useless pin, it makes this concept possible in the STE as well.
But non-games already run fine from fastram?
Not all of them. Well behaved GEM apps are not a problem. But there are many apps that perform their own screen management and can't be run from fastram. How do you know which software can run from fastram and which can not?

Anyway, games or non-games, what is IMHO very important, is being transparent to the software. It is always possible to patch games, software, and even TOS. This might be interesting and very useful in itself, but that's a completely different game. What I am considering (may be dreaming :) ) is something that works purely at the hardware level. No need to patch any software, no need to patch or use a custom TOS. You install the hardware and it just works, seamlessly. There would always be software, mostly games, that would run too fast, but those will probably require to disable the accelerator altogether anyway.
I do agree that even among games, it's a minority you'll actually need/want to speed up and that makes the whole endeavour more interesting in a kind of "can it be done" type of challenge rather than doing it for the end result :)

For sure, the concept of making ST-RAM always much faster is interesting but it does feel a bit like a solution looking for a problem rather than the other way around :)
Yeah, in a way it is like that :). As said, most well behaved apps can simply run from fastram and that's it. But oh well, the whole concept of retro computing can be considered without a practical purpose. In the 8-bit real some people still implement tape based solutions. Is there anything slower, more irritating, and completely useless than loading software from tape??? But the nostalgic effect is invaluable :)

Badwolf wrote: Wed Apr 20, 2022 1:11 pm Ijor's suggestion is to listen for commands sent to the DMA chip and therefore deduce which address lines are being written to as the data lines are driven.
Not to the DMA chip, but to MMU. All the address registers live in MMU because none of the custom chips (besides Blitter) can drive the address bus. So MMU has to address DRAM directly from internal registers.
How to even get that right on paper would drive me mad. Trying to write verilog to actually do it would make me give up computers ;)
This is probably simpler than what you are imaging. Mist, and Mister even more, perform tasks way more complicated than that. ;)


EDIT:
Replying to myself
The second problem is that you need to grab the mentioned DMA signal from GLUE (or MMU).
I would need to think about this more carefully, but it might be feasible to get along without this signal. The DMA signal works likes some sort of chip select. But if the bus is granted, the address bus is wholly pulled up and there is a bus transaction, then that might be enough information to infer that it is a DMA (chip) transaction.
http://github.com/ijor/fx68k 68000 cycle exact FPGA core
FX CAST Cycle Accurate Atari ST core
http://pasti.fxatari.com
Post Reply

Return to “TF536”