Didn't we establish that the address lines aren't driven when ACSI and the Floppy are doing their thing & it's consequently impossible to snoop?
Snooping the blitter is possible, however.
BW
(Atari) ST-RAM boost WIP
Moderators: terriblefire, Terriblefire Moderator
Re: (Atari) ST-RAM boost WIP
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
DSTB1 Open source 16Mhz 68k and AltRAM accelerator for the ST
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
DSTB1 Open source 16Mhz 68k and AltRAM accelerator for the ST
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
Re: (Atari) ST-RAM boost WIP
That's why I said in my previous message that you have to snoop writes to the DMA address registers.
Yep. Very true!
http://github.com/ijor/fx68k 68000 cycle exact FPGA core
FX CAST Cycle Accurate Atari ST core
http://pasti.fxatari.com
FX CAST Cycle Accurate Atari ST core
http://pasti.fxatari.com
Re: (Atari) ST-RAM boost WIP
Oh, lawks! Then try to infer where data is being written from that? You’re a better man than I to attempt that.
I think I might — if I were to attempt an ST RAM booster — which I’m definitely not — come at it from the other direction. Have an SRAM-backed DRAM emulator with an independent high-speed bus to the CPU. Serving two masters, as it were.
Probably look to work on the STE first, emulating a pair of SIMMs with a flat flex off to a CPU board.
But that’s probably because I’m better at circuit boards than glue logic
BW
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
DSTB1 Open source 16Mhz 68k and AltRAM accelerator for the ST
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
DSTB1 Open source 16Mhz 68k and AltRAM accelerator for the ST
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
Re: (Atari) ST-RAM boost WIP
Yeah that experiment was abandoned back in 2020 due to lack of time.
It worked well on my setup but getting it to a point where it would work for everyone felt like much too big of a hassle.
Just enabling L1 cache for ST-RAM gives a significant speed boost so that's what I went with in the end.
Not as fast but still very nice and a lot less complicated.
Snooping register access sounds like maybe it could work to some degree.
Or snoop instructions to detect when software invalidates the cache and act on that, assuming you are able to invalidate the shadow.
That simple proof-of-concept I experimented with couldn't do any such fancy things
Oh and that also assumes TOS is actually cache aware and issues cache flushes... EmuTos and PAK-TOS3 is, but I think people are still insisting on trying to have a good experience with that unholy combination of TOS206 + 68030 + TT-RAM
Here's another very different idea for the same problem.
I take it the main target is to run games fast.. games that does not work if you use the prgflags to put it into altram because its framebuffer ends up in there too.
One could make a game launcher that traps the games access to the shifter registers.
It would have a pool of a few pre-allocated framebuffers in st-ram.
When the game attempts to set the screen address it'll trigger an mmu exception. If it's not already been mapped to the st-ram pool, we grab a screen from the pool and set up a logical->physical mapping towards it. The value written to the shifter register is the ones for the st-ram buffer, and then we exit the exception handler letting it continue along its merry way..
It'll have to copy the contents on a first-access to I suppose.
I'm sure there's a lot of edge cases that would make this not work 100% for everything.
The obvious one is not knowing the upper 8 bits of where the game thinks it has the framebuffer.
You'd have to guess, but it's probably almost always going to be 0x01 unless you've somehow used up 16MB fastram already.
To improve the success rate, you could get fancy by checking where you're at when the program launches maybe?
This thing would make anything scrolling just become slower due to constantly having to reuse (and recopy to) screens in the st-ram pool.
As luck would have it, these types of games are very rarely the ones you want to boost into unplayable speeds anyway
The ones using two non-moving double buffered screens are usually the ones you do want to speed up, and these are also the ones that may work quite well with a "simple" software solution like that.
I have no idea if this would work well in practice but it feels doable.
(I'm not going to, even though I've been very tempted to try exactly this for quite some time now )
Re: (Atari) ST-RAM boost WIP
Mostly out of curiosity, why you think it would work "only" to some degree?
But if you really shadow the whole ram and snoop bus masters, then you don't care about invalidating the shadow. This is not really cache, it is just the normal ram as far as the CPU is concerned, and ram is always valid.Or snoop instructions to detect when software invalidates the cache and act on that, assuming you are able to invalidate the shadow.
Really ??? Most games won't benefit from any type of acceleration. They are either speed locked to the 200Hz clock, or sometimes to the video frame rate. These games would always run at constant speed. Or there are games that assume a stock 8MHz system and they will run too fast if accelerated. Games that perform 3D rendering, or some kind of slow calculation that would play better accelerated, they are a minority.I take it the main target is to run games fast..
I would think that the main goal of accelerators is precisely to run non-games faster. Not just faster, but as fast as possible. But don't know, may be I am wrong.
Nah. Having reverse engineered the ST chipset, I am just very familiar with the internals. So I know exactly how this works (or at least I like to think that )
http://github.com/ijor/fx68k 68000 cycle exact FPGA core
FX CAST Cycle Accurate Atari ST core
http://pasti.fxatari.com
FX CAST Cycle Accurate Atari ST core
http://pasti.fxatari.com
Re: (Atari) ST-RAM boost WIP
Agreed. The shadow I did was not really a cache - it was the st-ram as far as the rest of the system knew.ijor wrote: ↑Wed Apr 20, 2022 2:20 am Mostly out of curiosity, why you think it would work "only" to some degree?
But if you really shadow the whole ram and snoop bus masters, then you don't care about invalidating the shadow. This is not really cache, it is just the normal ram as far as the CPU is concerned, and ram is always valid.
But when the DMA is performing, the TF board has no idea what those writes are so they would not end up in the shadow.
Even if it could be made to detect when DMA is taking place I'm not sure the A and D bus contains enough to actually update the shadow accordingly?
If not it'll have to either detect that a DMA has taken place and invalidate the whole thing, or snoop for cache instructions and invalidate it accordingly, or something along those ways?
Edit: Hmmn, or are you saying that you'll snoop the registers for relevant info and then copy to shadow after DMA completed.
Or better yet, during DMA assuming the data is passed on the normal data bus. I expect it would be but I don't have schematics on hand.
That does indeed seem like it would work
But non-games already run fine from fastram?Really ??? Most games won't benefit from any type of acceleration. They are either speed locked to the 200Hz clock, or sometimes to the video frame rate. These games would always run at constant speed. Or there are games that assume a stock 8MHz system and they will run too fast if accelerated.
Games that perform 3D rendering, or some kind of slow calculation that would play better accelerated, they are a minority.
I would think that the main goal of accelerators is precisely to run non-games faster. Not just faster, but as fast as possible. But don't know, may be I am wrong.
So in practical terms the problem becomes about making things that cannot run from fastram, run faster (I suppose this means pre-TT games)
Either by making ST-RAM access faster, or by overcoming the issue that is preventing said software from being able to run from fastram in the first place.
I do agree that even among games, it's a minority you'll actually need/want to speed up and that makes the whole endeavour more interesting in a kind of "can it be done" type of challenge rather than doing it for the end result
This is one reason why I stopped that old experiment. It worked fine under EmuTOS and no DMA devices, but getting it to a point where it would "just work" for everyone's configuration would have been a much larger time investment I wasn't willing to make for such small practical gain.
For sure, the concept of making ST-RAM always much faster is interesting but it does feel a bit like a solution looking for a problem rather than the other way around
Edit:
If the main purpose is to speed up TOSs usage of ST-RAM then the most practical solution is to just run a build of EmuTOS that puts most of its internal work ram in fastram to begin with.
The Vampire build does this, but you can just as easily build the Atari version with the option to do the same thing.
(An additional benefit of this type of build is that it's actually more compatible with old stuff that overwrites the TOS ram area, or is expecting it to be no larger than what TOS1.4 used)
Re: (Atari) ST-RAM boost WIP
@agranlund how long does it actually take to copy STram to altram ? I assume by "invalidate" you mean copy STram ? Would have to be done any time there is bus activity from a bus master.
I'd assume for the blitter it would be a disaster unless its bus assesses were clone to altram which is probably not simple.
For DMA floppy HDD, a second delay( or whatever ) in refreshing the shadow ram after the DMA access I doubt anyone would care about anyway. But when it gets to like 10seconds + for a delay, its still probably not a problem, if the game can run faster afterwards.. Games will hammer the floppy drive during play might be a issue but there is always going to be pros and cons to any upgrade. It is what it is and nothing more.
I'd assume for the blitter it would be a disaster unless its bus assesses were clone to altram which is probably not simple.
For DMA floppy HDD, a second delay( or whatever ) in refreshing the shadow ram after the DMA access I doubt anyone would care about anyway. But when it gets to like 10seconds + for a delay, its still probably not a problem, if the game can run faster afterwards.. Games will hammer the floppy drive during play might be a issue but there is always going to be pros and cons to any upgrade. It is what it is and nothing more.
https://www.exxosforum.co.uk/atari/ All my hardware guides - mods - games - STOS
https://www.exxosforum.co.uk/atari/store2/ - All my hardware mods for sale - Please help support by making a purchase.
viewtopic.php?f=17&t=1585 Have you done the Mandatory Fixes ?
Just because a lot of people agree on something, doesn't make it a fact. ~exxos ~
People should find solutions to problems, not find problems with solutions.
https://www.exxosforum.co.uk/atari/store2/ - All my hardware mods for sale - Please help support by making a purchase.
viewtopic.php?f=17&t=1585 Have you done the Mandatory Fixes ?
Just because a lot of people agree on something, doesn't make it a fact. ~exxos ~
People should find solutions to problems, not find problems with solutions.
Re: (Atari) ST-RAM boost WIP
I really don't know how invalidate would work for a shadow like this, and of that size. Copy from st-ram perhaps like you mentioned? It would probably end up horrible no matter what
Even if it was implemented more like a normal cache instead of an always-on shadow, clearing the cache-line valid flags for that size is probably not very fun?
The best is probably if it could be guaranteed to be always valid by aid of snooping as @ijor mentioned.
Re: (Atari) ST-RAM boost WIP
The second.agranlund wrote: ↑Wed Apr 20, 2022 9:12 am Edit: Hmmn, or are you saying that you'll snoop the registers for relevant info and then copy to shadow after DMA completed.
Or better yet, during DMA assuming the data is passed on the normal data bus. I expect it would be but I don't have schematics on hand.
That does indeed seem like it would work
The data lines are driven & can be snooped easily enough but the address lines aren't. Ijor's suggestion is to listen for commands sent to the DMA chip and therefore deduce which address lines are being written to as the data lines are driven.
How to even get that right on paper would drive me mad. Trying to write verilog to actually do it would make me give up computers
BW
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
DSTB1 Open source 16Mhz 68k and AltRAM accelerator for the ST
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
DSTB1 Open source 16Mhz 68k and AltRAM accelerator for the ST
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
Re: (Atari) ST-RAM boost WIP
Exactly. The data is passed over the main CPU data bus (where else it could be?). The main problem is that the address bus is not driven because there is nobody to drive it. So you have to snoop writes to the DMA addr registers. Yes, you could perform your own writes on parallel, no need to wait to DMA to complete.agranlund wrote: ↑Wed Apr 20, 2022 9:12 am Edit: Hmmn, or are you saying that you'll snoop the registers for relevant info and then copy to shadow after DMA completed.
Or better yet, during DMA assuming the data is passed on the normal data bus. I expect it would be but I don't have schematics on hand.
The second problem is that you need to grab the mentioned DMA signal from GLUE (or MMU). I assume you don't have access to that signal in your current hardware, so you might need to solder one wire. Incidentally, this signal is still bonded out on the STE GLUE/MMU combo. This is curious because in the combo it becomes an internal signal and in fact, it is not connected to the board. But what could be a useless pin, it makes this concept possible in the STE as well.
Not all of them. Well behaved GEM apps are not a problem. But there are many apps that perform their own screen management and can't be run from fastram. How do you know which software can run from fastram and which can not?But non-games already run fine from fastram?
Anyway, games or non-games, what is IMHO very important, is being transparent to the software. It is always possible to patch games, software, and even TOS. This might be interesting and very useful in itself, but that's a completely different game. What I am considering (may be dreaming ) is something that works purely at the hardware level. No need to patch any software, no need to patch or use a custom TOS. You install the hardware and it just works, seamlessly. There would always be software, mostly games, that would run too fast, but those will probably require to disable the accelerator altogether anyway.
Yeah, in a way it is like that . As said, most well behaved apps can simply run from fastram and that's it. But oh well, the whole concept of retro computing can be considered without a practical purpose. In the 8-bit real some people still implement tape based solutions. Is there anything slower, more irritating, and completely useless than loading software from tape??? But the nostalgic effect is invaluableI do agree that even among games, it's a minority you'll actually need/want to speed up and that makes the whole endeavour more interesting in a kind of "can it be done" type of challenge rather than doing it for the end result
For sure, the concept of making ST-RAM always much faster is interesting but it does feel a bit like a solution looking for a problem rather than the other way around
Not to the DMA chip, but to MMU. All the address registers live in MMU because none of the custom chips (besides Blitter) can drive the address bus. So MMU has to address DRAM directly from internal registers.
This is probably simpler than what you are imaging. Mist, and Mister even more, perform tasks way more complicated than that.How to even get that right on paper would drive me mad. Trying to write verilog to actually do it would make me give up computers
EDIT:
Replying to myself
I would need to think about this more carefully, but it might be feasible to get along without this signal. The DMA signal works likes some sort of chip select. But if the bus is granted, the address bus is wholly pulled up and there is a bus transaction, then that might be enough information to infer that it is a DMA (chip) transaction.The second problem is that you need to grab the mentioned DMA signal from GLUE (or MMU).
http://github.com/ijor/fx68k 68000 cycle exact FPGA core
FX CAST Cycle Accurate Atari ST core
http://pasti.fxatari.com
FX CAST Cycle Accurate Atari ST core
http://pasti.fxatari.com