REV 3 - REV 5 - The beginning (ST536)

All about the ST536 030 ST booster.
User avatar
exxos
Site Admin
Site Admin
Posts: 28375
Joined: 16 Aug 2017 23:19
Location: UK

Re: REV 3 - The beginning

Post by exxos »

agranlund wrote: 13 Apr 2022 08:02 Yep exactly that, it has a lot of its work ram in the lower regions of ram, VDI variables and so on.
And there's the vector table which is accessed whenever there is an interrupt like @Elethiomel mentioned.
But as far as Gem/vdi speedup goes I think it's mostly because it's accessing its work ram a lot.
What RAM range exactly are you relocating then ?
User avatar
Badwolf
Site sponsor
Site sponsor
Posts: 3043
Joined: 19 Nov 2019 12:09

Re: REV 3 - The beginning

Post by Badwolf »

agranlund wrote: 13 Apr 2022 13:02
Badwolf wrote: 13 Apr 2022 12:42 BTW, idly fiddling with this: to get MAPROM building with vasm, all I needed to do was indent the SECTION commands and remove a .w from an addq. :)

Tag, you're it!
No thanks! Have tried a few times to get my head around how it does what it does! :lol:

I have just sumitted a pull request, though, which allows the tools as they currently stand to be build either way. I've also added conditional declaration of the three main config variables so they can be specified at build time, thus allowing MAPROM and FASTRAM to be built simultaneously without changing the code.

I've only tested on Hatari so far, I'm afraid.

Putting this in now in the hope it's useful when you come to consider the caching options.

BW
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
User avatar
Badwolf
Site sponsor
Site sponsor
Posts: 3043
Joined: 19 Nov 2019 12:09

Re: REV 3 - The beginning

Post by Badwolf »

exxos wrote: 13 Apr 2022 14:52 What RAM range exactly are you relocating then ?
From my incomplete understanding of the code it appears to be the first 32kB of memory (0x00000000-0x00008000) if it's TOS2.06, first 4kB otherwise.

BW
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
User avatar
agranlund
Site sponsor
Site sponsor
Posts: 1755
Joined: 18 Aug 2019 22:43
Location: Sweden

Re: REV 3 - The beginning

Post by agranlund »

Badwolf wrote: 13 Apr 2022 16:17 I have just sumitted a pull request, though, which allows the tools as they currently stand to be build either way. I've also added conditional declaration of the three main config variables so they can be specified at build time, thus allowing MAPROM and FASTRAM to be built simultaneously without changing the code.
Putting this in now in the hope it's useful when you come to consider the caching options.
Yikes.. and I just submitted something, based on that help you provided earlier for vasm:
https://github.com/agranlund/tftools

There are all new binaries there as well for the ones that want them.
maprom_c.prg is the one you want unless issues in tos206 etc etc :lol: )

I don't know if what you did in terms of makefile and compiling is better that what I did, it wouldn't surprise me if it was.
The "problem" I guess is that I reverted the folder structure back to how it was before it was cleaned up by someone else a while back. The cleanup was nice but it didn't felt Atari.. I suppose the pull request as-is may not be super valid anymore since the files moved around?
User avatar
Badwolf
Site sponsor
Site sponsor
Posts: 3043
Joined: 19 Nov 2019 12:09

Re: REV 3 - The beginning

Post by Badwolf »

agranlund wrote: 13 Apr 2022 16:34 I don't know if what you did in terms of makefile and compiling is better that what I did, it wouldn't surprise me if it was.
I can always have another look with the new files.
I suppose the pull request as-is may not be super valid anymore since the files moved around?
Screenshot_2022-04-13_16-39-04.png
It does look somewhat sub-optimal! :lol:

BW
You do not have the required permissions to view the files attached to this post.
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
User avatar
agranlund
Site sponsor
Site sponsor
Posts: 1755
Joined: 18 Aug 2019 22:43
Location: Sweden

Re: REV 3 - The beginning

Post by agranlund »

Badwolf wrote: 13 Apr 2022 16:40 It does look somewhat sub-optimal! :lol:
Nah, that looks fine to me.. ship it! :lol:


Badwolf wrote: 13 Apr 2022 16:31 From my incomplete understanding of the code it appears to be the first 32kB of memory (0x00000000-0x00008000) if it's TOS2.06, first 4kB otherwise.
Yep that's it. It also avoids some page which contain stuff that absolutely must be in st_ram (_dskbuf)

EmuTOS is more of a moving target than Tos206 so I don't dare move any more stuff there, but it's nice to at least be able to move the vectors and the supervisor stack if it's still there.

Besides, EmuTOS can be compiled with options that makes it put most of its internal work ram into altram already. Although you would have to build it yourself. In that case it'll just have the tiny bit of common stuff that all TOSs must have in exactly the same place in the lowest st-ram.
These builds are quite nice and fast :)
User avatar
Badwolf
Site sponsor
Site sponsor
Posts: 3043
Joined: 19 Nov 2019 12:09

Re: REV 3 - The beginning

Post by Badwolf »

I had a quick nosey at what you'd done versus what I'd done @agranlund, and you've provided more options more neatly than me, so all good!

So, here are the results of the DFB1 jury:-

FASTRAM 1.9 (I'm defing this as 100% in GB6):-
Frontbench score: ~4285 (100%)

Screenshot_2022-04-13_20-09-46.png

MAPROM 1.9:-
Frontbench score: ~3930 (91.7%)

Screenshot_2022-04-13_20-13-39.png

MAPROM_C 1.9:-
Frontbench score: ~4300 (100.4%)

Screenshot_2022-04-13_20-16-55.png


So, some interesting figures here.

First thing, I'm using TOS4 & NVDI. TOS4 does support caching ST-RAM so vanilla MAPROM which turns this off is unsurprisingly the poorest performer in both GB6 and Frontbench.

The most surprising thing is that caching the OS in AltRAM with MAPROM_C only has around a 6% display boost -- I would expect a lot of ROM access here -- especially surprising giving the 380% rating on ROM access.

Perhaps the blitting figure gives us a hint. This suffers a significant penalty over just FASTRAM.PRG. Now why would this be? Does the MMU tree traversal impact some of the optimisations that NVDI has honed? Is the traversal just a hit in general only offset by ROM accesses elsewhere?

Frontbench, on the other hand does show that (small but) consistent 0.4% speed boost. I was expecting more from the low-RAM caching to be honest, but perhaps that is cancelled by those MMU tree traversals when accessing screen memory again? The vast majority of Frontbench's non TT-RAM accesses are read/modify/write to the screen.

Anyway over to someone with a TF/ST536 (the proper target). :)

BW
You do not have the required permissions to view the files attached to this post.
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
User avatar
exxos
Site Admin
Site Admin
Posts: 28375
Joined: 16 Aug 2017 23:19
Location: UK

Re: REV 3 - The beginning

Post by exxos »

@Badwolf Actually documented that MAPROM seems slower than FASTROM on my website a while back. I just made the assumption that using the MMU has a performance hit. Whereas FASTROM just literally dumps the ROM at a new address.

So probably best sticking to MAPROM so we all testing the same setup. NVDI is adding a layer of issues , plus the blitter :lol:

EDIT:

I assume your using PRGFLAGS for NVDI ?
User avatar
agranlund
Site sponsor
Site sponsor
Posts: 1755
Joined: 18 Aug 2019 22:43
Location: Sweden

Re: REV 3 - The beginning

Post by agranlund »

Interesting results @Badwolf!

Although I always use NVDI myself I haven't actually compared any Gembench results on it yet :oops:
I'm going to do that and especially check if NVDI blitting is effected the same on my machine as on yours.

I think NVDI replaces a lot or maybe even most of the VDI drawing code that normally exist in ROM, so caching the OS in altram probably wont do that much for those kinds of things when using NVDI?
Never really reflected on it, everything is so stupidly fast anyway when using NVDI :lol:

I did get some quite interesting results before in those non-nvdi tests I did..
Generally speaking, having the ST-RAM cache enabled improved the result in all permutations except for one instance.

My best non-nvdi scoring run was actually:
"run from alt-ram, low-ram relo enabled, st-ram cache disabled"

Followed by:
"run from alt-ram, low-ram relo enabled, st-ram cache enabled"

I don't have an explanation for that at all.. nor the results you are getting :)
Nevertheless, "run from alt-ram, low-ram relo enabled, st-ram cache enabled" feels like the overall best setting on the ST even though it was only second best in Gembench specifically. It lost a bit in the high end (at least in Gembench) but gained a lot in the low end running stuff from st-ram.

That L1 is incredibly tiny though. It has only 16 lines of 16 bytes each (well ok, times two, one for instruction and one for data)

A cache miss, which I guess would be happening fairly often, would have to perform 8 memory accesses from ST-RAM to fill a line.
That can easily turn into a huge loss if it's not offset by actually accessing the rest of the line.
A cache miss from TT-RAM is obviously not great, but it's not nearly as bad because of 32bit access, plus burst mode on top of that, and well.. 50mhz instead of 8 :lol:

I am guessing something of that sort may have been happening in my unexplainable Gembench test?
Ie; different pieces of code competing for these lines, generating a higher ratio of refills in relation to usage..

But I'm just wildly guessing, it's super tricky to measure some kind of generic performance, what's best pretty much depends on what software is running.
A cache-conscious piece of software would freeze the cache in certain scenarios to avoid stuff being pushed out by other things etc.
For example if your just going to be doing a ton of incremental writes (filling the screen?) then it makes no sense to have that data go into the cache, removing things that you might benefit from being there instead of having to refill again.. In that case you'd just freeze it, do all those writes, and then unfreeze it again.
For ST software though, I think we're just going to have to count on luck and the fact that it's mostly, probably, going to be an overall net gain :)
User avatar
agranlund
Site sponsor
Site sponsor
Posts: 1755
Joined: 18 Aug 2019 22:43
Location: Sweden

Re: REV 3 - The beginning

Post by agranlund »

exxos wrote: 13 Apr 2022 21:40 @Badwolf Actually documented that MAPROM seems slower than FASTROM on my website a while back. I just made the assumption that using the MMU has a performance hit. Whereas FASTROM just literally dumps the ROM at a new address.
Yeah a larger, and/or deeper table could potentially have a negative impact on performance.
The table itself is in altram so there's a cost involved for it to look up a virtual->physical address. The table is a few level deep as well so worse case it can be multiple memory accesses getting to where you're going.

The MMU does have it's own built-in cache to offset the cost of table traversal but it only holds 22 entries so depending on stuff-and-things it may still end up have to do some additional memory accesses sometimes.

I think it's a bit similar to the scenario of caches in that it's hard to find a solution that is objectively best in all possible scenarios.
Having the rom (or the low ram) remapped into altram is generally quite good for most use-cases, especially when using the OS itself, but possibly a pointless mmu cost if running something that isn't actually taking advantage of it :)
I think here too, a slightly lower high-end is probably worth it to raise the lower-mid end of the overall experience?

There is also a real possibility that maybe maprom itself is doing something stupid? :D

Return to “ST536 030 ST ACCELERATOR”

Who is online

Users browsing this forum: ClaudeBot and 2 guests