Booster clock switching + DMA nerding

Discussion and support for the DSTB1 & DFB1 boosters by BadWolf..
User avatar
Badwolf
Site sponsor
Site sponsor
Posts: 3043
Joined: 19 Nov 2019 12:09

Booster clock switching + DMA nerding

Post by Badwolf »

Background DSTB1 has problems on my STE when a) running in 16MHz clock-switching mode, b) running with AltRAM enabled c) running the ACSI-heavy [read DMA] Day of the Tentacle. Steve had problems that sounded a little similar with his STE536 build. I put two and two together and potentially got seven and a half...

Following on from a conversation here viewtopic.php?p=117591#p117591 which I don't want to hijack any longer, Exxos mentioned the STE is very finicky about CPU clock skew when it comes to booster clock shenanigans and specifically with an eye to DMA.

So I ran some tests.

DSTB1 uses a 'related clock' switching technique, which isn't technically the right thing to do (as they're not related -- a 16.5MHz local oscillator and a system 8MHz one). There are a lot of stages of logic from the input clock, which I call CLK8, to the output clock so a fair delay is inevitable.

But how sensitive is the STE?

Locking the CPU to 8MHz, I used a similar technique to Terriblefire in his earlier TF boards to sample the local clock and apply various offsets to how far back in time you go to allow me to vary things...

Over to old me:
Badwolf wrote: 23 Jul 2024 19:08 No harm in trying it. But you're right: don't think it's clock skew.
Well, I couldn't have calculated that much worse!

IMG_7988.jpeg
IMG_7989.jpeg

And, remarkably. That runs and passes RAM tests.

ACSI works, but Tentacle isn't stable. I'll tweak the skew a bit.

BW

PS: A delay of 15-20ns seems OK at 8MHz.


PS2: This seems a pretty good compensation -- Tentacle runs ok, but we're still at 8MHz. Keeping a note of this one for later, though.

(averaging turned on for the blue (CLKOUT) line as there's a bit of jitter)
IMG_7990.jpeg
PS3: With a lead of 15ns, the computer was stable for a few minutes but then had an interesting crash (not sure if it were DMA-related)

PS4: Actually since that 'interesting' crash (I couldn't reboot with the button), it's been running stably in that configuration.

Anyway, I'm going out on a limb and saying I don't think the STE is *super* sensitive to clock skew on the CPU, but all this is at 8MHz. If it really were critical I'd have thought taking it from +62.5 to -15ns would have picked up something a bit more dramatic than the crashes I have to foment with Tentacle.
You do not have the required permissions to view the files attached to this post.
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
User avatar
Badwolf
Site sponsor
Site sponsor
Posts: 3043
Joined: 19 Nov 2019 12:09

Re: Booster clock nerding

Post by Badwolf »

This is very interesting. Using my 16MHz switching logic but with the optimally aligned sampling technique above, I've been able to play Tentacle for about an hour. Which is great.

Obviously I'll have to see if that's repeatable, but AFAICS it could be four things:-

1) The clock skew has been corrected away enough -- as per Exxos' theory.

2) I'm not now using the motherboard CLK8 for anything other than that one sampling operation. Everything else is now derived from that internal register.

3) My previously incorrectly applied related clock switching logic is now actually being applied to a related clock and the subsequent better efficiency of switching is yielding dividends.

4) I was tinkering with the fitting options to see how many register stages I could get in and I've left it in density optimisation mode. Perhaps this is a good thing.


I caught a couple of switching operations here, for comparison with before:-
IMG_7991.jpeg
IMG_7992.jpeg
Now, to me, the clock switching still doesn't look all that jazzy (a minus for point 3), but in some of these you can see the skew is now right down (a plus for point 1).


But the first thing I'll test is 4). Let's optimise for speed again.

EDIT: well, annoyingly, that seems to have a significant effect. Optimising for speed causes the old crashes on DMA in Tentacle to return. Damn. Now it's down to the vagaries of CPLD design, of which I know little.

Never mind, I'll continue testing the other options to see if things get better or worse under the different options. Firstly I'll fiddle with the skew and take it a bit early again (this is important as I'm actually measuring to the output of the Schmitt trigger -- I don't know a good clippable place on the board to tap the clock)

BW
You do not have the required permissions to view the files attached to this post.
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
User avatar
Badwolf
Site sponsor
Site sponsor
Posts: 3043
Joined: 19 Nov 2019 12:09

Re: Booster clock nerding

Post by Badwolf »

OK, fitting mode back to speed all around with more TIMESPEC constraints thrown about willy-nilly and it's back to behaving itself again.

Time for some skew tests. Now I'm running with clock switching (up to 16MHz) turned on. Previous skew experiments were with AltRAM, but running only at 8MHz and there appeared virtually no sensitivity during DMA then.

* Skewing the CPU clock -15ns is throwing up crashes now. Which isn't particularly surprising 15ns is likely much more than the Schmitt delay.

* Skewing the clock +15 is very unreliable during DMA. Really quite frequent issues at this delay. Address errors, invalid instructions. Things definitely not behaving in the clock switching realm at this offset.

This is significant as it's possibly the realm in which my 'normal' clock routine operated.

* Skewing the clock at +30ns is oddly more stable than at 15. There's still the occasional crash, but is actually far more reliable than my 'stock' configuration. Which is quite surprising as we're a long way out of sync now. That said, we did see the system run at 180 degrees offset earlier, albeit at 8MHz only.

Perhaps being a little off is worse than wildy off?

* +50ns (I know really it should be 45, but measures nearer 50 to me) and the machine won't boot.

* + 62ns (practically anti phase), the machine boots and Tentacle loads but is not reliable under DMA -- graphics corruption.

* Wrapping around to -35ns or so, as above.


I must say I find this quite fascinating. The STE *is* sensitive to clock skew when there is switching involved, but in very unexpected ways. It could be there are a couple of sweet spots under 5ns and around 30. Perhaps I'd have to invert the 'fast' clock beyond that. Anyway, it's not clear if these are the source of the problems (or my poorly constrained verilog), but it really is interesting.

I think with the advent of this invert, sample and delay clock reconstruction approach I could now actually try a different switching method too to attack option 3, above.


BW
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
User avatar
Badwolf
Site sponsor
Site sponsor
Posts: 3043
Joined: 19 Nov 2019 12:09

Re: Booster clock nerding

Post by Badwolf »

Badwolf wrote: 23 Jul 2024 22:19 I think with the advent of this invert, sample and delay clock reconstruction approach I could now actually try a different switching method too to attack option 3, above.
I did this really quickly last night and as such didn't have chance to take any pictures of the clock switch (this is where a four channel scope would come into its own!).

Tentacle ran the introduction to completion, but I didn't have a chance to test any futher.

The way this works is based on Stephen L's invert, sample and delay trick that was used in the early days of the TF53x line. He may have a more advanced version now, but this going from my memory of the livestreams at the time.

The idea is:

* On every 66MHz (in my case) clock (I call this CLKOSC), sample the 8 MHz system clock (I call this CLK8);
* Invert this sample and add it to a short list of the previous samples;
* As you go backward in the sample list, you're going backwards in time in 15ns (1/66MHz) increments;
* But because you inverted the signal, you only need to go back half the length of a 8MHz cycle to reconstruct the cycle (although this assumes a 50/50 duty cycle -- you could not invert and go back a full length if you choose);
* Choose an appropriate sample from your historical list to be represent the system clock, compensating for your logic delays.

Now this isn't perfect -- it can only work in increments of the CLKOSC period and my CLKOSC is 66MHz versus Stephen's 100 -- so you end up with a fair chunk of jitter, but depending on circumstances this may work out to be a better option than using the clock signal directly.

This is for a number of reasons. The clock signal can be affected by load. The more things you hang off it, the more it's affected. It can also ring quite badly, as I found out using it to clock my SDRAM causing edge detection to trigger all over the place. Here you're sampling it, not triggering on it, so that problem is alieviated. Switching from 8MHz to booster working speed (16.5 in my case) can occur on any edge of the faster clock. But potentially the big one -- if Exxos is right about clock skew being a significant STE DMA issue -- is you can dial in your skew in those 15ns increments at least. The system clock, making its way through the switching logic I had before, could easily be delayed by 20ns or more. I could dial that down to 5.

Anyway, I'll play with this a bit more -- try it in the STFM again (although I think my STFM has other problems) -- and see if these positives outweigh the negative jitter effect on balance. The other thing to bear in mind is that this approach optimises only for one clock frequency. It won't work with switching systems (like the Falcon or MSTE).

I should also point out the biggest stability effect I saw was adding more TIMESPECs to the verilog synthesis process. I'm a total n00b at those things, so I've just liberally sprinkled them around. Those synthesis settings probably have a larger effect than all this logic voodoo.

BW
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
User avatar
Badwolf
Site sponsor
Site sponsor
Posts: 3043
Joined: 19 Nov 2019 12:09

Re: Booster clock nerding

Post by Badwolf »

Things I'd like to try next:

* Only allowing a clock switch when AS is deasserted
* Asserting HALT during bus arbitration -- I know this doesn't tristate the pins, but I'm thinking it may stop the CPU responding to input that wasn't for it at the end of a bus arb cycle at least

BW
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
User avatar
stephen_usher
Site sponsor
Site sponsor
Posts: 7376
Joined: 13 Nov 2017 19:19
Location: Oxford, UK.

Re: Booster clock switching + DMA nerding

Post by stephen_usher »

This is interesting, especially when it comes to the PiStorm project too.

Funily enough the PiStorm firmware is more stable on the STE than the STFM, which is as stable as a jelly in an earthquake.
Intro retro computers since before they were retro...
ZX81->Spectrum->Memotech MTX->Sinclair QL->520STM->BBC Micro->TT030->PCs & Sun Workstations.
Added code to the MiNT kernel (still there the last time I checked) + put together MiNTOS.
Collection now with added Macs, Amigas, Suns and Acorns.
User avatar
exxos
Site Admin
Site Admin
Posts: 28344
Joined: 16 Aug 2017 23:19
Location: UK

Re: Booster clock switching + DMA nerding

Post by exxos »

We have talked about all this before viewtopic.php?p=100673#p100673

Your not just fighting bad clocks but even 6ns skew was enough to cause issues. Even 2 buffers delay on the 8mhz clock can cause dma issues. It's again why I developed the STE DMA fix board for the 32mhz booster. It had, in my case at least, nothing to do with clock switching or anything else( unless the switching logic is glitchng), just bad clocks and skew.

How much skew a machine can cope with I've already explained and depends on many factors. I know you don't believe 8mhz is an issue, unless you see it yourself I guess. But I've seen it happen again and again and already documented many tests. You would have to read all my posts on booster issues to see all the factors involved.
User avatar
Badwolf
Site sponsor
Site sponsor
Posts: 3043
Joined: 19 Nov 2019 12:09

Re: Booster clock switching + DMA nerding

Post by Badwolf »

exxos wrote: 24 Jul 2024 14:20 How much skew a machine can cope with I've already explained and depends on many factors. I know you don't believe 8mhz is an issue, unless you see it yourself I guess. But I've seen it happen again and again and already documented many tests. You would have to read all my posts on booster issues to see all the factors involved.
It may not be 100% reliable with a bit of skew, but if it works at all at antiphase I can't point the finger at 10ns of skew and say it's to blame. No matter if I can foment it myself or not (and I could 100% foment it with AltRAM + clock switching, but not in any other combination -- now the fact it's AltRAM is not irrelvant IMO, it's likely just a mechanism to allow the switching to happen near bus arbitration).

So a skew may help demonstrate the problem, but I'm unconvinced it's the root cause.

With the exception of the ACIAs I think (which run at 1/10th the anyway), every bus cycle on the CPU is asynchronous by design. Bus arbitration even more so.

If everything works hunky dory with a big old skew bar the DMA [which is big if, but I don't have any non-DMA mass storage on my STs so hard to prove it's only DMA, I admit], then it sounds like it's the DMA controller that's got the problem. So then it becomes a case of what is the problem and how can we work around it?

Like the extended DTACK assertion on PSG access in STFMs, it becomes something to accomodate rather than try to avoid.

Remember the myth of the dog always barking moments before the phone rings? Yes, the dog peeing ultimately did make the phone ring, but it's very much not the root cause! :lol:

BW


NB: https://everything2.com/title/The+circu ... by+the+dog
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
User avatar
exxos
Site Admin
Site Admin
Posts: 28344
Joined: 16 Aug 2017 23:19
Location: UK

Re: Booster clock switching + DMA nerding

Post by exxos »

Badwolf wrote: 24 Jul 2024 15:15 So a skew may help demonstrate the problem, but I'm unconvinced it's the root cause.
If you want to investigate deeper , I suggest you look at BR and how fast the CPU reacts to it. IIRC , the NMOS CPU can take a clock cycle longer over the HC CPU. Of course that timing ultimately sets the RDY signal timing and leads to the general more well known DMA issues.

Have fun ;)
User avatar
Badwolf
Site sponsor
Site sponsor
Posts: 3043
Joined: 19 Nov 2019 12:09

Re: Booster clock switching + DMA nerding

Post by Badwolf »

exxos wrote: 24 Jul 2024 15:25 If you want to investigate deeper , I suggest you look at BR and how fast the CPU reacts to it. IIRC , the NMOS CPU can take a clock cycle longer over the HC CPU. Of course that timing ultimately sets the RDY signal timing and leads to the general more well known DMA issues.
In theory I should be able to delay the reaction to BR as long as I want*. After all, It may come when I'm mid cycle or when I'm held in HALT.

But yes, good idea. I'll go and find the write ups on the 'bad' DMA.

But, wait a moment! I was just re-reading the 68k manual page for bus arb timing and realised I'd completely forgotten that BG can be asserted mid cycle and the requestor waits for the end of the cycle to assert BGK.

Gah! Now your BG delay remark makes sense to me. How does the requestor know there's a cycle ongoing when it's off-board?

Ha! Now I'm at that 'how did that ever work' phase of debugging! :lol:

BW

* well, providing it's responded to within 100us, IIRC
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark

Return to “DSTB1 & DFB1 booster by BadWolf”

Who is online

Users browsing this forum: ClaudeBot and 1 guest