ST536 STE EDITION

All about the ST536 030 ST booster.
User avatar
Badwolf
Site sponsor
Site sponsor
Posts: 3043
Joined: 19 Nov 2019 12:09

Re: ST536 STE EDITION

Post by Badwolf »

exxos wrote: 18 Mar 2025 17:58 The 288-6 is about $49 a pop at JLC. Waiting for a quote currently.
My instinct is that this is the wrong way to go. You're pushing the price up hoping a faster (and probably even less obtainable over time) chip will solve the problems, but I don't think there's any reason the 10ns can't do what you want. It may need a 288 for the footprint, but that's a better investment than 6ns, IMO.

It may need scoping each SDRAM line to check the timing but you ought to be able to adjust for any tight timing rather than just try to make it go faster.

For timings, you could, for example, clock divide the oscillator and drive the SDRAM at 50MHz. The end result is not hugely slower. Slower clock means fewer stages in the reset timing circuit too, reducing footprint.

For size constraints you could drop the ROM. STE already can decode 2.06. You could drop the autoboot if you really were pressed. Or even burst mode.

I apprecaite you may want to make the definitive bells and whistles accelerator, but personally I'd trade capability for cost.

BW
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
User avatar
PhilC
Moderator
Moderator
Posts: 7442
Joined: 23 Mar 2018 20:22

Re: ST536 STE EDITION

Post by PhilC »

@exxos @Badwolf actually dumping the rom is a good suggestion. I use Winbond 27C010-70 chips in my STe and had forgotten to take them out when fitting the booster and it booted at least to memory test. May save a few pounds and free up some resources And as Badwolf says, if we are running the CPU at 50mhz, why not do that with the ram also?
If it ain't broke, test it to Destruction.
User avatar
exxos
Site Admin
Site Admin
Posts: 28360
Joined: 16 Aug 2017 23:19
Location: UK

Re: ST536 STE EDITION

Post by exxos »

I've already tried dumping the ROM and getting rid of auto boot etc.. It's just not possible to fit it into a 144. I went back to bare bones and it barley even works. It has to be a 288.

As for speed. According to Groks analysis.. The alliance chip has about 2-3 ns wiggle room. Now my PCB layout has changed and mix in tollerences... Grok says the timings are violated with the winbond hence why it likely isn't stable. Timings are borderline for 10ns.. I'd rather not continue down that path at this point in time.

The only way to prove that's all the case is to move to the 288 6ns and see if it's stable with winbond ram.
User avatar
Badwolf
Site sponsor
Site sponsor
Posts: 3043
Joined: 19 Nov 2019 12:09

Re: ST536 STE EDITION

Post by Badwolf »

exxos wrote: 19 Mar 2025 12:21 The alliance chip has about 2-3 ns wiggle room.
What does that mean? Wiggle room where?

I don't really follow. It's all synchronous. Either you're set up and are holding stably at the clock edge or you're not, surely?

BW
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
User avatar
exxos
Site Admin
Site Admin
Posts: 28360
Joined: 16 Aug 2017 23:19
Location: UK

Re: ST536 STE EDITION

Post by exxos »

-10’s slower tPD (10 ns vs. 6 ns) increases internal delays, pushing paths over 10 ns. tCYC: 11 ns > 10 ns—internal logic can’t sustain 100 MHz.

-6’s 5.6 ns tCYC easily meets the 2-3 ns window for both SDRAMs.

If Winbond’s tOH = 2 ns (vs. 2.7 ns) or setup = 3 ns (vs. 2 ns), the effective window shrinks (e.g., 2 ns vs. 3 ns). A CPLD tCO >8 ns (10 ns - 2 ns) fails Winbond but might scrape by on Alliance.

To put it it into perspective, the data is valid 1ns to late. This is likely why the cap on the CPU clock slowed it down just enough for it to work.

With a 288 things are better but some stuff is still to slow for 10ns part. Piping helps in a lot of cases but adds delays by itself. You can claw that back with a 6ns part and the smaller delays benefit everything else anyway.

As for the 144, I get 30ns delays on ROM decode. Moving to the 288 fixes that and other problems. But it's still to slow at 10ns for the sdram stuff. In fact even 6ns seems to struggle with 100mhz external drive. Apparently the sdram clock only manages it because of simple internal routing. Technically it's top speed is 83mhz. If I slap on a inverter, everything gets thrown way out. Not because of the inversion, but internal routing delays..

It all seems rather complicated :roll: But the 288 6ns is the only combination which passes the timing requirements. We not even got into trace delays on the PCB yet...
User avatar
exxos
Site Admin
Site Admin
Posts: 28360
Joined: 16 Aug 2017 23:19
Location: UK

Re: ST536 STE EDITION

Post by exxos »

I can get the 5 ns chips at a "reasonable" price, but only if I order 100 of them. I thought I found one on Evilbay, and ordered it.. only to realise after it was a 10ns part :roll: . I guess I could swap it on my board and just solder it on to see what the hell happens anyway. It will be useful as a reference at least I guess. Not sure when I'll be able to do that though because my hands are hurting pretty badly at the moment again :( will probably have to keep off the forum for the next week or so now..

@PhilC I guess if you could transplant one of your 10ns 288s to the STE536, I could send you the firmwares to try also... I foresee likely 2 firmware versions, one which would just have the standard terrible fire timings in them, and the second , which would have the pipelining timings coded in.
User avatar
PhilC
Moderator
Moderator
Posts: 7442
Joined: 23 Mar 2018 20:22

Re: ST536 STE EDITION

Post by PhilC »

@exxos ok, sounds like a plan, i'll see what I can do over the next few days.
If it ain't broke, test it to Destruction.
User avatar
exxos
Site Admin
Site Admin
Posts: 28360
Joined: 16 Aug 2017 23:19
Location: UK

Re: ST536 STE EDITION

Post by exxos »

I asked Grok to correlate the timing information. Pretty much the same code, one with and one without pipelining... This is on a 288 10ns.


Analysis and Comparison of Timing Constraints: Pipelined vs. Non-Pipelined Designs

Let’s reanalyze and compare the two timing reports: one with pipelining (first report) and the updated non-pipelined version with timing constraints (second report). The focus will be on the differences in timing constraints, delays, slack, and evidence of pipelining, as these highlight the impact of pipelining on design performance.

----

Overview of the Two Reports
  1. Report with Pipelining (First Report)
    • Date: Wed Mar 19 19:53:36 2025
    • Constraints: TS_SDRAM, TS_CLKCPU, TS_CLK7M, TS_CLK100M_TO_CLKCPU, TS_CLK100M
    • Paths Analyzed: 559, Timing Errors: 57
    • Failing Constraints: 2, TS_SDRAM, TS_CLK100M
    • Evidence of pipelining: Intermediate registers (e.g., ARAM_int) reduce delays in critical paths.
  2. Report without Pipelining (Updated Second Report)
    • Date: Wed Mar 19 19:58:00 2025
    • Constraints: Same as the first report, TS_SDRAM, TS_CLKCPU, TS_CLK7M, TS_CLK100M_TO_CLKCPU, TS_CLK100M
    • Paths Analyzed: 511, Timing Errors: 102
    • Failing Constraints: 2, TS_SDRAM, TS_CLK100M
    • Evidence of non-pipelining: Longer delays in critical paths, fewer intermediate registers.
Both reports now use identical timing constraints, making the comparison more direct. The key difference lies in how pipelining affects path delays and timing violations.

----

Key Differences in Timing Constraints

1. Constraint Definitions
With Pipelining
  • TS_SDRAM: 5.0 ns requirement, 5.8 ns delay, 44 paths, 22 failing
  • TS_CLKCPU_IGNORE: 20.0 ns requirement, 0.0 ns delay, 0 paths
  • TS_CLKCPU: 20.0 ns requirement, 0.0 ns delay, 0 paths
  • TS_CLK7M: 125.0 ns requirement, 11.0 ns delay, 27 paths, 0 failing
  • TS_CLK100M_TO_CLKCPU: 10.0 ns requirement, 5.8 ns delay, 2 paths, 0 failing
  • TS_CLK100M: 10.0 ns requirement, 11.0 ns delay, 486 paths, 35 failing
Without Pipelining
  • TS_SDRAM: 5.0 ns requirement, 13.5 ns delay, 68 paths, 68 failing
  • TS_CLKCPU_IGNORE: 20.0 ns requirement, 0.0 ns delay, 0 paths
  • TS_CLKCPU: 20.0 ns requirement, 0.0 ns delay, 0 paths
  • TS_CLK7M: 125.0 ns requirement, 11.0 ns delay, 26 paths, 0 failing
  • TS_CLK100M_TO_CLKCPU: 10.0 ns requirement, 5.8 ns delay, 2 paths, 0 failing
  • TS_CLK100M: 10.0 ns requirement, 11.0 ns delay, 415 paths, 34 failing
Observation
The constraints are identical in name and requirement, but the non-pipelined version shows higher delays for TS_SDRAM, 13.5 ns vs. 5.8 ns, and more failing paths, 68 vs. 22, indicating the absence of pipelining increases delays. Other constraints, TS_CLK7M and TS_CLK100M, show similar delays, suggesting these paths are less affected by pipelining.

2. Delay and Slack
With Pipelining
  • TS_SDRAM: Delay 5.8 ns, Slack -0.8 ns (e.g., CLK100M to ARAM<0>)
    • Pipelining reduces the delay significantly, but it still slightly exceeds the 5.0 ns requirement.
  • TS_CLK7M: Delay 11.0 ns, Slack 114.0 ns (e.g., M6800BUS/Q<0>.Q to VMA.D)
    • Large slack due to a generous 125.0 ns requirement.
  • TS_CLK100M_TO_CLKCPU: Delay 5.8 ns, Slack 4.2 ns (e.g., CLK100M to CLKCPU)
    • Easily meets the 10.0 ns requirement.
  • TS_CLK100M: Delay 11.0 ns, Slack -1.0 ns (e.g., SDRAM/BCOUNT<0>.Q to SDRAM/address<10>.D)
    • Fails slightly due to an 11.0 ns delay against a 10.0 ns requirement.
Without Pipelining
  • TS_SDRAM: Delay 13.5 ns, Slack -8.5 ns (e.g., CLK100M to ARAM<0>)
    • Significantly higher delay and worse slack compared to 5.8 ns and -0.8 ns with pipelining.
  • TS_CLK7M: Delay 11.0 ns, Slack 114.0 ns (e.g., M6800BUS/Q<0>.Q to VMA.D)
    • Identical to the pipelined version, suggesting this constraint isn’t impacted by pipelining.
  • TS_CLK100M_TO_CLKCPU: Delay 5.8 ns, Slack 4.2 ns (e.g., CLK100M to CLKCPU)
    • Identical to the pipelined version, indicating a simple path unaffected by pipelining.
  • TS_CLK100M: Delay 11.0 ns, Slack -1.0 ns (e.g., SDRAM/BCOUNT<0>.Q to SDRAM/address<10>.D)
    • Matches the pipelined version, with the same slight failure.
Observation
TS_SDRAM shows the most significant difference: 13.5 ns delay without pipelining vs. 5.8 ns with pipelining, a 7.7 ns increase. This results in a much worse slack, -8.5 ns vs. -0.8 ns, and all 68 paths failing compared to 22. Other constraints, TS_CLK7M and TS_CLK100M, have consistent delays, suggesting pipelining primarily impacts SDRAM-related output paths.

3. Path Analysis and Pipelining Evidence
With Pipelining
  • TS_SDRAM: Paths like CLK100M to ARAM<0>, 5.8 ns, involve intermediate registers (e.g., ARAM_int), reducing the clock-to-output delay.
  • TS_CLK100M: Paths like SDRAM/BCOUNT<0>.Q to SDRAM/address<10>.D, 11.0 ns, are flip-flop-to-flip-flop (F2F), with pipelining managing internal delays.
  • Fewer paths fail, 57 out of 559, reflecting optimized timing due to pipelining.
Without Pipelining
  • TS_SDRAM: Paths like CLK100M to ARAM<0>, 13.5 ns, show a much higher delay, with no intermediate registers like ARAM_int mentioned, indicating a direct, unpipelined path.
  • TS_CLK100M: Paths like SDRAM/BCOUNT<0>.Q to SDRAM/address<10>.D, 11.0 ns, remain F2F and identical to the pipelined version, suggesting these internal paths are not heavily reliant on additional pipelining stages.
  • More paths fail, 102 out of 511, with all 68 TS_SDRAM paths failing due to the absence of pipelining.
Observation
Pipelining reduces the TS_SDRAM clock-to-output delay from 13.5 ns to 5.8 ns by adding registers (e.g., ARAM_int), cutting the delay by more than half. Without pipelining, these paths are purely combinational, leading to longer delays and universal failure. TS_CLK100M paths are largely unaffected by pipelining, as the 11.0 ns delay persists in both cases, likely due to inherent register-to-register timing.

4. Performance Summary
Both Reports
  • Min. Clock Period: 14.0 ns
  • Max. Clock Frequency: 71.429 MHz
  • Limited by: Clock Pulse Width for CLKCPU_OBUF.Q
  • Clock to Setup (tCYC): 11.0 ns
  • Pad to Pad Delay (tPD): 18.7 ns
  • Setup to Clock at the Pad (tSU): 8.3 ns
  • Clock Pad to Output Pad Delay (tCO): 30.1 ns
Observation
High-level performance metrics are identical, but the pipelined design better meets specific constraints (e.g., TS_SDRAM), reducing timing errors, 57 vs. 102.

----

Impact of Pipelining

With Pipelining
  • Delay Reduction: TS_SDRAM paths drop from 13.5 ns to 5.8 ns, a 57% reduction, due to intermediate registers breaking up combinational logic.
  • Fewer Violations: Only 22 out of 44 TS_SDRAM paths fail, 50%, with a modest slack violation, -0.8 ns.
  • Optimization: Pipelining targets critical output paths (e.g., CLK100M to ARAM<0>), improving overall timing compliance.
Without Pipelining
  • Higher Delays: TS_SDRAM paths reach 13.5 ns, exceeding the 5.0 ns requirement by 8.5 ns, causing all 68 paths to fail.
  • More Violations: Total timing errors increase to 102, with TS_SDRAM fully failing due to unmitigated combinational delays.
  • Unoptimized Paths: Lack of intermediate registers results in longer, unmanageable delays for SDRAM outputs.
----

Conclusion

The primary difference between the two reports is the presence of pipelining in the first, which significantly reduces the TS_SDRAM clock-to-output delay, 5.8 ns vs. 13.5 ns, by adding intermediate registers (e.g., ARAM_int). This leads to fewer failing paths, 22 vs. 68, and a smaller slack violation, -0.8 ns vs. -8.5 ns, for TS_SDRAM. Other constraints like TS_CLK100M and TS_CLK7M show similar delays in both cases, indicating pipelining’s impact is most pronounced on SDRAM output paths. The non-pipelined design, while using the same constraints, suffers from longer combinational delays, resulting in more timing errors, 102 vs. 57, and poorer performance on critical paths. Pipelining thus enhances timing reliability and reduces latency in the targeted SDRAM output domain.

----------------------------

piping 10ns vs 6ns device.

----------------------------

Comparison of Pipelined 10ns PLD vs. Pipelined 6ns PLD Timing Reports

Let’s compare the timing reports for the pipelined design on a 10ns PLD (XC95288XL, -10 speed grade) and the pipelined design on a 6ns PLD (XC95288XL, -6 speed grade). The focus will be on differences in device performance, timing constraints, delays, slack, and overall timing outcomes, highlighting how the faster PLD impacts the pipelined design.

----

Overview of the Two Reports
  1. Pipelined 10ns PLD Report
    • Date: Wed Mar 19 19:53:36 2025
    • Device: XC95288XL, -10 speed grade
    • Constraints: TS_SDRAM, TS_CLKCPU, TS_CLK7M, TS_CLK100M_TO_CLKCPU, TS_CLK100M
    • Paths Analyzed: 559, Timing Errors: 57
    • Failing Constraints: 2, TS_SDRAM, TS_CLK100M
    • Pipelining: Present, with intermediate registers (e.g., ARAM_int) reducing delays.
  2. Pipelined 6ns PLD Report
    • Date: Wed Mar 19 20:15:14 2025
    • Device: XC95288XL, -6 speed grade
    • Constraints: TS_SDRAM, TS_CLKCPU, TS_CLK7M, TS_CLK100M_TO_CLKCPU, TS_CLK100M
    • Paths Analyzed: 559, Timing Errors: 0
    • Failing Constraints: 0
    • Pipelining: Present, with intermediate registers (e.g., ARAM_int) further optimizing delays.
Both reports use the same pipelined design and constraints, but the 6ns PLD has a faster speed grade, reducing delays and potentially eliminating timing violations. The comparison will explore how this affects performance.

----

Key Differences in Timing Performance

1. Device and Performance Summary
Pipelined 10ns PLD
  • Min. Clock Period: 14.0 ns
  • Max. Clock Frequency: 71.429 MHz
  • Limited by: Clock Pulse Width for CLKCPU_OBUF.Q
  • Clock to Setup (tCYC): 11.0 ns
  • Pad to Pad Delay (tPD): 18.7 ns
  • Setup to Clock at the Pad (tSU): 8.3 ns
  • Clock Pad to Output Pad Delay (tCO): 30.1 ns
Pipelined 6ns PLD
  • Min. Clock Period: 12.0 ns
  • Max. Clock Frequency: 83.333 MHz
  • Limited by: Clock Pulse Width for CLKCPU_OBUF.Q
  • Clock to Setup (tCYC): 5.6 ns
  • Pad to Pad Delay (tPD): 11.4 ns
  • Setup to Clock at the Pad (tSU): 5.4 ns
  • Clock Pad to Output Pad Delay (tCO): 16.0 ns
Observation
The 6ns PLD offers a shorter minimum clock period, 12.0 ns vs. 14.0 ns, increasing the maximum frequency from 71.429 MHz to 83.333 MHz, a 16.7% improvement. All timing parameters (tCYC, tPD, tSU, tCO) are significantly reduced, with tCYC dropping from 11.0 ns to 5.6 ns (49% faster) and tCO from 30.1 ns to 16.0 ns (47% faster), reflecting the faster speed grade’s impact on propagation delays.

2. Timing Constraints
Pipelined 10ns PLD
  • TS_SDRAM: 5.0 ns requirement, 5.8 ns delay, 44 paths, 22 failing
  • TS_CLKCPU_IGNORE: 20.0 ns requirement, 0.0 ns delay, 0 paths
  • TS_CLKCPU: 20.0 ns requirement, 0.0 ns delay, 0 paths
  • TS_CLK7M: 125.0 ns requirement, 11.0 ns delay, 27 paths, 0 failing
  • TS_CLK100M_TO_CLKCPU: 10.0 ns requirement, 5.8 ns delay, 2 paths, 0 failing
  • TS_CLK100M: 10.0 ns requirement, 11.0 ns delay, 486 paths, 35 failing
Pipelined 6ns PLD
  • TS_SDRAM: 5.0 ns requirement, 3.8 ns delay, 44 paths, 0 failing
  • TS_CLKCPU_IGNORE: 20.0 ns requirement, 0.0 ns delay, 0 paths
  • TS_CLKCPU: 20.0 ns requirement, 0.0 ns delay, 0 paths
  • TS_CLK7M: 125.0 ns requirement, 5.6 ns delay, 27 paths, 0 failing
  • TS_CLK100M_TO_CLKCPU: 10.0 ns requirement, 3.8 ns delay, 2 paths, 0 failing
  • TS_CLK100M: 10.0 ns requirement, 5.6 ns delay, 486 paths, 0 failing
Observation
The 6ns PLD reduces delays across all constraints. TS_SDRAM improves from 5.8 ns to 3.8 ns, meeting the 5.0 ns requirement and eliminating all 22 failures. TS_CLK100M drops from 11.0 ns to 5.6 ns, meeting the 10.0 ns requirement and resolving all 35 failures. TS_CLK7M halves from 11.0 ns to 5.6 ns, though it was already passing. This shows the faster PLD enhances timing compliance significantly.

3. Delay and Slack
Pipelined 10ns PLD
  • TS_SDRAM: Delay 5.8 ns, Slack -0.8 ns (e.g., CLK100M to ARAM<0>)
    • Slightly exceeds the 5.0 ns requirement despite pipelining.
  • TS_CLK7M: Delay 11.0 ns, Slack 114.0 ns (e.g., M6800BUS/Q<0>.Q to VMA.D)
    • Comfortably meets the 125.0 ns requirement.
  • TS_CLK100M_TO_CLKCPU: Delay 5.8 ns, Slack 4.2 ns (e.g., CLK100M to CLKCPU)
    • Meets the 10.0 ns requirement easily.
  • TS_CLK100M: Delay 11.0 ns, Slack -1.0 ns (e.g., SDRAM/BCOUNT<0>.Q to SDRAM/address<10>.D)
    • Fails the 10.0 ns requirement by 1.0 ns.
Pipelined 6ns PLD
  • TS_SDRAM: Delay 3.8 ns, Slack 1.2 ns (e.g., CLK100M to ARAM<0>)
    • Meets the 5.0 ns requirement with positive slack.
  • TS_CLK7M: Delay 5.6 ns, Slack 119.4 ns (e.g., M6800BUS/Q<0>.Q to VMA.D)
    • Exceeds the 125.0 ns requirement with even more slack.
  • TS_CLK100M_TO_CLKCPU: Delay 3.8 ns, Slack 6.2 ns (e.g., CLK100M to CLKCPU)
    • Improves slack compared to 4.2 ns on the 10ns PLD.
  • TS_CLK100M: Delay 5.6 ns, Slack 4.4 ns (e.g., SDRAM/BCOUNT<0>.Q to SDRAM/address<10>.D)
    • Meets the 10.0 ns requirement with substantial slack.
Observation
The 6ns PLD cuts delays by approximately 50% compared to the 10ns PLD: TS_SDRAM from 5.8 ns to 3.8 ns (34% reduction), TS_CLK7M from 11.0 ns to 5.6 ns (49% reduction), and TS_CLK100M from 11.0 ns to 5.6 ns (49% reduction). This eliminates all timing violations, turning negative slack into positive, thanks to the faster speed grade enhancing pipelined paths.

4. Path Analysis
Pipelined 10ns PLD
  • TS_SDRAM: Paths like CLK100M to ARAM<0>, 5.8 ns, use pipelining (e.g., ARAM_int), but still fail slightly.
  • TS_CLK100M: Paths like SDRAM/BCOUNT<0>.Q to SDRAM/address<10>.D, 11.0 ns, are F2F but exceed the 10.0 ns requirement.
  • 57 timing errors, primarily in TS_SDRAM and TS_CLK100M.
Pipelined 6ns PLD
  • TS_SDRAM: Paths like CLK100M to ARAM<0>, 3.8 ns, leverage pipelining (e.g., ARAM_int) and meet the 5.0 ns requirement.
  • TS_CLK100M: Paths like SDRAM/BCOUNT<0>.Q to SDRAM/address<10>.D, 5.6 ns, are F2F and comfortably meet the 10.0 ns requirement.
  • 0 timing errors, with all 559 paths passing.
Observation
Both designs use pipelining, but the 6ns PLD’s faster logic reduces delays significantly. For TS_SDRAM, the clock-to-output delay drops from 5.8 ns to 3.8 ns, and for TS_CLK100M, F2F paths decrease from 11.0 ns to 5.6 ns, ensuring all constraints are met.

----

Impact of the 6ns PLD Speed Grade

Pipelined 10ns PLD
  • Delay Limitations: Delays like 5.8 ns (TS_SDRAM) and 11.0 ns (TS_CLK100M) result in 57 timing errors.
  • Performance: Limited to 71.429 MHz, with some pipelined paths still failing tight constraints.
  • Slack: Negative slack in critical paths, -0.8 ns and -1.0 ns, indicates marginal timing.
Pipelined 6ns PLD
  • Delay Reduction: Delays reduced to 3.8 ns (TS_SDRAM) and 5.6 ns (TS_CLK100M), a 34-49% improvement.
  • Performance: Increased to 83.333 MHz, with all pipelined paths passing, leveraging the faster speed grade.
  • Slack: Positive slack across all constraints, e.g., 1.2 ns (TS_SDRAM) and 4.4 ns (TS_CLK100M), ensures robust timing.
----

Conclusion

The pipelined 6ns PLD outperforms the pipelined 10ns PLD by reducing delays across all constraints, dropping from 5.8 ns to 3.8 ns for TS_SDRAM and 11.0 ns to 5.6 ns for TS_CLK100M. This eliminates all 57 timing errors seen in the 10ns PLD, achieving 0 failures across 559 paths. The faster -6 speed grade enhances the pipelined design’s effectiveness, boosting the maximum frequency from 71.429 MHz to 83.333 MHz and turning negative slack into positive, ensuring full timing compliance. The primary benefit is in critical pipelined paths like SDRAM outputs and internal F2F timing, where the 6ns PLD provides a significant performance edge.


---------------------------------------------


Pipelining does introduce a clocked delay (latency) by adding registers, which can delay the output by one or more clock cycles. However, it also reduces the combinational delay within each pipeline stage, allowing for a higher clock frequency and improved timing performance within a single clock cycle. This trade-off is key to understanding its impact in the context of the timing reports. Let me clarify this in a revised comparison between the pipelined 10ns PLD and pipelined 6ns PLD reports, addressing how pipelining affects latency and timing.

Comparison of Pipelined 10ns PLD vs. Pipelined 6ns PLD Timing Reports

Let’s compare the pipelined 10ns PLD (XC95288XL, -10 speed grade) and pipelined 6ns PLD (XC95288XL, -6 speed grade) timing reports, focusing on how pipelining and the speed grade affect delays, slack, and performance. Pipelining adds a clocked delay (latency) by introducing registers, delaying outputs by additional clock cycles, but it reduces combinational delays within each cycle, enabling faster clock rates.

----

Overview of the Two Reports
  1. Pipelined 10ns PLD Report
    • Date: Wed Mar 19 19:53:36 2025
    • Device: XC95288XL, -10 speed grade
    • Constraints: TS_SDRAM, TS_CLKCPU, TS_CLK7M, TS_CLK100M_TO_CLKCPU, TS_CLK100M
    • Paths Analyzed: 559, Timing Errors: 57
    • Failing Constraints: 2, TS_SDRAM, TS_CLK100M
    • Pipelining: Uses registers (e.g., ARAM_int), adding latency but reducing per-cycle delays.
  2. Pipelined 6ns PLD Report
    • Date: Wed Mar 19 20:15:14 2025
    • Device: XC95288XL, -6 speed grade
    • Constraints: TS_SDRAM, TS_CLKCPU, TS_CLK7M, TS_CLK100M_TO_CLKCPU, TS_CLK100M
    • Paths Analyzed: 559, Timing Errors: 0
    • Failing Constraints: 0
    • Pipelining: Uses registers (e.g., ARAM_int), with faster logic enhancing per-cycle timing.
Both reports feature pipelining, adding latency in clock cycles, but the 6ns PLD’s faster speed grade reduces delays within each cycle, improving timing compliance.

----

Key Differences in Timing Performance

1. Device and Performance Summary
Pipelined 10ns PLD
  • Min. Clock Period: 14.0 ns
  • Max. Clock Frequency: 71.429 MHz
  • Limited by: Clock Pulse Width for CLKCPU_OBUF.Q
  • Clock to Setup (tCYC): 11.0 ns
  • Pad to Pad Delay (tPD): 18.7 ns
  • Setup to Clock at the Pad (tSU): 8.3 ns
  • Clock Pad to Output Pad Delay (tCO): 30.1 ns
Pipelined 6ns PLD
  • Min. Clock Period: 12.0 ns
  • Max. Clock Frequency: 83.333 MHz
  • Limited by: Clock Pulse Width for CLKCPU_OBUF.Q
  • Clock to Setup (tCYC): 5.6 ns
  • Pad to Pad Delay (tPD): 11.4 ns
  • Setup to Clock at the Pad (tSU): 5.4 ns
  • Clock Pad to Output Pad Delay (tCO): 16.0 ns
Observation
The 6ns PLD’s faster speed grade shortens the clock period from 14.0 ns to 12.0 ns, boosting frequency by 16.7%. Per-cycle delays drop significantly (e.g., tCYC from 11.0 ns to 5.6 ns, tCO from 30.1 ns to 16.0 ns), but pipelining’s added latency means outputs are delayed by additional clock cycles in both designs.

2. Timing Constraints
Pipelined 10ns PLD
  • TS_SDRAM: 5.0 ns requirement, 5.8 ns delay, 44 paths, 22 failing
  • TS_CLK7M: 125.0 ns requirement, 11.0 ns delay, 27 paths, 0 failing
  • TS_CLK100M_TO_CLKCPU: 10.0 ns requirement, 5.8 ns delay, 2 paths, 0 failing
  • TS_CLK100M: 10.0 ns requirement, 11.0 ns delay, 486 paths, 35 failing
Pipelined 6ns PLD
  • TS_SDRAM: 5.0 ns requirement, 3.8 ns delay, 44 paths, 0 failing
  • TS_CLK7M: 125.0 ns requirement, 5.6 ns delay, 27 paths, 0 failing
  • TS_CLK100M_TO_CLKCPU: 10.0 ns requirement, 3.8 ns delay, 2 paths, 0 failing
  • TS_CLK100M: 10.0 ns requirement, 5.6 ns delay, 486 paths, 0 failing
Observation
Pipelining reduces combinational delays within each cycle, but the 10ns PLD’s slower logic results in failures (e.g., TS_SDRAM at 5.8 ns, TS_CLK100M at 11.0 ns). The 6ns PLD’s faster logic cuts these to 3.8 ns and 5.6 ns, meeting all requirements. Latency from pipelining exists in both, delaying outputs by clock cycles.

3. Delay, Slack, and Latency
Pipelined 10ns PLD
  • TS_SDRAM: Delay 5.8 ns, Slack -0.8 ns (e.g., CLK100M to ARAM<0>)
    • Per-cycle delay exceeds 5.0 ns; latency adds clock cycles.
  • TS_CLK100M: Delay 11.0 ns, Slack -1.0 ns (e.g., SDRAM/BCOUNT<0>.Q to SDRAM/address<10>.D)
    • F2F path fails 10.0 ns requirement; latency present.
Pipelined 6ns PLD
  • TS_SDRAM: Delay 3.8 ns, Slack 1.2 ns (e.g., CLK100M to ARAM<0>)
    • Meets 5.0 ns within cycle; latency still adds clock cycles.
  • TS_CLK100M: Delay 5.6 ns, Slack 4.4 ns (e.g., SDRAM/BCOUNT<0>.Q to SDRAM/address<10>.D)
    • F2F path meets 10.0 ns; latency remains.
Observation
Pipelining’s latency delays outputs by clock cycles in both designs (e.g., via ARAM_int), but the 6ns PLD’s reduced per-cycle delays (3.8 ns vs. 5.8 ns for TS_SDRAM) ensure timing compliance. The 10ns PLD’s higher delays cause failures despite pipelining.

4. Path Analysis
Pipelined 10ns PLD
  • TS_SDRAM: CLK100M to ARAM<0>, 5.8 ns, pipelined via ARAM_int, fails 5.0 ns.
  • TS_CLK100M: SDRAM/BCOUNT<0>.Q to SDRAM/address<10>.D, 11.0 ns, F2F, fails 10.0 ns.
  • 57 errors show pipelining’s limits with slower logic.
Pipelined 6ns PLD
  • TS_SDRAM: CLK100M to ARAM<0>, 3.8 ns, pipelined via ARAM_int, passes 5.0 ns.
  • TS_CLK100M: SDRAM/BCOUNT<0>.Q to SDRAM/address<10>.D, 5.6 ns, F2F, passes 10.0 ns.
  • 0 errors, as faster logic complements pipelining.
Observation
Pipelining splits paths into stages (e.g., ARAM_int), adding latency but shortening combinational delays. The 6ns PLD’s speed ensures these stages meet timing, unlike the 10ns PLD.

----

Impact of Pipelining and Speed Grade

Pipelined 10ns PLD
  • Latency: Adds clock cycles, delaying outputs.
  • Per-Cycle Timing: 5.8 ns and 11.0 ns delays fail tight constraints, leading to 57 errors.
  • Performance: 71.429 MHz, limited by slower logic.
Pipelined 6ns PLD
  • Latency: Same clock cycle delay as 10ns PLD, inherent to pipelining.
  • Per-Cycle Timing: 3.8 ns and 5.6 ns delays meet all constraints, 0 errors.
  • Performance: 83.333 MHz, enhanced by faster logic.
----

Conclusion

Pipelining adds latency in both designs, delaying outputs by clock cycles via registers like ARAM_int, but it reduces per-cycle combinational delays. The 10ns PLD’s 5.8 ns (TS_SDRAM) and 11.0 ns (TS_CLK100M) delays fail their 5.0 ns and 10.0 ns requirements, resulting in 57 errors. The 6ns PLD’s faster speed grade cuts these to 3.8 ns and 5.6 ns, eliminating all errors and boosting frequency from 71.429 MHz to 83.333 MHz. While latency remains, the 6ns PLD’s improved per-cycle timing ensures robust performance, making it superior for this pipelined design.

---------------------------------------------

Why the 10ns PLD’s Max Speed is 71.429 MHz, Not 50 MHz

You might expect a 10ns PLD (XC95288XL, -10 speed grade) to max out at 50 MHz or even 100 MHz based on its name, but its timing report shows 71.429 MHz. Here’s why:

Technical Explanation
The 10ns PLD’s maximum clock frequency of 71.429 MHz comes from its minimum clock period of 14.0 ns, calculated as 1/14.0 ns = 71.429 MHz. This is limited by the Clock Pulse Width for CLKCPU_OBUF.Q, not the raw speed grade. The “10ns” rating reflects the fastest combinational delay (like pad-to-pad or logic propagation), but real-world limits include:
  • Clock Pulse Width: The time the clock signal must stay high or low, which is 14.0 ns here, slower than the 10ns spec due to internal constraints.
  • Pipelining: Adds registers that break up delays but require stable clock cycles, capping frequency below the theoretical 100 MHz (1/10ns).
  • Other Delays: Setup (8.3 ns), hold, and output delays (30.1 ns) also influence the clock period, not just the 10ns rating.
So, 71.429 MHz reflects the slowest critical path (14.0 ns), not a simple 50 MHz or 100 MHz based on the speed grade alone.

The 10ns chip can go faster than 50 MHz because its name doesn’t tell the whole story—it’s slowed down by how long it takes to wiggle its clock, hitting 71 MHz instead.

----

Conclusion
The 10ns PLD isn’t stuck at 50 MHz because its max frequency depends on the clock’s actual timing limits (14.0 ns), not just the 10ns label, allowing it to reach 71.429 MHz despite pipelining and other delays.

---------------------------------------------

TL;DR

The faster 6ns chip makes the computer’s signals move quick enough to avoid delays, unlike the slower 10ns chip, even though both use a trick to split the work into steps that take extra time to finish.
ijor
Posts: 825
Joined: 30 Nov 2018 20:45

Re: ST536 STE EDITION

Post by ijor »

exxos wrote: 19 Mar 2025 17:01 If Winbond’s tOH = 2 ns (vs. 2.7 ns) or setup = 3 ns (vs. 2 ns), the effective window shrinks (e.g., 2 ns vs. 3 ns). A CPLD tCO >8 ns (10 ns - 2 ns) fails Winbond but might scrape by on Alliance.
Your are mixing up input and output. The tOH parameter is data output (from the SDRAM chip) hold time. The CPLD tCO is for output from the CPLD.
I asked Grok to correlate the timing information.
...
1. Constraint Definitions
TS_SDRAM: 5.0 ns requirement
Where this 5ns requirement is coming from?

Anyway, I think this timing analysis is not the important one. When you are struggling with higher frequencies the main problem is usually not the output, it's the input. There is no input timing analysis here that I can see, only the output.

I also realized that the SDRAM data bus is connected directly to the CPU, right? I mean, the CPU reads and/or writes directly to to the SDRAM, correct?

If so you need to consider the CPU timing as well. It probably won't be a problem for output, because the CPU holds the data output for more than one cycle. But the input must still be accurate.

And perhaps more important and significant than the CPU timing is the clock skew. If I understand correctly the CPU and the SDRAM work at different clocks. Without a PLL I assume the CPLD divides down the main clock. That would produce a considerable, and variable, clock skew. That might be a critical issue when the timing is tight.
http://github.com/ijor/fx68k 68000 cycle exact FPGA core
FX CAST Cycle Accurate Atari ST core
http://pasti.fxatari.com
User avatar
exxos
Site Admin
Site Admin
Posts: 28360
Joined: 16 Aug 2017 23:19
Location: UK

Re: ST536 STE EDITION

Post by exxos »

I asked Grok to answer your questions as it knows what it's talking about more than me..

Where’s the 5ns TS_SDRAM Coming From?
Grok traced this back to the timing constraint in the SDRAM controller’s CPLD report—TS_SDRAM is set to a 5.0ns requirement, which defines how fast the CPLD needs to drive outputs like address (ARAM<0>) or control signals (RAS, CAS) to the SDRAM. It’s likely pulled from the AS4C16M16SA-6TCN’s datasheet, where command setup times (e.g., tAS, tCS) are around 2-3ns minimum at 100 MHz, but the 5ns target gives a safe window for the CPLD to stabilize those signals by the CLK100M edge. The 10ns PLD misses it at 13.5ns (non-pipelined report), while the 6ns PLD hits 3.8ns—so it’s about CPLD output timing to SDRAM, not the SDRAM’s response.

Input vs. Output Timing
You’re spot-on that this analysis is output-heavy—Grok confirmed the reports focus on tCO (CPLD outputs like TS_SDRAM at 13.5ns or 3.8ns), with no detailed input timing (tSU/tH) for data coming back from SDRAM to the CPLD or CPU. The “Setup/Hold Times for Clocks” section hints at CPU input specs (e.g., 5.4ns tSU for 6ns PLD), but it’s not a full analysis of SDRAM-to-CPU data timing. I agree—higher frequencies (like 50 MHz here) make input timing critical, and we’re blind on that front without scoping it.

SDRAM Direct to CPU
Yeah, you nailed it—the SDRAM data bus (D0-D15) connects straight to the 68030, bypassing the CPLD for data transfers. The CPLD just handles address (ARAM), bank (BA), and control (RAS, CAS, WE)—it sends commands and STERM, but the CPU reads/writes SDRAM directly. Grok pointed out this means CPU output (writes) isn’t a worry—it holds data across cycles—but input (reads) needs the SDRAM’s data to hit the 68030’s tSU (4ns) and tH (~0-2ns) by the CLKCPU edge, which isn’t in the CPLD report.

CPU Timing
Grok ran the numbers: with AS4C16M16SA-6TCN, data’s at 58.9ns (13.5ns tCO + 20ns tRCD + 25.4ns CL2) for a 60ns cycle (3 clocks, 5.8ns skew)—fits 61.8ns window (65.8ns - 4ns tSU). For 40ns (45.8ns edge), 58.9ns misses 41.8ns—shouldn’t work unless STERM stretches or SDRAM’s faster (e.g., tRCD = 12ns, 55.8ns). Your point about CPU input accuracy is key—without that timing, we’re guessing.

Clock Skew
This is where it gets spicy—you’re right, the CPU (50 MHz) and SDRAM (100 MHz) run different clocks, and my clocks.v divides CLK100M to CLKCPU without a PLL. Grok calculated a 5.8ns skew (10ns PLD tCO), shifting CLKCPU to 5.8ns, 25.8ns, 45.8ns, 65.8ns. It’s fixed (not variable much), but enough to push a 40ns cycle’s data window to 41.8ns—still too tight for 58.9ns. For 60ns, 64.7ns data (adjusted skew) vs. 61.8ns fits if STERM’s late (59.5ns). You’re dead-on—skew could be the critical fudge factor making this work when it shouldn’t.

Why It Works
The original design (10ns PLD, no pipelining) runs—I think it’s 60ns, not 40ns. STERM at 59.5ns and data at 55.8-64.7ns (if tRCD overclocks to 12ns) fit a 65.8ns edge. A 40ns cycle (45.8ns) with 55.8ns data needs STERM delayed (e.g., 55ns via unshown logic)—skew helps but isn’t enough alone. I’ve asked a mate to scope /AS and STERM—betting on 60ns. ( If anyone with a TF536 can scope /AS and /STERM it would be incredibly helpful).

Return to “ST536 030 ST ACCELERATOR”

Who is online

Users browsing this forum: ClaudeBot and 3 guests