You will not be able to post if you are still using Microsoft email addresses such as Hotmail etc
See here for more information viewtopic.php?f=20&t=7296

Shifter LOAD behaviour

General discussions or ideas about hardware.
User avatar
Badwolf
Site sponsor
Site sponsor
Posts: 2671
Joined: Tue Nov 19, 2019 12:09 pm

Re: Shifter LOAD behaviour

Post by Badwolf »

ijor wrote: Thu May 08, 2025 2:09 pm
Badwolf wrote: Thu May 08, 2025 11:03 am Current theory: I'm missing VSYNC edges. Or at least my counter reset logic is faulty.
This should be relatively easily to confirm at the MCU. The MCU should receive 16,000 words for a standard ST screen. If it doesn't, something is wrong.
I'm not sure I understand why you need a counter at the PLD in the first place. Shifter doesn't need a counter, why would you? All you need is to record which word is the first one on screen, that's all. The framebuffer might need a counter (or not), but it can maintain the counter itself, if it needs it. Transmitting a counter from the PLD to the framebuffer seems like a waste of bandwidth.
The interface from the board to the MCU isn't specifically designed for this purpose. It's a general purpose channel for address+data which is known good (enough) that I'm reusing for this little experiment.

The MCU is totally unaware of VSYNC pulses, so I'd have to invent a way to communicate that to get the MCU to do the counting. Better if I can get the CPLD to emulate an address (in this case, starting at zero).

BW
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
ijor
Posts: 583
Joined: Fri Nov 30, 2018 8:45 pm

Re: Shifter LOAD behaviour

Post by ijor »

Badwolf wrote: Thu May 08, 2025 7:52 pm The interface from the board to the MCU isn't specifically designed for this purpose. It's a general purpose channel for address+data which is known good (enough) that I'm reusing for this little experiment.

The MCU is totally unaware of VSYNC pulses, so I'd have to invent a way to communicate that to get the MCU to do the counting. Better if I can get the CPLD to emulate an address (in this case, starting at zero).
Ic :) Well, this shows precisely my point in my first message. Please give us the whole context. I know you mean well and have only good intentions. But not giving us the whole context makes us waste a lot of time and effort ... No big deal, back to our regular programming ... :)

I still suggest you use the MCU as a debugging tool, at least assuming you have full control of the MCU and the firmware source. I'm not saying developing a whole new interface. But the MCU could perform some checking that would be way much more easy to debug. It might involve some work. But debugging the CPLD alone might be much more difficult.
http://github.com/ijor/fx68k 68000 cycle exact FPGA core
FX CAST Cycle Accurate Atari ST core
http://pasti.fxatari.com
User avatar
Badwolf
Site sponsor
Site sponsor
Posts: 2671
Joined: Tue Nov 19, 2019 12:09 pm

Re: Shifter LOAD behaviour

Post by Badwolf »

Sorry, @ijor, I wasn't meant to obfuscating.

I was trying to reduce the problem to the simplest standalone question and giving a bit of background thereafter.

To that end I think I only actually posed two questions.

* Is my assumption about address offset inference (LOAD signals since VSYNC = words offset from start of famebuffer) valid?
* Is this (the example I gave) a valid verilog construct to implement my counter?

I think both of those have been answered so everything else was me just jabbering on about the background to things.


So, to explain the background further: I have a project consisting of a CPLD and a Microcontroller (a Pico 2 board) which monitors address lines (and a couple of other things like VSYNC) on GLUE and the data lines on the shifter.

The CPLD is intended to be a memory write sniffer which passes any writes it sees off to the MCU via an series of 8 bit parallel transfers clocked at 48MHz (it takes 12 clock cycles to transfer 24 address bits and 16 data bits -- 250ns).

It's not expected to be a perfect protocol. If the transfer is still going on when a new write is detected, the new write is dropped. If the MCU can't store the data fast enough, it's dropped. Potentially when the buffer fills a bit of data is lost too. That's all very much prototype code.

So the MCU isn't a brilliant debugging option as it's unlikely to ever be a 100% accurate representation of its area of interest and all of its pins are presently allocated meaning I can't even get a printf() out of it at the moment. I've not investigated the Pico's debug headers yet, that may be an option down the line.

The design mistake I made was an obvious one in hindsight. The data bus on the shifter chip is necessarily on the DRAM side of the data bus split. This means I will only ever be able to see writes that go to the DRAM (or shifter) rather than to registers or other chips etc. (obviously latching of those needs to happen when the write buffer is enabled too).

Now that all works well enough for my initial purposes, but wasn't really what I wanted: I was more interested in the CPU-side of the bus split.

So I thought: why not try to embrace this situation and have a go at sniffing the video data instead? But I didn't know much about the shifter. I knew how *I'd* do it and had an idea about what would have to make sense given the way the video address pointers can be changed during a vblank, but because the results looked all over the place I wanted to double check my inferrence was in line with acuality.

It is, which has shown the problem is somewhere else.

After a bit more testing last night and trying out different ways of resetting my counter I'm now starting to suspect my data capture on the MCU. Which isn't a great surprise -- I'm pushing it hard and am a first timer on RP2350 DMA development.

I think a logic analyser on that 8 bit clocked transfer may be the next step, presuming I can't get meaningful debugging from the Pico easily.

Thanks for showing an interest in the first place & helping me out!

BW
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
ijor
Posts: 583
Joined: Fri Nov 30, 2018 8:45 pm

Re: Shifter LOAD behaviour

Post by ijor »

Badwolf wrote: Fri May 09, 2025 11:11 am Sorry, @ijor, I wasn't meant to obfuscating.
Yeah, I know.
So the MCU isn't a brilliant debugging option as it's unlikely to ever be a 100% accurate representation of its area of interest and all of its pins are presently allocated meaning I can't even get a printf() out of it at the moment. I've not investigated the Pico's debug headers yet, that may be an option down the line.
I see. But I would still use it for debugging purposes. Make a firmware version that only makes statistics for the data transfer. Count the number of transfers and the time between each zero address (screen start). See if there is a problem. You can perform, say, a dozen screen transfers, then display the statistics.

The Pico has two processors (actually the Pico 2 has four, but you only use two at a time). You can use the second processor in parallel if the other one is too busy.
After a bit more testing last night and trying out different ways of resetting my counter I'm now starting to suspect my data capture on the MCU. Which isn't a great surprise -- I'm pushing it hard and am a first timer on RP2350 DMA development.
The screenshots you posted suggest that this is not the main problem. You are obviously resetting the counter at the PLD at the wrong time, for some reason. Otherwise you wouldn't get the top of the ST screen displayed at the center of the snapshot.

Once again, I strongly recommend you follow good synchronous design practices. I actually recommend you read some articles on the subject because you seem to completely ignore this principle. It is possible, or may be even likely, that this is not your actual problem at this time. But you have to start with good foundations.

For starters, try to use a single clock as much as possible. And be very careful when you transfer between different clock domains. I would do something like this:

Code: Select all

	// Use LOAD as a clock for capturing the bus and for nothing else!
	always @( posedge LOAD)
		Data <= D[15:0];
		
	always @( posedge mainClk) begin
		syncedVsync <= VSYNC;
		vsync1 <= syncedVsync;
		vsync2 <= vsync1;
		onVsync <= !vsync1 & !vsync2;	// You can use a larger glitch filter
		
		syncLoad <= LOAD;
		load1 <= syncLoad;
		load2 <= load1;
		
		if( onVsync)
			counter <= 0;
		// Raising edge
		else if( load2 & load1) begin
			syncData <= Data;		// Depending on the rest of the code, you might not need this synchronization step
			counter <= counter + 1'b1;
		end
http://github.com/ijor/fx68k 68000 cycle exact FPGA core
FX CAST Cycle Accurate Atari ST core
http://pasti.fxatari.com
User avatar
Badwolf
Site sponsor
Site sponsor
Posts: 2671
Joined: Tue Nov 19, 2019 12:09 pm

Re: Shifter LOAD behaviour

Post by Badwolf »

ijor wrote: Fri May 09, 2025 2:01 pm
So the MCU isn't a brilliant debugging option as it's unlikely to ever be a 100% accurate representation of its area of interest and all of its pins are presently allocated meaning I can't even get a printf() out of it at the moment. I've not investigated the Pico's debug headers yet, that may be an option down the line.
I see. But I would still use it for debugging purposes. Make a firmware version that only makes statistics for the data transfer. Count the number of transfers and the time between each zero address (screen start). See if there is a problem. You can perform, say, a dozen screen transfers, then display the statistics.
Possibly doable if I can isolate a couple of pins for a serial connection out.
The Pico has two processors (actually the Pico 2 has four, but you only use two at a time). You can use the second processor in parallel if the other one is too busy.
Yes, both currently in service plus two PIOs and at least one pair of chained DMA channels!

But a custom firmware could possibly concentrate on logging rather than processing.
After a bit more testing last night and trying out different ways of resetting my counter I'm now starting to suspect my data capture on the MCU. Which isn't a great surprise -- I'm pushing it hard and am a first timer on RP2350 DMA development.
The screenshots you posted suggest that this is not the main problem. You are obviously resetting the counter at the PLD at the wrong time, for some reason. Otherwise you wouldn't get the top of the ST screen displayed at the center of the snapshot.
Do you think so? I'd come to conclude the opposite. If I were resetting VSYNC at the wrong point I'd expect to see lower parts of the screen drawing at the top and then it wrapping into the menu. But that doesn't seem to occur. Instead the menu starts to draw in three rough areas of the screen: near the top. At about 1/3 of the way down or at about 2/3 of the way down. And it doesn't appear to simply wrap around as I'd expect it to if the reset were simply in the wrong place.

Even reseting the counter at 16000 produces the same kind of results. I'd have expected an offset (and probably a scrambled colourscheme) but it ought to be stable. It's not.

Resetting at VSYNC and outputting the actual counter value as the data produces a lovely stable display.

So my conclusion is that I'm resetting the counter correctly, but somewhere in the chain the address (offset from screen ptr) and the data are becoming misaligned. My shonky parsing of the data transfer being the most likely culprit now.

I was intending to fit a logic analyser to the pico's pins and check if I could find any first address after VSYNC that wasn't 000000.

I suspect they will be exactly correct and I'm actually handling the input buffer parsing poorly. But that's just my current theory.
Once again, I strongly recommend you follow good synchronous design practices. I actually recommend you read some articles on the subject because you seem to completely ignore this principle. It is possible, or may be even likely, that this is not your actual problem at this time. But you have to start with good foundations.
Yes, sorry. I'm completely self-taught from little in the way of sources (mostly Stephen's' TF330) which mostly doesn't do that. Pistorm has two branches. One that does and one that doesn't for Max V and Xilinx respectively. I've tended to follow the Xilinx way.
For starters, try to use a single clock as much as possible. And be very careful when you transfer between different clock domains. I would do something like this:

Code: Select all

	// Use LOAD as a clock for capturing the bus and for nothing else!
Thanks. I can give it a try. I did actually deglitch VSYNC with a four stage history, but it produced exactly the same results.
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
ijor
Posts: 583
Joined: Fri Nov 30, 2018 8:45 pm

Re: Shifter LOAD behaviour

Post by ijor »

Badwolf wrote: Fri May 09, 2025 2:54 pm
ijor wrote: Fri May 09, 2025 2:01 pm
I see. But I would still use it for debugging purposes. Make a firmware version that only makes statistics for the data transfer. Count the number of transfers and the time between each zero address (screen start). See if there is a problem. You can perform, say, a dozen screen transfers, then display the statistics.
Possibly doable if I can isolate a couple of pins for a serial connection out.
Can't you just disconnect the screen for this test? Or just integrate some screen library and just printf on that screen.
But a custom firmware could possibly concentrate on logging rather than processing.
Yeah, that's precisely the idea.
So my conclusion is that I'm resetting the counter correctly, but somewhere in the chain the address (offset from screen ptr) and the data are becoming misaligned. My shonky parsing of the data transfer being the most likely culprit now.
Then build some kind of test to debug the data transfer. Forget about the ST, Shifter or Vsync. Make sure your transfer works, otherwise this doesn't make any sense.

Built a test firmware on the Pico, to see if what you are receiving is correct. And then, separately, build a test on the CPLD that sends the correct data by itself, completely disconnected from the ST hardware, just for the purpose of testing the Pico side. Divide and conquer.
I suspect they will be exactly correct and I'm actually handling the input buffer parsing poorly. But that's just my current theory.
It doesn't matter your theory, or my theory. Just implement good testing strategies. You also have to learn to simulate, and to learn that even when it involves extra work (sometimes a lot extra work) to setup a simulation, it is worth.
Yes, sorry. I'm completely self-taught from little in the way of sources (mostly Stephen's' TF330) which mostly doesn't do that. Pistorm has two branches. One that does and one that doesn't for Max V and Xilinx respectively. I've tended to follow the Xilinx way.
Sorry if I sound a little bit harsh. But people that don't follow good synchronous design practices are either amateurs that don't know what they do. Or either they are experts that know exactly what they do, when they can break the rules and why. In either case it is not a very good idea to learn from them.
Thanks. I can give it a try. I did actually deglitch VSYNC with a four stage history, but it produced exactly the same results.
I implemented a small glitch filter just for completeness and because it was easy. I don't think that's your problem. Good synchronization is more important here. VSYNC glitches, at least at the edges, are completely harmless in your application. LOAD glitches, on the other hand, could be a serious problem. I don't expect you will get glitches on LOAD, but it might be a good idea to verify this, just in case.
http://github.com/ijor/fx68k 68000 cycle exact FPGA core
FX CAST Cycle Accurate Atari ST core
http://pasti.fxatari.com
User avatar
Badwolf
Site sponsor
Site sponsor
Posts: 2671
Joined: Tue Nov 19, 2019 12:09 pm

Re: Shifter LOAD behaviour

Post by Badwolf »

ijor wrote: Fri May 09, 2025 2:01 pm

Code: Select all

	// Use LOAD as a clock for capturing the bus and for nothing else!
	always @( posedge LOAD)
		Data <= D[15:0];
		
	always @( posedge mainClk) begin
		syncedVsync <= VSYNC;
		vsync1 <= syncedVsync;
		vsync2 <= vsync1;
		onVsync <= !vsync1 & !vsync2;	// You can use a larger glitch filter
		
		syncLoad <= LOAD;
		load1 <= syncLoad;
		load2 <= load1;
		
		if( onVsync)
			counter <= 0;
		// Raising edge
		else if( load2 & load1) begin
			syncData <= Data;		// Depending on the rest of the code, you might not need this synchronization step
			counter <= counter + 1'b1;
		end
I was planning to work on my STE536 today, but since you were good enough to type all this out, I thought I'd give this a try.

I've translated it to my register names and done the ancillary declarations (plus I presume the edge detect is meant to be load2 & !load1? If not, I've misunderstood how it works) but sadly it yields almost identical behaviour to my original.

So I spent some time trying out different address-space restrictions, programatically driven data sets, odd and even frames (to see if there is complete desynchronisation), planar and non-planar dumps and wired up the logic analyser.

I can't identify anything wrong with the addressing or the vsync reset. There are the occasional dropped word, but I can't figure out how that results in the frame starting part way down the address range. I didn't see that when sniffing writes. Perhaps I need to reconfigure to do that again to make sure I've not broken something whilst trying to debug.

I then spent a couple of hours failing to get USB debugging up and running on the MCU. I think until I can dump the output for offline parsing I've *almost* exhausted my primary test cases and can find no smoking gun as yet.
Sorry if I sound a little bit harsh. But people that don't follow good synchronous design practices are either amateurs that don't know what they do...
Hey! That's me! :lol:
Built a test firmware on the Pico, to see if what you are receiving is correct. And then, separately, build a test on the CPLD that sends the correct data by itself, completely disconnected from the ST hardware, just for the purpose of testing the Pico side. Divide and conquer.
Yes, I think once I've found a way to get some data out of the Pico I'll add a special transfer to indicate vsync and do the counting on the MCU as well. See how far they diverge. I'd maybe expect a minimum of 0.2% drop based on buffer sizes and perhaps an order of magnitude more than that with the delays incurred by resynchronising as there's no pipeline. But I don't think I should be getting 20 or 30% out on the address.

Cheers -- there'll be a little while looking into this now.

BW
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
ijor
Posts: 583
Joined: Fri Nov 30, 2018 8:45 pm

Re: Shifter LOAD behaviour

Post by ijor »

Badwolf wrote: Sat May 10, 2025 12:57 am (plus I presume the edge detect is meant to be load2 & !load1?
Yes, sorry for the typo.
... but sadly it yields almost identical behaviour to my original.
That doesn't surprise me. I didn't expect that to solve your current problem. As I said, this was just for a solid foundation based on healthy synchronous practices.
http://github.com/ijor/fx68k 68000 cycle exact FPGA core
FX CAST Cycle Accurate Atari ST core
http://pasti.fxatari.com
Post Reply

Return to “HARDWARE DISCUSSIONS”