I did a quick knock up just of the load registers...

- 1.JPG (398.04 KiB) Viewed 5381 times
I think I realise why a copy is used between shifter banks.. The problem I see here is the load signal from the MMU has to be used as the clock, to clocking the data to reach shifter register. This means I cannot really link the 32 MHz clock at the same circuit.
So the output register would always be run on the pixel clock, assuming here 32 MHz.
Question is really how to take the output of all these load registers and passing them to the output register without causing any glitches anywhere.
I have a binary counter in this, it is also clocked by the load signal. Then over one of four (actually five in this case currently) is used as a chip select signal as to select one of the four shifters to accept data.
Active low and active high parts I have not figured out yet will try and work that out tomorrow..
Though currently I see a problem that the load signal cannot really be used to select the next register because as soon as a low signal activates the binary count will go up one and it will be practically impossible to select the first shift register..
So probably what I will do is add a flip-flop onto the load signal, so the load signal happens first and there will be a clocked delay before the next if there is selected.. I still need to think about this, of course open to suggestions if anyone has ideas on how to go about this..
EDIT:
I found a better shifter block.. This one should allow bank select to work as there is a pin for loading or shifting.. Probably what would be better is to use the 32 MHz clock for actually clocking the data into the shifter rather than using the actual load signal.. Actually would be still using the load signal but in combination with 32 MHz clock. Basically when load is low it will use of 32 MHz clock to clog the data in.
The only slight side effect here is that the data would be clocked multiple times while load is low but as this is all going to be done the same shifter I do not think it would really matter anyway...
EDIT2:

- 2.JPG (345.88 KiB) Viewed 5376 times
So LOAD has to be LOW (inverted high) and 1 of 5 has to be high. when both are high, the shifter is enabled to clock in data.
eq4 of the 1 of 5 decoder will set all 4 shifters in turn to LOAD. Then on the 5th clock, is bank swap to set them all in shift operation..
I do half see a issue as we need to load in "1 of 4" but output all at the same time in the second shifter bank, not "1 of 4".. so need to think about that some more...
Could possible be a "copy" would be easier... Just have a 16 bit latch "front end" and use LOAD to latch the data. Would be easier I think. Then some other clock to load the data into the shifter registers in one go then output...
Just seems to be like this "copy" would cause a "stutter" in the pixel line drawing as every 16 pixels, there would be a small delay during the copy..
Has anyone noticed a "gap" in between blocks of 16 pixels on the screen?... could be really hard to see on a monitor..