The idea of this board is to just forget the system 8MHz clock and generate a new one from the master 32MHz clock. Then divide it down back to 8MHz. Of course it is going to have clock skew which will be near impossible to figure out without physically building and trying the thing out.
The overall idea is to have the 8MHz "CPU CLOCK" slightly advanced over the motherboard clock. As by the time it is propagated through the buffers and GAL , it is inherently being delayed by somewhere around 13ns anyway. So the clock advance has to be 13ns basically. Though this delay is compounded by the clock division logic on the board as well
Also further problems are because I simply don't know what the propagation delays are through the GST MCU. It is possible my new logic could end up being the same propagation delay going through the GST anyway. Though I doubt it would be that simple.
So I threw together this board which gives me some options experimenting with a new clock division circuit. If I can basically synchronise the motherboard clock with the CPU clock (after all the logic delays) then theoretically it would solve the DMA issues.
Ultimately I may have to have a chain of several buffers together fine tune delay. This could ultimately be needed because of tolerances on the GALs themselves. But as things are basically borderline as they are now I think if I can just advance by thing like 15ns, then it should be enough to solve the problem in the majority of cases anyway.
Likely some experimentation will be needed to see what the minimum and maximum delays are and try and aim for the middle ground timings.

