I wasn't thinking of anything fancy, just brute force compare against the destination (because of L2) on copy and hoping that the total overhead of testing is less than the cost of writing
I may have misunderstood the instruction timings and my numbers and calculations might be (probably are) completely flawed..
Plus you'd have to somehow factor in what difference in speed building the screen in tt-ram + copy makes compared to stock Frontier.
I really have no idea...
Assuming your fast->st copy goes somewhat like this:
Code: Select all
// copy stuff
movem.l (a0)+,d0-d7 // 12 + 64:fast
movem.l d0-d7,x(a1) // 12 + 64:slow
Some kind of brute force copy-if-different:
Code: Select all
// compare-and-copy the same amount of stuff
movem.l (a0)+,d0-d7 // 12 + 64
// ----
cmp.l 0(a1),d0 // 12
beq.s .skip0 // 10 taken, 8 not taken
move.l d0,0(a1) // 8 + 8:slow
.skip0: // ---- etc 7 more times
best: 252 @50mhz
worst: 300 @50mhz + 64 @8mhz
And then assuming (yes, there's an awful lot of assuming going on here ) that one 8mhz cycle is "worth" about 7(?) 50mhz cycles?
Code: Select all
best: 252
ref: ~536
worst: ~748
..until someone with a better understanding of 68k instruction timings points out all the stuff I got wrong.
I suspect the timings for reading from fastram would be shorter still due to 32bit access, burstmode, cache and whatnot.
I was thinking the hardware would do that on st-ram writes. There would be an overhead of at least one fastram read on every st-ram write.Wonder if it would be worth it to try and make the L2 cache more clever and only write-through to ST-RAM on changed data?I think the overhead would kill you, but an interesting experiment if you could make the hardware do it.
Probably a huge benefit when a lot of writes can be early-outed, but more expensive if the opposite is true. I don't really know how to implement that though, but it would be an interesting experiment.