You will not be able to post if you are still using Microsoft email addresses such as Hotmail etc
See here for more information viewtopic.php?f=20&t=7296

exxos blog - random goings on

Blogs & guides and tales of woo by forum members.
User avatar
exxos
Site Admin
Site Admin
Posts: 25412
Joined: Wed Aug 16, 2017 11:19 pm
Location: UK
Contact:

Re: exxos blog - random goings on

Post by exxos »

dml wrote: Sun Nov 13, 2022 11:09 pm But I thought it was organised as 16bit ram, not 32. (In fact its organised as 8bit in terms of addressing but anyway....)
The CPU is only has 16bit access to RAM. But the Videl has 32bit access to RAM. Crap isn't it :P
dml wrote: Sun Nov 13, 2022 11:09 pm I can try that approach next - but it won't be tonight :-p might get some time in the morning for it.
No worries, I am only at home tomorrow anyway. Tuesday and Wednesday I am "away" for a couple of days. I guess just a single bit pattern test of something like 10101010 - 01010101 would be enough.
https://www.exxosforum.co.uk/atari/ All my hardware guides - mods - games - STOS
https://www.exxosforum.co.uk/atari/store2/ - All my hardware mods for sale - Please help support by making a purchase.
viewtopic.php?f=17&t=1585 Have you done the Mandatory Fixes ?
Just because a lot of people agree on something, doesn't make it a fact. ~exxos ~
People should find solutions to problems, not find problems with solutions.
dml
Posts: 320
Joined: Wed Nov 15, 2017 10:11 pm

Re: exxos blog - random goings on

Post by dml »

Here's an updated ramscan which does some extra stuff:

https://www.dropbox.com/s/1rlc2zxp720xm ... 2.zip?dl=1

(have linked it rather than attach it in case i need to fix any bugs and update the zip, can link it later after you try it out)

- scans both 32bit and 16bit accesses
- will report error state changes/transitions with the transition addresses

So this should tell you where faults begin/end or where faults change in memory, so you should be able to work out what address bits are involved.

It will stop after the first failed scan pass, but only after the pass completes and all transitions are printed.

If you need an 8bit data scan added let me know.

Forgot to add - I made some changes which may allow it to run from a ROM cart while scanning lower memory. But I have no way to test that so... good luck with that.

...and it should be runnable from FastRAM now too, which should also allow scanning of lower STRam - but only worth the trouble if you actually boot clean with minimum other stuff loaded since that will reduce the safe scanning area.
User avatar
exxos
Site Admin
Site Admin
Posts: 25412
Joined: Wed Aug 16, 2017 11:19 pm
Location: UK
Contact:

Re: exxos blog - random goings on

Post by exxos »

dml wrote: Mon Nov 14, 2022 9:44 am Here's an updated ramscan which does some extra stuff:
Great. Just ran it and it scrolls errors really fast so I cannot really see what address ranges / blocks are bad. There needs to be a "summary page" of some sort.

Not sure how 32 bit access works on the thing as the CPU is only physically wired with 16 bits :shrug: but it does actually seem to indicate that bits 11 is actually faulty so that seems to confirm that error at least.

IMG_0235.JPG
IMG_0235.JPG (348.79 KiB) Viewed 818 times
https://www.exxosforum.co.uk/atari/ All my hardware guides - mods - games - STOS
https://www.exxosforum.co.uk/atari/store2/ - All my hardware mods for sale - Please help support by making a purchase.
viewtopic.php?f=17&t=1585 Have you done the Mandatory Fixes ?
Just because a lot of people agree on something, doesn't make it a fact. ~exxos ~
People should find solutions to problems, not find problems with solutions.
dml
Posts: 320
Joined: Wed Nov 15, 2017 10:11 pm

Re: exxos blog - random goings on

Post by dml »

Maybe that's a bug, let me check it again. I only ran it in Hatari this morning and seemed ok but i'll look...
dml
Posts: 320
Joined: Wed Nov 15, 2017 10:11 pm

Re: exxos blog - random goings on

Post by dml »

No it seems to be sensible from what I can tell.

Just looking at your screenshot is already interesting. It's showing bit 11 out of a 32bit read, but only on the low word. Which sort of confirms what you were thinking yesterday...

Even more interesting - the error seems to occur 1 time every 8kb and not at the start of the 8k page either (offset $0ae8???). Which is perhaps why we don't see columns in the ramscan test. It will be well spaced single dots, if anything.

Does that offer some clues?
dml
Posts: 320
Joined: Wed Nov 15, 2017 10:11 pm

Re: exxos blog - random goings on

Post by dml »

exxos wrote: Mon Nov 14, 2022 10:35 am Great. Just ran it and it scrolls errors really fast so I cannot really see what address ranges / blocks are bad. There needs to be a "summary page" of some sort.
It's a very simple program - presenting all possible kinds of fault in a terse way at the end isn't so easy. I'll think about a tree breakdown of the fault regions but its already getting more complicated...

I think in this case you get enough info from the last page to figure out the regular spacing and nature of the fault in at least one area of RAM, if there is regularity involved. And if it is not regular (noise) that should also be visible in the last 10 or 20 prints or so. It looks pretty regular though.

It's a one-shot error every 8kb, with a fixed (weird) offset, bit #11 of 32. Which is a very strange result but still some sort of clue :-P

[EDIT]

It does seem like the only way a result like this can happen is a bad chip - it's probably too weird for an external routing problem. But I suppose you're trying to figure out which chip.
User avatar
exxos
Site Admin
Site Admin
Posts: 25412
Joined: Wed Aug 16, 2017 11:19 pm
Location: UK
Contact:

Re: exxos blog - random goings on

Post by exxos »

dml wrote: Mon Nov 14, 2022 11:12 am It's a one-shot error every 8kb, with a fixed (weird) offset, bit #11 of 32. Which is a very strange result but still some sort of clue :-P
Absolutely. There are probably 2 address bits that play here in conjunction with a bad bit 11.

You would think that if noise on bit 11 blew the chip, it would blow it across the entire address range. But there must be some sort of pattern going on relating to address ranges.

The other guy who had a problem had this..

IMG_5791.jpg
IMG_5791.jpg (108.07 KiB) Viewed 802 times

Of course I don't know the actual failure mode of the DRAM either. even though there is some sort of alignment going or some sort of consistency. I think overall the address ranges and bit failures are seemingly just random.

The data bits could be physically picking up "noise" from certain address lines aggravating the problem. Or maybe the Videl itself as internal glitches which are basically random.

There must be some sort of bus noise going on because again when the DFB1 Is plugged in the faults seem to be reduced. Really have no idea what the internal routing is like on these motherboards. Even though we seem to go from CPU <> Videl <> RAM, other traces all over the place which could couple noise on adjacent tracks causing the spikes on the DRAM databus.

Again it could possibly be the Videl itself causing the problems as it is literally connected to CPU and RAM. so greater activity on the CPU could create different noise in the Videl internally which then couples onto its DRAM bus. Of course it is just speculation but these spikes have to be what is killing the DRAM and something has to be generating them.

I guess I could run GB6 on various tests to see if there is any correlation between the test and spikes on the DRAM bus..
https://www.exxosforum.co.uk/atari/ All my hardware guides - mods - games - STOS
https://www.exxosforum.co.uk/atari/store2/ - All my hardware mods for sale - Please help support by making a purchase.
viewtopic.php?f=17&t=1585 Have you done the Mandatory Fixes ?
Just because a lot of people agree on something, doesn't make it a fact. ~exxos ~
People should find solutions to problems, not find problems with solutions.
User avatar
exxos
Site Admin
Site Admin
Posts: 25412
Joined: Wed Aug 16, 2017 11:19 pm
Location: UK
Contact:

Re: exxos blog - random goings on

Post by exxos »

dml wrote: Mon Nov 14, 2022 11:12 am It does seem like the only way a result like this can happen is a bad chip - its probably too weird for an external routing problem. But I suppose you're trying to figure out which chip.
I know which chip is bit 11. I was just merely trying to work out what address range in that chip had actually failed. But I don't think this is simply a bad chip because there's been a couple of other boards from people as mentioned in the above posts which have also failed on a slightly different bit. Also on a different address range.

You think if a chip was going to fail the entire chip would fail, all the entire address range on a bit. But it is not just my boards which have failed, because Atari's own boards have also failed. So something has to be killing them. Which is why I posted results of the positive and negative spikes which all way out of specification for the chips. I think some chips are just inherently more tolerant than others.
https://www.exxosforum.co.uk/atari/ All my hardware guides - mods - games - STOS
https://www.exxosforum.co.uk/atari/store2/ - All my hardware mods for sale - Please help support by making a purchase.
viewtopic.php?f=17&t=1585 Have you done the Mandatory Fixes ?
Just because a lot of people agree on something, doesn't make it a fact. ~exxos ~
People should find solutions to problems, not find problems with solutions.
dml
Posts: 320
Joined: Wed Nov 15, 2017 10:11 pm

Re: exxos blog - random goings on

Post by dml »

Yeah it could be some sort of tolerance issue with voltages, crosstalk or noise causing the chip to address or answer incorrectly under really weird conditions.

Problems with routing, track thickness, resistive vias? Resistive bridge between some pads? I know alcohol is conductive and it's bad to power stuff on an alcohol-cleaned board before it evaporates (or spends time in an oven). But I can't think what else could do that and still be on a board, not evaporating.

Or a groundplane/ground section accidentally floating somewhere and being an antenna?

No idea - good luck finding whatever it is :)
User avatar
exxos
Site Admin
Site Admin
Posts: 25412
Joined: Wed Aug 16, 2017 11:19 pm
Location: UK
Contact:

Re: exxos blog - random goings on

Post by exxos »

I think in terms of the RAM there is no proper termination of the signal at all. At least on the main address and databus of the CPU it does have some pullups to help alleviate that problem.

When the Videl switches to a logic high, it will get reflections on the signal (oscillations) which will increase the voltage. The problem is with chips having generally capacitive inputs these days, that capacitance can charge up to the peak voltage which is dangerous by itself (It could damage the Videl or RAM) . But also when it switches low, you basically connect something like a 7volt capacitor to 0v. That extra voltage has to go somewhere, so it swings below 0 V. It goes negative by 2 V which again the chips do not like and can damage them.

And when this is all happening at millions of times per second... Mixing crosstalk ,interference, bad grounding, bad PSU etc etc , all aggravating the problem... All reasons why I absolutely hate these machines and working on them these days. Always one huge time sink /rabbit hole.
https://www.exxosforum.co.uk/atari/ All my hardware guides - mods - games - STOS
https://www.exxosforum.co.uk/atari/store2/ - All my hardware mods for sale - Please help support by making a purchase.
viewtopic.php?f=17&t=1585 Have you done the Mandatory Fixes ?
Just because a lot of people agree on something, doesn't make it a fact. ~exxos ~
People should find solutions to problems, not find problems with solutions.
Post Reply

Return to “MEMBER BLOGS”