Thats why im referring to troed's article as it was the last one I could find before all the spam.Cyprian wrote: 14 Nov 2023 23:22 wow, impressive figures. It seems it is almost impossible to clean that DB
As Wiki wasn't so often updated by real users, I wonder whether would be possible just to remove all users and its edits after particular register day.
The resurrection of atari-wiki.com
-
exxos
- Site Admin

- Posts: 28350
- Joined: 16 Aug 2017 23:19
- Location: UK
Re: The resurrection of atari-wiki.com
-
exxos
- Site Admin

- Posts: 28350
- Joined: 16 Aug 2017 23:19
- Location: UK
Re: The resurrection of atari-wiki.com
Results so far
56 users left , who actually posted something.
440161 articles left to sort.
The next spam posts look simple.
Also had the idea to run a lin checker on the original database, as then I would get a list of all pages linked to some index. I know there are some which aren't anyway. But it would help reduce the risk of posts going missing.
EDIT:
430970 after that clean up.
56 users left , who actually posted something.
440161 articles left to sort.
The next spam posts look simple.
Also had the idea to run a lin checker on the original database, as then I would get a list of all pages linked to some index. I know there are some which aren't anyway. But it would help reduce the risk of posts going missing.
EDIT:
430970 after that clean up.
You do not have the required permissions to view the files attached to this post.
-
exxos
- Site Admin

- Posts: 28350
- Joined: 16 Aug 2017 23:19
- Location: UK
Re: The resurrection of atari-wiki.com
I'm not sure whats going on here, but I assume 5 accounts were hacked as the usernames look like spammers.. but there is Atari content linked to those accounts :shrug:
Something else is going on as there are still loads of spam even though I deleted the spam users & accounts, plus most of the guest accounts. I'm half wondering if the spammers had direct access to the SQL database, but that would seem a bit complex for spam bots :shrug:
Something else is going on as there are still loads of spam even though I deleted the spam users & accounts, plus most of the guest accounts. I'm half wondering if the spammers had direct access to the SQL database, but that would seem a bit complex for spam bots :shrug:
You do not have the required permissions to view the files attached to this post.
-
Zogging Hell
- Posts: 3
- Joined: 26 Nov 2017 12:38
Re: The resurrection of atari-wiki.com
Unfortunately the domain got sniped while I was moving country, Zogging Hell will be back at some point at a different domain (I think it is almost all internet archived in the meantime), but I need a little time to convert my shat-eaugh (no mistype) in France into a habitable location in the short term!exxos wrote: 14 Nov 2023 15:14 @Zogging Hell Your site pops up such as http://www.zhell.co.uk/magdisks.html is it no longer live or is the URL wrong ?
I've just realised this is my first post here after lurking for forever... so hi everyone!
-
exxos
- Site Admin

- Posts: 28350
- Joined: 16 Aug 2017 23:19
- Location: UK
Re: The resurrection of atari-wiki.com
Another pattern is links within the first 30 characters .
I'm also filtering any non-english letters. But I am also starting to manually check stuff before deleting now. It's going to get harder as the list gets smaller!
375091 to go!
I'm also filtering any non-english letters. But I am also starting to manually check stuff before deleting now. It's going to get harder as the list gets smaller!
375091 to go!
You do not have the required permissions to view the files attached to this post.
-
exxos
- Site Admin

- Posts: 28350
- Joined: 16 Aug 2017 23:19
- Location: UK
Re: The resurrection of atari-wiki.com
48769 to go!
I will have to write a "un-f**k" script as well to verify all the databases and strip out all the millions of pages of crap there as well. The structure of the wiki is pretty weird.
https://www.mediawiki.org/wiki/Manual:Page_table
I will have to write a "un-f**k" script as well to verify all the databases and strip out all the millions of pages of crap there as well. The structure of the wiki is pretty weird.
https://www.mediawiki.org/wiki/Manual:Page_table
The text of the page itself is stored in the text table. To retrieve the text of an article, MediaWiki first searches for page_title in the page table. Then, page_latest is used to search the revision table for rev_id, and rev_text_id is obtained in the process. The value obtained for rev_text_id is used to search for old_id in the text table to retrieve the text.
-
exxos
- Site Admin

- Posts: 28350
- Joined: 16 Aug 2017 23:19
- Location: UK
Re: The resurrection of atari-wiki.com
Looks like this is the next one. They got missed because they look like genuine links, but they actually not.
EDIT:
Almost missed one!
EDIT:
40728 to go!
EDIT:
Almost missed one!
EDIT:
40728 to go!
You do not have the required permissions to view the files attached to this post.
-
exxos
- Site Admin

- Posts: 28350
- Joined: 16 Aug 2017 23:19
- Location: UK
Re: The resurrection of atari-wiki.com
20431 to go!
I think mostly it's the "stragglers" now. I'm not sure how many actual posts there is yet, but I suspect there is only about 2,000 spam posts left now!
Getting tricky!
I think mostly it's the "stragglers" now. I'm not sure how many actual posts there is yet, but I suspect there is only about 2,000 spam posts left now!
Getting tricky!
You do not have the required permissions to view the files attached to this post.
-
exxos
- Site Admin

- Posts: 28350
- Joined: 16 Aug 2017 23:19
- Location: UK
Re: The resurrection of atari-wiki.com
It looks like some pages got edited, *or* cloned and then a new page with spam. Hard to tell currently. So will just leave those as is to fix later... They may or may not be in the main index list anyway. I tried searching for the page on the wiki and it did not come up any of the text at all even the Atari text.
As it stands there may be a couple of spam posts here and there but I cannot see any at a glance. Most of the problem is it is difficult to go through every single link to see if it is supposed to be a genuine link or not. In particular it's problematical as a lot of the links don't work either way.
So as far as I'm concerned the main database is spam free now!
However the work is far from over! There are two other databases which need cleaning up as mentioned previously. Though I should be able to write a script to go through those tables and see if they actually match a current page on the wiki. If there is no match then it should be possible to simply delete the entries in the tables.
Basically it does this currently...
The page does not exist because those are the things I have removed. I need to remove all the references to those pages.. Now I have a "working" text database, I should only need to verify the indexes match actual text pages in the database.. There is literally going to be millions of those as well.
So next.. backup where its got up to. There are several backups currently and they are all 200GB each. So hosting costs for all this is starting to cost a fair bit. But I hope it won't be many more days before I will have it all cleared out.. Assuming I don't find anything else amiss somewhere..
As it stands there may be a couple of spam posts here and there but I cannot see any at a glance. Most of the problem is it is difficult to go through every single link to see if it is supposed to be a genuine link or not. In particular it's problematical as a lot of the links don't work either way.
So as far as I'm concerned the main database is spam free now!
However the work is far from over! There are two other databases which need cleaning up as mentioned previously. Though I should be able to write a script to go through those tables and see if they actually match a current page on the wiki. If there is no match then it should be possible to simply delete the entries in the tables.
Basically it does this currently...
The page does not exist because those are the things I have removed. I need to remove all the references to those pages.. Now I have a "working" text database, I should only need to verify the indexes match actual text pages in the database.. There is literally going to be millions of those as well.
So next.. backup where its got up to. There are several backups currently and they are all 200GB each. So hosting costs for all this is starting to cost a fair bit. But I hope it won't be many more days before I will have it all cleared out.. Assuming I don't find anything else amiss somewhere..
You do not have the required permissions to view the files attached to this post.
-
exxos
- Site Admin

- Posts: 28350
- Joined: 16 Aug 2017 23:19
- Location: UK
Re: The resurrection of atari-wiki.com
Bit of a drop in size in the text database :lol:
Now my first test script rebuilding the revisions table :)
Now my first test script rebuilding the revisions table :)
You do not have the required permissions to view the files attached to this post.
Who is online
Users browsing this forum: ClaudeBot and 1 guest