« Creeping Panopticon Watch | Main | Friday Frank: Alien Orifice; The Palladium, NY, 1981 »

August 17, 2007

I'm In Ur Wikipedia, Tracking Their Edits

By Deborah Newell Tornello
a.k.a. litbrit

One of Wikipedia's greatest strengths--the open-editing format that (nearly always) permits ordinary citizens to add, subtract, or alter content--is also one of its largest liabilities, since persons with less-than-honorable intentions can manipulate data for any number of nefarious reasons: discrediting a competing company, say, or spreading disinformation. At least, they can and will until someone else notices and re-edits an entry. And while Wiki is undeniably useful for writers and researchers of every stripe, the very fact that its insta-data is so easily manipulable by anyone and everyone should serve as a whisper in the ear--if not a huge and wildly undulating red flag--that it might be a good idea to double-check the content against another source, that it would be prudent to ask oneself, before quoting a Wiki entry at length, I wonder whose fingerprints are on this stuff, anyway?

Cal Tech graduate student Virgil Griffith had that very thought. And the computation and neural-systems academic decided that not only was it time to figure out who was behind all the edits, but that it would also be a boon to the free and open marketplace of ideas to offer the general public a way to know, too.

And thus Wikipedia Scanner was born.  From Wired Magazine:

On November 17th, 2005, an anonymous Wikipedia user deleted 15 paragraphs from an article on e-voting machine-vendor Diebold, excising an entire section critical of the company's machines. While anonymous, such changes typically leave behind digital fingerprints offering hints about the contributor, such as the location of the computer used to make the edits.

In this case, the changes came from an IP address reserved for the corporate offices of Diebold itself. And it is far from an isolated case. A new data-mining service launched Monday traces millions of Wikipedia entries to their corporate sources, and for the first time puts comprehensive data behind longstanding suspicions of manipulation, which until now have surfaced only piecemeal in investigations of specific allegations.

Wikipedia Scanner -- the brainchild of Cal Tech computation and neural-systems graduate student Virgil Griffith -- offers users a searchable database that ties millions of anonymous Wikipedia edits to organizations where those edits apparently originated, by cross-referencing the edits with data on who owns the associated block of internet IP addresses.

Inspired by news last year that Congress members' offices had been editing their own entries, Griffith says he got curious, and wanted to know whether big companies and other organizations were doing things in a similarly self-interested vein.

"Everything's better if you do it on a huge scale, and automate it," he says with a grin.

This database is possible thanks to a combination of Wikipedia policies and (mostly) publicly available information.

The online encyclopedia allows anyone to make edits, but keeps detailed logs of all these changes. Users who are logged in are tracked only by their user name, but anonymous changes leave a public record of their IP address.

Wired invites readers who've used Wikipedia Scanner and unearthed any companies or government spooks fiddling around with data or rewriting history to submit their finds--and vote on other readers' discoveries--at the magazine's blog.

What's also brilliant about Griffith's brainchild is that it injects a much-needed dose of accountability into the sprawling corpus indicium that is Wikipedia. Corporations and politicians seeking to shape (or outright change) information won't be able to hide their self-interested edits behind anonymous user names, and, one hopes, knowing that their IP addresses now point Wikipedia Scanner to their identities will deter them from making mercenary, dishonest, and unscrupulous edits in the first place. One always hopes.

In any case, it's refreshing to learn of an all-too-infrequent case of youth and reason overcoming wealth and tyranny. Cheers to you, Mr. Griffith.

(H/T Lisa in Baltimore)

UPDATE:  Wikipedia Scanner is already embarassing some government agencies, tying numerous unethical edits to computers at the CIA and FBI:

WASHINGTON (Reuters) - People using CIA and FBI computers have edited entries in the online encyclopedia Wikipedia on topics including the Iraq war and the Guantanamo prison, according to a new tracing program.

The changes may violate Wikipedia's conflict-of-interest guidelines, a spokeswoman for the site said on Thursday.


August 17, 2007 | Permalink


I can't bring up the site. I'm guessing (hoping) this is because the server is overloaded with hits as opposed to something nefarious...

Posted by: Christmas | Aug 17, 2007 7:40:47 AM

Christmas, I'd guess it's because of an unanticipatedly high number of visits right now. There is the Wired story, and I posted about Wikipedia Scanner at my place on Wednesday--as I imagine some other writers did, too--plus The Huffington Post had a front page story about it a little while ago and Keith Olbermann discussed it on Countdown last night. Nothing exceeds like excess!

Posted by: litbrit | Aug 17, 2007 8:04:21 AM

It's been most amusing on our side of the pond. Who would have thought that the BBC would have been responsible for 7,000 such edits? Tax funded broadcasting...impartial, no?

Posted by: Tim Worstall | Aug 17, 2007 8:24:20 AM

So all it means is that we will be seeing more suits in Internet cafes.

Posted by: Joe S. | Aug 17, 2007 8:44:56 AM

Joe S., I think the big dark glasses and antenna-bedecked briefcases will give them away.

Posted by: litbrit | Aug 17, 2007 9:23:58 AM

This is a great idea and wonderful service, but --
Even Karl Rove understands that the way to evade this sort of monitoring is to do your dirty e-work on a different, unmonitored, account. Does Wiki Scanner have a way to detect that?

Posted by: Stuart Eugene Thiel | Aug 17, 2007 10:45:10 AM

These are big companies and institutions. Who at Diebold, the FBI and the BBC made these edits? It could've been some bored, mid-level schlub. That wouldn't justify it, but it would be misleading to say "The FBI edited entries" and so on.

Posted by: Ronnie Pudding | Aug 17, 2007 10:49:43 AM

I assume that those who wish to go undetected will start using IP anonymizers. That will pretty much end the tracking.

Posted by: Sanpete | Aug 17, 2007 11:40:23 AM

Us post-docs in the lab where I work were discussion the issue of wikipedia a while back.

Most of the content related to our (non-controversial) field is quite good on wikipedia and wikipedia's a convenient place to get it. But still, we have this nagging worry about random people changing constants or whatever just for the fun of it.

We decided that wikipedia entries should have easy to access, pseudonymous authorships. That way, we can know that, e.g., entries have been re-edited. Also, we'd love to play a game of "which scientist chose what pseudonym": it would be interesting to look up articles in our field and see authors like "SwissStud100", "PassTheDutchieOnTheLeftHandSide", "Looking4Love&NIHgrants", etc. -- and to guess who's whom (well, SwissStud100 would obviously be what our boss would call himself ;) ) ...

Posted by: DAS | Aug 17, 2007 1:29:45 PM

All of this raises -- and not for the first, second, or thousandth time -- the following question: How can you believe ANYTHING on Wikipedia? Even in a "non-controversial" subject-matter area, what's to say that some clown hasn't changed the entry just for the hell of it? And how do you know that if someone sees the altered entry and "fixes" it, he or she doesn't have his or her own axe to grind? And even if a truly knowledgeable person correctly fixes a doctored entry, how do you know you haven't referred to the entry post-doctoring and pre-fix? There is just no way you can believe any Wikipedia entry, at any given time, is accurate or authoritative -- assuming those concepts have any meaning anyway. For just this reason, my high-school-AP-English-teacher wife will not allow her students to cite Wikipedia in their research papers, telling them to find, and cite to, the original sources.

Posted by: Bob | Aug 17, 2007 2:33:16 PM

All of this raises -- and not for the first, second, or thousandth time -- the following question: How can you believe ANYTHING on Wikipedia?

It's not nearly as complicated as you make it sound. Most of the content of a Wikipedia article is attributed to another Web page. Check that page; if it's not reliable, look for another source or just don't consider that statement a reliable fact.

For just this reason, my high-school-AP-English-teacher wife will not allow her students to cite Wikipedia in their research papers, telling them to find, and cite to, the original sources.

Indeed. Wikipedia should not be considered an original source, and can be a good way to find some.

Posted by: Cyrus | Aug 17, 2007 3:26:34 PM

Cyrus--exactly. It's insta-data, with signposts. (And back when I had to write research papers, we were allowed to read, but never cite, the Encylopedia Brittanica. Same principle, I daresay.)

Posted by: litbrit | Aug 17, 2007 3:30:56 PM

They wouldn't let you cite the signed entries in the 11th edition of the EB? You worked under strict principles indeed!

Posted by: Joe S. | Aug 17, 2007 3:47:05 PM

JS, this was at a big old public high school in Miami, too (my graduating class had over 1,200 kids). The idea was that the encyclopedia was for general learning and pointing one toward further research. My English Honors teacher in particular was a stickler for making you learn how to look for stuff. I think the problem was that too many kids were doing the 1970's version of copy-pasting; which is to say, they'd copy its text outright in longhand, in pen(!)--something my boys find risibly onerous, even when it only involves a sentnce. But they have a sucka for a Mum; I wind up helping them "find" all the sources online and make them a nice bookmark group, a process for which the 70's analog equivalent involved hours of (literally) climbing through four or five dark and dusty levels of "The Stacks" at the UF library.

Ah, The Stacks...such a great place for sharing a quick joint and/or a steamy moment or two with the boyfriend...sorry for the digression.

Posted by: litbrit | Aug 17, 2007 4:10:35 PM

I've had this argument about Wikipedia before. In my own specialist area (yes, I am indeed something more than just a blow hard scribbler on blogs) Wikipedia contains at least a certain amount of correct information: because I put it there. Both Britannica and Columbia contain incorrect data on the same subject (or at least did, last time I checked).
As an example, Columbia carried for at least a decade the information that "a dilute aqueous solution of scandium is used to germinate seeds". Err, no. Scandium is Sc. Selenium is Se. A dilute aqueous solution of "selenium" is used to germinate seeds. Someone copied the information wrong from an earlier generation of encyclopedias (there were other such errors, ScN for SiN for example).
Trivial, I know. But just because Wikipedia is inaccurate at times doesn't mean that the more formally edited encyclopedias are not so.

Posted by: Tim Worstall | Aug 18, 2007 5:49:32 AM

The comments to this entry are closed.