The convergence of notability and verifiability on Wikipedia

And some awkwardness left by the gaps

Notability and verifiability are two sort-of distinct concepts on Wikipedia. Notability is, contrary to what some people assume, the one whose influence dates back further: the subjective, almost circularly defined idea that articles should be on "encyclopedic" topics, i.e. the sorts of things that one would write an encyclopedia article about. What does that mean? Well, Albert Einstein yes, a club you just made up no, things in between maybe. It worked better when the encyclopedia and number of editors were smaller, as you might guess.

There's a more recent idea of verifiability: nontrivial statements in Wikipedia articles should be cited to some source where the reader could verify them. Although "[citation needed]" is now one of Wikipedia's famous cultural exports, this is actually not as old a concept. Early Wikipedia was seen as kind of a scratch workspace associated with the "real" encyclopedia project, Nupedia, which it was assumed would do all the fact-vetting and source-citing before the final polished article was published. There was still the notability policy to some extent, in that Wikipedia articles were expected to be collecting information on subjects that could at least plausibly be useful in a future encyclopedia article. But Wikipedia's own job was just to collect the information, and sources were mostly brought up only when someone asked for them on the talk page ("I've never heard of this interpretation of quantum mechanics; where did you get it from?").

Merging the two ideas

Although these ideas are still officially distinct, my view is that they've increasingly been de facto merged. I also think this has mostly been a good idea, and has actually been mostly good for the "inclusionists" like myself who have long argued for very wide topic coverage, though there are exceptions that pose difficulties.

How did the ideas get connected? Someone who's dug through the archives should correct me if I'm wrong, but I believe it was first in the natural-science articles. Wikipedia policies grew fairly pragmatically in response to problems, rather than being some overarching theory of knowledge—if you had gotten together a committee of experts in topics like epistemology, the sociology of knowledge, the history of ideas, etc., to agree on what its policies should be, the committee would still be meeting and there would be no encyclopedia.

So what was the problem in this case? As anyone who's spent some time around internet discussion of physics can attest, there's a certain proportion of very vocal proponents of, uh, "non-mainstream" physics theories. Inevitably some prolific authors of such theories discovered Wikipedia, and the local physicists didn't really like the result. The reaction was an inclusion policy that defined notability, at least for science, to depend on the existence of sources: a notable physics theory is, according to this policy, a physics theory that has at least some mentions in the physics literature, whether textbooks, survey articles, or some other decent source. If the only source is your personal webpage or an article you just uploaded to arXiv.org, then the theory may or may not be good, but it isn't a theory that Wikipedia should cover yet.

This has mostly worked pretty well in the sciences. If you're going to write an article about a slime mold, presumably it's one that's been documented somewhere in the literature, right? Otherwise, where did you even get the information from? If you personally discovered a new slime mold, you should probably be writing that up in a paper, not on Wikipedia. There have been controversies when different communities have their own literature and sets of authority, as with various kinds of alternative medicine. But by and large most scientists like this policy, and it's not hugely controversial in that area. I think this is because it's very widely accepted by scientists that "known to science" is, if not exactly the same as "discussed in the science literature", at least really, really close, because peer-reviewed publishing is how science works.

The verifiability push

Meanwhile, circa 2004-05, the Real World discovered Wikipedia, and some of that world was not very happy about it. Previously blissfully chugging away as an obviously-impossible idealistic encyclopedia project, Wikipedia was increasingly, to the shock and horror of some, actually being used by people as a reference source. A flurry of op-eds began questioning whether this collection of stuff put together on the internet by people with no verified credentials and no real editorial process was getting too much exposure. Was Wikipedia reliable, pundits in the New York Times and Wall Street Journal asked? Was it misleading our kids with some sort of loosey-goosey, amateurs-on-the-internet project, when the kids should be reading a Proper Source like Britannica, written by Real Experts? It didn't help that one of the project's cofounders partly endorsed those criticisms, arguing that Wikipedia needed to "jettison its anti-elitism" and adopt a more rigorous, more expert-based, less anarchistic editorial and review process.

Now, some of that criticism could just be ignored; one might suspect that the criticism coming from the editor of Britannica had something of an ulterior motive, and some academics were just bitter that a bunch of upstart internet kids didn't have enough respect for academic credentials (never mind that some of us Wikipedians were also academics). But one aspect that many Wikipedians did take to heart was that Wikipedia should do a better job documenting where its information comes from. The elemental make-up of the sun isn't something we heard on the street or personally measured, but presumably something someone actually looked up somewhere. If not, someone should look it up, and we should add a footnote citing where we got it from. Now we have a partial answer to the question, "why should anybody believe what Wikipedia says?" We documented where the information came from, so you can verify it yourself if you'd like.

Verifiability steamrollers notability

So, by 2006 or so, Wikipedia is getting a strong cite-your-sources culture. It's also getting a lot bigger, both in editors and articles, than it was circa 2001-04. The original notability policy was based on a lot of subjective judgment calls: we write articles about "notable" things, which are things that are "encyclopedic", a concept people have a lot of varying ideas about. This was increasingly not what legal scholars call an "administrable" rule: even if this were a coherent philosophical standard, trying to implement it in a workable way was impossible. As a result, 2005-07 were probably the height of the inclusionist versus deletionist wars.

Despite being an inclusionist myself (at least in that context, a fairly hardline one), to be fair, the "deletionists" were fairly open-minded and ambitious by most non-Wikipedia standards. Some saw the goal of Wikipedia, derived from the goal of Nupedia, to be to write an open-access, free-content encyclopedia that was the union of all existing encyclopedias, both general ones like Britannica, and subject-specific ones like the Grove Dictionary of Music and Musicians. If you had proposed that in 2001, it would've sounded pretty ambitious, but by 2005, other people saw Wikipedia as a project to write an article about absolutely everything that we could reasonably write one on (more on that below). Some people taking the more restrictive view ended up leaving around 2005-06, either quitting entirely, or to join a forked project like Sanger's Citizendium that promised a more orderly, scholarly, edited atmosphere (not exactly the same debate as inclusionism v. deletionism, but correlated somewhat).

The end result is that by 2007 or 2008 or so, verifiability had mostly steamrollered notability, and the inclusionists had won on a lot of topics that had once been active areas of debate. Instead of a philosophical standard of notability as somehow "a thing worth noting", increasingly a descriptive standard was adopted: has the thing in fact been noted by someone, in some source that also allows us to verify the information? If yes, we'll rubber-stamp it as automatically notable. That isn't quite official policy (except in some areas), but de facto is very close to how things work across large parts of the encyclopedia.

As an example, a notability debate for a historical biography prior to around 2005-06 would have centered on whether the person did anything that was somehow historically interesting. Was he or she a famous scientist, a prime minister of a country, a prominent lawyer? In either 2006 or 2007 I believe, some of us inclusionists started arguing that it wasn't even necessary to debate the notability of historical people on their merits. I translated an article on some minor Prussian official from the Allgemeine Deutsche Biographie, a prominent old German biographical encyclopedia, and verified its facts in the updated version, the Neue Deutsche Biographie. Someone nominated the article for deletion, on the grounds that this was just a minor non-notable official. Admittedly, the person in question hadn't actually done anything particularly exciting. But, several of us successfully argued that being included in a major biographical dictionary was automatically sufficient: they had in fact been noted, and we had a source to cite for the information. I personally cared mostly about the second. An article with a decent source on a minor Prussian official is, at worst, a verifiable article on a subject that nobody cares about.

To even some people on the fence about the issue, this seemed like a radical gutting of notability, which to some extent it was. What we more or less succeeded in arguing was that any historical person who has non-trivial mentions in the historical literature is automatically notable. Basically, every person in history is notable, if there's anything written about them in the history books, encyclopedias, journals, etc. Literally hundreds of thousands of people just with that rule are rubber-stamped in, to the dismay of those who were hoping for a somewhat more curated collection of biographies. One of my personal hobbies is writing well-referenced, short biographies of minor figures that have never been biographied before in English (I cite solid but non-English sources, which Wikipedia permits).

So, while the verifiability policy sometimes seems like a weapon for deleting articles (no sources? you're out!), I think to a large extent it's also been a useful justification for keeping articles (got a cite? you're in). In addition to shepherding in almost everyone in the history books, it's helped push Wikipedia more strongly into some areas of pop-culture that it used to delete on sight, like internet memes, which now typically stay if they have well-cited articles. The fact that we could point to good citations also helped push back against the criticism Wikipedia was getting, partly from external sources, that we were some kind of "Pokemon encyclopedia" writing too much cruft on unserious topics. We did have a lot of Pokemon articles, but they were pretty good ones.

Notable but unverifiable?

That covers the gap in one direction. To a certain extent articles that previously fell into the "verifiable, but not notable" category have been upgraded to notable-enough, or at least difficult to delete.

But what about articles that, if you ask someone knowledgeable in an area, are clearly notable, but for one reason or another don't actually have good sources available where we can verify and cite the information? In a world where everyone else involved in knowledge production outside Wikipedia was doing their part perfectly, this would never happen: everything notable would in fact be noted by someone who studies that subject, and Wikipedia would be able to look up their work and cite it.

Unfortunately, that isn't always the case, which leads to some tricky cases. Wikipedia is supposed to be a collaboratively edited summary of the existing literature, serving as a comprehensive and reliable tertiary source. But what happens if the existing literature itself is deficient? Asking Wikipedia to not only summarize and organize all existing literature, but also to fix all deficiencies in the literature is a pretty tall order! Inevitably, some amount of improvisation and compromise ends up being necessary to paper over the problems. And, no matter what happens, it seems Wikipedians end up being attacked as somehow jerks on the internet who don't know how to write an encyclopedia: either they're a bunch of fascist rules-lawyers deleting perfectly good articles, or they're a bunch of anarchistic amateurs filling up their encyclopedia with uncited nonsense, depending on who you ask. So it sometimes feels like a bit of a thankless task to spend my free time editing an encyclopedia for free!

But anyway, there are two main areas I've encountered with an existing-literature deficiency:

Case 1. Recent things not covered by journalists.

There is a class of subjects that almost certainly will be covered by the usual good sources at some point (history books, sociology books, etc.), but which are new enough that they haven't yet been covered. For some topics, journalists provide the stop-gap references. For example, for an article on the U.S. Civil War, Wikipedia really should only cite published historical work, whether books or journal articles or some other such thing. Citing news articles from 1860s newspapers to construct a novel historical analysis is original archival research, and probably best left to historians.

For really recent events, though, Wikipedia simply summarizes the available news articles. This lets us write something on subjects not yet covered by historians, while still having some answer to the question: "where did you get this information from?" We got it from the newspaper; here's a citation. Wikipedia sometimes gets criticized for this, by scholars who feel that it's taking media narratives too uncritically, rather than doing the kind of critical source analysis a historian would do. But for an event that is happening right now, I think simply summarizing what the newspapers say is close to the best that can be done, and I think Wikipedia does in fact do a fairly good job of this. As better critical analyses come out, the articles can always be updated.

But, there is a category of things that aren't yet covered in the more slow-moving "permanent" literature, and yet aren't in a field that journalists seem to care about. There was a brouhaha regarding the article on a nascent philosophical movement I follow, which if you were "in the know" had clearly generated some buzz within the field, but if you were looking for published sources, had only a handful. There was a conference proceeding, a special issue of a journal, a few very active blogs, and an essay anthology mentioned on some blogs but not yet officially announced. Given a year or two everything sorted itself out well enough, and now there are good sources, and there will be more in another year. But in the meantime, how do you write anything about it not cited directly to the blogs of the main authors? That's the sort of thing that the philosophy newspapers would write about, but there are just a few magazines, and they cover a small part of the total philosophy universe.

Some recent acrimony over new/emerging programming languages fits into the same case, I think. Some were badly nominated for deletion to begin with, but a few of them were clearly notable to people knowledgeable about the subject, but as far as I could tell had not yet actually been noted in any source I could cite (I looked for a few). Given a few years, they'll probably be mentioned in survey articles, newer editions of books covering various areas of PLs, etc. But in the meantime, you'd want to maybe find out about emerging programming languages in the PLs newspapers, which sadly don't really exist. Instead people find out about things through more decentralized networks that don't leave behind a lot of citeable documentary evidence, like mailing list posts.

Case 2. Things that aren't that recent, but which are for some reason not written about by the people you would have hoped would do so.

It turns out that "the literature" sadly doesn't actually have perfect coverage, even given some time. Some things just get studied more than others. If you go back far enough, the things that never got studied are just lost to history, or require careful archival research to reconstruct, so Wikipedia's best bet is probably to just omit them. But there's an awkward period where lots of people know something is notable, but inexplicably nobody has written much on it.

I personally run across this most often in post-1960s popular music. For whatever reason, coverage here is extremely lumpily distributed. There are some genres and sub-genres where you can find hundreds of journal articles, a dozen PhD theses, and a half-dozen books, coming from various angles: musicology, cultural studies, popular history, or all of the above. Then there are other genres with very little coverage, even ones with quite large followings. For example, underground punk has actually gotten fairly good coverage over the past 10 years or so, with a flurry of documentaries, books, journal articles, etc. Industrial music has much larger gaps in the literature, for whatever reason, so there are even fairly prominent musicians where finding good sources is tough. And it's nearly impossible if you go into a sub-genre; if you want to talk about noise musicians less prominent than Merzbow, you really start hurting for sources (afaik there's exactly one book on the genre, and it's only partly historical/documentary in nature).

Even if you're willing to accept old newspaper clippings as decent sources, they're hard to come by and present a confusing, fragmented record (e.g. just a concert announcement, sometimes with a summary of the band that to anybody familiar with them would recognize as hilariously wrong). Maybe if you could get more specialized magazine archives you'd have better luck, but the specialist music magazines are often not archived in places that one can find.

Really, a music historian should be doing that work, writing a book or article, and then we can summarize their scholarship in the Wikipedia article. If they haven't, we've got to muddle through and make the best of a bad information-availability situation. I'd like to write a good article with solid references that someone can follow up on. From my personal familiarity with a genre I know this particular artist or record label is quite prominent, not even borderline prominent, but I just can't find good sources. Do I write an unsourced article based on my personal knowledge? Do I just not write an article? If someone else writes an unsourced article based on their personal knowledge, and sources can't be found to improve it, what do we do about it? Basically, do we compromise on covering all notable things, or do we compromise on writing a tertiary-source encyclopedia that cites where it got its information from? Fortunately that isn't always the choice, but it sometimes is.

I can see why this one particularly annoys some communities as well. When a group is already feeling aggrieved that they're being unjustly ignored by academia or other writers who really should be studying them or their work (some fan communities, for example), also being ignored by Wikipedia—because Wikipedia wants to wait for the other writers to write about them first—feels like an additional insult. Perhaps Wikipedia should do more to flexibly accomodate these cases, but fundamentally the root of the problem lies elsewhere: someone is not producing the sources that Wikipedia would love to cite if they existed! Why not? And how can that be fixed?

A few suggestions

So, that's my personal recollection of some of the relevant history, and my analysis of where some of the problems between notability and verifiability lie, based on my admittedly anecdotal experience as a Wikipedian since 2003 or so.

That said, what can be done? I see three ways to address things, which should probably all be pursued:

  1. Can Wikipedia handle these problems better?
  2. Can fields leaving gaps in the published record be persuaded to do a better job studying the gaps?
  3. Can we, as people interested in collaboratively putting information on the internet, pursue other approaches in parallel to Wikipedia, with different goals?

The first gets the most discussion, and indeed is something we can discuss. When I'm looking through existing articles, I personally try to use some sort of pragmatic strategy. Minor programming languages get treated (by me) more leniently than fringe physics theories, because they just seem to be less of a problem: "fringe PLs" is at worst useless, but not usually filled with crazy misinformation. Some other topics worth giving some more scrutiny to include kooky medicine, potentially libelous articles about living people, inflammatory uncited information about ethnic conflicts, etc. If something doesn't seem to harm anything, though, perhaps people should just make more use of tagging articles as unreferenced (or in need of improved references), rather than proposing them to be deleted. This is especially the case for "Case 1" above: if there are no sources currently, but there's a high probability that there will be some soon, might as well just leave the article and wait around a bit for them to show up. Tag it in the meantime so the reader is aware that the information isn't fully referenced, but no need to axe it. Perhaps some user-interface improvements could be made to better distinguish different qualities of articles, to balance the idea of Wikipedia as a work-in-progress encyclopedia with its reliability as an encyclopedia that people are using right now.

The second one is where I think the root of a lot of the problems lie, but is understandably not talked about as much, because it seems hard to fix. Despite the complaints about Wikipedia's management/consensus/cabal structure being somewhat opaque, yelling at Wikipedians (or just wading in) actually has some possibility of changing things, while I have no idea how you convince academic fields, journalists, or popular nonfiction writers to start writing about under-studied topics.

The third one is only partly explored and very interesting, I think. I support Wikipedia's goal of being a referenced tertiary-source encyclopedia, accurately summarizing all existing literature. That's a pretty ambitious goal to begin with, and I think the world benefits from such a source existing. But, Wikipedia doesn't have to be the only collaboratively edited information source on the internet. Some of the disagreements with Wikipedia's view on sourcing boil down, I think, to a belief that if the academic fields aren't studying something, but we know it's notable, we should just study it ourselves. That's a view I actually agree with, but I don't think it's something best done within Wikipedia, which is a sort of different project.

Of all things to have taken a lead on this, Know Your Meme is a surprisingly good example. Whereas Wikipedia documents internet memes from a tertiary-source perspective (citing news articles, sociology / internet-culture writings, etc.), Know Your Meme is an experiment in directly researching the history and context of internet memes. Contributors trace their emergence by digging through forum threads, archive.org, personal recollections, interviews, etc. That's a great idea and I fully support it.

I don't see any particular reason it needs to merge with Wikipedia, though; they're different kinds of collaborative knowledge-production projects, one dedicated to summarization, while the other is dedicated to actively doing sociology/history research. Some of this may be the fault of Wikipedians, some of whom still attach value judgments to the "notability" criterion, as if internet memes are somehow not serious enough for Wikipedia. I personally think they're plenty worthy of serious study, but still would want to separate a primary-source, research project (KYM) from a tertiary-source, summarizing project (Wikipedia). Both do good stuff, just different stuff.

Can more of that stuff exist in other areas? Some of this sort of thing is as old as the internet; for example, Kill from the Heart is an excellent research-project / historical archive of classic hardcore punk, with a focus on bands from less-studied countries. In music genres that are understudied, perhaps we as people who like doing volunteer stuff on the internet (and also like music) should do more documentation of our own history, collecting zine scans, discographies, interviews, dates and membership and gig histories, and any other information that can be found (some people are already doing this, of course). Again, this isn't quite the same project as Wikipedia, and I don't think it really has to be done as part of Wikipedia.

* * *

To summarize: Wikipedia has to a large extent moved towards being more inclusive on a notability standard, but also more strict on wanting information in its articles verifiable via cited sources. This works great, and largely towards the goal of including more topics, on things where good sources exist. On topics where they don't, Wikipedia is sort of in a tough spot: the people who were supposed to write those sources (historians, other researchers) have dropped the ball, and so we're left trying to improvise something while still meeting competing demands. Wikipedia could improve how it handles this, especially in terms of friendliness and accomodation of temporary problems, but it would be ideal to also explore other ways of collaboratively building information in areas that are so far underserved by traditional publishing: Wikipedia could be better, but it also doesn't have to be the only wiki on the internet.