Scooping and New Publication Models

If I come up with a great new research idea, sit on it for a couple of years, and then someone else publishes it before I do, then I got scooped. This is unfortunate but as long as the ideas were developed independently, nobody has done anything wrong. Of course, in the real world, establishing independence can be hard and sometimes this causes nastiness like the Leibniz-Newton controversy. This piece is about some flavors of scooping where ideas are not independently developed.

First I’ll quickly summarize the ground rules. It is always OK for me to publish work that builds upon someone else’s ideas providing that the earlier work has been published and that I properly acknowledge it. It is sometimes OK for me to publish ideas building upon someone else’s work without citation, for example if the work is old enough and well-known enough that it can be considered part of the common knowledge. Without this loophole we’d have to cite a pile of Knuth and Dijkstra (and Leibniz and Newton and Aristotle…) papers almost any time we wrote anything. It is never OK for me to publish ideas that build upon someone’s unpublished work that I acquired through unofficial channels such as reviewing a paper, sitting on a grant panel, finding a paper at the printer, or similar.

These rules are pretty simple and usually there’s no trouble following them. But how do they interact with newer publication possibilities such as arXiv or blogs? I recently heard about a case where a professor placed a paper draft on arXiv. Subsequently, this professor submitted a revised version of the arXiv paper to a conference where it found itself in competition with a different paper, also based on his work that had been posted to arXiv, but which was written by someone completely different who had simply read the paper from arXiv, liked it, and wanted to extend it. Was this OK? How about if something similar happens based on a blog post? What if I host a paper that I’m working on in a public Github repository and someone reads it there, extends it, and wins the publication race?

I propose that the guideline for “proper use” of someone’s ideas should be based on a standard similar to the public disclosure rule for patents in the US. In other words, if a researcher has publicly disclosed his or her ideas—in a blog, on arXiv, in a technical report, on Github, or anywhere else—then I am free to build upon them, provided of course that I cite the source properly. Conversely, a confidential disclosure of research ideas (peer review of grant proposals and papers would fall into this category) would not give me the right to scoop a researcher.

The patent analogy is not perfect but I think we should borrow the general idea that any disclosure is public unless it is explicitly confidential. I’ll be interested to hear what people think. Finally, to return to the example, posting the work on arXiv constituted a public disclosure of the ideas, so the competing researcher was perfectly within his/her rights to submit a paper that would potentially scoop the original author.

7 thoughts on “Scooping and New Publication Models”

  1. I agree with your conclusion, although I wasn’t under the impression that there was ever a controversy about this. I regard everything on the arXiv as public, citable, and usable. I regularly cite such results during the time period when the authors are trying to get the paper published in a good conference/journal, which could take up to a couple of years from the time that they first write a solid paper and make it public on the arXiv.

    I think the issue is whether arXiv papers count as proof that the authors were the first to get a result. If they do (and I think they do and was under the impression that everyone agreed), then one could not make the claim that it is unethical to build on those results before conference/journal publication. Otherwise putting something on the arXiv and not sending it somewhere for refereed review (or sending it for review much later) would be similar to the phenomenon that Russell Impagliazzo complained about here, about “taking credit without giving ideas” (regarding the specific case of Vinay Deolalikar claiming to have proved P is not NP in a paper that after a bit of scrutiny was shown to have many serious flaws, who then refused to renounce the claim even after the flaws were demonstrated):

  2. Hi Dave, I agree that this shouldn’t be particularly controversial, but I know there are people who disagree and figured it was worth airing the argument.

  3. Getting confidential early access to someone’s research may put you in the position to get a head start on extending some idea/discovery. You cannot publish anything that relies on what you’ve read before it gets published, but that will take some months and you can do your own work based on those ideas. Then, when it gets publicly disclosed, you can publish your work before the competition. Is this ethical? Has it happened?

  4. Perhaps the issue here is that the arxiv is fulfilling part of the goal of a publication (publicity), but not the other half (academic currency).

    I wonder how much of the thorniness goes away if we are willing to take the arxiv seriously as a final publication venue. That is, if a paper which exists mainly on the arxiv is eventually recognized for the full extent of its intellectual contribution, then not only there should be no problem with someone citing it, *that would be the point*.

    No one would seriously point out to a blog post as the final embodiment of an academic contribution, and so it would make sense to ethically consider it equivalent to something you heard while having a beer. But as long as the contribution is in the right format and has the right content (which encompasses much of the work in the arxiv), why shouldn’t it count?

  5. I also agree that arXiv “counts”. The blog post is perhaps the interesting borderline case. I don’t think I would want to read a paper that cited a blog post, but I’d also be miffed if someone didn’t cite a blog post when they should have…

  6. I can easily see that people will start pushing huge numbers of half-baked reports to arXiv just to stake their claims to ideas.
    Who then determines the minimum requirements for something to be citable?
    Does just describing an idea make a contribution that should be cited?

