Rules against citing DOI-less content are easier to understand once you understand a bit more about life within an academic publishing house.
An article with a DOI has more longevity. This is mentioned in the post, but I don't think the author fully grasps the importance of this. Plain URLs break down all the damn time; publishers/journals redesign their sites and migrate between technology partners often. It only takes a few Very Important People to flip out about a missing piece of data or a broken link before a publisher will start to put safeguards into the production workflow to prevent such complaints. Mandating DOIs alongside citations is an easy policy toward that goal. An article with a DOI will almost surely be supported by a technology back-end that can keep that DOI up-to-date, generate manifests for (C)LOCKSS, submit materials to archival services like Portico, etc.
DOI-bound citations are much more efficient throughout the publishing process. Editors can use DOIs to quickly verify and proof citations. If a publisher's workflow requires sending papers to an XML composition vendor for tagging, DOIs can make that process more automated and accurate. Some of the more sophisticated XML vendors can generate all the citations given only the DOIs. This might not sounds like much, but it's notoriously easy to make mistakes on citations throughout the editorial process, even with NLP and other more enlightened software in the mix. If you ever see an article with a citation list like this...
* Suzuki, et al. 2004...
* Suzuki, et al. 2006a...
* Suzuki, et al. 2006b...
* Suzuki, et al. 2006d...
* Suzuki, et al. 2006f...
* Suzuki, et al. 2006g...
* Suzuki, et al. 2007...
* Suzuki, et al. 2008b...
...think about all the double/triple/quadruple checking bullshit that goes into making sure all the text and links in that list are accurate, and know that DOIs take most of that headache away. It's substantially more efficient at scale when DOIs are included.
Recently I've been thinking about how some of these problems could potentially be solved via new ideas like IPFS. A Handle-based system like DOI might not be as necessary in IPFS (given IPNS), and perhaps you could start creating and citing more diverse sources of content with some confidence in preservation/longevity. It could be a great fit for open access content.
I can understand why a publishing house would find the DOI to be useful. I can understand why a publishing house would want to encourage people to use a DOI.
I do not understand why there are rules against citing DOI-less content, as that would seem to place the needs of the publishing house far above the needs for good scholarship.
> Thomas E. Kurtz, letter to Secretary, X3, December 21, 1965, Calvin N. Mooers Papers (CBI 81), Charles Babbage Institute, University of Minnesota, Minneapolis, box 20, folder 1.
That paper has dozens of DOI-less references, like meeting minutes and memos, which are stored in archives like the CBI or the Smithsonian.
How does a paper like that get published in a journal which has rules against citing DOI-less content?
> I do not understand why there are rules against citing DOI-less content, as that would seem to place the needs of the publishing house far above the needs for good scholarship.
Absolutely. Welcome to the industry!
But the publisher's concerns are not without merit. Your cited reference is useless if nobody can find it in a couple years. I'd argue that any article which does not take steps to preserve the availability and connectivity of its sources is neglecting a critical component of "good scholarship", even though this often has nothing to do with the authors or research methodology. (There are, of course, countless justifiable exceptions to this.)
There are creative solutions though. A good journal would take that PDF in your comment, get permission from the author, and package it up with the article as "supplementary data" or some other auxiliary material. A good publisher will find ways to otherwise publish critical outside materials.
We are stuck at an awkward time in digital scholarship. In the old days, you could cite any published material because everything was on paper, and everything was distributed to a thousand libraries, and you could reasonably expect the availability of everything, and there was no expectation that everything would be available instantly. You can't put any of that confidence in naked web pages; URLs are fatally unreliable for this purpose. DOIs, and confidence in the tech brokers that use them, help. And with time, as more and different types of online content become more reliable, it will become easier to cite anything like the old days.
I thought I want out of my way to say they publisher's concerns have merit, so feel somewhat insulted by your bringing it up as you did.
> Your cited reference is useless if nobody can find it in a couple years.
Nonsense. I told you exactly where to find it. It's in a box in the archives at the Charles Babbage Institute. Even more, I just visited that archive last month, and if I didn't look through that box I looked through its neighbor boxs. You can also request a copy for yourself. The CBI charges $0.25/page.
Do you believe that the Smithsonian and other archives are not a valid part of "preserv[ing] the availability and connectivity of its sources"? Not only did I pay a copy fee, but I gave extra money to the CBI as a gift - does that not count?
> ... if nobody can find it in a couple years
You speak from the point of view of a publishing company.
Many organizations publish besides publishing companies. IBM has (or had) their own corporate journals, both in-house and public. One of the common reference in my field is: Tanimoto, T. (17 Nov 1958). "An Elementary Mathematical theory of Classification and Prediction". Internal IBM Technical Report 1957. That's certainly not a peer-reviewed journal!
Google Scholar says it knows of 222 citations to this report. I have a copy that I ordered from the Niedersächsische Staats- und Universitätsbibliothek Göttingen. Up until I changed it a few weeks ago, the relevant Wikipedia entry said the report was 'unavailable'. https://en.wikipedia.org/w/index.php?title=Jaccard_index&typ... ). Yes, I wonder how many of the authors of those 222+ papers actually read the original paper
If there were DOIs in the 1950s, do you think this document would be more findable now? Perhaps with IBM, yes. But there are many small companies which also publish their own papers as their own press. (I can provide examples if needed.)
So if my company starts its own press, publishes papers with its own DOIs, then goes under, shuts down the servers, and doesn't bother to transfer copyright to a new provider ... just how will anyone be able to resolve those DOIs and find those documents in a couple years?
> A good journal would take that PDF in your comment, get permission from the author, and package it up with the article as "supplementary data"
Please, yes! I would love all of the journals to take charge of digitizing the world's archival data and make it more accessible! Many of the documents I looked at were untitled, and in cursive, and I can't begin to estimate the cost of indexing all of those properly.
Which journals do this, so I can publish in them? ... Or are there no good journals?
To be more specific, I want to cite pages 61-63 of the Work Book I for Calvin N. Mooers, located in box 13 folder 4 of archive CBI 81. This 1947 entry is a precursor to his 1951 Zatopleg paper and show that there's no connection to Wheland's 1949 connection matrix proposal. The sketch on p62 shows that he thought it had practical application. The double counter-signature on p63 shows that Mooers likely considered this a patentable idea. For a historian of science, this is quite useful.
Will the journal arrange the scans and the copyright licensing details for me? What happens if they and the archive can't come to an agreement on the copyright terms?
Will they scan only those pages, or the entire notebook? (Since there's more I'll want to cite in the future, it makes sense that they make a bulk request.)
If they scan the entire work book, will they make sure to remove the Social Security numbers of living people and other potentially sensitive information?
How much will this service cost me, vs. a standard fair-use quote that nearly all journals now would accept?
However, that doesn't help with the citation I gave, which is to a letter by Thomas E. Kurtz which is in the CBI archive for Mooers' papers.
The CBI archived Mooers' papers. These include works by Mooers and works by others. They control the copyright to Mooers' own works, but not those of the others. They do not have the ability to let people use Kurtz's letter beyond the research purposes allowed by fair use.
Which means no journal in the world (or at least those parts signatory to the Berne Convention) can do as you suggest. Not even good ones.
Should a paper not be published unless it's possible to do what you want, regarding DOI and some incompletely specified definition of what counts as sufficient permanence?
> We are stuck at an awkward time in digital scholarship
All interesting points to consider, but when haven't we been at an awkward time in publishing for the industry and for scholarship? I mean, I read a paper recently from the 1930s concerning the question of scientific priority when the paper is published only on microfilm. It asked questions like, does a microfilm-only publication really count as a publication?
At https://www.asis.org/Bulletin/Bulletin-50thAnniversary-Watso... you can see the founder of one such microfilm service. People could send in camera-ready auxiliary publications to be microfilmed and published on demand by those who want it. If someone sent in a document, which was listed on the ADI list of available documents, does that count as being published?
Replace "auxiliary publication" with "blog" and you'll see it's pretty much the same debate.
>> We are stuck at an awkward time in digital scholarship
I'll add to that. It will still be an awkward time in 20-40 years. Let's say that "Professor X" is a pioneer in a field, and her computer, which has her last 40 years of activity (email, source code, draft documents, downloads, etc.) on it is placed in an archive.
Now I, as an historian of science, go through that blob and figure out the timeline for some event, which is based on recorded IRC sessions, git commit messages, etc.
How do I cite those individual locations in the archival data?
Correct me if I'm wrong but IPFS intends to be torrent-like in terms of re-distribution - so you can no more guarantee that the article can be found there then with ordinary torrents, which I think is hardly more reliable then a url.
The difference is that anyone can "pin" any content in IPFS. So if an article cites any IPFS resource, it can also pin those resources regardless of whether or not they're academic, created by the same publisher, etc. IPFS would make it aggressively easy for any publisher to take responsibility for archiving wayward cited materials.
An article with a DOI has more longevity. This is mentioned in the post, but I don't think the author fully grasps the importance of this. Plain URLs break down all the damn time; publishers/journals redesign their sites and migrate between technology partners often. It only takes a few Very Important People to flip out about a missing piece of data or a broken link before a publisher will start to put safeguards into the production workflow to prevent such complaints. Mandating DOIs alongside citations is an easy policy toward that goal. An article with a DOI will almost surely be supported by a technology back-end that can keep that DOI up-to-date, generate manifests for (C)LOCKSS, submit materials to archival services like Portico, etc.
DOI-bound citations are much more efficient throughout the publishing process. Editors can use DOIs to quickly verify and proof citations. If a publisher's workflow requires sending papers to an XML composition vendor for tagging, DOIs can make that process more automated and accurate. Some of the more sophisticated XML vendors can generate all the citations given only the DOIs. This might not sounds like much, but it's notoriously easy to make mistakes on citations throughout the editorial process, even with NLP and other more enlightened software in the mix. If you ever see an article with a citation list like this...
* Suzuki, et al. 2004...
* Suzuki, et al. 2006a...
* Suzuki, et al. 2006b...
* Suzuki, et al. 2006d...
* Suzuki, et al. 2006f...
* Suzuki, et al. 2006g...
* Suzuki, et al. 2007...
* Suzuki, et al. 2008b...
...think about all the double/triple/quadruple checking bullshit that goes into making sure all the text and links in that list are accurate, and know that DOIs take most of that headache away. It's substantially more efficient at scale when DOIs are included.
Recently I've been thinking about how some of these problems could potentially be solved via new ideas like IPFS. A Handle-based system like DOI might not be as necessary in IPFS (given IPNS), and perhaps you could start creating and citing more diverse sources of content with some confidence in preservation/longevity. It could be a great fit for open access content.