(CNN) As the internet evolves, webpages that were live years ago are frequently no longer available today. In some cases, that's for the best (so long, embarrassing MySpace profiles), but it is a concern when it comes to scientific research.
A new study found that 176 open access journals from 47 countries have vanished from the Internet between 2000 and 2019, and nearly 900 "inactive" journals may be at risk of vanishing in the future.
The issue of insufficient preservation of the scholarly record online has been raised before, however "no one has really put out names of journals or tried to identify a quantity," said study co-author Mikael Laakso, an associate professor at Hanken School of Economics.
The study doesn't analyze why these specific journals disappeared, or their quality, but it found that over 50% of them had an academic affiliation. As far as topics, over 50% of the vanished journals were about social sciences and humanities, although health, physical science, mathematics and life sciences were also represented.
"There is usually an immense amount of time contributed by a lot of different people behind every article," from the authors, to the editors, all the way to peer-reviewers, Laakso told CNN.
"For all that work to be nullified and cut off from ever making an impact on the world, for such a trivial reason as not having a backup system in place for PDF files is not something that should be accepted," Laakso added.
The study, published as a pre-print, is available on arXiv, an open access archive of scholarly articles.
With little documentation available on what content falls offline, researchers said they had to do some "detective work" to gather data, something they believe speaks to the need for better tools to capture this phenomenon.
They tracked the presence of the journals in major bibliographic indexes such as the Directory of Open Access Journals and others, over the years. They said the process was like going through old phone books to see if, when a phone number is removed from the index, the users are still alive.
After narrowing down the list of titles that were no longer featured in indexes as years passed, they then sifted through thousands of webpages to figure out what had happened to each individual title, relying on tools such as the Internet Archive's Wayback Machine.
In terms of absolute numbers, the study finds that only a small proportion of open access journals disappeared within the past two decades, but the authors warn against reading that with optimism.
"We think that more journals might be at risk of vanishing in the future," Lisa Matthias, a Ph.D. candidate at the Free University of Berlin and a co-author of the study, told CNN.
The study identified 900 "inactive" journals that may be at risk of vanishing, since over three-quarters of the journals that ended up falling offline did so within 5 years from the last publication.
In an email to CNN, the Directory of Open Access Journals said the study "reinforces our view that DOAJ must help those journals, indexed with us, to preserve their content, and we need to find a model where, depending on their economic profile, the cost of doing so is not always passed back to the journal."
Why does digital content disappear from the internet? There are plenty of reasons, ranging from technological advances that make webpages obsolete, to Web hosting bills going unpaid.
"The average life of a webpage is 100 days before it's either changed or deleted," Brewster Kahle told CNN. Kahle is the founder of the Internet Archive, a non-profit organization that aims to be "the library for the Internet," as Kahle puts it.
"The Web is an ever-shifting set of sands," Kahle said.
The issue affects all kinds of digital content, but when it comes to scholarly literature, there are still gaps in knowledge about what is even out there to be saved.
The Internet Archive set out to find and archive all journal articles available online in 2018, and more recently, it received funding from the Mellon Foundation to pursue this goal, Kahle explained.
"By our analysis, 18%, or over 3 million, open access articles since 1945 are not independently archived, either by us or by other preservation organizations," Kahle said. The Internet Archive and the authors of the study on vanishing open access journals have joined forces to address the problem.
Historically, the role of preserving content for future generations rested with libraries, but in the digital age, the role of libraries has become much more complex, as the rising cost of commercial scholarly literature impacts their budgets.
Judy Ruttenberg, senior director of Scholarship and Policy at the Association of Research Libraries, told CNN that libraries have focused on this work since the 1990s, but all the while, "subscription costs began to far outpace inflation and crowd out investments in other literature and, quite frankly, in programs like preservation."
According to Ruttenberg, the study on vanished open access journals is "a wake-up call for us to pay more attention."
What is needed, according to Ruttenberg, are coordinated approaches as the scientific community moves from a commercially dominated mode of publishing to open access.
"This story is about resource allocation and coordination," Ruttenberg said.
Subscription-based digital scholarly content is not exempt from the issue of vanishing from the Web, but content from smaller or more independent open access publishers lacks some of the protections and resources that commercial content is more likely to enjoy.
"The publishing technologies employed to address preservation and archiving are mostly US or European initiatives where the solutions come with a price," the Directory of Open Access Journals told CNN in an email.
"For traditional commercial or society publishers, the fees to implement such a service and then deposit in them are negligible, compared to the income from subscriptions or open access publication charges. For small, scholar-led publishers or for single journals, often with no steady revenue stream, the fees can be prohibitive," DOAJ explained.
There are also technical issues to consider.
"To get the content into a service can require specialized knowledge and often involves some form of testing and sampling. The individuals running these journals may not have the time, skills or funding to be able to do this," DOAJ explained.
Internet Archive founder Brewster Kahle cautioned that looking at blind spots in how open access journals from the past were preserved shouldn't suggest that commercial publishers are better equipped to handle preservation than open access publishers.
He mentioned successful initiatives in the open access space like PLOS, a multidisciplinary non-profit publisher founded in 2001, or arXiv, a Cornell University-operated open archive and free distribution service started in 1991.
"Those guys are designed to be archived, they are designed to be picked up and used for new types of research," Kahle told CNN.
The importance of archiving content goes beyond preservation, Kahle explained.
"When you can gather these materials, you can start to do studies on the whole body of knowledge. You can do what's called meta-science, or the science of science," he said.
Such studies allow for the detection of biases, or new patterns.
"That data mining is just fantastically valuable," Kahle said. Under non-open access publishing agreements, these types of analyses may raise copyright infringement concerns.
Even though the principles of open access to scientific information are supported by international organizations such as UNESCO, and shared by many in the scientific community, the transition to a fully open model is still a work in progress.
"The challenge in the transition is to make sure that we end up with the infrastructure for libraries to be able to coordinate their investments in open content, the same way that we have all kinds of tools to coordinate our investments in subscriptions or purchased content," Ruttenberg said.
At a time when so many are turning to online resources for their learning, due to the pandemic, the conversation on open access knowledge is all the more relevant.
"Covid and the mass transition to virtual research and learning is a huge demonstration of the need for open access," Ruttenberg said.