(CNN) Scientists investigating the origins of the coronavirus pandemic might be working with the wrong samples, because some early samples of the virus submitted by a Chinese researcher were deleted from a shared database, an expert in the evolution of viruses says.
Jesse Bloom, a researcher at the Fred Hutchinson Cancer Center in Seattle, said he found genetic sequences taken from early coronavirus cases in China that were deleted from a US National Institutes of Health database. Examination shows some of the early cases in the Chinese city of Wuhan are different, genetically, from the variants that eventually spread to cause the pandemic.
The sequences themselves do not shed any more light on the renewed debate about whether the virus spread naturally from animals to humans, or was the result of a laboratory leak, Bloom told CNN.
But he said his analysis shows the samples being used to investigate the origins of the Covid-19 pandemic may not be complete.
"I recover the deleted files from the Google Cloud, and reconstruct partial sequences of 13 early epidemic viruses," Bloom, who is helping with efforts to follow the genetic changes of the coronavirus, wrote in a pre-print paper posted on bioRxiv. It has not yet been peer-reviewed.
The NIH confirmed the sequences had been removed in June 2020 at the request of the investigator who originally submitted them in March 2020, and said it was standard practice to allow this. Geneticists have been sharing information in databases like this one since the start of the pandemic.
The World Health Organization has been leading efforts to find the origins of the coronavirus and issued a report in March saying it was most likely the virus originated in an animal and passed to people, as other coronaviruses have. Least likely, it said, is the possibility that a virus was engineered in a lab and leaked out.
Much of the investigation has focused on early cases at Wuhan's Huanan Seafood market.
But WHO has been criticized for accepting evidence from China, and the administration of President Joe Biden has been taking another look at the origins.
"We are aware of this report and, as we repeatedly asked, we hope that all data on early cases will be made available," WHO spokesman Tarik Jašarević told CNN by email.
Bloom said the missing sequences are no smoking gun.
"This study does not provide any additional strong evidence favoring either natural zoonosis or lab accident. Rather, it shows that there are additional sequences from relatively early in the outbreak that are still unknown, and in some cases have mutations that suggest they are probably evolutionarily older than the viruses from the Huanan Seafood Market," Bloom said in an email to CNN.
Scientists not involved with Bloom's analysis were skeptical about his conclusions.
"If these sequences were removed for the purpose of obscurity, it is also worth noting that such an effort clearly flopped because ... these sequences do not immediately provide any completely new knowledge about the genetic diversity of SARS-CoV-2 in the early pandemic," said Robert Garry, a professor of immunology at Tulane University.
"The reality is that minor scientific missteps and less-than-ideal circumstances surround the sharing of scientific data all of the time," Garry told CNN.
"In general, the work is vague or remiss about extremely important context and details about the sequences in question."
"The language of the paper is unusual, its contains a significant degree of supposition and conjecture, cites blog posts and appears to be pointing towards a deliberate cover up by Chinese authorities of early sequence data from Wuhan. However, this is an entirely subjective appraisal of the situation, which will be very difficult to confirm or disprove," Andrew Preston, a professor of microbial pathogenesis at the University of Bath in Britain, said in a statement.