Jessica Manning had no experience with coronaviruses. The infectious disease researcher had lived and worked in Cambodia off and on since 2013, studying the mosquitoes of the Mekong Delta and how their saliva helps spread disease in humans. But in January, the country flagged its first Covid-19 patient, and the lab that delivered the diagnosis wanted to send samples from the patient and his family to Manning for further testing.
Manning works at the National Institute of Allergy and Infectious Diseases’ Laboratory of Malaria and Vector Research in Phnom Penh, which is part of a decades-old collaboration between NIAID and the Cambodian National Center for Parasitology, Entomology, and Malaria Control. In September, her team had booted up a white machine, small enough to fit in an airplane’s overhead compartment and designed to read out DNA letters one by one. For the past few months they’d been using that new sequencer to figure out which microbes, other than the dengue virus, are behind so many high fevers in Cambodia. Now, they were going to ask it to piece together the coronavirus that had just arrived on their shore. And they were going to do it with the help of something called IDSeq.
IDSeq is a cloud-based, open-source bioinformatics pipeline for metagenomic sequencing. In non-scientist speak, it’s packages of computer code that comb through all the genetic material extracted from a sample—a tube of human blood, say, or a swab that’s been up someone’s nose. It matches all those mishmashed bits of DNA and RNA to massive databases of known microbes, telling you which bugs are in the mix. Running IDSeq only requires having a sequencer you know how to use and an internet connection.
IDSeq started out as a research project in the UC San Francisco lab of biochemist Joe DeRisi, where 17 years ago his team built technology that identified the coronavirus that causes SARS. More recently, DeRisi’s lab has been behind a push into clinical metagenomic sequencing, developing tests that have helped solve medical mysteries for patients being treated at nearby hospitals, including the case of a brain-invading tapeworm.
In 2016, when pediatrician Priscilla Chan and her husband, Facebook founder Mark Zuckerberg, pledged $3 billion over 10 years to fight infectious diseases, they chose DeRisi to co-helm their first investment: a new $600 million research center called the Chan Zuckerberg Biohub. Shortly after joining the Biohub, DeRisi brought on a large team of designers and engineers to turn years of cobbled-together code from his lab into an industrial-strength software package. In October 2018 they soft-launched IDSeq to a small group of test users, with the Facebook fortune footing the bill for all that computational crunching.
To get it into the hands of more scientists, especially in under-resourced places, the Biohub teamed up with the Bill and Melinda Gates Foundation. Grants from the foundation have begun to bring 10 teams of researchers from countries including South Africa, Bangladesh, and Madagascar to the Biohub to learn how to use IDSeq. In addition to training, the grants equip each international team with a small sequencer to take back to their home labs.
Manning received one of those grants to expand her work investigating undiagnosed fevers in Cambodia. At the end of last summer, just as the worst dengue epidemic in Cambodia’s history was peaking, she flew to San Francisco with two technicians from her lab for a week of training at the Biohub. By November, her team had IDSeq up and running, processing blood samples collected from fever patients at field hospitals across Cambodia. In early January, DeRisi brought a Biohub team to visit Manning’s lab and troubleshoot any issues they were having. During the trip, Manning recalls, they discussed news reports of mysterious pneumonia cases coming out of Wuhan, China. At the time, there weren’t wide reports of health care workers getting sick, so they expected it to blow over soon. DeRisi’s team flew back to California. “Then everything just hit the roof,” says Manning.
As Chinese officials began to grapple with an explosive outbreak of a novel coronavirus, global health experts worried about what would happen if it spread to China’s less technologically advanced neighbors in the South Pacific. When a new disease emerges, it’s crucial to track its spread. That means more than just gleaning the numbers of new cases. Collecting genetic information about a virus can help public health officials understand how it arrived in their country and take steps to slow its advance. It can also help researchers monitor the virus for mutations that might make diagnostic tests less effective.
But efforts to identify and contain such outbreaks are hindered in resource-poor settings. Take the Zika virus, which had been circulating in Brazil for two years before the country reported its first case. Researchers figured this out only much later, by piecing together viral genomes pulled from patients throughout the Americas.
So epidemiologists had reason to believe the new coronavirus could continue to spread undetected in countries like Malaysia, Indonesia, and Cambodia, which possess a weaker public health infrastructure than China and have historically lacked sequencing capacity. They feared that pockets of undiagnosed infections could silently sustain the outbreak, fueling its advance around the globe.
But with the arrival of a sequencer at Manning’s lab, Cambodia now had the ability to do metagenomic sequencing on patient samples. On January 26, technicians at the Institut Pasteur du Cambodge extracted viral RNA from nose and throat swabs taken from a 60-year-old Chinese man who had recently arrived from Wuhan and developed a fever. The next day, health ministry officials announced that these samples had tested positive for Covid-19, making the man the country’s first case.
Three days later, Manning’s lab received a few vials of RNA extracted from the patient’s swabs. Her team prepped it for sequencing, ran the sample through their new machine, and beamed the resulting data up to IDSeq. Then they waited while IDSeq’s algorithms sifted through all the chunks of genetic code, comparing each piece to GenBank, a collection of all publicly available genetic sequences. Although scientists in China had by then sequenced the coronavirus that causes Covid-19 and deposited that genomic data in GenBank, DeRisi’s team hadn’t yet updated their software to search the most recent version of the database. They wanted to let IDSeq fly blind and see what it could turn up.
Two hours later, the results rolled in. Manning stared at her computer screen at a heat map showing hits to GenBank. The darkest shade of red—indicating the most “reads”—matches between the sample and sequences in GenBank—was to the coronavirus that causes SARS. But it wasn’t an exact match. An almost equal number of reads mapped back to a coronavirus found in bats. “You could tell it’s a novel coronavirus that’s closely related to SARS but hasn’t yet been characterized,” says Manning.
About a week later, the IDSeq team updated its index to the newest version of GenBank. The bank now had nearly 85,000 new additions, including 54 sequences for the Covid-19-causing coronavirus collected from patients worldwide. This time, when the team ran their sample, the software came back with an unequivocal answer: Most reads completely matched the virus that had emerged in Wuhan just weeks before.
Together, these two tests proved that IDSeq could do what DeRisi had promised: detect the spread of known pathogens and serve as an early warning system for emerging ones. His team and Manning’s reported this proof of concept in a preprint on bioRxiv last week. The Biohub also released a project page for the Cambodian coronavirus case, providing a first glance of the software to the public.
“I’m just giddy with excitement about this,” says DeRisi. The primary goal of IDSeq, he says, is to empower scientists to run advanced molecular diagnostics in their own countries to solve local problems. But a secondary objective is to expand metagenomic sequencing capacity in under-resourced places, so that if a pandemic does occur, scientists around the globe will be able to detect it. “This is really a tremendous validation that this technology isn’t just for super-wealthy places that have a lot of server farms,” says DeRisi. “This can be done in the field and—with a little support—can really make a difference.”
How much of a difference still remains to be seen. Using IDSeq, together with a new technique for boosting the amount of viral material in a sample, Manning’s team was eventually able to assemble a whole genome of the strain of the virus that had shown up in Cambodia. Last month, when they added it to the public databases scientists are using to track how the virus is spreading and mutating, it was one of the only sequences from a low-income country close to the outbreak’s epicenter.
Manning says her group is on standby to sequence any additional cases confirmed by the country’s public health officials. But it’s only in the last week that Cambodia has begun to aggressively test for Covid-19, after a Japanese citizen who had traveled to Cambodia tested positive for the disease upon returning to Japan. He reportedly had contact with 40 Cambodian people, who are now being monitored in medical isolation. On Saturday, the Cambodian Health Ministry confirmed that one of them has been diagnosed with Covid-19.
In response, Cambodia’s Prime Minister ordered school canceled for two weeks. The move came in sharp contrast to weeks of the government downplaying the seriousness of the outbreak, in what critics called an effort to maintain positive political and economic relations with China, Cambodia’s biggest foreign investor, rather than preventing the spread of the disease. In February, researchers at the Harvard T.H. Chan School of Public Health analyzed flight travel between Wuhan and Cambodia and determined it was statistically improbable for Cambodia to have just one case (the one that Manning’s lab had sequenced). In other words, Cambodia wasn't finding more cases because health officials hadn’t been looking.
Now that they are, Manning hopes to contribute more sequences of the virus to GenBank as confirmed cases come in. She says her team is currently in the process of sequencing a case that was confirmed over the weekend. Not only will this digital coronavirus library help epidemiologists track its spread and evolution, but it could help ensure that any potential treatments or vaccines developed in Cambodia will be effective against the strains circulating there.
“These sequences are letting us see, in close to real time, how rapidly this virus is mutating, and serving as a roadmap to developing countermeasures,” says Manning. Her team will be limited, however, by the throughput of their new sequencer, which can run only one sample at a time. If Cambodia experiences a surge in cases, a backlog will quickly build up. Still, at least Cambodia will be on epidemiologists’ map. Any sequencing is better than nothing.