University of Sydney scientists make koala genome data available on AWS to accelerate efforts to prevent extinction.
Dr. Carolyn Hogg has never been one to stay in her lane. The senior research manager for theAustralasian Wildlife Genomics Group at the University of Sydney leads breakthrough scientific work in endangered species conservation and is-by her own admission-a "Type A personality who just wants to get on with it."
"Get on with it" she does. With Sydney under strict lockdown due to the spread of the coronavirus delta variant, and with a set of koala biopsy samples stuck in Queensland, Hogg contacted a friend in the northeast state to ask an unusual favor.
"I can't get to Queensland," she told us. "But I have a colleague there who's a chemist. I gave him and one of his Ph.D. students, who is also a chemist, a lesson on how to subsample biopsies over Zoom."
"Neither of them have done any genetics in their entire lives, but they drove to the Australia Zoo Wildlife Hospital, where the samples were being held, to subsample them for me. Then, they sent everything to my house so I could drop the material off at our lab. This is scientific collaboration on a whole new level."
Collaboration comes naturally to Hogg, who has been working in conservation for 25 years and is pushing for more multidisciplinary initiatives to tackle the increasingly complex problems faced by wildlife, in Australia and beyond-from climate change, to human destruction of habitat, and disease.
Hogg needs the koala DNA samples, because she is part of a concerted effort to map the genomes of 450 koalas from across the entire species range and make the data available to researchers anywhere in the world through the Amazon Web Services (AWS) Open Data Sponsorship Program.
By releasing this genetic data through the program-which covers the cost of storage for datasets of high value to the scientific community and makes it available to be accessed and analyzed in the cloud-Hogg and team hope to accelerate the research and development of solutions to protect the iconic, but increasingly threatened, marsupial.
"Genome sequencing is an incredibly powerful tool for decision making," said Hogg. "It provides us with so much insight into an animal, from its immune system, to its diet, to its population diversity."
"We know that the more genetic diversity a species has within itself, the greater its adaptive potential to deal with whatever might be around the corner. We want to find ways to give endangered species like the koala as much genetic variation as possible, so they can naturally adapt to the coming challenges."
"One approach might be, for example, moving koalas with certain genetic characteristics from one location to another. This does come with a suite of other issues, mainly because koalas are picky eaters, and some are pickier than others, but it is possible."
"And for researchers who are working on a vaccine for chlamydia-a bacterial infection that is a leading killer of koalas-this kind of genetic information could help to inform vaccine management programs, by identifying which koala populations are more susceptible to the disease than others."
Photo by Mathew Crowther
The project, which has backing from both the Australian federal government and the New South Wales government as part of its goal to double koala numbers by 2050, builds on previous work to sequence the koala genome, which was published in 2018.
This genetic blueprint, or "reference genome," was produced from the DNA of one individual koala. What Hogg and her fellow scientists are doing now is known as "resequencing"-sequencing the genomes of a wider group of animals that represent all the different types of koala found within the species.
"Koalas from Queensland tend to be a little bit smaller and browner in color, while koalas in Victoria tend to be bigger, grey, and fluffier," said Hogg. "The level of variation had led many scientists to believe that there might be two separate species of koala, until the koala reference genome was sequenced and proved this wrong."
"By generating this new, broader genome dataset, we can start to ask questions like, 'Do koalas living to the west have gene variants that we don't see on the east coast, where it's colder and wetter-so are those genes potentially important for dealing with climate change?'"
The original plan was to release batches of 96 sequenced koala genomes to the AWS Open Data Sponsorship Program every six to eight weeks, with all 450 being available by the end of 2021. While COVID-19 pushed back the timeline, the first set of genomes is now available for researchers to access.
Photo by Louise M. Cooper
Researchers can start working with the data for those specific koala populations immediately, without waiting for all the sequencing to be complete, although they must wait for all 450 genomes to be available before they can study how diversity compares across the full species range.
"The best way to safeguard the koala population is generally to stop cutting down their trees," said Hogg. "But as much as many of us would like to say, 'Let's just stop doing that,' we have to be realistic. We can't stop development. We have to find multiple, creative solutions to these problems, and not be wedded to tools we developed in the '90s."
For Hogg, this means harnessing technologies like the cloud to make species genome sequencing more equitable, ensuring that as many researchers as possible-regardless of their background or their institution's level of computing resources-can access genetic information to inform conservation efforts.
"I started working with AWS because I had so much data and I couldn't process it efficiently using the on-premises IT systems we had," she said. "We were able to reduce data analysis from 10 days to just five hours using AWS. When I realized the power of being able to work in the cloud, it encouraged me to go beyond my sphere of biology and find people who work in other fields of expertise."
A new report by 451 Research, part of S&P Global Market Intelligence, finds that computing in the cloud is five times more energy efficient than on-premises data centers in the Asia Pacific region.
This is not the first time Hogg and her colleagues have used big datasets in AWS Cloud for the purposes of species conservation. Last year, the team used AWS Cloud to generate genetic information for the management of the Tasmanian devil, an endangered Australian marsupial.
Today, Hogg leads the Threatened Species Initiativeto generate genomic resources for some of Australia's most vulnerable species. The project, launched in May 2020, has already generated resources for 62 species, which will be available on the AWS Open Data Sponsorship Program in the future.
Hogg describes her work as "thinking outside the realm of what you thought was possible" to make scientific advances happen.
"Our survival as a human race depends on the survival of the planet, and the planet depends on our ability to maintain diversity in the system," said Hogg. "It's all very well to ask, 'Why do we need to care about preserving one species?' But how many species are we willing to lose before we start to say, 'Oh, we've now lost an entire ecosystem!'"
"I am incredibly optimistic," she said. "Our ability to adapt to the changing environment is there. When we want to change something, we can."