Developed by a team of Chinese researchers at the Qingdao Institute of Bioenergy and Bioprocess Technology (QIBEBT), the new tool promises to be the ‘compass’ that guides further exploration within the vast universe of microbiome big-data, says the team.
"MSE [is] to microbiome big-data is like Google or Baidu to [is] webpage big-data,” says SU Xiaoquan, lead of the bioinformatics group at the single-cell center, QIBEBT.
“By searching for the most structurally or functionally similar microbiomes in a super-fast manner, MSE offers the first opportunity to relate each microbiome ever published to the microbiome big-data known to mankind so far.”
The recent rapid expansion and development of new microbiome sequencing technologies have created many opportunities for researchers and industries. However, a key challenge remains in being able to relate new microbiome samples to existing microbiome data.
Indeed, despite the immense volume of data created by projects like the Earth Microbiome Project (EMP) and the Human Microbiome Project (HMP), very few computational approaches are available to process and integrate them, said the authors.
In particular, it is difficult to relate a new microbiome sample to the huge number of existing microbiome samples, they noted.
In databases of 100 thousand to 1 million microbiomes, MSE is up to three orders of magnitude faster in searching for the closest neighbours of a microbiome in terms of structure, compared with existing strategies such as pairwise comparisons, said the authors.
"We envision that such search against the microbiome database will be an important first step for data analysis at various scales in microbiome studies, just as a BLAST search is essential and universal in sequence analysis studies today," commented Rob Knight, director of center for microbiome innovation at the University of California at San Diego.
SU Xiaoquan added that the MSE technology also makes comparison of microbiome at the global scale possible; “enabling a bird's eye view of microbiome data universe.”
Published in mBio, the tool introduces two concepts to quantify the novelty of a microbiome. The first is the microbiome novelty score (MNS) which allows identification of microbiomes that are especially different from what is already sequenced. The second, the microbiome attention score (MAS), allows identification of microbiomes that have many close neighbours, implying that considerable scientific attention is devoted to their study.
The Chinese team believe that by computing a microbiome focus index based on the MNS and MAS, they are able to objectively track and compare the novelty and attention scores of individual microbiome samples and projects over time – and predict future trends in the field.
Microbiome Focus Index, or MFI, which is derived from MNS and MAS, can measure the impact and contribution of a microbiome sample to mankind's exploration for novel microbiomes.
“Therefore, MNS, MAS, and MFI can serve as ‘alt-metrics’ for evaluating a microbiome project or prospective developments in the microbiome field, both of which are done in the context of existing microbiome big data,” the authors said.
Using MSE, the team predicts the identification of new ‘sleeping beauty’ microbiomes – something they describe as being published microbiome samples that are still very novel in structure at present yet are destined to attract a lot of scientific attention in the next several years, based on temporal growth of their MAS.
These ‘sleeping beauties’ are mainly from marine environments and mother-baby interactions, they said.
As such, they suggested that data mining, made possible by MSE, could help the scientific community, industry and funding agencies to decide the research areas with the highest potential in generating high-novelty and high-impact microbiome data.
Published online ahead of print, doi: 10.1128/mBio.02099-18
“Identifying and Predicting Novelty in Microbiome Studies”
Authors: Xiaoquan Su, et al