No man is an island, an old saying goes. Nothing proves this better than microbiome research into the bacteria, archaea and viruses that co-exist with humans. Recent studies have shown many of these co-tenants of our bodies are essential to our well-being. Their presence and relative balance are correlated with an increasing number of illnesses, including many that have proven difficult to treat using conventional pharmaceuticals, perhaps due to their impacts on these populations. So far, the challenge of microbiome analyses has been DNA sequencing of thousands of microbial populations. In fact, scientists have studied less than one percent of the microbes on Earth – so this field of study is just beginning.
Jigsaw puzzles offer a useful analogy: imagine the most difficult puzzle you have seen, with thousands of pieces requiring months to fit together. Yet more than a hundred different puzzles, each as hard as the first, are dumped into a single pile. Now you must solve them all at the same time, without knowing what the completed puzzles should look like! This is similar to the complexity scientists face in deciphering human microbiomes. Now imagine that even though the puzzles are equally difficult, there are slight differences between the pieces that come from different puzzles; some are larger, some curved, while others are angular in shape. If we sort using these characteristics, similar pieces can be put together. The goal is to use as many features as possible to “cluster” the pieces into piles that correspond to one puzzle each.
Dr. Yu-Wei Wu from Taipei Medical University has developed an important algorithm to address this sorting problem: his “unsupervised algorithms,” MaxBin, software has been shown effective in DNA sequencing. This strategy makes the daunting task of microbiome DNA sequencing manageable, because DNA sequences also harbor similar features that universally exist in all species. Dr. Wu’s unsupervised machine learning algorithm, MaxBin, employs these features extracted from DNA sequences to cluster them into sub-sets that each pertain to one microbial species. This algorithm significantly reduces the complexity of microbiome analysis, allowing scientists to see ever further into this newly glimpsed microbial world. The MaxBin algorithm already has been widely adopted in microbiome research, and Dr. Wu is dedicated to maintaining this publicly available software so it can continue to foster progress in this promising biomedical research area.
Caption: Population genome of ‘Ca. R. cellulovorans’ recovered from metagenomics data from the 15 l cultivation. The genome was dispersed on 114 scaffolds (blue), with 2,814 predicted CDS (coding DNA sequences) in forward (red) and reverse (green) and average (orange) coverage. N50 is the shortest sequence length that includes 50% of the assembled genome, summing from the largest contig.