Published October 27, 2025
Email contact@novogaia.bio
We compared the accuracy of our model in predicting molecular structures from mass spectra on the industry standard MassSpecGym benchmark for De novo molecule generation. Gaia-01 outperforms all other models, MADGEN (Tufts, 2025), DiffMS (MIT, 2025), and MIST-MolForge (DSO, 2025). Top-10 accuracy measures how often the correct molecular structure appears among the model’s ten highest-ranked predictions.
Small molecules therapeutics remain the cornerstone of modern medicine. Yet most new drugs are built from a limited set of known chemical scaffolds that are easy to synthesize. This has kept discovery efforts clustered in a narrow region of chemical space, repeatedly optimizing known, low-complexity structures rather than exploring novel ones.
Meanwhile, nature operates in a far broader chemical universe. Over billions of years, it has generated complex, functional molecules far beyond the diversity of what humans can make. About half of all approved small-molecule drugs are in fact structurally inspired by natural designs, with evolution optimizing these molecules for biological function. The visualization below shows how natural molecules (from animals, bacteria, fungi, and plants) occupy distinct regions of chemical space compared to synthetic ones. Each dot represents a molecule, clustered by similarity.
While screening nature for new molecules has sharply declined since the 1980s due to slow, manual discovery workflows and limited structural identification tools, advances in machine learning and analytical chemistry now make it possible to decode nature’s chemistry at scale. This unlocks vast, bioactive regions of chemical space that were previously out of reach for drug discovery.
The key step for knowing what molecules are in a natural sample, and whether they are interesting for drug discovery, is decoding their chemical structure. Mass spectrometry is the fastest and most sensitive method for profiling molecules from natural samples, and its resolution is now orders of magnitude higher than during past large-scale screening efforts. The technology works by breaking molecules into fragments and measuring their mass-to-charge ratios, the patterns of which can be used to infer a molecule’s chemical structure. Recent advances now allow fine-grained distinction between closely related molecules and reconstruction of complex molecular structures directly from natural samples. Modern machine learning transforms this detection tool into a predictive tool for inferring molecular structures.
Gaia-01
What Gaia-01 enables
Gaia-01 advances two critical capabilities:
New chemical starting points for drug discovery, directly from nature
Gaia-01 allows us to rapidly identify molecules with drug-like properties from natural samples. From these natural molecules, we can design synthetic analogues for testing against therapeutic targets, bridging nature’s molecular diversity with modern medicinal chemistry.
A vastly expanded data foundation for generative molecule design
Current generative small molecule models repurpose known compounds due to limited data. Gaia-01 can recover molecular structures hidden in millions of publicly available mass spectral datapoints, expanding the set of known natural molecules by up to 100-fold. This opens the door to generative models that learn not just from human-made chemistry, but from nature’s own design principles.
Next steps
Gaia-01 was built through the dedication of a small team over a few intense months. We are computational biologists and machine learning engineers from leading academic labs at Imperial College London, UCL, TU Delft, ETH Zurich. We are backed by a former GSK and Merck executive and serial-biotech entrepreneur, David Pompliano, and a pioneer in ML for metabolomics, Tomáš Pluskal.
At Novogaia, we apply these technologies to decode fungal chemistry. Fungi remain one of nature’s richest but most unexploited sources of pharmacologically active molecules. Our mission is to unlock a new era in drug discovery from fungi by using AI to systematically uncover their molecular diversity and translate it into new therapeutic breakthroughs. To make that happen, we’re building a broader AI-driven discovery pipeline that brings this technology fully to life. We’ll share more about this soon.
Next, we’re excited to test Gaia-01 in the lab and demonstrate how it can accelerate real-world discovery on our first set of drug targets related to autoimmunity. We’re making the model available to select research partners to expand its applications. If you’re interested in trying Gaia-01 or collaborating on natural product discovery, we’d love to connect: contact@novogaia.bio.
