Google Unveils DeepSomatic: New AI Tool That Pinpoints Cancer-Driving Mutations With Unprecedented Accuracy

Google Unveils DeepSomatic: New AI Tool That Pinpoints Cancer-Driving Mutations With Unprecedented Accuracy

Google has introduced DeepSomatic, a powerful new artificial intelligence tool designed to identify genetic mutations that drive cancer growth more accurately than existing methods. The announcement, published in Nature Biotechnology, marks another milestone in the use of AI to accelerate cancer research and precision medicine.

Accurate somatic small variant discovery for multiple sequencing technologies with DeepSomatic - Nature Biotechnology
Somatic small variants in cancer genomes are identified in both short-read and long-read data.

Pinpointing the Genetic Triggers of Cancer

Cancer begins when normal cell division goes awry, often due to genetic mutations that disrupt normal biological controls. Understanding which mutations are responsible is essential for doctors to select targeted treatments that stop tumour growth and prevent its spread.

Modern cancer care often involves sequencing tumour DNA from biopsies. However, separating genuine cancer-related mutations from background noise and sequencing errors remains a major challenge—especially when dealing with somatic variants, which arise during a person’s lifetime rather than being inherited.

DeepSomatic, an open-source AI model, is speeding up genetic analysis for cancer research.
An overview of DeepSomatic, a new AI tool that helps identify complex genetic variants in cancer cells.

Somatic mutations can result from environmental damage like UV exposure or from natural errors during cell replication. Because they often appear in only a fraction of tumour cells, distinguishing them from technical errors requires exceptional precision—something DeepSomatic is built to deliver.

Using AI to identify genetic variants in tumors with DeepSomatic

How DeepSomatic Works

In typical clinical workflows, scientists sequence both tumour and normal cells from the same patient. DeepSomatic compares the two, spotting subtle differences that reveal which mutations are fuelling cancer.

The AI transforms raw DNA sequencing data into detailed visual representations, which are then analysed by convolutional neural networks—the same type of deep-learning architecture used in image recognition. This allows the model to distinguish between inherited genetic variants, sequencing artifacts, and true cancer-causing mutations.

DeepSomatic can also operate in a “tumour-only” mode for cases where healthy tissue samples aren’t available, such as blood cancers like leukaemia. This flexibility makes it suitable for a wide range of clinical and research applications.

Built on a Robust Foundation

To train DeepSomatic, Google partnered with the UC Santa Cruz Genomics Institute and the U.S. National Cancer Institute to create a gold-standard reference dataset called CASTLE. The team sequenced tumour and normal cells from breast and lung cancer samples using three major sequencing technologies. Combining these datasets produced a high-accuracy reference that filters out platform-specific errors.

This comprehensive training data helped the AI achieve superior performance across sequencing methods. On Illumina data, DeepSomatic reached a 90% F1-score for detecting complex insertions and deletions—outperforming the next-best model by a significant margin. The advantage was even greater with Pacific Biosciences data, where DeepSomatic’s accuracy exceeded 80%, compared to less than 50% for competing tools.

The system also proved effective on more challenging samples, including those preserved using formalin-fixed-paraffin-embedded (FFPE) methods and whole exome sequencing (WES) datasets. These findings suggest DeepSomatic can handle lower-quality or older samples often used in retrospective studies.

Beyond the Training Data

One of the most promising aspects of DeepSomatic is its ability to generalize. When tested on glioblastoma, an aggressive form of brain cancer not included in its training data, the tool successfully identified known driver mutations. In collaboration with Children’s Mercy Hospital in Kansas City, DeepSomatic also analysed paediatric leukaemia samples, confirming known variants and uncovering ten new ones—all from tumour-only data.

A Step Toward Precision Oncology

By making both the DeepSomatic model and the CASTLE dataset openly available, Google aims to accelerate global cancer research and improve clinical decision-making. The tool’s ability to identify both known and novel mutations could help doctors tailor treatments more precisely and researchers uncover new therapeutic targets.

As AI continues to reshape genomics, DeepSomatic represents a significant step toward understanding what drives each individual tumour—and ultimately, how to stop it.

Read more