January 29, 2025

AI model learns generalized 'language' of regulatory genomics, predicts cellular stories

A team of investigators from Dana-Farber Cancer Institute, The Broad Institute of MIT and Harvard, Google, and Columbia University have created an artificial intelligence model that can predict which genes are expressed in any type of human cell. The model, called EpiBERT, was inspired by BERT, a deep learning model designed to understand and generate human-like language.

The work appears in Cell Genomics.

Every cell in the body has the same genome sequence, so the difference between two types of cells is not the genes in the genome, but which genes are turned on, when, and how many. Approximately 20% of the genome codes for regulatory elements determine which genes are turned on, but very little is known about where those codes are in the genome, what their instructions look like, or how mutations affect function in a cell.

EpiBERT was trained on data from hundreds of human cell types in multiple phases. It was fed the genomic sequence, which is 3 billion base pairs long, along with maps of chromatin accessibility that inform which of these sequences are unwound from the chromosome and read by the cell.

The model was first trained to learn the relationship between DNA sequence and chromatin accessibility across large chunks of the genome in a specific cell type. It then used these learned relationships to predict which genes were active in the corresponding cell type. It accurately identified regulatory elements—parts of the genome recognized by transcription factors—and their influence on gene expression across many cell types, building a "grammar" that is generalizable and predictable.

This grammar-building process can be likened to the way a large language model, such as ChatGPT, learns to build meaningful sentences and paragraphs from many examples of text. The EpiBERT model can process accessibility and predict functional bases as well as RNA expression for a never-before-seen cell type.

EpiBERT will shed light on how genes are regulated in cells, and potentially, how the regulatory systems of those cells can be mutated in ways that lead to diseases such as cancer.

More information: Nauman Javed et al, A multi-modal transformer for cell type agnostic regulatory predictions, Cell Genomics (2025). DOI: 10.1016/j.xgen.2025.100762

Journal information: Cell Genomics

Provided by Dana-Farber Cancer Institute

Citation: AI model learns generalized 'language' of regulatory genomics, predicts cellular stories (2025, January 29) retrieved 30 January 2025 from https://phys.org/news/2025-01-ai-generalized-language-regulatory-genomics.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

AI model learns generalized 'language' of regulatory genomics, predicts cellular stories

Specific neurons may explain why maggots love the texture of decaying fruit

Complex engineering of human cell lines reveals genome's unexpected resilience to structural changes

Ancient DNA analyses bring to life the 11,000-year intertwined genomic history of sheep and humans

Atoms that measure magnetic fields could lead to new quantum sensors

Accepting AI judgments on moral decisions: A study on justified defection

Generating electricity from tacky tape: Follow-up research offers improved version

A scientist tracking koalas films unexpected social behavior between young males

Can queen conch aquaculture save the species? New publication suggests limitations

Wildfire smoke can carry toxins hundreds of kilometers, depositing 'urban grime'

Chameleon shrimp can adapt camouflage to invasive algae species

Relevant PhysicsForums posts

Is there a link between poultry consumption and longevity?

What are the costs for tests to measure bacteria and virus levels on surfaces?

Would neutrinosynthesis be physically possible for a lifeform to have?

US Government Shuts Down NIH Grant Reviews and More

Controlling optical wavelength of illumination to improve reproduction rates in microorganisms

How does laser hair removal work?

Tagging gene-regulating DNA sequences with barcodes provides insights into human genome

Computational biologists develop AI that predicts inner workings of cells

Cell-type-specific insight into the function of risk factors in coronary artery disease

A study uncovers the 'grammar' behind human gene regulation

AI-designed DNA switches flip genes on and off, allowing precise activation or repression

How proteins bind to closed regions of the genome to facilitate cell differentiation and development

Complex engineering of human cell lines reveals genome's unexpected resilience to structural changes

Specific neurons may explain why maggots love the texture of decaying fruit

Borrowing nature's blueprint: Scientists replicate bone marrow

Bats' genetic adaptations: How they tolerate coronaviruses without becoming ill

Bioengineering advance allows scientists to accurately predict and engineer protein metalation

Future antibiotics face early bacterial resistance challenges, studies show

Medical Xpress

Tech Xplore

Science X

AI model learns generalized 'language' of regulatory genomics, predicts cellular stories

Specific neurons may explain why maggots love the texture of decaying fruit

Complex engineering of human cell lines reveals genome's unexpected resilience to structural changes

Ancient DNA analyses bring to life the 11,000-year intertwined genomic history of sheep and humans

Atoms that measure magnetic fields could lead to new quantum sensors

Accepting AI judgments on moral decisions: A study on justified defection

Generating electricity from tacky tape: Follow-up research offers improved version

A scientist tracking koalas films unexpected social behavior between young males

Can queen conch aquaculture save the species? New publication suggests limitations

Wildfire smoke can carry toxins hundreds of kilometers, depositing 'urban grime'

Chameleon shrimp can adapt camouflage to invasive algae species

Relevant PhysicsForums posts

Related Stories

Tagging gene-regulating DNA sequences with barcodes provides insights into human genome

Computational biologists develop AI that predicts inner workings of cells

Cell-type-specific insight into the function of risk factors in coronary artery disease

A study uncovers the 'grammar' behind human gene regulation

AI-designed DNA switches flip genes on and off, allowing precise activation or repression

How proteins bind to closed regions of the genome to facilitate cell differentiation and development

Recommended for you

Complex engineering of human cell lines reveals genome's unexpected resilience to structural changes

Specific neurons may explain why maggots love the texture of decaying fruit

Borrowing nature's blueprint: Scientists replicate bone marrow

Bats' genetic adaptations: How they tolerate coronaviruses without becoming ill

Bioengineering advance allows scientists to accurately predict and engineer protein metalation

Future antibiotics face early bacterial resistance challenges, studies show

Newsletter sign up

Donate and enjoy an ad-free experience