0% found this document useful (0 votes)
10 views

KaggleX Workshop - Machine Learning For Genomics

This document discusses machine learning applications in genomics. It begins with an introduction to genomics and genomic data. It then discusses how machine learning can help analyze vast amounts of genomic data by cutting through technical noise. It provides examples of machine learning applications for genomic data analysis, personalized medicine, drug discovery, clinical genomics, and agriculture. Common machine learning techniques for genomics are also listed, along with challenges and best practices. The document emphasizes the importance of collaboration between biologists and data scientists for producing more accurate and interpretable machine learning models in genomics.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

KaggleX Workshop - Machine Learning For Genomics

This document discusses machine learning applications in genomics. It begins with an introduction to genomics and genomic data. It then discusses how machine learning can help analyze vast amounts of genomic data by cutting through technical noise. It provides examples of machine learning applications for genomic data analysis, personalized medicine, drug discovery, clinical genomics, and agriculture. Common machine learning techniques for genomics are also listed, along with challenges and best practices. The document emphasizes the importance of collaboration between biologists and data scientists for producing more accurate and interpretable machine learning models in genomics.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Machine Learning For

Genomics: Unveiling the


Power of Data-driven
Insights

Prepared By - Salman Ibne Eunus


Introduction To Genomics
❖ Genomics is the study of all the genes of an
organism which includes - interactions of these
genes with each other and their environment.

❖ All living organisms starting from single celled


bacteria to multicellular plants, animals &
humans have a genome.

❖ Humans Genomes are made up of DNA. DNA


or Deoxyribonucleic Acid is a molecule that
carries biological instructions which makes
every species unique.
Understanding Genomic DATA
★ Data related to structure and function of an
organism’s genome.

★ Cellular data an organisms need to grow and


function.

★ Includes information like - sequence of molecules


in an organism's genes, function of each genes,
interactions between different genes, RNA and
proteins, etc.

★ Genomic Data Scientists use data from DNA


sequences to research diseases and discover new
novel treatments.
Intersection of Machine Learning &
Genomics
● Rapid advancement in genome-sequencing
technologies produces vast amount of data.

● Too large for traditional applied statistical


techniques to handle.

● Most valuable signals in genomic datasets are


often tiny and masked by technical noise.

● Sophisticated ML technologies can cut through the


noise and help researchers draw clinically useful
information from cross-disciplinary genomic
datasets.

● Machine Learning to datasets generated from


genome sequencing has been so successful.
Applications of Machine Learning in Genomics
★ Genomic Data Analysis - Interprets genomic data more
accurately than conventional methods. Example - Data
preprocessing, quality control, alignment, variant calling,
functional annotation, etc.

★ Personalized Medicine - Identifying genetic mutations


associated with specific disease; AI powered diagnostics.

★ Drug Discovery and Development - Researchers can


design new drug-like molecules, predict drug efficacy,
solubility & toxicity, developing new drugs more quickly.

I★ Clinical Genomics - AI-powered solutions for clinical


genomics applications, including genomic diagnostics,
prognostics, and therapeutics are revolutionizing the way
healthcare is provided.

★ Agriculture & Livestock - Researcher can identify


desirable traits and can develop efficient breeding
programs, improves crop yields.
Common Machine Learning Techniques Used in
Genomics
➢ Supervised & Unsupervised Learning
Algorithms

➢ Deep Learning and 1D CNN

➢ Recurrent Neural Networks

➢ Generative Adversarial Networks or GANs

➢ Graph Neural Networks

➢ Transformers & Attention Mechanisms


Machine Learning Challenges For Genomics

➔ Lack of Flexible Tools

➔ Fewer Biological Samples

➔ Computational Resource Requirements

➔ Lack of high quality labelled data

➔ Lack of Model Interpretability


Best Practices For Applying ML To Genomics

❏ Understand the problem and knowing your data better

❏ Using simple models for simple problems

❏ Establishing a baseline for your model

❏ Ensure reproducibility

❏ Use pre-existing models for genomics

❏ Focus on feature engineering

❏ Tuning hyperparameters automatically

❏ Normalizing the data

❏ Avoiding Overfitting
Collaboration between Biologists and Data Scientists
● Brings together domain expertise & computational
skills producing more accurate, impactful and
interpretable results.
● Biologists can guide data scientists in selecting
relevant genomic features, designing experiments that
capture biologically meaningful variations.
● Data Scientists brings expertise in statistics, ML &
data manipulation thus leading to an optimal
experimental design
● Biologists understands the noises, biases in genomic
data ensuring efficient preprocessing steps such as -
data normalization, batch-effect removal, etc.
● Ethical considerations like dealing with sensitive
patients informations. Collaboration ensures that data
privacy & legal guidelines are followed
ANY QUESTIONS ?

You might also like