Immune Cell Datamining

This pipeline currently only supports scRNA-Seq datasets in the 10X genomics format.

All pre-processing steps are fully automated, with hands-on analysis steps being included in the DataVisualization.Rmd file.

Additional Notes:

Supports automatic conversion of ENSEMBL IDs to gene symbols.
Filtering metrics can be configured in the DataVisualization.Rmd file.

Instructions

Source all R script files and load packages/config variables through the DataVisualization.Rmd file.
Place 10X Genomics formatted scRNA-Seq data files into the datasets/ directory.
Read and store data into individual seurat object variables
- samples <- generateSampleList(dataID)
Perform batch correction through integrateData() function or simply merge the samples normally through mergeData().
- Not necessary if only one sample is present in the dataset.
- TODO: Create metrics for determining if batch correction is necessary
Use runDimReduction() function to perform dimension reduction analysis on the processed dataset.
Use code blocks in DataVisualization.Rmd file to generate figures and visualize clustering/markers.
- TODO: Better support image file generation for results

(TODO) Usage instructions for all custom functions created for the pipeline:

Additional Notes

The "Immune-Cell-Datamining" folder should always be your current working directory (cwd).

Advice

When in doubt, use help(foo) to get quick documentation on a function

Pipeline Planning

Implementation/Design

Upon generation of a SeuratObject, the majority of sample-specific information will be imprinted into the @misc slot

Planned Features

Analysis Techniques:

Weighted Gene Co-Expression Network Analysis (WGCNA)
Clonality Trees
Trajectory Analysis/Pseudotime
Copy-Number Variations (CNVs)

Pipeline Improvements:

Bring expression level violin plots more in-line with past lab papers
Change symbol conversion function to update dataset files rather than being used during pipeline.
Support integration of separate datasets, while retaining sample characteristics/identifiers
Automatically classify as mouse vs human model (with manually override) and update gene references if necessary

ATAC-Seq

filter_gtf.py - Generates a subset GTF file containing only the annotations for the genes of interest. This prevents non-target genes from being included in the tracks.

Additional Information

Quick Notes

It is unlikely that the information each database stores for entries will be the same

We can use marker genes (those that are highly associated with specific cell types) to differentiate
Papers involving sequencing data analysis often require you to include the steps you had taken along the way (filtering process, etc.)
- Seurat does a lot of the heavy lifting by storing data transformations
- We could still make this a part of the script's job by saving the most relevant parameters & results (Ex: # of clusters) and generating an associated output file
Global Assignment (within functions): variable <<- data

Future Plans

Sub Cluster Discovery Pipeline
- Support "zooming in" on clusters we had generated to further investigate

Questions:

Most documentation/papers suggest using high [% mitochondrial gene expression as a filtering metric]{.underline}. How does this impact the study of genes such as TFAM (Mitochondrial Transcription Factor A)?

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
misc		misc
notebooks		notebooks
results		results
samples		samples
src		src
.Rprofile		.Rprofile
.gitignore		.gitignore
DataVisualization.Rmd		DataVisualization.Rmd
MainAnalysis.Rmd		MainAnalysis.Rmd
README.md		README.md
Seurat Function Cheatsheet.Rmd		Seurat Function Cheatsheet.Rmd
config.yml		config.yml
immune_r.yml		immune_r.yml
init.R		init.R
jobsubmit.slurm		jobsubmit.slurm
plot-119.png		plot-119.png
plot-120.png		plot-120.png
renv.lock		renv.lock
rsync.sh		rsync.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Immune Cell Datamining

Instructions

Additional Notes

Advice

Pipeline Planning

Implementation/Design

Planned Features

ATAC-Seq

Additional Information

Quick Notes

Future Plans

Resources

About

Uh oh!

Releases

Packages

Uh oh!

Languages

michakinyemi/Immune-Cell-Datamining

Folders and files

Latest commit

History

Repository files navigation

Immune Cell Datamining

Instructions

Additional Notes

Advice

Pipeline Planning

Implementation/Design

Planned Features

ATAC-Seq

Additional Information

Quick Notes

Future Plans

Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages