0% found this document useful (0 votes)

17 views

Tigger

Uploaded by

timmydocker19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

Tigger

Uploaded by

timmydocker19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Package ‘tigger’

October 10, 2023

Type Package
Version 1.1.0
Date 2023-10-10
Title Infers Novel Immunoglobulin Alleles from Sequencing Data
Description Infers the V genotype of an individual from immunoglobulin (Ig)
repertoire sequencing data (AIRR-Seq, Rep-Seq). Includes detection of
any novel alleles. This information is then used to correct existing V
allele calls from among the sample sequences.
Citations:
Gadala-Maria, et al (2015) <doi:10.1073/pnas.1417683112>,
Gadala-Maria, et al (2019) <doi:10.3389/fimmu.2019.00129>.
License AGPL-3
URL http://tigger.readthedocs.io
BugReports https://bitbucket.org/kleinstein/tigger/issues
LazyData true
BuildVignettes true
VignetteBuilder knitr
Encoding UTF-8
Depends R (>= 4.0), ggplot2 (>= 3.4.0)
Imports alakazam (>= 1.3.0), dplyr (>= 1.0.0), doParallel, foreach,
graphics, gridExtra, gtools, iterators, lazyeval, parallel,
rlang, stats, stringi, tidyr (>= 1.1.0), utils
Suggests knitr, rmarkdown, testthat
RoxygenNote 7.2.3
NeedsCompilation no
Author Daniel Gadala-Maria [aut],
Susanna Marquez [aut, cre],
Moriah Cohen [aut],
Jason Vander Heiden [aut],
Gur Yaari [aut],
Steven Kleinstein [aut, cph]

1
2 AIRRDb

Maintainer Susanna Marquez <[email protected]>

Repository CRAN
Date/Publication 2023-10-10 13:40:02 UTC

R topics documented:
AIRRDb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
cleanSeqs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
findNovelAlleles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
findUnmutatedCalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
generateEvidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
genotypeFasta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
GermlineIGHV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
getMutatedPositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
getMutCount . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
getPopularMutationCount . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
inferGenotype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
inferGenotypeBayesian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
insertPolymorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
plotGenotype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
plotNovel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
readIgFasta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
reassignAlleles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
SampleDb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
SampleGenotype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
SampleGermlineIGHV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
SampleNovel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
selectNovel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
sortAlleles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
subsampleDb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
tigger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
updateAlleleNames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
writeFasta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Index 33

AIRRDb Example human immune repertoire data

Description
A data.frame of example V(D)J immunoglobulin sequences derived from a single individual
(PGP1), sequenced on the Roche 454 platform, and assigned by IMGT/HighV-QUEST to IGHV1
family alleles.
cleanSeqs 3

Format
A data.frame where rows correspond to unique V(D)J sequences and columns include:
• "sequence_alignment": IMGT-gapped V(D)J nucleotide sequence.
• "v_call": IMGT/HighV-QUEST V segment allele calls.
• "d_call": IMGT/HighV-QUEST D segment allele calls.
• "j_call": IMGT/HighV-QUEST J segment allele calls.
• "junction_length": Junction region length.

References
1. Gadala-Maria, et al. (2015) Automated analysis of high-throughput B cell sequencing data re-
veals a high frequency of novel immunoglobulin V gene segment alleles. PNAS. 112(8):E862-
70.

See Also
See SampleDb for Change-O formatted version of AIRRDb.

cleanSeqs Clean up nucleotide sequences

Description
cleanSeqs capitalizes nucleotides and replaces all characters besides c("A", "C", "G", "T", "-",
".") with "N".

Usage
cleanSeqs(seqs)

Arguments
seqs vector of nucleotide sequences.

Value
A modified vector of nucleotide sequences.

See Also
sortAlleles and updateAlleleNames can help format a list of allele names.

Examples
# Clean messy nucleotide sequences
seqs <- c("AGAT.taa-GAG...ATA", "GATACAGTXXZZAGNNPPACA")
cleanSeqs(seqs)
4 findNovelAlleles

findNovelAlleles Find novel alleles from repertoire sequencing data

Description
findNovelAlleles analyzes mutation patterns in sequences thought to align to each germline allele
in order to determine which positions might be polymorphic.

Usage
findNovelAlleles(
data,
germline_db,
v_call = "v_call",
j_call = "j_call",
seq = "sequence_alignment",
junction = "junction",
junction_length = "junction_length",
germline_min = 200,
min_seqs = 50,
auto_mutrange = TRUE,
mut_range = 1:10,
pos_range = 1:312,
pos_range_max = NULL,
y_intercept = 0.125,
alpha = 0.05,
j_max = 0.15,
min_frac = 0.75,
nproc = 1
)

Arguments
data data.frame containing repertoire data. See details.
germline_db vector of named nucleotide germline sequences matching the V calls in data.
These should be the gapped reference germlines used to make the V calls.
v_call name of the column in data with V allele calls. Default is v_call.
j_call name of the column in data with J allele calls. Default is j_call.
seq name of the column in data with the aligned, IMGT-numbered, V(D)J nu-
cleotide sequence. Default is sequence_alignment.
junction Junction region nucleotide sequence, which includes the CDR3 and the two
flanking conserved codons. Default is junction.
junction_length
Number of junction nucleotides in the junction sequence. Default is junction_length.
findNovelAlleles 5

germline_min the minimum number of sequences that must have a particular germline allele
call for the allele to be analyzed.
min_seqs minimum number of total sequences (within the desired mutational range and
nucleotide range) required for the samples to be considered.
auto_mutrange if TRUE, the algorithm will attempt to determine the appropriate mutation range
automatically using the mutation count of the most common sequence assigned
to each allele analyzed.
mut_range range of mutations that samples may carry and be considered by the algorithm.
pos_range range of IMGT-numbered positions that should be considered by the algorithm.
pos_range_max Name of the column in data with the ending positions of the V alignment in the
germline (usually v_germline_end). The end of the alignment will be used to
limit the range of positions to be considered to count mutations. With NULL
all positions in the IMGT V region will be considered. In this case, in se-
quences where the V was trimmed on the 3’, mutated nucleotides could include
nucleotides from the CDR3.
y_intercept y-intercept threshold above which positions should be considered potentially
polymorphic.
alpha alpha value used for determining whether the fit y-intercept is greater than the
y_intercept threshold.
j_max maximum fraction of sequences perfectly aligning to a potential novel allele that
are allowed to utilize to a particular combination of junction length and J gene.
The closer to 1, the less strict the filter for junction length and J gene diversity
will be.
min_frac minimum fraction of sequences that must have usable nucleotides in a given
position for that position to considered.
nproc number of processors to use.

Details
The TIgGER allele-finding algorithm, briefly, works as follows: Mutations are determined through
comparison to the provided germline. Mutation frequency at each *position* is determined as
a function of *sequence-wide* mutation counts. Polymorphic positions exhibit a high mutation
frequency despite sequence-wide mutation count. False positive of potential novel alleles resulting
from clonally-related sequences are guarded against by ensuring that sequences perfectly matching
the potential novel allele utilize a wide range of combinations of J gene and junction length.

Value
A data.frame with a row for each known allele analyzed. Besides metadata on the the parameters
used in the search, each row will have either a note as to where the polymorphism-finding algo-
rithm exited or a nucleotide sequence for the predicted novel allele, along with columns providing
additional evidence.
The output contains the following columns:
• germline_call: The input (uncorrected) V call.
• note: Comments regarding the inferrence.
6 findNovelAlleles

• polymorphism_call: The novel allele call.

• nt_substitutions: Mutations identified in the novel allele, relative to the reference germline
(germline_call)
• novel_imgt: The novel allele sequence.
• novel_imgt_count: The number of times the sequence novel_imgt is found in the input
data. Considers the subsequence of novel_imgt in the pos_range.
• novel_imgt_unique_j: Number of distinct J calls associated to novel_imgt in the input
data. Considers the subsequence of novel_imgt in the pos_range.
• novel_imgt_unique_cdr3: Number of distinct CDR3 sequences associated with novel_imgt
in the input data. Considers the subsequence of novel_imgt in the pos_range.
• perfect_match_count: Final number of sequences retained to call the new allele. These
are unique sequences that have V segments that perfectly match the predicted germline in the
pos_range.
• perfect_match_freq: perfect_match_count / germline_call_count
• germline_call_count: The number of sequences with the germline_call in the input data
that were initially considered for the analysis.
• germline_call_freq: The fraction of sequences with the germline_call in the input data
initially considered for the analysis.
• germline_imgt: Germline sequence for germline_call.
• germline_imgt_count: The number of times the germline_imgt sequence is found in the
input data.
• mut_min: Minimum mutation considered by the algorithm.
• mut_max: Maximum mutation considered by the algorithm.
• mut_pass_count: Number of sequences in the mutation range.
• pos_min: First position of the sequence considered by the algorithm (IMGT numbering).
• pos_max: Last position of the sequence considered by the algorithm (IMGT numbering).
• y_intercept: The y-intercept above which positions were considered potentially polymor-
phic.
• y_intercept_pass: Number of positions that pass the y_intercept threshold.
• snp_pass: Number of sequences that pass the y_intercept threshold and are within the
desired nucleotide range (min_seqs).
• unmutated_count: Number of unmutated sequences.
• unmutated_freq: Number of unmutated sequences over germline_imgt_count.
• unmutated_snp_j_gene_length_count: Number of distinct combinations of SNP, J gene,
and junction length.
• snp_min_seqs_j_max_pass: Number of SNPs that pass both the min_seqs and j_max thresh-
olds.
• alpha: Significance threshold to be used when constructing the confidence interval for the
y-intercept.
• min_seqs: Input min_seqs. The minimum number of total sequences (within the desired
mutational range and nucleotide range) required for the samples to be considered.
findNovelAlleles 7

• j_max: Input j_max. The maximum fraction of sequences perfectly aligning to a potential
novel allele that are allowed to utilize to a particular combination of junction length and J
gene.
• min_frac: Input min_frac. The minimum fraction of sequences that must have usable nu-
cleotides in a given position for that position to be considered.

The following comments can appear in the note column:

• Novel allele found: A novel allele was detected.

• Same as:: The same novel allele sequence has been identified multiple times. If this happens,
the function will also throw the message ’Duplicated polymorphism(s) found’.
• Plurality sequence too rare: No sequence is frequent enough to pass the J test (j_max).
• A J-junction combination is too prevalent: Not enough J diversity (j_max).
• No positions pass y-intercept test: No positions above y_intercept.
• Insufficient sequences in desired mutational range: mut_range and pos_range.
• Not enough sequences: Not enough sequences in the desired mutational range and nucleotide
range (min_seqs).
• No unmutated versions of novel allele found: All observed variants of the allele are mutated.

See Also

selectNovel to filter the results to show only novel alleles. plotNovel to visualize the data sup-
porting any novel alleles hypothesized to be present in the data and inferGenotype and inferGeno-
typeBayesian to determine if the novel alleles are frequent enought to be included in the subject’s
genotype.

Examples

# Note: In this example, with SampleGermlineIGHV,

# which contains reference germlines retrieved on August 2014,
# TIgGER finds the allele IGHV1-8*02_G234T. This allele
# was added to IMGT as IGHV1-8*03 on March 28, 2018.

# Find novel alleles and return relevant data

novel <- findNovelAlleles(AIRRDb, SampleGermlineIGHV)
selectNovel(novel)
8 generateEvidence

findUnmutatedCalls Determine which calls represent an unmutated allele

Description

findUnmutatedCalls determines which allele calls would represent a perfect match with the germline
sequence, given a vector of allele calls and mutation counts. In the case of multiple alleles being
assigned to a sequence, only the subset that would represent a perfect match is returned.

Usage

findUnmutatedCalls(allele_calls, sample_seqs, germline_db)

Arguments

allele_calls vector of strings respresenting Ig allele calls, where multiple calls are separated
by a comma.
sample_seqs V(D)J-rearranged sample sequences matching the order of the given allele_calls.
germline_db vector of named nucleotide germline sequences

Value

A vector of strings containing the members of allele_calls that represent unmutated sequences.

Examples
# Find which of the sample alleles are unmutated
calls <- findUnmutatedCalls(AIRRDb$v_call, AIRRDb$sequence_alignment,
germline_db=SampleGermlineIGHV)

generateEvidence Generate evidence

Description

generateEvidence builds a table of evidence metrics for the final novel V allele detection and
genotyping inferrences.
generateEvidence 9

Usage
generateEvidence(
data,
novel,
genotype,
genotype_db,
germline_db,
j_call = "j_call",
junction = "junction",
fields = NULL
)

Arguments
data a data.frame containing sequence data that has been passed through reassig-
nAlleles to correct the allele assignments.
novel the data.frame returned by findNovelAlleles.
genotype the data.frame of alleles generated with inferGenotype denoting the genotype
of the subject.
genotype_db a vector of named nucleotide germline sequences in the genotype. Returned by
genotypeFasta.
germline_db the original uncorrected germline database used to by findNovelAlleles to iden-
tify novel alleles.
j_call name of the column in data with J allele calls. Default is j_call.
junction Junction region nucleotide sequence, which includes the CDR3 and the two
flanking conserved codons. Default is junction.
fields character vector of column names used to split the data to identify novel alleles,
if any. If NULL then the data is not divided by grouping variables.

Value
Returns the genotype input data.frame with the following additional columns providing support-
ing evidence for each inferred allele:

• field_id: Data subset identifier, defined with the input paramter fields.
• A variable number of columns, specified with the input parameter fields.
• polymorphism_call: The novel allele call.
• novel_imgt: The novel allele sequence.
• closest_reference: The closest reference gene and allele in the germline_db database.
• closest_reference_imgt: Sequence of the closest reference gene and allele in the germline_db
database.
• germline_call: The input (uncorrected) V call.
• germline_imgt: Germline sequence for germline_call.
10 generateEvidence

• nt_diff: Number of nucleotides that differ between the new allele and the closest reference
(closest_reference) in the germline_db database.
• nt_substitutions: A comma separated list of specific nucleotide differences (e.g. 112G>A)
in the novel allele.
• aa_diff: Number of amino acids that differ between the new allele and the closest reference
(closest_reference) in the germline_db database.
• aa_substitutions: A comma separated list with specific amino acid differences (e.g. 96A>N)
in the novel allele.
• sequences: Number of sequences unambiguosly assigned to this allele.
• unmutated_sequences: Number of records with the unmutated novel allele sequence.
• unmutated_frequency: Proportion of records with the unmutated novel allele sequence (unmutated_sequences
/ sequences).
• allelic_percentage: Percentage at which the (unmutated) allele is observed in the sequence
dataset compared to other (unmutated) alleles.
• unique_js: Number of unique J sequences found associated with the novel allele. The se-
quences are those who have been unambiguously assigned to the novel allelle (polymorphism_call).
• unique_cdr3s: Number of unique CDR3s associated with the inferred allele. The sequences
are those who have been unambiguously assigned to the novel allelle (polymorphism_call).
• mut_min: Minimum mutation considered by the algorithm.
• mut_max: Maximum mutation considered by the algorithm.
• pos_min: First position of the sequence considered by the algorithm (IMGT numbering).
• pos_max: Last position of the sequence considered by the algorithm (IMGT numbering).
• y_intercept: The y-intercept above which positions were considered potentially polymor-
phic.
• alpha: Significance threshold to be used when constructing the confidence interval for the
y-intercept.
• min_seqs: Input min_seqs. The minimum number of total sequences (within the desired
mutational range and nucleotide range) required for the samples to be considered.
• j_max: Input j_max. The maximum fraction of sequences perfectly aligning to a potential
novel allele that are allowed to utilize to a particular combination of junction length and J
gene.
• min_frac: Input min_frac. The minimum fraction of sequences that must have usable nu-
cleotides in a given position for that position to be considered.
• note: Comments regarding the novel allele inferrence.

See Also

See findNovelAlleles, inferGenotype and genotypeFasta for generating the required input.
genotypeFasta 11

Examples

# Generate input data

novel <- findNovelAlleles(AIRRDb, SampleGermlineIGHV,
v_call="v_call", j_call="j_call", junction="junction",
junction_length="junction_length", seq="sequence_alignment")
genotype <- inferGenotype(AIRRDb, find_unmutated=TRUE,
germline_db=SampleGermlineIGHV,
novel=novel,
v_call="v_call", seq="sequence_alignment")
genotype_db <- genotypeFasta(genotype, SampleGermlineIGHV, novel)
data_db <- reassignAlleles(AIRRDb, genotype_db,
v_call="v_call", seq="sequence_alignment")

# Assemble evidence table

evidence <- generateEvidence(data_db, novel, genotype,
genotype_db, SampleGermlineIGHV,
j_call = "j_call",
junction = "junction")

genotypeFasta Return the nucleotide sequences of a genotype

Description
genotypeFasta converts a genotype table into a vector of nucleotide sequences.

Usage
genotypeFasta(genotype, germline_db, novel = NA)

Arguments
genotype data.frame of alleles denoting a genotype, as returned by inferGenotype.
germline_db vector of named nucleotide germline sequences matching the alleles detailed in
genotype.
novel an optional data.frame containing putative novel alleeles of the type returned
by findNovelAlleles.

Value
A named vector of strings containing the germline nucleotide sequences of the alleles in the pro-
vided genotype.

See Also
inferGenotype
12 getMutatedPositions

Examples
# Find the sequences that correspond to the genotype
genotype_db <- genotypeFasta(SampleGenotype, SampleGermlineIGHV, SampleNovel)

GermlineIGHV Human IGHV germlines

Description
A character vector of all human IGHV germline gene segment alleles in IMGT/GENE-DB (2019-
06-01, 372 alleles). See IMGT data updates: https://www.imgt.org/IMGTgenedbdoc/dataupdates.html.

Format
Values correspond to IMGT-gaped nuceltoide sequences (with nucleotides capitalized and gaps
represented by ".") while names correspond to stripped-down IMGT allele names (e.g. "IGHV1-
18*01").

References
1. Xochelli, et al. (2014) Immunoglobulin heavy variable (IGHV) genes and alleles: new en-
tities, new names and implications for research and prognostication in chronic lymphocytic
leukaemia. Immunogenetics. 67(1):61-6.

getMutatedPositions Find the location of mutations in a sequence

Description
getMutatedPositions takes two vectors of aligned sequences and compares pairs of sequences. It
returns a list of the nucleotide positions of any differences.

Usage
getMutatedPositions(
samples,
germlines,
ignored_regex = "[\\.N-]",
match_instead = FALSE
)
getMutCount 13

Arguments
samples vector of strings respresenting aligned sequences
germlines vector of strings respresenting aligned sequences to which samples will be com-
pared. If only one string is submitted, it will be used for all samples.
ignored_regex regular expression indicating what characters should be ignored (such as gaps
and N nucleotides).
match_instead if TRUE, the function returns the positions that are the same instead of those that
are different.

Value
A list of the nucleotide positions of any differences between the input vectors.

Examples
# Create strings to act as a sample sequences and a reference sequence
seqs <- c("----GATA", "GAGAGAGA", "TANA")
ref <- "GATAGATA"

# Find the differences between the two

getMutatedPositions(seqs, ref)

getMutCount Determine the mutation counts from allele calls

Description
getMutCount takes a set of nucleotide sequences and their allele calls and determines the distance
between that seqeunce and any germline alleles contained within the call

Usage
getMutCount(samples, allele_calls, germline_db)

Arguments
samples vector of IMGT-gapped sample V sequences
allele_calls vector of strings respresenting Ig allele calls for the sequences in samples,
where multiple calls are separated by a comma
germline_db vector of named nucleotide germline sequences matching the calls detailed in
allele_calls

Value
A list equal in length to samples, containing the Hamming distance to each germline allele con-
tained within each call within each element of samples
14 getPopularMutationCount

Examples
# Insert a mutation into a germline sequence
s2 <- s3 <- SampleGermlineIGHV[1]
stringi::stri_sub(s2, 103, 103) <- "G"
stringi::stri_sub(s3, 107, 107) <- "C"

sample_seqs <- c(SampleGermlineIGHV[2], s2, s3)

# Pretend that one sample sequence has received an ambiguous allele call
sample_alleles <- c(paste(names(SampleGermlineIGHV[1:2]), collapse=","),
names(SampleGermlineIGHV[2]),
names(SampleGermlineIGHV[1]))

# Compare each sequence to its assigned germline(s) to determine the distance

getMutCount(sample_seqs, sample_alleles, SampleGermlineIGHV)

getPopularMutationCount
Find mutation counts for frequency sequences

Description
getPopularMutationCount determines which sequences occur frequently for each V gene and
returns the mutation count of those sequences.

Usage
getPopularMutationCount(
data,
germline_db,
v_call = "v_call",
seq = "sequence_alignment",
gene_min = 0.001,
seq_min = 50,
seq_p_of_max = 1/8,
full_return = FALSE
)

Arguments
data data.frame in the Change-O format. See findNovelAlleles for a list of required
columns.
germline_db named list of IMGT-gapped germline sequences.
v_call name of the column in data with V allele calls. Default is v_call.
seq name of the column in data with the aligned, IMGT-numbered, V(D)J nu-
cleotide sequence. Default is sequence_alignment.
inferGenotype 15

gene_min portion of all unique sequences a gene must constitute to avoid exclusion.
seq_min number of copies of the V that must be present for to avoid exclusion.
seq_p_of_max ror each gene, the fraction of the most common V sequence count that a se-
quence must meet to avoid exclusion.
full_return if TRUE, will return all data columns and will include sequences with mutation
count < 1.

Value
A data frame of genes that have a frequent sequence mutation count above 1.

See Also
getMutatedPositions can be used to find which positions of a set of sequences are mutated.

Examples
getPopularMutationCount(AIRRDb, SampleGermlineIGHV)

inferGenotype Infer a subject-specific genotype using a frequency method

Description
inferGenotype infers an subject’s genotype using a frequency method. The genotype is inferred
by finding the minimum number set of alleles that can explain the majority of each gene’s calls.
The most common allele of each gene is included in the genotype first, and the next most common
allele is added until the desired fraction of alleles can be explained. In this way, mistaken allele
calls (resulting from sequences which by chance have been mutated to look like another allele) can
be removed.

Usage
inferGenotype(
data,
germline_db = NA,
novel = NA,
v_call = "v_call",
seq = "sequence_alignment",
fraction_to_explain = 0.875,
gene_cutoff = 1e-04,
find_unmutated = TRUE
)
16 inferGenotype

Arguments

data data.frame containing V allele calls from a single subject.

germline_db named vector of sequences containing the germline sequences named in allele_calls.
Only required if find_unmutated is TRUE.
novel optional data.frame of the type novel returned by findNovelAlleles containing
germline sequences that will be utilized if find_unmutated is TRUE. See Details.
v_call column in data with V allele calls. Default is "v_call".
seq name of the column in data with the aligned, IMGT-numbered, V(D)J nu-
cleotide sequence. Default is sequence_alignment.
fraction_to_explain
the portion of each gene that must be explained by the alleles that will be in-
cluded in the genotype.
gene_cutoff either a number of sequences or a fraction of the length of allele_calls denot-
ing the minimum number of times a gene must be observed in allele_calls to
be included in the genotype.
find_unmutated if TRUE, use germline_db to find which samples are unmutated. Not needed if
allele_calls only represent unmutated samples.

Details

Allele calls representing cases where multiple alleles have been assigned to a single sample se-
quence are rare among unmutated sequences but may result if nucleotides for certain positions are
not available. Calls containing multiple alleles are treated as belonging to all groups. If novel is
provided, all sequences that are assigned to the same starting allele as any novel germline allele
will have the novel germline allele appended to their assignent prior to searching for unmutated
sequences.

Value

A data.frame of alleles denoting the genotype of the subject containing the following columns:

• gene: The gene name without allele.

Note

This method works best with data derived from blood, where a large portion of sequences are
expected to be unmutated. Ideally, there should be hundreds of allele calls per gene in the input.
inferGenotypeBayesian 17

See Also
plotGenotype for a colorful visualization and genotypeFasta to convert the genotype to nucleotide
sequences. See inferGenotypeBayesian to infer a subject-specific genotype using a Bayesian ap-
proach.

Examples
# Infer IGHV genotype, using only unmutated sequences, including novel alleles
inferGenotype(AIRRDb, germline_db=SampleGermlineIGHV, novel=SampleNovel,
find_unmutated=TRUE)

inferGenotypeBayesian Infer a subject-specific genotype using a Bayesian approach

Description
inferGenotypeBayesian infers an subject’s genotype by applying a Bayesian framework with
a Dirichlet prior for the multinomial distribution. Up to four distinct alleles are allowed in an
individual’s genotype. Four likelihood distributions were generated by empirically fitting three
high coverage genotypes from three individuals (Laserson and Vigneault et al, 2014). A posterior
probability is calculated for the four most common alleles. The certainty of the highest probability
model was calculated using a Bayes factor (the most likely model divided by second-most likely
model). The larger the Bayes factor (K), the greater the certainty in the model.

Usage
inferGenotypeBayesian(
data,
germline_db = NA,
novel = NA,
v_call = "v_call",
seq = "sequence_alignment",
find_unmutated = TRUE,
priors = c(0.6, 0.4, 0.4, 0.35, 0.25, 0.25, 0.25, 0.25, 0.25)
)

Arguments
data a data.frame containing V allele calls from a single subject. If find_unmutated
is TRUE, then the sample IMGT-gapped V(D)J sequence should be provided in
column sequence_alignment
germline_db named vector of sequences containing the germline sequences named in allele_calls.
Only required if find_unmutated is TRUE.
novel an optional data.frame of the type novel returned by findNovelAlleles con-
taining germline sequences that will be utilized if find_unmutated is TRUE. See
Details.
18 inferGenotypeBayesian

v_call column in data with V allele calls. Default is "v_call".

seq name of the column in data with the aligned, IMGT-numbered, V(D)J nu-
cleotide sequence. Default is "sequence_alignment".
find_unmutated if TRUE, use germline_db to find which samples are unmutated. Not needed if
allele_calls only represent unmutated samples.
priors a numeric vector of priors for the multinomial distribution. The priors vector
must be nine values that defined the priors for the heterozygous (two allele),
trizygous (three allele), and quadrozygous (four allele) distributions. The first
two values of priors define the prior for the heterozygous case, the next three
values are for the trizygous case, and the final four values are for the quadrozy-
gous case. Each set of priors should sum to one. Note, each distribution prior
is actually defined internally by set of four numbers, with the unspecified final
values assigned to 0; e.g., the heterozygous case is c(priors[1], priors[2],
0, 0). The prior for the homozygous distribution is fixed at c(1, 0, 0, 0).

Details
Allele calls representing cases where multiple alleles have been assigned to a single sample se-
quence are rare among unmutated sequences but may result if nucleotides for certain positions are
not available. Calls containing multiple alleles are treated as belonging to all groups. If novel is
provided, all sequences that are assigned to the same starting allele as any novel germline allele
will have the novel germline allele appended to their assignent prior to searching for unmutated
sequences.

Value
A data.frame of alleles denoting the genotype of the subject with the log10 of the likelihood of
each model and the log10 of the Bayes factor. The output contains the following columns:

• gene: The gene name without allele.

• alleles: Comma separated list of alleles for the given gene.
• counts: Comma separated list of observed sequences for each corresponding allele in the
alleles list.
• total: The total count of observed sequences for the given gene.
• note: Any comments on the inferrence.
• kh: log10 likelihood that the gene is homozygous.
• kd: log10 likelihood that the gene is heterozygous.
• kt: log10 likelihood that the gene is trizygous
• kq: log10 likelihood that the gene is quadrozygous.
• k_diff: log10 ratio of the highest to second-highest zygosity likelihoods.

Note
This method works best with data derived from blood, where a large portion of sequences are
expected to be unmutated. Ideally, there should be hundreds of allele calls per gene in the input.
insertPolymorphisms 19

References

1. Laserson U and Vigneault F, et al. High-resolution antibody dynamics of vaccine-induced

immune responses. PNAS. 2014 111(13):4928-33.

See Also

plotGenotype for a colorful visualization and genotypeFasta to convert the genotype to nucleotide
sequences. See inferGenotype to infer a subject-specific genotype using a frequency method

Examples
# Infer IGHV genotype, using only unmutated sequences, including novel alleles
inferGenotypeBayesian(AIRRDb, germline_db=SampleGermlineIGHV, novel=SampleNovel,
find_unmutated=TRUE, v_call="v_call", seq="sequence_alignment")

insertPolymorphisms Insert polymorphisms into a nucleotide sequence

Description

insertPolymorphisms replaces nucleotides in the desired locations of a provided sequence.

Usage

insertPolymorphisms(sequence, positions, nucleotides)

Arguments

sequence starting nucletide sequence.

positions numeric vector of positions which to be changed.
nucleotides character vector of nucletides to which to change the positions.

Value

A sequence with the desired nucleotides in the provided locations.

Examples
insertPolymorphisms("HUGGED", c(1, 6, 2), c("T", "R", "I"))
20 plotGenotype

plotGenotype Show a colorful representation of a genotype

Description
plotGenotype plots a genotype table.

Usage
plotGenotype(
genotype,
facet_by = NULL,
gene_sort = c("name", "position"),
text_size = 12,
silent = FALSE,
...
)

Arguments
genotype data.frame of alleles denoting a genotype, as returned by inferGenotype.
facet_by column name in genotype to facet the plot by. if NULL, then do not facet the
plot.
gene_sort string defining the method to use when sorting alleles. if "name" then sort in
lexicographic order. If "position" then sort by position in the locus, as deter-
mined by the final two numbers in the gene name.
text_size point size of the plotted text.
silent if TRUE do not draw the plot and just return the ggplot object; if FALSE draw the
plot.
... additional arguments to pass to ggplot2::theme.

Value
A ggplot object defining the plot.

genotype_a$SUBJECT <- "A"

genotype_b$SUBJECT <- "B"
geno_sub <- rbind(genotype_a, genotype_b)
plotGenotype(geno_sub, facet_by="SUBJECT", gene_sort="pos")

plotNovel Visualize evidence of novel V alleles

Description
plotNovel is be used to visualize the evidence of any novel V alleles found using findNovelAlleles.
It can also be used to visualize the results for alleles that did

Usage
plotNovel(
data,
novel_row,
v_call = "v_call",
j_call = "j_call",
seq = "sequence_alignment",
junction = "junction",
junction_length = "junction_length",
pos_range_max = NULL,
ncol = 1,
multiplot = TRUE
)

Arguments
data data.frame containing repertoire data. See findNovelAlleles for details.
novel_row single row from a data frame as output by findNovelAlleles that contains a
polymorphism-containing germline allele.
v_call name of the column in data with V allele calls. Default is v_call.
j_call name of the column in data with J allele calls. Default is j_call.
seq name of the column in data with the aligned, IMGT-numbered, V(D)J nu-
cleotide sequence. Default is sequence_alignment.
junction Junction region nucleotide sequence, which includes the CDR3 and the two
flanking conserved codons. Default is junction.
junction_length
number of junction nucleotides in the junction sequence. Default is junction_length.
pos_range_max Name of the column in data with the ending positions of the V alignment in the
germline (usually v_germline_end).
ncol number of columns to use when laying out the plots.
multiplot whether to return one single plot (TRUE) or a list with the three individual plots
(FALSE).
22 readIgFasta

Details
The first panel in the plot shows, for all sequences which align to a particular germline allele,
the mutation frequency at each postion along the aligned sequence as a function of the sequence-
wide mutation count. Each line is a position. Positions that contain polymorphisms (rather than
somatic hypermutations) will exhibit a high apparent mutation frequency for a range of sequence-
wide mutation counts. The positions are color coded as follows:

• red: the position(s) pass(ess) the novel allele test

• yellow: the position(s) pass(ess) the y-intercept test but not other tests
• blue: the position(s) didn’t pass the y-intercept test and was(were) not further considered

The second panel shows the nucleotide usage at each of the polymorphic positions as a function
of sequence-wide mutation count. If no polymorphisms were identified, the panel will show the
mutation count.
To avoid cases where a clonal expansion might lead to a false positive, TIgGER examines the
combinations of J gene and junction length among sequences which perfectly match the proposed
germline allele. Clonally related sequences usually share the same V gene, J gene and junction
length. Requiring the novel allele to be found in different combinations of J gene and junction
lengths is a proxy for requiring it to be found in different clonal lineages.

Examples
# Plot the evidence for the first (and only) novel allele in the example data
novel <- selectNovel(SampleNovel)
plotNovel(AIRRDb, novel[1, ], v_call="v_call", j_call="j_call",
seq="sequence_alignment", junction="junction", junction_length="junction_length",
multiplot=TRUE)

readIgFasta Read immunoglobulin sequences

Description
readIgFasta reads a fasta-formatted file of immunoglobulin (Ig) sequences and returns a named
vector of those sequences.

Usage
readIgFasta(fasta_file, strip_down_name = TRUE, force_caps = TRUE)

Arguments
fasta_file fasta-formatted file of immunoglobuling sequences.
strip_down_name
if TRUE, will extract only the allele name from the strings fasta file’s sequence
names.
force_caps if TRUE, will force nucleotides to uppercase.
reassignAlleles 23

Value
Named vector of strings respresenting Ig alleles.

reassignAlleles Correct allele calls based on a personalized genotype

Description
reassignAlleles uses a subject-specific genotype to correct correct preliminary allele assignments
of a set of sequences derived from a single subject.

Usage
reassignAlleles(
data,
genotype_db,
v_call = "v_call",
seq = "sequence_alignment",
method = "hamming",
path = NA,
keep_gene = c("gene", "family", "repertoire")
)

Arguments
data data.frame containing V allele calls from a single subject and the sample
IMGT-gapped V(D)J sequences under seq.
genotype_db vector of named nucleotide germline sequences matching the calls detailed in
allele_calls and personalized to the subject
v_call name of the column in data with V allele calls. Default is v_call.
seq name of the column in data with the aligned, IMGT-numbered, V(D)J nu-
cleotide sequence. Default is SEQUENCE_IMGT
method method to use when realigning sequences to the genotype_db sequences. Cur-
rently, only "hammming" (for Hamming distance) is implemented.
24 SampleDb

path directory containing the tool used in the realignment method, if needed. Ham-
ming distance does not require a path to a tool.
keep_gene string indicating if the gene ("gene"), family ("family") or complete repertoire
("repertoire") assignments should be performed. Use of "gene" increases
speed by minimizing required number of alignments, as gene level assignments
will be maintained when possible.

Details
In order to save time, initial gene assignments are preserved and the allele calls are chosen from
among those provided in genotype_db, based on a simple alignment to the sample sequence.

Value
A modifed input data.frame containing the best allele call from among the sequences listed in
genotype_db in the v_call_genotyped column.

Examples
# Extract the database sequences that correspond to the genotype
genotype_db <- genotypeFasta(SampleGenotype, SampleGermlineIGHV, novel=SampleNovel)

# Use the personlized genotype to determine corrected allele assignments

output_db <- reassignAlleles(AIRRDb, genotype_db, v_call="v_call",
seq="sequence_alignment")

SampleDb Example human immune repertoire data

Format
A data.frame where rows correspond to unique V(D)J sequences and columns include:

• "SEQUENCE_IMGT": IMGT-gapped V(D)J nucleotide sequence.

• "V_CALL": IMGT/HighV-QUEST V segment allele calls.
• "D_CALL": IMGT/HighV-QUEST D segment allele calls.
• "J_CALL": IMGT/HighV-QUEST J segment allele calls.
• "JUNCTION_LENGTH": Junction region length.
SampleGenotype 25

See Also
See AIRRDb for an AIRR formatted version of SampleDb.

SampleGenotype Example genotype inferrence results

Description
A data.frame of genotype inference results from inferGenotype after novel allele detection via
findNovelAlleles. Source data was a collection of V(D)J immunoglobulin sequences derived from
a single individual (PGP1), sequenced on the Roche 454 platform, and assigned by IMGT/HighV-
QUEST to IGHV1 family alleles.

Format
A data.frame where rows correspond to genes carried by an individual and columns lists the alleles
of those genes and their counts.

See Also
See inferGenotype for detailed column descriptions.

SampleGermlineIGHV Example Human IGHV germlines

Description
A character vector of all 344 human IGHV germline gene segment alleles in IMGT/GENE-DB
release 2014-08-4.

Format
Values correspond to IMGT-gaped nuceltoide sequences (with nucleotides capitalized and gaps
represented by ".") while names correspond to stripped-down IMGT allele names (e.g. "IGHV1-
18*01").
26 selectNovel

SampleNovel Example novel allele detection results

Description
A data.frame of novel allele detection results from findNovelAlleles. Source data was a collection
of V(D)J immunoglobulin sequences derived from a single individual (PGP1), sequenced on the
Roche 454 platform, and assigned by IMGT/HighV-QUEST to IGHV1 family alleles.

Format
A data.frame where rows correspond to alleles checked for polymorphisms and columns give
results as well as paramaters used to run the test.

See Also
See findNovelAlleles for detailed column descriptions.

selectNovel Select rows containing novel alleles

Description
selectNovel takes the result from findNovelAlleles and selects only the rows containing unique,
novel alleles.

Usage
selectNovel(novel, keep_alleles = FALSE)

Arguments
novel data.frame of the type returned by findNovelAlleles.
keep_alleles logical indicating if different alleles leading to the same novel sequence should
be kept. See Details.
sortAlleles 27

Details
If, for instance, subject has in his genome IGHV1-2*02 and a novel allele equally close to IGHV1-2*02
and IGHV1-2*05, the novel allele may be detected by analyzing sequences that best align to either
of these alleles. If keep_alleles is TRUE, both polymorphic allele calls will be retained. In the
case that multiple mutation ranges are checked for the same allele, only one mutation range will be
kept in the output.

Value
A data.frame containing only unique, novel alleles (if any) that were in the input.

Examples
novel <- selectNovel(SampleNovel)

sortAlleles Sort allele names

Description
sortAlleles returns a sorted vector of strings respresenting Ig allele names. Names are first sorted
by gene family, then by gene, then by allele. Duplicated genes have their alleles are sorted as if they
were part of their non-duplicated counterparts (e.g. IGHV1-69D*01 comes after IGHV1-69*01 but
before IGHV1-69*02), and non-localized genes (e.g. IGHV1-NL1*01) come last within their gene
family.

Usage
sortAlleles(allele_calls, method = c("name", "position"))

Arguments
allele_calls vector of strings respresenting Ig allele names.
method string defining the method to use when sorting alleles. If "name" then sort in
lexicographic order. If "position" then sort by position in the locus, as deter-
mined by the final two numbers in the gene name.

Value
A sorted vector of strings respresenting Ig allele names.

See Also
Like sortAlleles, updateAlleleNames can help format a list of allele names.
28 subsampleDb

Examples
# Create a list of allele names
alleles <- c("IGHV1-69D*01","IGHV1-69*01","IGHV1-2*01","IGHV1-69-2*01",
"IGHV2-5*01","IGHV1-NL1*01", "IGHV1-2*01,IGHV1-2*05",
"IGHV1-2", "IGHV1-2*02", "IGHV1-69*02")

# Sort the alleles by name

sortAlleles(alleles)

# Sort the alleles by position in the locus

sortAlleles(alleles, method="pos")

subsampleDb Subsample repertoire

Description
subsampleDb will sample the same number of sequences for each gene, family or allele (specified
with mode) in data. Samples or subjects can be subsampled indepently by setting group.

Usage
subsampleDb(
data,
gene = "v_call",
mode = c("gene", "allele", "family"),
min_n = 1,
max_n = NULL,
group = NULL
)

Arguments
data data.frame containing repertoire data.
gene name of the column in data with allele calls. Default is v_call.
mode one of c("gene", "family", "allele") defining the degree of specificity re-
garding allele calls when subsetting sequences. Determines how data will be
split into subsets from which the same number of sequences will be subsampled.
See also group.
min_n minimum number of observations to sample from each groupe. A group with
less observations than the minimum is excluded.
max_n maximum number of observations to sample for all mode groups. If NULL, it will
be set automatically to the size of the smallest group. If max_n is larger than
the availabe number of sequences for any mode group, if will be automatically
adjusted and the efective max_n used will be the size of the smallest mode group.
tigger 29

group columns containing additional grouping variables, e.g. sample_id. These groups
will be subsampled independently. If max_n is NULL, a max_n will be automati-
cally set for each group.

Details

data will be split into gene, allele or family subsets (mode) from which the same number of se-
quences will be subsampled. If mode=gene, for each gene in the field gene from data, a maximum
of max_n sequences will be subsampled. Input sequences that have multiple gene calls (ties), can be
subsampled from any of their calls, but these duplicated samplings will be removed, and the final
subsampled data will contain unique rows.

Value

Subsampled version of the input data.

See Also

selectNovel

Examples
subsampleDb(AIRRDb)

tigger tigger

Description

Here we provide a Tool for Immunoglobulin Genotype Elucidation via Rep-Seq (TIgGER). TIg-
GER inferrs the set of Ig alleles carried by an individual (including any novel alleles) and then uses
this set of alleles to correct the initial assignments given to sample sequences by existing tools.

Details

Immunoglobulin repertoire sequencing (AIRR-Seq, Rep-Seq) data is currently the subject of much
study. A key step in analyzing these data involves assigning the closest known V(D)J germline
alleles to the (often somatically mutated) sample sequences using a tool such as IMGT/HighV-
QUEST. However, if the sample utilizes alleles not in the germline database used for alignment,
this step will fail. Additionally, this alignment has an associated error rate of ~5 mutations. The
purpose of TIgGER is to address these issues.
30 updateAlleleNames

Allele detection and genotyping

• findNovelAlleles: Detect novel alleles.
• plotNovel: Plot evidence of novel alleles.
• inferGenotype: Infer an Ig genotype using a frequency approach.
• inferGenotypeBayesian: Infer an Ig genotype using a Bayesian approach.
• plotGenotype: A colorful genotype visualization.
• genotypeFasta: Convert a genotype to sequences.
• reassignAlleles: Correct allele calls.
• generateEvidence: Generate evidence for the genotype and allele detection inferrence.

Mutation handling
• getMutatedPositions: Find mutation locations.
• getMutCount: Find distance from germline.
• findUnmutatedCalls: Subset unmutated sequences.
• getPopularMutationCount: Find most common sequence’s mutation count.
• insertPolymorphisms: Insert SNPs into a sequence.

Input, output and formatting

• readIgFasta: Read a fasta file of Ig sequences.
• updateAlleleNames: Correct outdated allele names.
• sortAlleles: Sort allele names intelligently.
• cleanSeqs: Standardize sequence format.

updateAlleleNames Update IGHV allele names

Description
updateAlleleNames takes a set of IGHV allele calls and replaces any outdated names (e.g. IGHV1-
f) with the new IMGT names.

Usage
updateAlleleNames(allele_calls)
writeFasta 31

Arguments

allele_calls vector of strings respresenting IGHV allele names.

Value

Vector of strings respresenting updated IGHV allele names.

Note

IGMT has removed IGHV2-5*10 and IGHV2-5*07 as it has determined they are actually alleles 02
and 04, respectively. The updated allele names are based on IMGT release 2014-08-4.

References

1. Xochelli et al. (2014) Immunoglobulin heavy variable (IGHV) genes and alleles: new en-
tities, new names and implications for research and prognostication in chronic lymphocytic
leukaemia. Immunogenetics. 67(1):61-6

See Also

Like updateAlleleNames, sortAlleles can help format a list of allele names.

Examples
# Create a vector that uses old gene/allele names.
alleles <- c("IGHV1-c*01", "IGHV1-f*02", "IGHV2-5*07")

# Update the alleles to the new names

updateAlleleNames(alleles)

writeFasta Write to a fasta file

Description

writeFasta writes a named vector of sequences to a file in fasta format.

Usage

writeFasta(named_sequences, file, width = 60, append = FALSE)

32 writeFasta

Arguments
named_sequences
vector of named string representing sequences
file the name of the output file.
width the number of characters to be printed per line. if not between 1 and 255, width
with be infinite.
append logical indicating if the output should be appended to file instead of over-
writing it

Value
A named vector of strings respresenting Ig alleles.

∗ data updateAlleleNames, 3, 27, 30, 30

AIRRDb, 2
GermlineIGHV, 12 writeFasta, 23, 31
SampleDb, 24
SampleGenotype, 25
SampleGermlineIGHV, 25
SampleNovel, 26

AIRRDb, 2, 25

cleanSeqs, 3, 30

findNovelAlleles, 4, 9–11, 14, 16, 17, 21,

25, 26, 30
findUnmutatedCalls, 8, 30

generateEvidence, 8, 30
genotypeFasta, 9, 10, 11, 17, 19, 30
GermlineIGHV, 12
getMutatedPositions, 12, 15, 30
getMutCount, 13, 30
getPopularMutationCount, 14, 30

inferGenotype, 7, 9–11, 15, 19, 20, 25, 30

inferGenotypeBayesian, 7, 17, 17, 30
insertPolymorphisms, 19, 30

plotGenotype, 17, 19, 20, 30

plotNovel, 7, 21, 30

readIgFasta, 22, 30, 32

reassignAlleles, 9, 23, 30

SampleDb, 3, 24
SampleGenotype, 25
SampleGermlineIGHV, 25
SampleNovel, 26
selectNovel, 7, 26, 29
sortAlleles, 3, 27, 30, 31
subsampleDb, 28

tigger, 29

Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
Global Performance Requirements For Labels: Worldwide Engineering Standards
100% (1)
Global Performance Requirements For Labels: Worldwide Engineering Standards
17 pages
Audio, Video, and Media in the Ministry
From Everand
Audio, Video, and Media in the Ministry
Clarence Floyd Richmond
No ratings yet
The Encyclopedia of Infant and Toddler Activities, revised
From Everand
The Encyclopedia of Infant and Toddler Activities, revised
Donna Wittmer
4.5/5 (3)
polyRAD
No ratings yet
polyRAD
90 pages
Knit Soxx for Everyone: 25 Colorful Sock Patterns for the Whole Family
From Everand
Knit Soxx for Everyone: 25 Colorful Sock Patterns for the Whole Family
Kerstin Balke
4.5/5 (2)
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
NIHMS753481 Supplement Supplemental Data
No ratings yet
NIHMS753481 Supplement Supplemental Data
124 pages
Pegas
No ratings yet
Pegas
82 pages
Genepop
No ratings yet
Genepop
19 pages
Package Haplo - Stats'
No ratings yet
Package Haplo - Stats'
72 pages
Life Ain't A Dress Rehearsal: Lives in Poetry
From Everand
Life Ain't A Dress Rehearsal: Lives in Poetry
Cecil D. Haas
No ratings yet
Package Genetics': R Topics Documented
No ratings yet
Package Genetics': R Topics Documented
43 pages
ChatGPT CheatSheet: 400 Powerful Examples That Turn You Into a ChatGPT Expert
From Everand
ChatGPT CheatSheet: 400 Powerful Examples That Turn You Into a ChatGPT Expert
Igor Pogany
No ratings yet
Deadline Yemen (The Elizabeth Darcy Series)
From Everand
Deadline Yemen (The Elizabeth Darcy Series)
Peggy Hanson
5/5 (1)
The Future Is Ours: The Collected Science Fiction of Edward D. Hoch
From Everand
The Future Is Ours: The Collected Science Fiction of Edward D. Hoch
Edward D. Hoch
No ratings yet
Deadline Istanbul (The Elizabeth Darcy Series)
From Everand
Deadline Istanbul (The Elizabeth Darcy Series)
Peggy Hanson
5/5 (1)
The Unstuck Book
From Everand
The Unstuck Book
Conor McCarthy
No ratings yet
The Attention Fix: How to Focus in a World That Wants to Distract You
From Everand
The Attention Fix: How to Focus in a World That Wants to Distract You
Anders Hansen
No ratings yet
Time Management Tracker (Printable Version)
From Everand
Time Management Tracker (Printable Version)
Sheba Blake
No ratings yet
Plink Doc 1.07
No ratings yet
Plink Doc 1.07
293 pages
Bimbo Heaven: Stone Angel #7
From Everand
Bimbo Heaven: Stone Angel #7
Marvin H. Albert
No ratings yet
Monthly Productivity Planner (Printable Version)
From Everand
Monthly Productivity Planner (Printable Version)
Sheba Blake
No ratings yet
Hamlet Had an Uncle: A Comedy of Honor
From Everand
Hamlet Had an Uncle: A Comedy of Honor
James Branch Cabell
4.5/5 (7)
Mendel
No ratings yet
Mendel
310 pages
The Gracious Lily Affair
From Everand
The Gracious Lily Affair
Van Wyck Mason
5/5 (1)
Operation Exile
From Everand
Operation Exile
E. Hoffmann Price
3.5/5 (1)
Design, Analysis, and Interpretation of Genome Wide Association Scans Full Digital Edition
100% (5)
Design, Analysis, and Interpretation of Genome Wide Association Scans Full Digital Edition
15 pages
Duenna to a Murder
From Everand
Duenna to a Murder
Rufus King
No ratings yet
Gene Prediction Using Statistical Methods
No ratings yet
Gene Prediction Using Statistical Methods
47 pages
STAR Manual 2.7.3a: Alexander Dobin Dobin@cshl - Edu October 8, 2019
No ratings yet
STAR Manual 2.7.3a: Alexander Dobin Dobin@cshl - Edu October 8, 2019
54 pages
Operation Longlife
From Everand
Operation Longlife
E. Hoffmann Price
3.5/5 (3)
The Last Smile: Stone Angel #5
From Everand
The Last Smile: Stone Angel #5
Marvin H. Albert
No ratings yet
Back in the Real World (Stone Angel #2)
From Everand
Back in the Real World (Stone Angel #2)
Marvin H. Albert
5/5 (1)
Osama the Gun
From Everand
Osama the Gun
Norman Spinrad
5/5 (1)
Genepop
No ratings yet
Genepop
51 pages
A Home in Bloom: Four Enchanted Seasons with Flowers
From Everand
A Home in Bloom: Four Enchanted Seasons with Flowers
Christie Purifoy
No ratings yet
Never Walk Alone
From Everand
Never Walk Alone
Rufus King
No ratings yet
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
1/5 (1)
edgeRUsersGuide PDF
No ratings yet
edgeRUsersGuide PDF
110 pages
Edger Users Guide
No ratings yet
Edger Users Guide
105 pages
Keys to Better Reading
From Everand
Keys to Better Reading
Judy McFall
No ratings yet
Murder in the Willett Family: A Lt. Valcour Mystery
From Everand
Murder in the Willett Family: A Lt. Valcour Mystery
Rufus King
No ratings yet
Edger: Differential Analysis of Sequence Read Count Data User'S Guide
No ratings yet
Edger: Differential Analysis of Sequence Read Count Data User'S Guide
119 pages
Trouble in Tahiti: Blood on the Hibiscus
From Everand
Trouble in Tahiti: Blood on the Hibiscus
Hayford Peirce
No ratings yet
Worked Example BioMart Ensembl Tutorial
No ratings yet
Worked Example BioMart Ensembl Tutorial
11 pages
Edge RUsers Guide
No ratings yet
Edge RUsers Guide
138 pages
Edger: Differential Analysis of Sequence Read Count Data User'S Guide
No ratings yet
Edger: Differential Analysis of Sequence Read Count Data User'S Guide
122 pages
Design, Analysis, and Interpretation of Genome Wide Association Scans ISBN 1461494427, 9781461494423 scribd download
No ratings yet
Design, Analysis, and Interpretation of Genome Wide Association Scans ISBN 1461494427, 9781461494423 scribd download
17 pages
Oneness
From Everand
Oneness
Carmeline Pusateri
No ratings yet
Dada 2
No ratings yet
Dada 2
45 pages
Adegenet Tutorial
No ratings yet
Adegenet Tutorial
63 pages
Ape
No ratings yet
Ape
290 pages
Topic_5_complex_diseases_20121302
No ratings yet
Topic_5_complex_diseases_20121302
67 pages
The Essential Robert Frost: Collected Poems 1913-1923 (Warbler Classics Annotated Edition)
From Everand
The Essential Robert Frost: Collected Poems 1913-1923 (Warbler Classics Annotated Edition)
Robert Frost
No ratings yet
Package Ape': March 2, 2022
No ratings yet
Package Ape': March 2, 2022
296 pages
Package Ape': October 12, 2022
No ratings yet
Package Ape': October 12, 2022
296 pages
Hideous Faces, Beautiful Skulls: Tales of Horror and the Bizarre
From Everand
Hideous Faces, Beautiful Skulls: Tales of Horror and the Bizarre
Mark McLaughlin
5/5 (1)
Gene Identification - I: Shivani Chandra Birla Institute of Scientific Research
No ratings yet
Gene Identification - I: Shivani Chandra Birla Institute of Scientific Research
35 pages
The Deathguards
From Everand
The Deathguards
Phyllis Ann Karr
No ratings yet
Genetic Variation in Mice: Modeling Disease, Pharmacogenetics, and Basic Biology
No ratings yet
Genetic Variation in Mice: Modeling Disease, Pharmacogenetics, and Basic Biology
44 pages
BS en 50363-2-1-2005+a1-2011 - (2023-06-01 - 10-22-38 Am)
100% (2)
BS en 50363-2-1-2005+a1-2011 - (2023-06-01 - 10-22-38 Am)
12 pages
Sikap Positif
No ratings yet
Sikap Positif
23 pages
M C S E P A S S: Aterials For Ryogenic Ervice: Ngineering Roperties of Ustenitic Tainless Teels
No ratings yet
M C S E P A S S: Aterials For Ryogenic Ervice: Ngineering Roperties of Ustenitic Tainless Teels
50 pages
Magnetic Method
No ratings yet
Magnetic Method
16 pages
ATOS Comoso DKE Solenoid Valve
No ratings yet
ATOS Comoso DKE Solenoid Valve
4 pages
2022 Experiment With Google Ads Playbook
No ratings yet
2022 Experiment With Google Ads Playbook
47 pages
Guest Speaker-Mrs. Eman Saleh, Lecturer - The University of Georgia Department of Computer Science-Employability Skills
No ratings yet
Guest Speaker-Mrs. Eman Saleh, Lecturer - The University of Georgia Department of Computer Science-Employability Skills
6 pages
Specification Sheet Solar Module - KM (P) 30
No ratings yet
Specification Sheet Solar Module - KM (P) 30
1 page
Sanyam Modi Report Seminar Final
No ratings yet
Sanyam Modi Report Seminar Final
39 pages
13.PICS & TYPS - EQRisk-report - 13-012 - BuildingtypologiesinGuwahati
No ratings yet
13.PICS & TYPS - EQRisk-report - 13-012 - BuildingtypologiesinGuwahati
21 pages
02 Operating BCS RH340 en 3668170.00
100% (1)
02 Operating BCS RH340 en 3668170.00
22 pages
Lecture Statistics
No ratings yet
Lecture Statistics
5 pages
Electric Field and Ultrasonic Sensor Based Security System PDF
No ratings yet
Electric Field and Ultrasonic Sensor Based Security System PDF
4 pages
Lesson Plan Guide: Unit/Chapter Objective/Generalization/Big Idea
No ratings yet
Lesson Plan Guide: Unit/Chapter Objective/Generalization/Big Idea
3 pages
Examples For Chemical Engineering Thermodynamics Using Matlab
No ratings yet
Examples For Chemical Engineering Thermodynamics Using Matlab
7 pages
Reimage Guide Precision RG En-Us
No ratings yet
Reimage Guide Precision RG En-Us
47 pages
Cattle Record
No ratings yet
Cattle Record
2 pages
Nynas - Nytro Izar Ii
No ratings yet
Nynas - Nytro Izar Ii
42 pages
CSE20240207_Part16
No ratings yet
CSE20240207_Part16
1 page
medtronic 2090
No ratings yet
medtronic 2090
112 pages
10 Plumbing Details
No ratings yet
10 Plumbing Details
1 page
Basketball Documentation Format
No ratings yet
Basketball Documentation Format
7 pages
Modeling Estuaries Using WASP
No ratings yet
Modeling Estuaries Using WASP
55 pages
Goegraphy Grade 8
No ratings yet
Goegraphy Grade 8
6 pages
InfoPLC Net 77430539 Wincc Migrationguide DOC V21 en
No ratings yet
InfoPLC Net 77430539 Wincc Migrationguide DOC V21 en
24 pages
Bolex Projector m8 m8r User Manual
100% (1)
Bolex Projector m8 m8r User Manual
20 pages
Geomechanics Lab Report 2024
No ratings yet
Geomechanics Lab Report 2024
10 pages
DataBook CD PDF
No ratings yet
DataBook CD PDF
15 pages
Unit-2 C-Language
No ratings yet
Unit-2 C-Language
33 pages

Tigger

Uploaded by

Tigger

Uploaded by

Package ‘tigger’

October 10, 2023

Maintainer Susanna Marquez <[email protected]>

AIRRDb Example human immune repertoire data

cleanSeqs Clean up nucleotide sequences

findNovelAlleles Find novel alleles from repertoire sequencing data

• polymorphism_call: The novel allele call.

The following comments can appear in the note column:

• Novel allele found: A novel allele was detected.

# Note: In this example, with SampleGermlineIGHV,

# Find novel alleles and return relevant data

findUnmutatedCalls Determine which calls represent an unmutated allele

findUnmutatedCalls(allele_calls, sample_seqs, germline_db)

generateEvidence Generate evidence

# Generate input data

# Assemble evidence table

genotypeFasta Return the nucleotide sequences of a genotype

GermlineIGHV Human IGHV germlines

getMutatedPositions Find the location of mutations in a sequence

# Find the differences between the two

getMutCount Determine the mutation counts from allele calls

sample_seqs <- c(SampleGermlineIGHV[2], s2, s3)

# Compare each sequence to its assigned germline(s) to determine the distance

inferGenotype Infer a subject-specific genotype using a frequency method

data data.frame containing V allele calls from a single subject.

• gene: The gene name without allele.

inferGenotypeBayesian Infer a subject-specific genotype using a Bayesian approach

v_call column in data with V allele calls. Default is "v_call".

• gene: The gene name without allele.

1. Laserson U and Vigneault F, et al. High-resolution antibody dynamics of vaccine-induced

insertPolymorphisms Insert polymorphisms into a nucleotide sequence

insertPolymorphisms replaces nucleotides in the desired locations of a provided sequence.

insertPolymorphisms(sequence, positions, nucleotides)

sequence starting nucletide sequence.

A sequence with the desired nucleotides in the provided locations.

plotGenotype Show a colorful representation of a genotype

genotype_a$SUBJECT <- "A"

plotNovel Visualize evidence of novel V alleles

• red: the position(s) pass(ess) the novel allele test

readIgFasta Read immunoglobulin sequences

reassignAlleles Correct allele calls based on a personalized genotype

# Use the personlized genotype to determine corrected allele assignments

SampleDb Example human immune repertoire data

• "SEQUENCE_IMGT": IMGT-gapped V(D)J nucleotide sequence.

SampleGenotype Example genotype inferrence results

SampleGermlineIGHV Example Human IGHV germlines

SampleNovel Example novel allele detection results

selectNovel Select rows containing novel alleles

sortAlleles Sort allele names

# Sort the alleles by name

# Sort the alleles by position in the locus

subsampleDb Subsample repertoire

Subsampled version of the input data.

Allele detection and genotyping

Input, output and formatting

updateAlleleNames Update IGHV allele names

allele_calls vector of strings respresenting IGHV allele names.

Vector of strings respresenting updated IGHV allele names.

Like updateAlleleNames, sortAlleles can help format a list of allele names.

# Update the alleles to the new names

writeFasta Write to a fasta file

writeFasta writes a named vector of sequences to a file in fasta format.

writeFasta(named_sequences, file, width = 60, append = FALSE)

∗ data updateAlleleNames, 3, 27, 30, 30

findNovelAlleles, 4, 9–11, 14, 16, 17, 21,

inferGenotype, 7, 9–11, 15, 19, 20, 25, 30

plotGenotype, 17, 19, 20, 30

readIgFasta, 22, 30, 32

You might also like