Dan Gatti and Belinda Cornes have updated the annotation files for mouse genome build GRCm39. These also include sex-averaged genetic map positions for an updated version of the Cox et al. (2009) map; see CoxMapV3. (Note that the CoxMapV3 maps were corrected on 2023-03-17 using the original crimap software rather than the "improved" version of crimap, and that we've further "smoothed" the maps slightly to avoid segments with 0 recombination.)
-
GigaMUGA:
gm_uwisc_v4.csv,gm_uwisc_dict_v4.csv -
MegaMUGA:
mm_uwisc_v4.csv,mm_uwisc_dict_v4.csv -
MiniMUGA:
mini_uwisc_v5.csv,mini_uwisc_dict_v5.csv -
Original MUGA:
muga_uwisc_v4.csv,muga_uwisc_dict_v4.csv
I had identified a number of potential problems in the GigaMUGA annotation file, as well as discrepancies between the GigaMUGA and MegaMUGA files. I suspect that some of the columns in the GigaMUGA annotation file from UNC have been at least partially scrambled.
I emailed GeneSeek to get files with the probe sequences on the
arrays, and on 2018-11-02 I received a .xlsx file by email from Ben
Pejsar, Genomic Market Development Manager, Neogen GeneSeek
Operations.
My goals were to:
-
blast the sequences for all markers in each of the arrays against the mouse genome
-
figure out which SNPs have a single hit in the mouse genome, and to where
-
compare the sequences and probe locations, and the markers with multiple hits, to the UNC annotation file
Summary of findings:
-
the
uniquecolumn in the UNC annotation file for the GigaMUGA array was messed up. -
we should use
NAfor chromosome and position of markers whose probe does not have a single perfect match in the mouse genome assembly -
for a small number of markers (the transversions, with two-bead Illumina probes), the probe sequence in the GeneSeek file includes the SNP and the SNP basepair positions in the UNC GigaMUGA file were off by 1.
-
For the markers with unique probes, the GigaMUGA annotation file has the correct chromosome and position (except for the off-by-1 cases), while the MegaMUGA annotation file has six markers with incorrect chromosome assignment.
-
There are a bunch of markers with different names but the same probe sequence. More troubling, there are 29 markers that are on both the MegaMUGA and GigaMUGA arrays but with different probes on the two arrays. These are switches from plus to minus strand but without changing the marker name, and for 8 of them, the sequence on one array is either not unique or has no perfect match in the genome.
The following document describes what I've found:
The new annotation files are in the UWisc directory of this repository.
This includes a file, mm_gm_commonmark_uwisc_v1.csv,
indicating which markers are assaying common SNPs, within and between
the two arrays.
-
UWisc- the new annotation files -
Blast- includes R code for constructing fasta files with the array sequences, and for usingblastnmap them to the mouse genome. The ReadMe file explains the source for the mouse genome files, and of the command-line blastn program. (installedblastnon linux withsudo apt install ncbi-blast+) -
GeneSeek- includes the.xlsxfile with probe sequences, from Ben Pejsar at GeneSeek. -
Python-xlsx2csv.pyscript for pulling worksheets from a.xlsxfile as a CSV file. -
R- R code and R Markdown files with the analyses.new_annotations.Rmdis the key document. -
UNC- the ReadMe file has URLs for the UNC annotation files. -
GenMaps- raw genetic map files derived using the Mouse Map Converter. -
docs- compiled RMarkdown files, available on the web:
Vivek Kumar asked me to take a look at the miniMUGA array, using an annotation file he got from Fernando Pardo Manuel de Villena.
The miniMUGA paper has now been published, with some additions to the array. Initially published at bioRxiv on 2020-03-14, it provides official annotations with the Supplemental material, as Table S2.
My original analysis is at https://kbroman.org/MUGAarrays/mini_annotations.html
But I've now added a comparison to the new annotations: https://kbroman.org/MUGAarrays/mini_revisited.html
My annotation files are in the UWisc directory, with the
original ones labeled v1 and the ones based on the new array v2.
Note that new miniMUGA annotation information was provided with Blanchard et al. (2024). (For the uniquely mapped markers, these new annotations match the positions that I provide.) See Supplementary Table 2, whose columns are defined in Supplementary Table 3. Download the Supplementary Table 2 CSV file directly with the link https://gsajournals.figshare.com/ndownloader/files/47717242 Note that the positions are in build GRCm38 (mm10).
Mandy Chen asked me to take a look at the original MUGA array, using the annotations at UNC, http://csbio.unc.edu/MUGA/snps.muga.Rdata.
My analysis is at https://kbroman.org/MUGAarrays/muga_annotations.html
My annotation files are in the UWisc directory.
The code in this repository are released under the MIT License.