SNPMeta

SNPMeta is a Python and BioPython-based tool to generate "metadata" for single nucleotide polymorphisms (SNPs) for easy filtering, or submission to SNP databases. Information reported includes gene name, whether the SNP is coding or noncoding, and whether the SNP is synonymous or nonsynonymous. SNPMeta outputs in either a dbSNP submission report format, or a tab-delimited format.

Run SNPMeta Online

There is a Web-based version of SNPMeta available here. It only annotates with default settings, and only annotates a maximum of 20 SNPs at one time. Download the script for full functionality.

Download SNPMeta

SNPMeta is now available on GitHib! Please download SNPMeta from GitHub, as it will have the most current version. Changes made to the script will also be documented there.

User Manual

The user manual (pdf) is available here.

Publication

The SNPMeta paper is published in Molecular Ecology Resources, and is available here.

Example Files

Example dataset - a collection of 20 SNPs called in Drosophila melanogaster, from King et al., (2012).

Default output - dbSNP-formatted report, for the 20 SNPs in the example dataset.

Verbose output - a tab-delimited text file with much more information, for the 20 SNPs in the example dataset.

All annotations were generated using a copy of NCBI's 'nt' database, current as of October 16, 2012.

Citing SNPMeta

If you use any of SNPMeta's annotations in a publication, please cite Kono et al. (2013).

Companion Scripts

These are various helper scripts provided to help with running SNPMeta. They might have uses outside of that context, though.

Blast_SNPs.sh - A shell script to run BLAST on SNPs, and save the reports as XML. Requires an installation of NCBI's BLAST executables, and a Bash shell. Edit the script in a text editor so the variables match your system. Requires a directory with FASTA files, with one sequence per file. This script will create a new file for each FASTA in the directory, ending in '.xml', containing the BLAST report.

Convert_Illumina.py - A Python script to convert from the Illumina contextual sequence format to FASTA, for input to SNPMeta. Accepts a text file with two fields, separated by a tab: the SNP Name, and the SNP contextual sequence. Outputs a FASTA file with IUPAC ambiguities to stdout.

GBSContextualSeq.py - A Python script to build SNP contextual sequences from a reference sequence and a VCF file. Generates a separate FASTA file for each sample listed in the VCF file. This is useful for generating contextual sequence from genotype-by-sequence (GBS) data, as the SNPs will be stored as a VCF. Requires BioPython. Also requires Argparse if using Python < 2.7.

Split_FASTA.py - A Python script to split a large FASTA file into smaller files. Takes a FASTA file and a positive integer as arguments. Requires BioPython.

Frequently Asked Questions

How do I cite SNPMeta?

If you use SNPMeta's annotations in your research, please cite Kono et al. (2013).

Can I run SNPMeta on Windows?

Yes! Running SNPMeta on Windows is a little different from running it on UNIX-like operating systems, however. More detailed information is provided in the user manual.

Which Python version should I use?

SNPMeta is currently written to run on Python version 2.7.3. Python 2.6.* will also work, provided that the argparse library is installed. Argparse is provided in the Python 2.7 standard libraries, so separate installation is not necessary. SNPMeta is untested with Python 3, but this may change in the future. SNPMeta uses syntax that is not implemented in Python 2.5 and earlier, so these versions will not work.

I need a C compiler to install BioPython. Where can I get one?

If you are working on GNU/Linux, then the GNU C Compiler (part of the GNU Compiler Collection) should be available. If your distribution does not provide gcc by default, then you can use its package manager to install it. If you are working on MacOS, then Apple provides a C compiler free with their Developer Tools package. The "Command Line Tools for Xcode" can be used if downloading the full Xcode package takes too much space. A C compiler should not be necessary for installing SNPMeta's dependencies on Windows.

How long does SNPMeta take to run?

If you are running BLAST through SNPMeta, then it takes a little over one minute per SNP to process. Most of the time is spent waiting for BLAST results. If you have run BLAST beforehand, and are annotating from XML reports, then SNPMeta is significantly faster, and can annotate over 1,000 SNPs in about 20 minutes.

Last updated: 2013-08-21


The views and opinions expressed in this page are strictly those of the page author.
The contents of this page have not been reviewed or approved by the University of Minnesota.