SNPMeta is a Python and BioPython-based tool to generate "metadata" for single nucleotide polymorphisms (SNPs) for easy filtering, or submission to SNP databases. Information reported includes gene name, whether the SNP is coding or noncoding, and whether the SNP is synonymous or nonsynonymous. SNPMeta outputs in either a dbSNP submission report format, or a tab-delimited format.
There is a Web-based version of SNPMeta available here. It only annotates with default settings, and only annotates a maximum of 20 SNPs at one time. Download the script for full functionality.
SNPMeta is now available on GitHib! Please download SNPMeta from GitHub, as it will have the most current version. Changes made to the script will also be documented there.
The user manual (pdf) is available here.
Default output - dbSNP-formatted report, for the 20 SNPs in the example dataset.
Verbose output - a tab-delimited text file with much more information, for the 20 SNPs in the example dataset.
All annotations were generated using a copy of NCBI's 'nt' database, current as of October 16, 2012.
If you use any of SNPMeta's annotations in a publication, please cite Kono et al. (2013).
These are various helper scripts provided to help with running SNPMeta. They might have uses outside of that context, though.
Blast_SNPs.sh - A shell script to run BLAST on SNPs, and save the reports as XML. Requires an installation of NCBI's BLAST executables, and a Bash shell. Edit the script in a text editor so the variables match your system. Requires a directory with FASTA files, with one sequence per file. This script will create a new file for each FASTA in the directory, ending in '.xml', containing the BLAST report.
Convert_Illumina.py - A Python script to convert from the Illumina contextual sequence format to FASTA, for input to SNPMeta. Accepts a text file with two fields, separated by a tab: the SNP Name, and the SNP contextual sequence. Outputs a FASTA file with IUPAC ambiguities to stdout.
GBSContextualSeq.py - A Python script to build SNP contextual sequences from a reference sequence and a VCF file. Generates a separate FASTA file for each sample listed in the VCF file. This is useful for generating contextual sequence from genotype-by-sequence (GBS) data, as the SNPs will be stored as a VCF. Requires BioPython. Also requires Argparse if using Python < 2.7.
Split_FASTA.py - A Python script to split a large FASTA file into smaller files. Takes a FASTA file and a positive integer as arguments. Requires BioPython.
If you use SNPMeta's annotations in your research, please cite Kono et al. (2013).
Yes! Running SNPMeta on Windows is a little different from running it on UNIX-like operating systems, however. More detailed information is provided in the user manual.
SNPMeta is currently written to run on Python version 2.7.3. Python 2.6.* will also work, provided that the argparse library is installed. Argparse is provided in the Python 2.7 standard libraries, so separate installation is not necessary. SNPMeta is untested with Python 3, but this may change in the future. SNPMeta uses syntax that is not implemented in Python 2.5 and earlier, so these versions will not work.
If you are working on GNU/Linux, then the GNU C Compiler (part of the GNU Compiler Collection) should be available. If your distribution does not provide gcc by default, then you can use its package manager to install it. If you are working on MacOS, then Apple provides a C compiler free with their Developer Tools package. The "Command Line Tools for Xcode" can be used if downloading the full Xcode package takes too much space. A C compiler should not be necessary for installing SNPMeta's dependencies on Windows.
If you are running BLAST through SNPMeta, then it takes a little over one minute per SNP to process. Most of the time is spent waiting for BLAST results. If you have run BLAST beforehand, and are annotating from XML reports, then SNPMeta is significantly faster, and can annotate over 1,000 SNPs in about 20 minutes.
Last updated: 2013-08-21