Code

These are small pieces of code I have written. I have never taken classes in programming nor computer science, so these are probably messy or inefficient. Most of them are written in bash or python, but R scripts might start appearing here as I learn the language. See the Morrell Lab page for other useful pieces of code.

SNPMeta (Python)

Strip_BAM.sh (Bash)

This Bash script trims down a BAM file and its header to a handful of user-specified chromosomes (or contigs). Useful especially for reference sequences that contain many small contigs. Depends on SAMtools. Please note that this script is not guaranteed to produce a sorted SAM file.

Usage:
Strip_BAM.sh -b <file> -c <seq1>,<seq2>,...,<seqK>
Will remove all but the sequences listed from the given BAM file.
Output will be written in SAM format to stdout. Requires samtools

[Download] 2013-07-03

codons.py (Python)

This python script interactively takes three-letter strings of nucleotide sequence (either DNA or RNA) and prints out the amino acid translation(s). It recognizes IUPAC ambiguity codes and outputs all possible codons. There are probably more elegant ways to handle this problem.

Usage:
$ ./codons.py
Enter codon (type 'q' to quit): TTT
Codon Amino Acid
TTT F
Enter codon (type 'q' to quit): KTT
Codon Amino Acid
GTT V
TTT F
Enter codon (type 'q' to quit): arg
Codon Amino Acid
AGG R
AAG K
Enter codon (type 'q' to quit): ugu
Codon Amino Acid
TGT C
Enter codon (type 'q' to quit): q
$

[Download] 2011-12-12

sfs_extraction.py (Python)

This python script takes raw output from SFS.pl (Cartwright and Ross-Ibarra 2011) and extracts the site frequency spectra. It prints these to stdout for easy redirection or piping. Included is a companion bash script that looks for files that have no replacement sites, as SFS.pl omits these from output.

Usage:
$ ./sfs_extraction.py [SFS.pl output] > sfs_extraction_output
$ ./count-empty.sh

[Download] 2011-10-17

polydNdS_Multiple.sh (Bash)

This shell script (with assistance from Ana Gonzales) is useful for examining the site frequency spectrum at silent and replacement sites from multiple loci. The script runs the libsequence program polydNdS and splits loci into silent and replacement sites. See the readme for details. Sample fasta files and options file are provided.

Usage:
$ ./polydNdS_Multiple.sh

[Download] 2011-10-03

SNPs.py (Python)

This python script builds a fasta alignment from a polytable, a reference sequence in fasta format, and an integer offset. The offset is the coordinate of the first base of the reference sequence on the reference genome. The sequences print to stdout.

Usage:
$ ./SNPs.py [polytable file] [reference sequence] [integer offset]

[Download] 2011-10-17


The views and opinions expressed in this page are strictly those of the page author.
The contents of this page have not been reviewed or approved by the University of Minnesota.