Contact: Steve Cannon regarding this document (cann0010@tc.umn.edu)
OR bioperl-l@bioperl.org regarding Bioperl questions.
July, 2002
OLD STUFF!
I wrote the instructions below in 2001 and early 2002, when Bioperl
and OS X and fink and CPAN/OS X were all less mature than they are now.
Many distributions and patches and other details have probably changed, so
most of these instructions will be of little use to anyone.
| Overview and Introduction | |
| Bioperl | easy |
| NCBI Blast | TT |
| Clustalw | T |
| t-coffee | T |
| Bioperl XS extensions | T |
| AcePerl Modules | not ported for OS X |
| File::Temp | easy |
| IO::Scalar & IO::String | easy |
| LWP:: Modules | TTT |
| XML Enabling Modules | TT |
| Storable and Text::Shellwords modules | easy |
| GD.pm graphics library and related stuff | TTT |
These instructions are for people who would like to use BioPerl on Mac OS X. Most of those people -- myself included -- are relatively new to a Unix environment, and so have to contend with a new OS, a new and rather complex scripting interface to a large number of modules and external bioinformatics packages, and some often tricky installations of BioPerl dependencies. Having gone through this experience a few times, I thought I would make my installation notes available to others. (By the way, what I do with Bioperl is "genome archaeology" and gene family evolutionary analysis.) Although I have tried to be thorough in these instructions, I haven't tried to duplicate information in the README files, so be sure to check those for everything you install. I am interested in improving these instructions, so please email me if you run into problems or have suggestions.
There are at least three quite distinct options for installing the BioPerl dependencies on OS X. You can 1) do it "from scratch," following the instructions on this page, or 2) you can use the CPAN.pm module and BioPerl::Bundle, which should install almost all of the dependencies nearly automatically, or 3) you can try the new Fink bioperl-pm package, which also should install almost all of the dependencies nearly automatically. Unfortunately, none of these options is entirely without a certain amount of hassle and risk. In order, here are the options in more detail:
One more piece of advice before you commence installations: much of this software is under active development, and there are lots of packages, and you will be installing lots of stuff with "superuser" priveledges. So you can do some damage to your system. Back up your important data and settings before you start. And some related general OS X advice: I strongly recommend using Disk First Aid (run it from the OS 9 CD, starting up from that CD by holding down the C key) IMMEDIATELY after you spot any serious problems on your machine. An example of a "problem" is any software crash that requires a hard restart of the machine, particularly if it doesn't restart cleanly (hangs partway through the restart, etc). Another example is if an application that has always worked in the past starts behaving very differently and is unable to locate files, etc.
A couple of FAQs:
Will BiopPerl and dependencies install under OS 10.2 (Jaguar)?
Yes. I haven't tried myself, but I've received several positive reports, and there should be no problems.
I type "make blablabla" and get the message "make: Command not found.".
You need to install the OS X Developer CD. This should be included in your OS X 10.2 distribution, and is also available for free download from the Apple Developer Connection (after registering at that site).
The basic installation of Bioperl on OS X goes without a hitch. Installation of the dependent packages and associated programs sometimes takes a little thought, or a small trick -- maybe just setting an environment variable or updating a symbol table, but enough to cause someone new to the Unix installation process some grief. The T's in the following list of Bioperl dependencies indicate relative effort (or number of "Tricks") involved in each installation.
Setting up .cshrc
First, check whether .cshrc exists:
cd ~your_user_name ls -la * or the alias to this command, ll
If you don't see .cshrc, create it. To see what is currently in your default path, go
env
and read the PATH line. You want to keep these paths, so Darwin can find the Unix shell commands! So that's what we'll add in .cshrc first.
For example, using pico:
pico .cshrc
Then type or cut and paste the following. While you're at it, also add an alias and an environment variable. The BLASTDB and BLASTDIR variables won't be useful until you've added Blast, but at least this gives examples of the content of .cshrc. The alias will save you some keystrokes (once you've created that directory). Note that if you mess up this initial path information, Unix commands such as ls and rm will no longer work, and you'll have to find some other way to delete or fix this hidden file (for example, overwrite using BBEdit with a corrected or blank .cshrc file, or delete the file while in OS 9).
set path=(. \ ~/bin/powerpc-apple-darwin \ /usr/local/bin \ /usr/bin \ /bin \ /usr/local/sbin \ /usr/sbin \ /sbin \ ) alias bioinf 'cd /Applications/bioinf' setenv BLASTDB "/Applications/bioinf/ncbi/build/data" setenv BLASTDIR "/Applications/bioinf/ncbi/build"
then exit pico and save. Now, make Darwin see your changes, either by quitting Terminal and starting it up again, or by going:
source .cshrcReturn to top
Download and unzip a BioPerl distribution from Bioperl.org. In these instructions, I will use bioperl-1.0 -- although use either the most current developer release (if different than 1.0), or the latest stable release. We'll retrieve the distribution from the Bioperl web site, using 'curl' (which replaces 'wget' from OS X versions predating 10.1). This will go into a new /Applications/bioinf directory:
cd /Applications mkdir bioinf cd bioinf curl -O ftp://bioperl.org/pub/DIST/bioperl-1.0.tar.gz gunzip bioperl-1.0.tar.gz tar -xvf bioperl-1.0.tar rm bioperl-1.0.tar
Or, to save some key strokes, combine the last three commands like so:
gunzip -dc bioperl-1.0.tar.gz | tar xvf -
Now, install bioperl (to get some instant gratification), and then do the
tedious work of installing some or all of the other programs that parts of
Bioperl use or depend on, such Blast.
The bioperl installation requires root access, which you will get using the
'sudo' command and your password. After 'make test', don't worry if a number
of the tests fail, but note which additional packages are recommended --
and why. You may not need all of the "extras".
cd bioperl-1.0 more README perl Makefile.PL make make test sudo make install
Before running bptutorial.pl, you will also need to install IO::Scalar & IO::String, and LWP and the supporting libraries. After that installation, read bptutorial.pl, and try running some of the sample programs:
perl bptutorial.pl perl bptutorial.pl 4 perl bptutorial.pl 0
With the last command, you will get a number of errors, unless you have installed the dependencies. That is the next step -- depending on what functionality you actually need from Bioperl.
As the "Bioperl external package dependencies" page explains (http://www.bioperl.org/Core/external.shtml),
"Bioperl contains wrappers, parsers and modules that can make use of several third party applications. Parts of our pre-install test suite may try to check for the presence and behaviour of these applications so you may see mention of them during the bioperl 'make test' installation step. Don't worry about them if you don't need or or use them."Nevertheless, many of the third party applications, extensions, and modules are important, so here are some detailed OS X installation notes. These instructions follow the order in which dependencies are discussed at "Bioperl external package dependencies" page. All of the extensions and modules (but not third party applications) are conveniently available from ftp://bioperl.org/pub/external/ (though sometimes in as less-than-current versions). Return to top
Download Blast from either the NCBI or from the Apple R&D site (see below). For the NCBI distribution, use ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/:
cd /Applications/bioinf curl -O ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/ncbi.tar.gz gunzip -dc ncbi.tar.gz | tar xvf -
In the afforementioned Apple R&D version, BlastN (accessed via the blastall program) has been optimized for the Mac G4 processor and OS X (I believe the optimized version doesn't run on G3s). See a description at http://developer.apple.com/hardware/ve/acgresearch.html. You can either install the standard NCBI distribution (address above) and then replace one file (blastall, at ftp://ftp.apple.com/developer/Tool_Chest/AGBLAST/blastall.gz) in the Build directory, or grab and install all of the source code from the Apple ftp site (at ftp://ftp.apple.com/developer/Tool_Chest/AGBLAST/AGBLAST.tar.gz). The result of either approach SHOULD be the same, with about the same amount of effort. However, as of April 2002, the ncbi version may be the safer way to go (and then swap in the blastall executable). See this note from David Adelson about missing distance matrices in the Apple version.
So, install either distribution. We need to run a shell script in the 'make' directory that will disable the 'vibrant' graphics libraries (which won't work in the OS X graphical environment), and configure the make file to use Darwin. The script then builds the executables. The last step takes several minutes, even on a fast machine. (Note: I notice that there is a new distribution of these ncbi tools (from July 2002). I haven't tested this distribution. You might have to tweak the makedis.csh in order to make sure that it is not trying to build using the Motif or Vibrant graphics libraries.)
cd ncbi cd make chmod 775 makedis.csh cd .. cd .. sudo ./ncbi/make/makedis.csh
The executables should now be in the 'build' directory, along with a lot of files used during the build process. Those files can be deleted:
cd ncbi/build sudo /bin/tcsh * gives you root access rm *.a rm *.o rm *.c rm *.h exit * returns you to your standard login access
(If you installed the ncbi distribution from ncbi, and want to replace the G4-optimized version of Blastall from the Apple ftp site, now is the time to do it. Also make sure the new blastall file is executable.)
The Blast programs need to know where your blast target databases will be placed. We'll tell the programs to look in ncbi/build/data, and if you wish, you can later redirect the program to look wherever you choose. Check first to see whether build/data exists, and create it if it doesn't. From ncbi/build:
ls mkdir data
We will set several environment variables, pointing to the location of the the Blast executables (setenv BLASTDIR "/Applications/bioinf/ncbi/build"); and your formatted target databases (setenv BLASTDB "/Applications/bioinf/ncbi/build/data"); and the blast distance matrices (Data=/Applications/bioinf/ncbi/data and BLASTMAT and "/Applications/bioinf/ncbi/data").
The information about the distance matrices goes into a new configuration file, .ncbirc, in your home directory:
cd ~yourusername pico .ncbirc
In pico, type or paste these two lines:
[ncbi] Data=/Applications/bioinf/ncbi/data
Now, edit your .cshrc file, adding the following line to your path list:
/Applications/bioinf/ncbi/build \
... and these lines after your path list (if you don't already have them):
setenv BLASTDB "/Applications/bioinf/ncbi/build/data" setenv BLASTDIR "/Applications/bioinf/ncbi/build" setenv BLASTMAT "/Applications/bioinf/ncbi/data"
then save and exit pico, and go "source .cshrc". Now, you need a set of sequences to blast against. How about the E. coli nucleotides. This database also happens to be the one that the "standaloneblast" test in bptutorial.pl uses by default.
cd /Applications/bioinf/ncbi/build/data curl -O ftp://ftp.ncbi.nlm.nih.gov/blast/db/ecoli.nt.Z gunzip ecoli.nt.Z
Test the "formatdb" Blast program, and create some blastable files. Make sure you are in the 'data' directory, then format the database. Because you are in this directory, the formatted files will also be placed there.
formatdb -i ecoli.nt -p F
There should now be several files in this directory formatted for blasting. To create a file containing a query sequence, open ecoli.nt, copy part of one of the fasta-format sequences (the definition lines plus a few lines of sequence), and paste into a new file called sample.fasta. Save sample.fasta in 'data'. We'll blast this file against ecoli.nt:
blastall -p blastn -d ecoli.nt -i sample.fasta
This should give you some blast output. If it does, you've got Blast installed locally, and you can test if bioperl sees it:
cd /Applications/bioinf/bioperl-1.0 perl bptutorial.pl 8
If you get the following output, you're set:
Beginning run_standaloneblast example... Hit name is gi|1786181|gb|AE000111.1|AE000111 Escherichia coli K-12 MG1655 section 1 of 400 of the complete genome
For troubleshooting, see the Blast documentation at the NCBI, and the following web site:
http://genome.nhgri.nih.gov/blastall/blast_install/
Also, see these comments from Tim Myers on the remoteblast demo (#26) in bptutorial.pl.
Download the software, uncompress, make, and clean. I downloaded Clustal from ftp://ftp.ebi.ac.uk/pub/software/unix/clustalw/clustalw1.82.UNIX.tar.gz Then gunzip and untar. You can also try ftp://bioperl.org/pub/external/. I placed the archive in /Applications/bioinf. Note that in OS X versions earlier than 10.1, 'make' doesn't go smoothly without removal of -lm flags from the make file.
cd /Applications/bioinf gunzip -dc clustalw1.82.UNIX.tar.gz | tar xvf - cd clustalw1.82 make rm *.o
Then add clustal to your path (in .cshrc) ...
/Applications/bioinf/clustalw1.82 \
... and set environment variables for the benefit of T-Coffee and Bioperl:
setenv CLUSTALDIR "/Applications/bioinf/clustalw1.82" setenv CLUSTALW_4_TCOFFEE "/Applications/bioinf/clustalw1.82"
... then save and go 'source .cshrc'. Now, you can test using Bioperl:
cd /Applications/bioinf/bioperl-1.0 perl bptutorial.pl 12Return to top
Download T-COFFEE from
http://igs-server.cnrs-mrs.fr/~cnotred/Projects_home_page/t_coffee_home_page.html
Then place in bioinf, rename to 't_coffee', then gunzip and untar, and ...
cd t_coffee ./install
Note that in OS X versions earlier than 10.1, 'make' doesn't go smoothly without removal of -lm flags from several make files. Then add T-COFFEE to your path (in .cshrc) ...
/Applications/bioinf/t_coffee/bin \
... and set environment variables for the benefit of T-Coffee and Bioperl:
setenv TCOFFEEDIR "/Applications/bioinf/t_coffee/bin"
... then save and go 'source .cshrc'. Now, you can test using bioperl:
cd /Applications/bioinf/bioperl-1.0 perl bptutorial.pl 12Return to top
Compile this package if you want to make protein Smith-Waterman comparisons. The installation instructions in the Readme files for Bioperl-0.7.2 and 1.0 are quite confusing, I think, because they describe a Compile/SW/libs directory which is not in these distributions. You need to download this material from ftp://bioperl.org/pub/DIST or ftp://bioperl.org/pub/external/
cd /Applications/bioinf/bioperl-1.0 curl -O ftp://bioperl.org/pub/DIST/bioperl-ext-0.6.tar.gz gunzip -dc bioperl-ext-0.6.tar.gz | tar xvf - cd bioperl-ext-06/Bio/Ext/Align perl Makefile.PL
Now you will need to edit a line in 'Makefile'. Note: I believe that Pico will mess this file up (will insert an extra return in one of the long lines), so use BBEdit or vi. Find the line that begins with CCFLAGS, and add -fPIC to the end:
CCFLAGS = -g -pipe -pipe -fno-common -no-cpp-precomp -flat_namespace -DHAS_TELLDIR_PROTOTYPE -fno-strict-aliasing -fPIC
Then save, 'make', then update a symbol table for the compiler using ranlib, then install:
make ranlib libs/libsw.a make test sudo make install
Then test the installation using bptutorial.pl:
perl bptutorial.pl 13Return to top
AcePerl is used by Bioperl to access AceDB databases. The module is available from http://stein.cshl.org or ftp://bioperl.org/pub/external/. I have not been able to get AcePerl installed. It appears not to have been ported for OS X.
Return to topFile::Temp is used by network accessing modules such as Bio::DB::WebDBSeqI. Download them from CPAN or ftp://bioperl.org/pub/external/, unzip and untar, then go
perl Makefile.PL make make test sudo make installReturn to top
IO::Scalar (contained in the "IO-Stringy" CPAN perl module) is used in
Bio::Tools::Blast::Run::Webblast.pm IO::String (contained in the "IO-String"
CPAN perl module) is used in Bio::DB::Genbank and Bio::DB::Swissprot
The installation of these two modules is straightforward: download them from
CPAN or ftp://bioperl.org/pub/external/,
unzip and untar, the cd into each directory and go
perl Makefile.PL make make test sudo make installReturn to top
In order to perform remote blast searches via a network the following
modules are required: HTTP::Request::Common and LWP::UserAgent. These
modules are both contained in the libwww-perl distribution at CPAN. The
libwww-perl module also has a number of dependencies. Installation of
the whole package would be a breeze with CPAN.pm, but as I mentioned in
the "OS X Preliminaries" section, I have not gotten CPAN.pm working smoothly
on OS X. Also, it is worth noting that there are warnings in several newsgroups
about an unpleasant side effect of default installation of LWP.
Because OS X files are not distinguished by case, 'HEAD'
from LWP (in libwww-perl) clobbers the Unix utility 'head' in /usr/bin/. This is
not a good thing! Please see this site for nice instructions about how to back
up and then recover 'head':
http://www.scriptdigital.com/divers/frontiermonitor.html
You may also want to check these references other in case you run into problems. The last
two reference also describe how to recover 'head'.
http://developer.apple.com/internet/macosx/perl.html
http://sial.org/code/perl/docs/life-with-cpan.txt
http://www.dur.ac.uk/p.j.heslin/diogenes/mac_install.html
http://www.macosxhints.com/article.php?story=20010603142727786
http://archive.develooper.com/macosx%40perl.org/msg00353.html
The dependent packages for libwww-perl are:
HTML-Tagset - Needed by HTML-Parser
Digest-MD5 - Needed to do Digest authentication
MIME-Base64 - Used in authentication headers
libnet-1.0901
URI-1.10 - There are URIs everywhere
HTML-Parser-3.25 - Need by HTML-HeadParser
libwww-perl-5.63 - provides access to WWW clients
After downloading these packages from ftp://bioperl.org/pub/external/
or from http://www.cpan.org/ into the same directory,
I went through the following standard installation procedures. Of course the version numbers will change over time.
cd ../HTML-Tagset-3.03 perl makefile.pl make make test sudo make install cd ../Digest-MD5-2.16 perl makefile.pl make make test sudo make install cd ../MIME-Base64-2.12 perl makefile.pl make make test sudo make install cd ../libnet-1.0901 configure perl makefile.pl make make test sudo make install cd ../URI 1.10 perl makefile.pl make make test sudo make install cd ../HTML-Parser-3.25 perl makefile.pl make make test sudo make install cd ../libwww-perl-5.63 perl makefile.pl make make test sudo make install
You should now be set with libwww and LWP. To test LWP using a simple command-line statement (see the LWP POD), try:
perl -MLWP::Simple -e 'getprint "http://www.ncbi.nlm.nih.gov/BLAST/blast_FAQs.html"'Return to top
Expat
The first step is to install an XML parser called Expat. The Expat version at
Sourceforge.net at the time of writing, 1.95.2, does not compile on OS X (judging by the
bug reports at http://sourceforge.net/projects/expat/,
the software's home). The previous version, 1.95.1 apparently does compile easily after
using ./configure (discussion at
http://archive.develooper.com/macosx@perl.org/msg00708.html).
Use the 1.95.1 version at ftp://bioperl.org/pub/external/,
or there is a "ready-to-make-on-OS X" version at
http://www.caos.aamu.edu/pub/MacOS_X/BSD/Applications/Publishing/XML/expat/
The installation using the latter version is a simple one-liner (after unzipping and untarring):
sudo make install
Most of the other dependencies are easy. I simply downloaded from ftp://bioperl.org/pub/external/ (and XML-Twig from CPAN), unzipped and untarred, and went
cd XML-Parser-2.30 perl Makefile.PL EXPATLIBPATH=/usr/local/lib/ \ EXPATINCPATH=/usr/local/include/ make * I see a few warnings, but all tests pass: make test sudo make install cd ../libxml-perl-0.07 perl Makefile.PL make make test sudo make install
XML-Writer generates an error, which is discussed by John Escott at http://aspn.activestate.com/ASPN/Mail/Message/perl-xml/282609
The fix requires editing two lines in Writer.pm. Replace both instances of
_checkNSNames(\@_);
with
my @a = @_;
_checkNSNames(\@a);
Then, continue as usual:
perl Makefile.PL make make test sudo make install cd ../XML-Node-0.10 perl Makefile.PL * I get a warning, but all tests pass: make test sudo make install
The next module is not mentioned at the Bioperl dependency page, but is called for in the Bioperl-1.0 test suite:
cd ../XML-Twig-2.02 perl Makefile.PL make make test sudo make installReturn to top
Storable
- Recommended for all releases after bioperl-0.7.2. This module used for persistant object
storage and local file caching.
Text::Shellwords
- Used only within the bioperl graphics package.
I don't have instructions for these, but they are standard CPAN modules.
See here
for partial notes for installing GD.pm (GD-2.11) on OS X 10.3.1
Return to top
After doing these installations, you can see if you have any remaining failures in the Bioperl test, using 'make test', and by running bptutorial.pl -0. I have not gone through this process systematically in the Bioperl-1.0.2 release, so I can't yet share my experiences -- but the core modules and dependencies are working for me.
Return to top