Installing BioPerl on OS X



Contact: Steve Cannon regarding this document (cann0010@tc.umn.edu) OR bioperl-l@bioperl.org regarding Bioperl questions.
July, 2002



OLD STUFF!
I wrote the instructions below in 2001 and early 2002, when Bioperl and OS X and fink and CPAN/OS X were all less mature than they are now. Many distributions and patches and other details have probably changed, so most of these instructions will be of little use to anyone.



Overview and Introduction
Bioperleasy
NCBI BlastTT
ClustalwT
t-coffeeT
Bioperl XS extensionsT
AcePerl Modulesnot ported for OS X
File::Tempeasy
IO::Scalar & IO::Stringeasy
LWP:: ModulesTTT
XML Enabling ModulesTT
Storable and Text::Shellwords moduleseasy
GD.pm graphics library and related stuffTTT


Overview and Introduction

These instructions are for people who would like to use BioPerl on Mac OS X. Most of those people -- myself included -- are relatively new to a Unix environment, and so have to contend with a new OS, a new and rather complex scripting interface to a large number of modules and external bioinformatics packages, and some often tricky installations of BioPerl dependencies. Having gone through this experience a few times, I thought I would make my installation notes available to others. (By the way, what I do with Bioperl is "genome archaeology" and gene family evolutionary analysis.) Although I have tried to be thorough in these instructions, I haven't tried to duplicate information in the README files, so be sure to check those for everything you install. I am interested in improving these instructions, so please email me if you run into problems or have suggestions.

There are at least three quite distinct options for installing the BioPerl dependencies on OS X. You can 1) do it "from scratch," following the instructions on this page, or 2) you can use the CPAN.pm module and BioPerl::Bundle, which should install almost all of the dependencies nearly automatically, or 3) you can try the new Fink bioperl-pm package, which also should install almost all of the dependencies nearly automatically. Unfortunately, none of these options is entirely without a certain amount of hassle and risk. In order, here are the options in more detail:

  1. Doing the installations manually gives you the most control, but there are a few pitfalls here and there (particularly, see the note about "HEAD" below, under "LWP.")
  2. CPAN.pm is a wonderful creation, but I have been frustrated by it on OS X. In particular, the last time I tried to install CPAN.pm itself (in July, 2002), it tried to update my version of Perl to 5.80 -- which would seem like a virtuous thing, but the Perl reinstallation process itself was long and arduous, and didn't work for me (it caused me no problems with the default installation, but 5.80 didn't "take.") And then when I tried to use it to install BioPerl::Bundle, it tried AGAIN (and without asking or giving me the option to refuse) to install 5.80 again. That said, the dependencies did install correctly. I did get a report in September that CPAN and BioPerl::Bundle worked fine for someone on OSX, so maybe the problem has been fixed (or I am doing something wrong). I'd like to hear other reports. (Note January 2003: please see these comments/instructions from Gregory Jefferis.)
  3. The Fink option should be the best solution, because the Fink package manager is explicitly designed for OS X. It also installs everything in its own private directory (/sw), so there is essentially no danger of overwriting important Unix utilities and libraries. The bioperl-pm package is new, and I haven't tested it, so I (and the package maintainer, Christopher Dithi) would be interested to hear how it goes. As of August, 2002, the Fink bioperl-pm is in the "unstable" branch, meaning that it has not been well tested. You can help! (Note January 2003: please see these comments/instructions from Gregory Jefferis.)

One more piece of advice before you commence installations: much of this software is under active development, and there are lots of packages, and you will be installing lots of stuff with "superuser" priveledges. So you can do some damage to your system. Back up your important data and settings before you start. And some related general OS X advice: I strongly recommend using Disk First Aid (run it from the OS 9 CD, starting up from that CD by holding down the C key) IMMEDIATELY after you spot any serious problems on your machine. An example of a "problem" is any software crash that requires a hard restart of the machine, particularly if it doesn't restart cleanly (hangs partway through the restart, etc). Another example is if an application that has always worked in the past starts behaving very differently and is unable to locate files, etc.

A couple of FAQs:
Will BiopPerl and dependencies install under OS 10.2 (Jaguar)?
Yes. I haven't tried myself, but I've received several positive reports, and there should be no problems.
I type "make blablabla" and get the message "make: Command not found.".
You need to install the OS X Developer CD. This should be included in your OS X 10.2 distribution, and is also available for free download from the Apple Developer Connection (after registering at that site).

The basic installation of Bioperl on OS X goes without a hitch. Installation of the dependent packages and associated programs sometimes takes a little thought, or a small trick -- maybe just setting an environment variable or updating a symbol table, but enough to cause someone new to the Unix installation process some grief. The T's in the following list of Bioperl dependencies indicate relative effort (or number of "Tricks") involved in each installation.

Darwin / OS X Preliminaries

Setting up .cshrc

First, check whether .cshrc exists:

	cd ~your_user_name
	ls -la				* or the alias to this command,   ll

If you don't see .cshrc, create it. To see what is currently in your default path, go

	env

and read the PATH line. You want to keep these paths, so Darwin can find the Unix shell commands! So that's what we'll add in .cshrc first.
For example, using pico:

	pico .cshrc

Then type or cut and paste the following. While you're at it, also add an alias and an environment variable. The BLASTDB and BLASTDIR variables won't be useful until you've added Blast, but at least this gives examples of the content of .cshrc. The alias will save you some keystrokes (once you've created that directory). Note that if you mess up this initial path information, Unix commands such as ls and rm will no longer work, and you'll have to find some other way to delete or fix this hidden file (for example, overwrite using BBEdit with a corrected or blank .cshrc file, or delete the file while in OS 9).

	set path=(. \
	~/bin/powerpc-apple-darwin \
	/usr/local/bin \
	/usr/bin \
	/bin \
	/usr/local/sbin \
	/usr/sbin \
	/sbin  \
	)

	alias bioinf		'cd /Applications/bioinf'

	setenv BLASTDB 		"/Applications/bioinf/ncbi/build/data"
	setenv BLASTDIR		"/Applications/bioinf/ncbi/build"

then exit pico and save. Now, make Darwin see your changes, either by quitting Terminal and starting it up again, or by going:

	source .cshrc

Return to top

Install Bioperl

Download and unzip a BioPerl distribution from Bioperl.org. In these instructions, I will use bioperl-1.0 -- although use either the most current developer release (if different than 1.0), or the latest stable release. We'll retrieve the distribution from the Bioperl web site, using 'curl' (which replaces 'wget' from OS X versions predating 10.1). This will go into a new /Applications/bioinf directory:

	cd /Applications
	mkdir bioinf
	cd bioinf
	curl -O ftp://bioperl.org/pub/DIST/bioperl-1.0.tar.gz
	gunzip bioperl-1.0.tar.gz
	tar -xvf bioperl-1.0.tar
	rm bioperl-1.0.tar

Or, to save some key strokes, combine the last three commands like so:

	gunzip -dc bioperl-1.0.tar.gz | tar xvf -

Now, install bioperl (to get some instant gratification), and then do the tedious work of installing some or all of the other programs that parts of Bioperl use or depend on, such Blast.
The bioperl installation requires root access, which you will get using the 'sudo' command and your password. After 'make test', don't worry if a number of the tests fail, but note which additional packages are recommended -- and why. You may not need all of the "extras".

	cd bioperl-1.0
	more README
	perl Makefile.PL
	make
	make test
	sudo make install

Before running bptutorial.pl, you will also need to install IO::Scalar & IO::String, and LWP and the supporting libraries. After that installation, read bptutorial.pl, and try running some of the sample programs:

	perl bptutorial.pl
	perl bptutorial.pl 4
	perl bptutorial.pl 0

With the last command, you will get a number of errors, unless you have installed the dependencies. That is the next step -- depending on what functionality you actually need from Bioperl.

Install the External Packages

As the "Bioperl external package dependencies" page explains (http://www.bioperl.org/Core/external.shtml),

"Bioperl contains wrappers, parsers and modules that can make use of several third party applications. Parts of our pre-install test suite may try to check for the presence and behaviour of these applications so you may see mention of them during the bioperl 'make test' installation step. Don't worry about them if you don't need or or use them."
Nevertheless, many of the third party applications, extensions, and modules are important, so here are some detailed OS X installation notes. These instructions follow the order in which dependencies are discussed at "Bioperl external package dependencies" page. All of the extensions and modules (but not third party applications) are conveniently available from ftp://bioperl.org/pub/external/ (though sometimes in as less-than-current versions).

Return to top

NCBI Blast

Download Blast from either the NCBI or from the Apple R&D site (see below). For the NCBI distribution, use ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/:

	cd /Applications/bioinf
	curl -O ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/ncbi.tar.gz
	gunzip -dc ncbi.tar.gz | tar xvf -

In the afforementioned Apple R&D version, BlastN (accessed via the blastall program) has been optimized for the Mac G4 processor and OS X (I believe the optimized version doesn't run on G3s). See a description at http://developer.apple.com/hardware/ve/acgresearch.html. You can either install the standard NCBI distribution (address above) and then replace one file (blastall, at ftp://ftp.apple.com/developer/Tool_Chest/AGBLAST/blastall.gz) in the Build directory, or grab and install all of the source code from the Apple ftp site (at ftp://ftp.apple.com/developer/Tool_Chest/AGBLAST/AGBLAST.tar.gz). The result of either approach SHOULD be the same, with about the same amount of effort. However, as of April 2002, the ncbi version may be the safer way to go (and then swap in the blastall executable). See this note from David Adelson about missing distance matrices in the Apple version.

So, install either distribution. We need to run a shell script in the 'make' directory that will disable the 'vibrant' graphics libraries (which won't work in the OS X graphical environment), and configure the make file to use Darwin. The script then builds the executables. The last step takes several minutes, even on a fast machine. (Note: I notice that there is a new distribution of these ncbi tools (from July 2002). I haven't tested this distribution. You might have to tweak the makedis.csh in order to make sure that it is not trying to build using the Motif or Vibrant graphics libraries.)

	cd ncbi
	cd make
	chmod 775 makedis.csh
	cd ..
	cd ..
	sudo ./ncbi/make/makedis.csh

The executables should now be in the 'build' directory, along with a lot of files used during the build process. Those files can be deleted:

	cd ncbi/build
	sudo /bin/tcsh		* gives you root access
	rm *.a
	rm *.o
	rm *.c
	rm *.h
	exit			* returns you to your standard login access

(If you installed the ncbi distribution from ncbi, and want to replace the G4-optimized version of Blastall from the Apple ftp site, now is the time to do it. Also make sure the new blastall file is executable.)

The Blast programs need to know where your blast target databases will be placed. We'll tell the programs to look in ncbi/build/data, and if you wish, you can later redirect the program to look wherever you choose. Check first to see whether build/data exists, and create it if it doesn't. From ncbi/build:

	ls
	mkdir data

We will set several environment variables, pointing to the location of the the Blast executables (setenv BLASTDIR "/Applications/bioinf/ncbi/build"); and your formatted target databases (setenv BLASTDB "/Applications/bioinf/ncbi/build/data"); and the blast distance matrices (Data=/Applications/bioinf/ncbi/data and BLASTMAT and "/Applications/bioinf/ncbi/data").

The information about the distance matrices goes into a new configuration file, .ncbirc, in your home directory:

	cd ~yourusername
	pico .ncbirc

In pico, type or paste these two lines:

	[ncbi]
	Data=/Applications/bioinf/ncbi/data 

Now, edit your .cshrc file, adding the following line to your path list:

	/Applications/bioinf/ncbi/build \

... and these lines after your path list (if you don't already have them):

	setenv BLASTDB    "/Applications/bioinf/ncbi/build/data"
	setenv BLASTDIR   "/Applications/bioinf/ncbi/build"
	setenv BLASTMAT   "/Applications/bioinf/ncbi/data"

then save and exit pico, and go "source .cshrc". Now, you need a set of sequences to blast against. How about the E. coli nucleotides. This database also happens to be the one that the "standaloneblast" test in bptutorial.pl uses by default.

	cd /Applications/bioinf/ncbi/build/data
	curl -O ftp://ftp.ncbi.nlm.nih.gov/blast/db/ecoli.nt.Z
	gunzip ecoli.nt.Z

Test the "formatdb" Blast program, and create some blastable files. Make sure you are in the 'data' directory, then format the database. Because you are in this directory, the formatted files will also be placed there.

	formatdb -i ecoli.nt -p F

There should now be several files in this directory formatted for blasting. To create a file containing a query sequence, open ecoli.nt, copy part of one of the fasta-format sequences (the definition lines plus a few lines of sequence), and paste into a new file called sample.fasta. Save sample.fasta in 'data'. We'll blast this file against ecoli.nt:

	blastall -p blastn -d ecoli.nt -i sample.fasta

This should give you some blast output. If it does, you've got Blast installed locally, and you can test if bioperl sees it:

	cd /Applications/bioinf/bioperl-1.0
	perl bptutorial.pl 8

If you get the following output, you're set:

	Beginning run_standaloneblast example... 
	 Hit name is gi|1786181|gb|AE000111.1|AE000111 Escherichia coli K-12 MG1655 section 1 of 400 of the complete genome

For troubleshooting, see the Blast documentation at the NCBI, and the following web site:
http://genome.nhgri.nih.gov/blastall/blast_install/
Also, see these comments from Tim Myers on the remoteblast demo (#26) in bptutorial.pl.

Return to top

Clustalw

Download the software, uncompress, make, and clean. I downloaded Clustal from ftp://ftp.ebi.ac.uk/pub/software/unix/clustalw/clustalw1.82.UNIX.tar.gz Then gunzip and untar. You can also try ftp://bioperl.org/pub/external/. I placed the archive in /Applications/bioinf. Note that in OS X versions earlier than 10.1, 'make' doesn't go smoothly without removal of -lm flags from the make file.

	cd /Applications/bioinf
	gunzip -dc clustalw1.82.UNIX.tar.gz | tar xvf -
	cd clustalw1.82
	make
	rm *.o

Then add clustal to your path (in .cshrc) ...

		/Applications/bioinf/clustalw1.82 \

... and set environment variables for the benefit of T-Coffee and Bioperl:

	setenv CLUSTALDIR		"/Applications/bioinf/clustalw1.82"
	setenv CLUSTALW_4_TCOFFEE	"/Applications/bioinf/clustalw1.82"

... then save and go 'source .cshrc'. Now, you can test using Bioperl:

	cd /Applications/bioinf/bioperl-1.0
	perl bptutorial.pl 12
Return to top

T-COFFEE

Download T-COFFEE from http://igs-server.cnrs-mrs.fr/~cnotred/Projects_home_page/t_coffee_home_page.html
Then place in bioinf, rename to 't_coffee', then gunzip and untar, and ...

	cd t_coffee
	./install

Note that in OS X versions earlier than 10.1, 'make' doesn't go smoothly without removal of -lm flags from several make files. Then add T-COFFEE to your path (in .cshrc) ...

		/Applications/bioinf/t_coffee/bin \

... and set environment variables for the benefit of T-Coffee and Bioperl:

	setenv TCOFFEEDIR		"/Applications/bioinf/t_coffee/bin"

... then save and go 'source .cshrc'. Now, you can test using bioperl:

	cd /Applications/bioinf/bioperl-1.0
	perl bptutorial.pl 12
Return to top

Bioperl XS extentions

Compile this package if you want to make protein Smith-Waterman comparisons. The installation instructions in the Readme files for Bioperl-0.7.2 and 1.0 are quite confusing, I think, because they describe a Compile/SW/libs directory which is not in these distributions. You need to download this material from ftp://bioperl.org/pub/DIST or ftp://bioperl.org/pub/external/

	cd /Applications/bioinf/bioperl-1.0
	curl -O ftp://bioperl.org/pub/DIST/bioperl-ext-0.6.tar.gz
	gunzip -dc bioperl-ext-0.6.tar.gz | tar xvf -
	cd bioperl-ext-06/Bio/Ext/Align
	perl Makefile.PL

Now you will need to edit a line in 'Makefile'. Note: I believe that Pico will mess this file up (will insert an extra return in one of the long lines), so use BBEdit or vi. Find the line that begins with CCFLAGS, and add -fPIC to the end:

	CCFLAGS = -g -pipe -pipe -fno-common -no-cpp-precomp -flat_namespace -DHAS_TELLDIR_PROTOTYPE -fno-strict-aliasing -fPIC

Then save, 'make', then update a symbol table for the compiler using ranlib, then install:

	make
	ranlib libs/libsw.a
	make test
	sudo make install

Then test the installation using bptutorial.pl:

	perl bptutorial.pl 13
Return to top

AcePerl Modules

AcePerl is used by Bioperl to access AceDB databases. The module is available from http://stein.cshl.org or ftp://bioperl.org/pub/external/. I have not been able to get AcePerl installed. It appears not to have been ported for OS X.

Return to top

File::Temp Module

File::Temp is used by network accessing modules such as Bio::DB::WebDBSeqI. Download them from CPAN or ftp://bioperl.org/pub/external/, unzip and untar, then go

	perl Makefile.PL
	make
	make test
	sudo make install
Return to top

IO::Scalar & IO::String Modules

IO::Scalar (contained in the "IO-Stringy" CPAN perl module) is used in Bio::Tools::Blast::Run::Webblast.pm IO::String (contained in the "IO-String" CPAN perl module) is used in Bio::DB::Genbank and Bio::DB::Swissprot
The installation of these two modules is straightforward: download them from CPAN or ftp://bioperl.org/pub/external/, unzip and untar, the cd into each directory and go

	perl Makefile.PL
	make
	make test
	sudo make install
Return to top

LWP:: Modules

In order to perform remote blast searches via a network the following modules are required: HTTP::Request::Common and LWP::UserAgent. These modules are both contained in the libwww-perl distribution at CPAN. The libwww-perl module also has a number of dependencies. Installation of the whole package would be a breeze with CPAN.pm, but as I mentioned in the "OS X Preliminaries" section, I have not gotten CPAN.pm working smoothly on OS X. Also, it is worth noting that there are warnings in several newsgroups about an unpleasant side effect of default installation of LWP. Because OS X files are not distinguished by case, 'HEAD' from LWP (in libwww-perl) clobbers the Unix utility 'head' in /usr/bin/. This is not a good thing! Please see this site for nice instructions about how to back up and then recover 'head': http://www.scriptdigital.com/divers/frontiermonitor.html You may also want to check these references other in case you run into problems. The last two reference also describe how to recover 'head'.
http://developer.apple.com/internet/macosx/perl.html
http://sial.org/code/perl/docs/life-with-cpan.txt
http://www.dur.ac.uk/p.j.heslin/diogenes/mac_install.html
http://www.macosxhints.com/article.php?story=20010603142727786
http://archive.develooper.com/macosx%40perl.org/msg00353.html


The dependent packages for libwww-perl are:
HTML-Tagset - Needed by HTML-Parser
Digest-MD5 - Needed to do Digest authentication
MIME-Base64 - Used in authentication headers
libnet-1.0901
URI-1.10 - There are URIs everywhere
HTML-Parser-3.25 - Need by HTML-HeadParser
libwww-perl-5.63 - provides access to WWW clients


After downloading these packages from ftp://bioperl.org/pub/external/ or from http://www.cpan.org/ into the same directory, I went through the following standard installation procedures. Of course the version numbers will change over time.

	cd ../HTML-Tagset-3.03
	perl makefile.pl
	make
	make test
	sudo make install

	cd ../Digest-MD5-2.16
	perl makefile.pl
	make
	make test
	sudo make install

	cd ../MIME-Base64-2.12
	perl makefile.pl
	make
	make test
	sudo make install

	cd ../libnet-1.0901
	configure
	perl makefile.pl
	make
	make test
	sudo make install

	cd ../URI 1.10
	perl makefile.pl
	make
	make test
	sudo make install

	cd ../HTML-Parser-3.25
	perl makefile.pl
	make
	make test
	sudo make install

	cd ../libwww-perl-5.63
	perl makefile.pl
	make
	make test
	sudo make install

You should now be set with libwww and LWP. To test LWP using a simple command-line statement (see the LWP POD), try:

	perl -MLWP::Simple -e 'getprint "http://www.ncbi.nlm.nih.gov/BLAST/blast_FAQs.html"'
Return to top

XML Enabling Modules

Expat
The first step is to install an XML parser called Expat. The Expat version at Sourceforge.net at the time of writing, 1.95.2, does not compile on OS X (judging by the bug reports at http://sourceforge.net/projects/expat/, the software's home). The previous version, 1.95.1 apparently does compile easily after using ./configure (discussion at http://archive.develooper.com/macosx@perl.org/msg00708.html). Use the 1.95.1 version at ftp://bioperl.org/pub/external/, or there is a "ready-to-make-on-OS X" version at http://www.caos.aamu.edu/pub/MacOS_X/BSD/Applications/Publishing/XML/expat/ The installation using the latter version is a simple one-liner (after unzipping and untarring):

	sudo make install

Most of the other dependencies are easy. I simply downloaded from ftp://bioperl.org/pub/external/ (and XML-Twig from CPAN), unzipped and untarred, and went

	cd XML-Parser-2.30
	perl Makefile.PL EXPATLIBPATH=/usr/local/lib/  \
   		EXPATINCPATH=/usr/local/include/
   	make	   * I see a few warnings, but all tests pass:
   	make test
   	sudo make install

	cd ../libxml-perl-0.07
	perl Makefile.PL
	make
	make test
	sudo make install

XML-Writer generates an error, which is discussed by John Escott at http://aspn.activestate.com/ASPN/Mail/Message/perl-xml/282609


The fix requires editing two lines in Writer.pm. Replace both instances of

      _checkNSNames(\@_);

with

      my @a = @_;
      _checkNSNames(\@a);

Then, continue as usual:

	perl Makefile.PL
	make
	make test
	sudo make install

	cd ../XML-Node-0.10
	perl Makefile.PL		* I get a warning, but all tests pass:
	make test
	sudo make install

The next module is not mentioned at the Bioperl dependency page, but is called for in the Bioperl-1.0 test suite:

	cd ../XML-Twig-2.02
	perl Makefile.PL
	make
	make test
	sudo make install
Return to top

Storable and Text::Shellwords modules

Storable
- Recommended for all releases after bioperl-0.7.2. This module used for persistant object storage and local file caching.
Text::Shellwords
- Used only within the bioperl graphics package.

I don't have instructions for these, but they are standard CPAN modules.

Return to top

GD.pm, gd, libjpeg, libpng (optional but excellent graphics tools)

See here for partial notes for installing GD.pm (GD-2.11) on OS X 10.3.1 Return to top

Testing Bioperl after Dependency Installations

After doing these installations, you can see if you have any remaining failures in the Bioperl test, using 'make test', and by running bptutorial.pl -0. I have not gone through this process systematically in the Bioperl-1.0.2 release, so I can't yet share my experiences -- but the core modules and dependencies are working for me.

Return to top




Since May 1, 2002: visitors