Gbrowse
From Biocourse
GBrowse: The Generic Genome Browser
Genome Browser 로 유명한 gbrowse는 데이터베이스와 웹을 연동하여 Gennome상의 주석정보를 시각화해서 보여주는 효과적인 툴입니다.
특징
Scroll, zoom, center와 같은 기능을 통해 전체적인 Genome의 모습을 보여준다
다양한 트랙을 통해 DNA, Repeat, Expression 등의 정보를 시각화
ID, name, comment 검색가능
GFF포맷의 파일로도 사용가능
Contents |
[edit]
CVS를 통한 최근버젼다운르드
| % cvs -d :pserver:anonymous@gmod.cvs.sourceforge.net:/cvsroot/gmod login CVS password: <hit return> % cvs -d :pserver:anonymous@gmod.cvs.sourceforge.net:/cvsroot/gmod co -r gbrowse-session Generic-Genome-Browser |
GBrowse 설치
GBrowse 를 설치할 수 있는 환경은 윈도우, 맥, 리눅스OS가 있습니다. 1. WINDOWS INSTALL
Before installing on Windows systems, you will need to installActiveState Perl and the Apache web server. You may also wish to install
a database management system such as MySQL.
Install ActiveState Perl
Go to http://www.activestate.com, and download the product
"ActivePerl." This is a little confusing because web site tries to
point you to the commercial product, ASPN Perl. At the current time,
the full download URL for ActivePerl is:
http://www.activestate.com/Products/Download/Download.plex?id=Active
Perl
Choose the "MSI" package for Windows for Perl version 5.8. Once
downloaded, launch the package, and it will install automatically.
Note that due to differences in how ActiveState implemented Perl
between Perl 5.6.1 and 5.8, we can only support Perl 5.8.
Install the Apache web server
Go to http://httpd.apache.org/download.cgi . Select the most recent
version of Apache, and choose the download marked "Win32 Binary (MSI
Installer)." Once downloaded, launch the package and it will install
automatically.
Install the MySQL database (optional)
Go to http://dev.mysql.com/downloads/mysql . Select and download the
most recent version of the Windows package. Once the package is
downloaded, you will need to unpack it with the WinZip program. Then
launch the installer.
After installing MySQL, install DBD::mysql; the easiste way to do
that is via ActiveState's ppm utility.
Install GBrowse
The easiest way to install GBrowse on Windows is to use an installer
script that can be obtained from the GMOD website:
http://www.gmod.org/gbrowse/windows .
After downloading the windows_install.pl script, open a DOS command
shell, change directories to where the script was downloaded to, and
execute the command:
perl windows_install.pl
This script will:
Install MicroSoft's nmake utility
Add a third party ppm repository to install GD.pm
Install GD.pm
Use CPAN tools to install other prerequisites
Download and install BioPerl
Download and install the Generic Genome Browser
During the course of the install, you will prompted to answer a few
questions; accepting the defaults is almost always the right thing
to do.
When this is done, go to step (5) below.
[edit]
2. MACINTOSH OS X INSTALL
NOTE: The MacOSX installer sited below is quite out of date. Until it is
brought up to date, please use the SOURCE CODE INSTALL section below for
Macs. The thing that generally trips up installs on Macs is getting
libgd (a prerequisite for GD.pm) installed. I have usually had success
with fink getting it installed:
fink install gd2
will do the trick.
Go to the following URL:
ftp://dev.wormbase.org/pub/people/tharris/macosx/packages
Find the most recent version of the GBrowse package. These files have
the .dmg extension.
Once the package is downloaded, double click on it. The installer will
handle everything else.
[edit]
3. SOURCE CODE INSTALL
GBrowse를 가동시키기 위해서는 MySQL, Apache, Perl, Perl Module, BioPerl이 받드시 필요하며 추가적으로 필요한 소프트웨어도 있습니다.리눅스의 경우 필수요소인 MySQL, Apache, Perl, Perl Module이 이미 설치되어 있으므로 환경설정에 중점을 두면 됩니다.
[edit]
필수요구소프트웨어
A) MySQLThe MySQL database is a fast open source relational database that is
widely used for web applications. It is required for most real-live
genome annotation projects. For small projects (a few thousands of
annotated features), you can skip installing MySQL and use an
in-memory database instead.
B) Apache Web Server
The Apache web server is the industry standard open source web
server for Unix and Windows systems.
C) Perl 5.005
The Perl language is widely used for web applications. Version 5.6
is preferred, but 5.00503 or higher will work.
D) Standard Perl modules
The following Perl modules must be installed for GBrowse to work.
They can be found on the Comprehensive Perl Archive Network (CPAN):
CGI (2.56 or higher)
GD (2.07 or higher)
CGI::Session (4.03 or higher)
DBI (any version)
DBD::mysql (any version)
Digest::MD5 (any version)
Text::Shellwords (any version)
Class::Base (any version)
| 1. cpan 접속 2. get 모듈이름 3. perl Makefile.PL 4. make 5. make install |
E) BioPerl version 1.5.2 or higher
Get 'current_core_unstable'.
관련페이지 : http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
[edit]
Optional modules:
F) XML::Parser, XML::Writer, XML::Twig, XML::DOM
If these modules are present, the "Sequence Dumper" plugin will be
able to produce GAME and BSML output. They can be downloaded from
CPAN.
G) LWP
To load remote 3d party annotations. Available from CPAN.
H) Bio::Das
To display remote annotations using the Distributed Annotation
System. The current version is available at
http://www.biodas.org/download/Bio::Das/Bio-Das-0.92.tar.gz
I) MOBY
Needed by gbrowse_moby to fetch and display data from MOBY
providers. Available from biomoby.org; obtain via anonymous cvs
until it is released. Directions are at
http://www.biomoby.org/GettingTheCode.html.
J) GD::SVG
To save images as publication-quality editable images in Scalar
Vector Graphics format. Available from CPAN.
K) Bio::SCF File::Temp io-lib(v1.7+)
Needed by the trace glyph which can parse SCF files and display the
trace graph. The io-lib library can be downlowded from
https://sourceforge.net/project/showfiles.php?group_id=100316&packag
e_id=108243 which is part of the Staden Package
https://sourceforge.net/projects/staden/.
위의 필수항목을 설치하고 난 후, Generic-Genome-Browser 최신 버젼을 다운로드 받습니다.
2007.2.12일 현재 버젼 : Generic-Genome-Browser-1.66.tar.gz
| http://prdownloads.sourceforge.net/gmod tar -zxvf Generic-Genome-Browser-1.66.tar.gz # 압축해제 cd Generic-Genome-Browser-1.66 # 위치이동 |
압축을 풀고 다음의 과정을 실행합니다.
| perl Makefile.PL # 설치시 물어보는 경로를 정확히 기입해야 합니다. make make test (optional) make install UNINST=1 # root권한으로 실행 |
권한설정
<table cellspacing="5" cellpadding="5" width="100%" summary="" border="1"><tbody><tr><td bgcolor="#cccccc"># <strong>su<br /></strong>Password: <strong>*********</strong><br /># <strong>chown my_user_name /var/www/html/gbrowse/databases # my_user_name : 리눅스 사용자 계정</strong><br /># <strong>chown my_user_name /etc/httpd/conf/gbrowse.conf</strong><br /># <strong>exit</strong><br />#</td></tr></tbody></table>
This will install the software in the default location under
/usr/local/apache. See "Details" to change this, or to install gbrowse
into your home directory. The 'UNINST=1' will insure that older versions
of perl modules being installed will be removed to help prevent
conflicts.
To further configure GBrowse, see CONFIGURE_HOWTO. To run GBrowse on top
of Oracle and PostgreSQL databases see ORACLE_AND_POSTGRESQL. To run on
top of a BioSQL database, see BIOSQL_ADAPTER_HOWTO. To run GBrowse on
top of Gadfly, see README-berkeley-gadfly.
Details:
The browser consists of a CGI script named "gbrowse", a Perl module that
handles some of the gory details, a small number of static image files,
and a configuration directory that contains configuration files for each
data source. By default, these will be installed in the following
locations:
CGI script: /usr/local/apache/cgi-bin/gbrowse
Static images: /usr/local/apache/htdocs/gbrowse
Config files: /usr/local/apache/conf/gbrowse.conf
The module: -standard site-specific Perl library location-
You can change change the location of the installation by passing
Makefile.PL one or more NAME=VALUE pairs, like so:
perl Makefile.PL CONF=/etc HTDOCS=/home/html
This will cause the configuration files to be installed in
/etc/gbrowse.conf and the static files to be installed in
/home/html/gbrowse.
The following arguments are recognized:
CONF Configuration file directory
HTDOCS Static files directory
CGIBIN CGI script directory
APACHE Base directory for Apache's conf, htdocs and cgibin directories
LIB Perl site-specific modules directory
BIN Perl executable scripts directory
NONROOT If set to a non-zero value (e.g. NONROOT=1) then install
gbrowse in a way that does not require root access.
DO_XS Compile fast alignment algorithm (XS C extension)
For example, if you are on a RedHat system, where the default Apache
installation uses /var/www/html for HTML files, /var/www/cgi-bin for CGI
scripts, and /etc/httpd/conf for the configuration files, you should
specify the following configuration:
perl Makefile.PL HTDOCS=/var/www/html \
CONF=/etc/httpd/conf \
CGIBIN=/var/www/cgi-bin
(The backslashes are there to split the command across multiple lines
only). To make it easier when upgrading to new versions of the software,
you can put this command into a shell script.
As a convenience, you can use the configuration option APACHE, in which
case the static and CGI files will be placed into APACHE/conf,
APACHE/htdocs and APACHE/cgi-bin respectively, where APACHE is the
location you specified on the command line:
perl Makefile.PL APACHE=/home/www
Note that the configuration files are always placed in a subdirectory
named gbrowse.conf. You cannot change this. Similarly, the static files
are placed in a directory named gbrowse. The install script will detect
if there are already configuration files in the selected directory and
not overwrite them if so. The same applies to the cascading stylesheet
file (gbrowse.css) located in the gbrowse subdirectory. However, neither
the GIF files in the "buttons" subdirectory nor the plugin modules in
the gbrowse.conf/plugins directory are checked before overwriting them,
so be careful to copy the new copies somewhere safe if you have modified
them.
The DO_XS flag, if true (perl Makefile.PL DO_XS=1), will compile a small
C subroutine for nucleotide alignments. This will vastly improve the
performance of the gbrowse_details script when displaying alignments. To
use this feature, you will need a C compiler.
You can always manually move the files around after install. See
CONFIGURE_HOWTO for details.
When installing the static files, the install script also creates an
empty directory named "tmp". This directory is set to be world writable
so that the GBrowse server can use it to manage temporary image files
that it creates on the fly. If you would prefer not to have a world
writable directory on your system, simply change the ownership and
permissions to allow the web server account to write into it. The
directory is located in /usr/local/apache/htdocs/gbrowse/tmp by default.
The first time you run Makefile.PL, a file named GGB.def will be created
your file path settings. When Makefile.PL is run again, it will ask you
whether you wish to reuse the settings stored in the file.
gbrowse Quick-Guide
[edit]
4. INSTALLING INTO YOUR HOME DIRECTORY
Read this section only if you are on a Unix system and do not have root
privileges. You will need to configure Apache to run out of your home
directory. One way to do this is to install Apache from source code and
to specify your home directory when you first configure it:
% cd apache_x.xx.xx
% ./configure --prefix=$HOME/apache
% make
% make install
This will place Apache into your home directory under ~/apache. You
should then edit ~/apache/conf/httpd.conf and replace the directive:
Listen 80
with
Listen 8000
so that Apache will listen for connections to the unprivileged port 8000
rather than the usual port 80. If you also see a "Port 80" directive,
change it to read "Port 8000." You'll now be able to talk to Apache
using URLs like http://your.host.edu:8000/.
You may not need to install Apache from scratch if your Unix
distribution already has Apache installed. What you will do is to create
an Apache directory tree in your home directory and then start Apache
using command-line arguments that tell it to start up from the home
directory rather than its default system-wide directory.
Create an Apache directory and its subdirectories using the following
series of commands:
% cd ~
% mkdir apache
% mkdir apache/conf
% mkdir apache/logs
% mkdir apache/htdocs
% mkdir apache/cgi-bin
Now copy the system-wide httpd.conf into ~/apache/conf. You may need to
search around a bit to find out where the system-wide httpd.conf lives
(try running the command "locate httpd.conf):
% cp /etc/httpd/conf/httpd.conf ~/apache/conf
Now open up ~/apache/conf/httpd.conf with a text editor and add the
following four directives, replacing $HOME with the full path to your
home directory (for example "/home/fred"):
Listen 8000
ServerRoot $HOME/apache
DocumentRoot $HOME/apache/htdocs
SetEnv PERL5LIB $HOME/lib
You should search the httpd.conf file for older versions of these
directives, and delete them if they're there. If you see a Port
directive, change it to read "Port 8000".
Somewhere in httpd.conf there will be a ScriptAlias directives, as well
as a section that refers to "cgi-bin". Delete the
ScriptAlias directive and the entire section through to the
line. Replace both these sections with the following:
ScriptAlias /cgi-bin/ "cgi-bin/"
AllowOverride None
Options None
Order allow,deny
Allow from all
You can now start Apache from the command line using the "apachectl"
script:
% /usr/sbin/apachectl -d ~/apache -k start
If Apache starts successfully, then this command will return silently.
Otherwise, it will print an error message. More error messages may be
found in ~/apache/logs/error_log.
To confirm that Apache is running from your home directory, create a
file named index.html and copy it into ~/apache/htdocs. You should then
be able to open a browser, connect to http://localhost:8000/, and see
the index.html file that you just created.
Now you can build and install gbrowse with the following incantation:
% cd Generic-Genome-Browser-X.XX
% perl Makefile.PL APACHE=~/apache LIB=~/lib BIN=~/bin NONROOT=1
% make
% make install
When you are prompted to load gbrowse using http://localhost/gbrowse,
use http://localhost:8000/gbrowse instead.
5. TRY THE BROWSER OUT
The installation procedure will create a small in-memory database of
yeast chromosome 1 for you to play with. To try the browser out, use
your favorite browser to open:
http://localhost/cgi-bin/gbrowse
Try searching for "I" (the name of the first chromosome of yeast), or a
gene such as NUT21 or TCF3. Then try searching for "membrane
trafficking."
For your interest, the feature and DNA files for this database is
located in the web server's document root at
gbrowse/databases/yeast_chr1. The configuration file is in the web
server's configuration directory under gbrowse.conf/yeast1.conf.
More configuration information and a short tutorial are located at:
http://localhost/gbrowse
[edit]
6. POPULATING THE DATABASE (MySQL)
This step takes you through populating the database with the full yeast
genome. You can skip this step if you use the in-memory database for
small projects (see section 6).
Synopsis:
mysql -uroot -p password -e 'create database yeast'
mysql -uroot -p password -e 'grant all privileges on yeast.* to me@localhost'
mysql -uroot -p password -e 'grant file on *.* to me@localhost'
mysql -uroot -p password -e 'grant select on yeast.* to nobody@localhost'
bp_bulk_load_gff.pl -d yeast sample_data/yeast_data.gff
Details:
Note for RedHat Linux users: note that if you are using the default
installed Apache, the user that apache runs as is 'apache' as opposed to
the otherwise standard 'nobody'. Therefore, everywhere 'nobody' occurs
in these directions, replace it with 'apache'.
In Bioperl versions 1.3 or later (not released as of August 2003), this
script is named bp_bulk_load_gff.pl.
You will need a MySQL database in order to start using GBrowse. Using
the mysql command line, create a database (called "yeast" in the
synopsis above), and ensure that you have update and file privileges on
it. The example above assumes that you have a username of "me" and that
you will allow updates from the local machine only. It also gives all
privileges to "me". You may be comfortable with a more restricted set of
privileges, but be sure to provide at least SELECT, UPDATE and INSERT
privileges. You will need to provide the administrator's name and
correct password for these commands to succeed.
In addition, grant the "nobody" user the SELECT privilege. The web
server usually runs as nobody, and must be able to make queries on the
database. Modify this as needed if the web server runs under a different
account.
The next step is to load the database with data. This is accomplished by
loading the database from a tab-delimited file containing the genomic
annotations in GFF format. The Bioperl distribution comes with three
tools for loading Bio::DB::GFF databases:
1 bp_load_gff.pl
This will incrementally load a database, optionally initializing it
if it does not already exist. This script will work correctly even
if the MySQL server is located on another host.
2 bp_bulk_load_gff.pl
This Perl script will initialize a new Bio::DB::GFF database with a
fresh schema, deleting anything that was there before. It will then
load the file. Only suitable for use the very first time you create
a database, or when you want to start from scratch! The bulk loader
is as much as 10x faster than bp_load_gff.pl, but does not work in
the situation in which the MySQL database is running on a remote
host.
3 bp_fast_load_gff.pl
This will incrementally load a database. On UNIX systems, it will
activate a fast loader that makes the speed almost the same as the
bulk loader. Be careful, though, because this is an experimental
piece of software.
You will find these scripts in the Bioperl distribution, in the
subdirectory scripts/Bio-DB-GFF. Earlier versions of the distribution
will have these files directly in the scripts/ subdirectory.
For testing purposes, this distribution includes a GFF file with yeast
genome annotations. The file can be found in the test_data subdirectory.
If the load is successful, you should see a message indicating that
13298 features were successfully loaded.
Provided that the yeast load was successful, you may now run "make
test". This invokes a small test script that tests that the database is
accessible by the "nobody" user and that the basic feature retrieval
functions are working.
You may also wish to load the yeast DNA, so that you can test the
three-frame translation and GC content features of the browser. Because
of its size, the file containing the complete yeast genome is
distributed separately and can be downloaded from:
http://prdownloads.sourceforge.net/gmod/yeast.fasta.gz?download
Load the file with this command:
bp_load_gff.pl -d yeast -fasta yeast.fasta.gz ). By configuring an Apache::Registry
directory and placing gbrowse inside it (rather than in the default
cgi-bin directory). The overhead for loading Perl and its libraries are
eliminated, thereby increasing the performance of the script noticeably.
Be aware that there is a bad interaction between the Apache::DBI module
(often used to speed up database accesses) and Bio::DB::GFF. This will
cause the GFF dumper plugin to fail intermittently. GBrowse does not
need Apache::DBI to achieve performance increases under mod_perl and it
is suggested that you disable Apache::DBI. If you cannot do this, then
you should remove the file GFFDumper.pm from the gbrowse.conf/plugins
directory.
Database query performance (2) is also a major factor. If you are using
MySQL as the backend, you will see dramatic performance increases by
increasing the amount of memory available to the key buffer, sort
buffer, table cache and other in-memory data structures. I suggest that
you replace the default MySQL configuration file (usually stored in
/etc/my.cnf) with one of the large-memory sample configuration files
provided in the support-files subdirectory of the MySQL distribution. Of
course, if you tell MySQL to use more memory than you have, then
performance will degrade again.
Finally, there is a slowdown when gbrowse converts the results of
database SQL queries into renderable biological objects. This becomes
particularly noticeable when there are lots of multi-segment objects to
be displayed. You can work around this slowdown by using semantic
zooming (see CONFIGURE_HOWTO). Otherwise, there's not much that can be
done about this short of buying a faster machine. The GMOD team is
working hard to reduce this performance hit.
11. MAKING THE SERVER RUN SAFER
Whenever you are running a server-side Web script using information
provided by a web client, there is a risk that maliciously-formatted
data provided by the use will trick the server-side script into
performing some unintentional action, such as modifying a file on the
server. Perl's "taint" checks are designed to catch places in the code
where such malicious data could cause harm, and GBrowse has been tested
extensively with these taint checks activated.
Because of taint checks' noticeable impact on performance, they have
been turned off in the distributed version of gbrowse. If you wish to
reactivate the extra checking (at the expense of a performance hit), go
to the file "gbrowse" located in the Web scripts directory and edit the
top line of the file to read:
#!/usr/bin/perl -w -T
The -T switch turns on taint checks.
If you are running GBrowse under mod_perl, add the following line to the
httpd.conf configuration file:
PerlTaintCheck On
This will affect all mod_perl scripts globally.
12. BIOPERL VERSIONS
GBrowse is evolving quickly, and some of its features are dependent on
new features in Bioperl 1.4.0. If you are having trouble making GBrowse
run, make sure you are using Bioperl 1.4.0!
13. THE GBROWSE_IMG SCRIPT
The gbrowse_img CGI script (a new feature as of version 1.41), is a
stripped-down version of gbrowse which just generates images. It is
suitable for incorporating into tags in order to make a thumbnail
of a region of interest. The thumbnail can then be linked to the
full-featured gbrowse. Here is an example of how this works using the
WormBase site:
This will generate a 200-pixel inline image of the region. Clicking on
the image will link to the fully-navigable gbrowse script.
You can also use gbrowse_img to superimpose temporary features (like
BLAST hits) on the existing genome features.
Read docs/gbrowse_img.txt DOES NOT EXIST for the CGI parameters and
other instructions. A copy of these instructions in HTML form will be
generated when gbrowse_img is called without any arguments. Type
http://your.host/cgi-bin/gbrowse_img into your favorite web browser.
14. PLUGINS
Gbrowse has a plugin architecture which makes it easy for third-party
developers to expand its functionality. The plugins are Perl .pm files
located in the directory gbrowse.conf/plugins/. To install plugins,
simply copy them into this directory. To uninstall, remove them.
If you wish to install your own or third party plugins, it is suggested
that you create a separate directory outside the gbrowse.conf/ hierarchy
in which to store them and then to indicate the location of these
plugins using the plugin_path setting:
plugin_path = /usr/local/gbrowse_plugins
This setting should be somewhere in the [GENERAL] section of the
relevant gbrowse configuration file.
15. THE GENBANK/EMBL PROXY
Sample configuration number 5 ("05.embl.conf") corresponds to an
experimental pass-through proxy for Genbank. At least in theory, if you
enter a landmark that isn't recognized, gbrowse will go to EMBL using
the bioperl BioFetch facility, parse the record, and enter it into the
local database. This allows you to browse arbitrary Genbank/EMBL/Refseq
entries.
You are free to experiment with this, but don't expect it to be entirely
reliable. To get it to work, you must:
1 Make sure you are using Bioperl 1.02 (or a patched version of 1.01)
2 Create a local database named "embl" and initialize it this way:
3 Set up permissions for this database so that "nobody@localhost" has
SELECT, INSERT, UPDATE and DELETE privileges
4 Initialize the database for use with this command:
% bp_load_gff.pl -c -d embl
Have fun!
Lincoln Stein & the GMOD team lstein@cshl.org
