Bioinformatics and Interaction Design: June 2012

Beautiful Soup written in Python.Which is HTML / XML parser, it can handle non-standard tags and generate inside parse tree. Also provides a simple and commonly used in navigation , search and modify the operation.
Recently i tried BeautifuSoup module inorder to parse html class libraries, Here is the simple scrapy code for extracting informaton from Apple website.

 #! /usr/bin/python  
 print 'Content-type: text/plain\r\n'  
 from BeautifulSoup import BeautifulSoup   
 import urllib   
 webpage = urllib.urlopen(r"http://store.apple.com/us/browse/home/shop_iphone/family/iphone/iphone4s");   
 soup = BeautifulSoup(webpage.read())   
 tags = soup('ul',{'class':'selection-options all-models'})   
 tags = tags[0](lambda tag : len(tag.attrs) == 1 and tag.name in ['span'] and   
            tag['class'] in ['shipping','price','color','title'])   
 for tag in tags :   
   print tag.text   
   print '-' * 30

Results:

16GB2
------------------------------
black
------------------------------
From$199
------------------------------
In Stock
------------------------------
16GB2
------------------------------
black
------------------------------
From$199
------------------------------
In Stock

1.) Create user called galaxy with password galaxy

 admin@myserver:~# adduser galaxy  
 Adding user `galaxy' ...  
 Adding new group `galaxy' (1007) ...  
 Adding new user `galaxy' (1008) with group `galaxy' ...  
 Creating home directory `/home/galaxy' ...  
 Copying files from `/etc/skel' ...  
 Enter new UNIX password:  
 Retype new UNIX password:  
 passwd: password updated successfully  
 Changing the user information for galaxy  
 Enter the new value, or press ENTER for the default  
     Full Name []:  
     Room Number []:  
     Work Phone []:  
     Home Phone []:  
     Other []:  
 Is the information correct? [Y/n] Y  
 admin@myserver:~#

2.) Change user to galaxy and clone galaxy production version

 admin@myserver: su galaxy  
 galaxy@myserver:/home/admin$ cd   
 galaxy@myserver: hg clone https://bitbucket.org/galaxy/galaxy-dist

3.) If you don't have mercurial client for clone galaxy use following step

 admin@myserver:sudo apt-get install mercurial

4.) Set the $TEMP environment variable to Galaxy's new_files_path directory

 galaxy@myserver:~$ export TEMP=/home/galaxy/galaxy-dist/database/tmp

5.) We need clean python interpreter with correct python path

 galaxy@myserver:wget http://bitbucket.org/ianb/virtualenv/raw/tip/virtualenv.py  
 galaxy@myserver:/usr/bin/python2.6 virtualenv.py --no-site-packages galaxy_env

6.) Now we need to setup new database for galaxy.I am going to create PostgreSQL database

 galaxy@myserver:~/galaxy-dist$ psql -h localhost -d postgres -U postgres  
 postgres=#CREATE DATABASE galaxy_prod;  
 postgres=# CREATE USER galaxy_prod_user WITH PASSWORD 'galaxy';  
 postgres=# GRANT ALL PRIVILEGES ON DATABASE galaxy_prod to galaxy_prod_user;  
 postgres=# \q

7.) Then we need to configure Galaxy default server settings to our server details

 galaxy@myserver:cd galaxy-dist/  
 galaxy@myserver:~/galaxy-dist$ chmod -R 777 universe_wsgi.ini  
 galaxy@myserver:~/galaxy-dist$ vi universe_wsgi.ini

8.) Here is the basic changes for universe_wsgi.ini file.

 host = xxx.xxx.23.123 [IP ADDRESS]  
 debug = False  
 use_interactive = False  
 database_connection = postgres://galaxy_prod_user:galaxy@localhost:5432/galaxy_prod

9.) There are many more changes we can do for galaxy by customizing niverse_wsgi.ini for instance adding tracks,user privileges, ftp upload e.t.c.Galaxy has its own server but there are pages with static contents therefore we can setup proxy to enhance efficiency

 admin@myserver:vi /etc/httpd/conf/httpd.conf

10.) Add following lines to httpd.conf

 <VirtualHost *:80>  
 ServerName xxx.xxx.23.123 [IP ADDRESS]  
 RewriteEngine on  
 #RewriteLog "/etc/httpd/logs/rewrite_log"  
 #RewriteLogLevel 9  
 RewriteRule ^/galaxy$ /galaxy/ [R]  
 #RewriteRule ^/galaxy/static/style/(.*) /home/galaxy/galaxy-dist/static/june_2007_style/blue/$1 [L]  
 #RewriteRule ^/galaxy/static/scripts/(.*) /home/galaxy/galaxy-dist/static/scripts/packed/$1 [L]  
 #RewriteRule ^/galaxy/static/(.*) /home/galaxy/galaxy-dist/static/$1 [L]  
 #RewriteRule ^/galaxy/favicon.ico /home/galaxy/galaxy-dist/static/favicon.ico [L]  
 #RewriteRule ^/galaxy/robots.txt /home/galaxy/galaxy-dist/static/robots.txt [L]  
 RewriteRule ^/galaxy(.*) http://localhost:8080$1 [P]  
 </VirtualHost>

11.) Now we need to restart proxy server.

 admin@myserver:/etc/init.d/httpd restart

12.) Finally we can run galaxy

 galaxy@myserver:~/galaxy-dist$ sh ./run.sh --daemon

13.) We can stop or see the status by using following commands

 galaxy@myserver:~/galaxy-dist$ sh ./run.sh --stop-daemon  
 galaxy@myserver:~/galaxy-dist$ sh ./run.sh --status

14.) Done!

Bioinformatics and Interaction Design

Saturday, 30 June 2012

Python BeautifulSoup scraper script

Tuesday, 26 June 2012

Setup Galaxy production server on Ubuntu and Apache environment

Saturday, 23 June 2012

Ubuntu folder/file permission