Showing posts with label BeautifuSoup. Show all posts
Showing posts with label BeautifuSoup. Show all posts

Saturday, 30 June 2012

Python BeautifulSoup scraper script

Beautiful Soup written in Python.Which is  HTML / XML parser, it can handle non-standard tags and generate inside parse tree. Also provides a simple and commonly used in navigation , search and modify the operation.
Recently i tried BeautifuSoup module inorder to parse html class libraries, Here is the simple scrapy code for extracting informaton from Apple website.

 #! /usr/bin/python  
 print 'Content-type: text/plain\r\n'  
 from BeautifulSoup import BeautifulSoup   
 import urllib   
 webpage = urllib.urlopen(r"http://store.apple.com/us/browse/home/shop_iphone/family/iphone/iphone4s");   
 soup = BeautifulSoup(webpage.read())   
 tags = soup('ul',{'class':'selection-options all-models'})   
 tags = tags[0](lambda tag : len(tag.attrs) == 1 and tag.name in ['span'] and   
            tag['class'] in ['shipping','price','color','title'])   
 for tag in tags :   
   print tag.text   
   print '-' * 30   


Results:
16GB2
------------------------------
black
------------------------------
From$199
------------------------------
In Stock
------------------------------
16GB2
------------------------------
black
------------------------------
From$199
------------------------------
In Stock