Saturday, 30 June 2012

Python BeautifulSoup scraper script

Beautiful Soup written in Python.Which is  HTML / XML parser, it can handle non-standard tags and generate inside parse tree. Also provides a simple and commonly used in navigation , search and modify the operation.
Recently i tried BeautifuSoup module inorder to parse html class libraries, Here is the simple scrapy code for extracting informaton from Apple website.

 #! /usr/bin/python  
 print 'Content-type: text/plain\r\n'  
 from BeautifulSoup import BeautifulSoup   
 import urllib   
 webpage = urllib.urlopen(r"http://store.apple.com/us/browse/home/shop_iphone/family/iphone/iphone4s");   
 soup = BeautifulSoup(webpage.read())   
 tags = soup('ul',{'class':'selection-options all-models'})   
 tags = tags[0](lambda tag : len(tag.attrs) == 1 and tag.name in ['span'] and   
            tag['class'] in ['shipping','price','color','title'])   
 for tag in tags :   
   print tag.text   
   print '-' * 30   


Results:
16GB2
------------------------------
black
------------------------------
From$199
------------------------------
In Stock
------------------------------
16GB2
------------------------------
black
------------------------------
From$199
------------------------------
In Stock

No comments:

Post a Comment