Recently i tried BeautifuSoup module inorder to parse html class libraries, Here is the simple scrapy code for extracting informaton from Apple website.
#! /usr/bin/python
print 'Content-type: text/plain\r\n'
from BeautifulSoup import BeautifulSoup
import urllib
webpage = urllib.urlopen(r"http://store.apple.com/us/browse/home/shop_iphone/family/iphone/iphone4s");
soup = BeautifulSoup(webpage.read())
tags = soup('ul',{'class':'selection-options all-models'})
tags = tags[0](lambda tag : len(tag.attrs) == 1 and tag.name in ['span'] and
tag['class'] in ['shipping','price','color','title'])
for tag in tags :
print tag.text
print '-' * 30
Results:
16GB2
------------------------------
black
------------------------------
From$199
------------------------------
In Stock
------------------------------
16GB2
------------------------------
black
------------------------------
From$199
------------------------------
In Stock
No comments:
Post a Comment