我有一个非常简单的代码,如下所示。抓取没问题,我可以看到所有生成正确数据的print语句。在Pipeline中,初始化工作正常。但是,process_item函数没有被调用,因为函数开头的print语句
我有一个非常简单的代码,如下所示。抓取没问题,我可以看到所有生成正确数据的print语句。在Pipeline中,初始化工作正常。但是,process_item函数没有被调用,因为函数开头的print语句从未执行过。在
蜘蛛:comosham.py在
import scrapyfrom scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request
from activityadvisor.items import ComoShamLocation
from activityadvisor.items import ComoShamActivity
from activityadvisor.items import ComoShamRates
import re
class ComoSham(Spider):
name = "comosham"
allowed_domains = ["www.comoshambhala.com"]
start_urls = [
"http://www.comoshambhala.com/singapore/classes/schedules",
"http://www.comoshambhala.com/singapore/about/location-contact",
"http://www.comoshambhala.com/singapore/rates-and-offers/rates-classes",
"http://www.comoshambhala.com/singapore/rates-and-offers/rates-classes/rates-private-classes"
]
def parse(self, response):
category = (response.url)[39:44]
print 'in parse'
if category == 'class':
pass
"""self.gen_req_class(response)"""
elif category == 'about':
print 'about to call parse_location'
self.parse_location(response)
elif category == 'rates':
pass
"""self.parse_rates(response)"""
else:
print 'Cant find appropriate category! check check check!! Am raising Level 5 ALARM - You are a MORON :D'
def parse_location(self, response):
print 'in parse_location'
item = ComoShamLocation()
item['category'] = 'location'
loc = Selector(response).xpath('((//div[@id = "node-2266"]/div/div/div)[1]/div/div/p//text())').extract()
item['address'] = loc[2]+loc[3]+loc[4]+(loc[5])[1:11]
item['pin'] = (loc[5])[11:18]
item['phone'] = (loc[9])[6:20]
item['fax'] = (loc[10])[6:20]
item['email'] = loc[12]
print item['address'],item['pin'],item['phone'],item['fax'],item['email']
return item
项目文件:
^{pr2}$
管道文件:
def __init__(self):
self.locationdump = csv.writer(open('./scraped data/ComoSham/ComoshamLocation.csv','wb'))
self.locationdump.writerow(['Address','Pin','Phone','Fax','Email'])
def process_item(self,item,spider):
print 'processing item now'
if item['category'] == 'location':
print item['address'],item['pin'],item['phone'],item['fax'],item['email']
self.locationdump.writerow([item['address'],item['pin'],item['phone'],item['fax'],item['email']])
else:
pass
最终发现主要是以下两个原因
1将以下行添加到py设置! ITEM_PIPELINES = {‘[YOUR_PROJECT_NAME].pipelines.[YOUR_PIPELINE_CLASS]’: 300} 2当你的蜘蛛跑的时候交出物品! yield my_item
我的是第二个原因,设置完后立刻就好了
下载scrapy开发手册