ScrapyでMySQLデータベースにアイテムを書き込む

Question

私はScrapyの初心者で、スパイダーコードを持っていました

class Example_spider(BaseSpider): name = "example" allowed_domains = ["www.example.com"] def start_requests(self): yield self.make_requests_from_url("http://www.example.com/bookstore/new") def parse(self, response): hxs = HtmlXPathSelector(response) urls = hxs.select('//div[@class="bookListingBookTitle"]/a/@href').extract() for i in urls: yield Request(urljoin("http://www.example.com/", i[1:]), callback=self.parse_url) def parse_url(self, response): hxs = HtmlXPathSelector(response) main = hxs.select('//div[@id="bookshelf-bg"]') items = [] for i in main: item = Exampleitem() item['book_name'] = i.select('div[@class="slickwrap full"]/div[@id="bookstore_detail"]/div[@class="book_listing clearfix"]/div[@class="bookstore_right"]/div[@class="title_and_byline"]/p[@class="book_title"]/text()')[0].extract() item['price'] = i.select('div[@id="book-sidebar-modules"]/div[@class="add_to_cart_wrapper slickshadow"]/div[@class="panes"]/div[@class="pane clearfix"]/div[@class="inner"]/div[@class="add_to_cart 0"]/form/div[@class="line-item"]/div[@class="line-item-price"]/text()').extract() items.append(item) return items

そしてパイプラインコードは：

class examplePipeline(object): def __init__(self): self.dbpool = adbapi.ConnectionPool('MySQLdb', db='blurb', user='root', passwd='redhat', cursorclass=MySQLdb.cursors.DictCursor, charset='utf8', use_unicode=True ) def process_item(self, spider, item): # run db query in thread pool assert isinstance(item, Exampleitem) query = self.dbpool.runInteraction(self._conditional_insert, item) query.addErrback(self.handle_error) return item def _conditional_insert(self, tx, item): print "db connected-=========>" # create record if doesn't exist. tx.execute("select * from example_book_store where book_name = %s", (item['book_name']) ) result = tx.fetchone() if result: log.msg("Item already stored in db: %s" % item, level=log.DEBUG) else: tx.execute("""INSERT INTO example_book_store (book_name,price) VALUES (%s,%s)""", (item['book_name'],item['price']) ) log.msg("Item stored in db: %s" % item, level=log.DEBUG) def handle_error(self, e): log.err(e)

これを実行した後、次のエラーが発生します

exceptions.NameError: global name 'Exampleitem' is not defined

process_itemメソッドに以下のコードを追加すると、上記のエラーが発生しました

assert isinstance(item, Exampleitem)

そしてこの行を追加せずに私は得ています

**exceptions.TypeError: 'Example_spider' object is not subscriptable

誰かがこのコードを実行して、すべてのアイテムがデータベースに保存されていることを確認できますか？

Mahmoud M. Abdel-Fattah · Accepted Answer

パイプラインで次のコードを試してください

import sys import MySQLdb import hashlib from scrapy.exceptions import DropItem from scrapy.http import Request class MySQLStorePipeline(object): def __init__(self): self.conn = MySQLdb.connect('Host', 'user', 'passwd', 'dbname', charset="utf8", use_unicode=True) self.cursor = self.conn.cursor() def process_item(self, item, spider): try: self.cursor.execute("""INSERT INTO example_book_store (book_name, price) VALUES (%s, %s)""", (item['book_name'].encode('utf-8'), item['price'].encode('utf-8'))) self.conn.commit() except MySQLdb.Error, e: print "Error %d: %s" % (e.args[0], e.args[1]) return item

FavorMylikes · Answer

私はこの方法がより良く、より簡潔であると思います：

#Item class pictureItem(scrapy.Item): topic_id=scrapy.Field() url=scrapy.Field() #SQL self.save_picture="insert into picture(`url`,`id`) values(%(url)s,%(id)s);" #usage cur.execute(self.save_picture,dict(item))

まるで

cur.execute("insert into picture(`url`,`id`) values(%(url)s,%(id)s)" % {"url":someurl,"id":1})

原因（ Items の詳細については、Scrapyを参照してください）

Fieldクラスは、組み込みのdictクラスの単なるエイリアスであり、追加の機能や属性を提供していません。言い換えれば、Fieldオブジェクトは昔ながらのPython dictsです。

Sjaak Trekhaak · Answer

Process_itemメソッドは、def process_item(self, item, spider):ではなくdef process_item(self, spider, item):として宣言する必要があります->引数を切り替えました。

この例外：exceptions.NameError: global name 'Exampleitem' is not definedは、パイプラインにExampleitemをインポートしなかったことを示します。追加してみてください：from myspiders.myitems import Exampleitem （コースの正しい名前/パスを使用）