rubyでサイトデータをMySqlに保存する方法を勉強しています。
書籍「Rubyによるクローラー開発技法 巡回・解析機能の実装と21の運用例」
書籍ではYahoo!を例に話をされており、Yahoo!はできたのですが、
他サイトで試したところ、同じようなエラーが発生してデータベースへの保存ができません。。
恐らく取得したサイト情報をMysqlに保存する際にエラがー起きているようですが、
全く解決方法がわかりません。。。
「anemone-mysql.rb」のYahoo!のURLはできて、AmazonやGoogleで試しましたが、
解決いたしましせんでした。
<エラー>
/Users/hiroyuki/.rbenv/versions/2.1.3/lib/ruby/gems/2.1.0/gems/anemone-0.7.2/lib/anemone/storage/base.rb:28:in rescue in []=': You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'text/javascript'>var ue_t0=ue_t0||+new Date();</script><!-- sp:feature:cs-optimi' at line 1 (Anemone::Storage::InsertionError) from /Users/hiroyuki/.rbenv/versions/2.1.3/lib/ruby/gems/2.1.0/gems/anemone-0.7.2/lib/anemone/storage/base.rb:26:in
[]='
from /Users/hiroyuki/.rbenv/versions/2.1.3/lib/ruby/gems/2.1.0/gems/anemone-0.7.2/lib/anemone/page_store.rb:20:in []=' from /Users/hiroyuki/.rbenv/versions/2.1.3/lib/ruby/gems/2.1.0/gems/anemone-0.7.2/lib/anemone/core.rb:176:in
block in run'
from /Users/hiroyuki/.rbenv/versions/2.1.3/lib/ruby/gems/2.1.0/gems/anemone-0.7.2/lib/anemone/core.rb:163:in loop' from /Users/hiroyuki/.rbenv/versions/2.1.3/lib/ruby/gems/2.1.0/gems/anemone-0.7.2/lib/anemone/core.rb:163:in
run'
from /Users/hiroyuki/.rbenv/versions/2.1.3/lib/ruby/gems/2.1.0/gems/anemone-0.7.2/lib/anemone/core.rb:92:in block in crawl' from /Users/hiroyuki/.rbenv/versions/2.1.3/lib/ruby/gems/2.1.0/gems/anemone-0.7.2/lib/anemone/core.rb:83:in
initialize'
from /Users/hiroyuki/.rbenv/versions/2.1.3/lib/ruby/gems/2.1.0/gems/anemone-0.7.2/lib/anemone/core.rb:90:in new' from /Users/hiroyuki/.rbenv/versions/2.1.3/lib/ruby/gems/2.1.0/gems/anemone-0.7.2/lib/anemone/core.rb:90:in
crawl'
from /Users/hiroyuki/.rbenv/versions/2.1.3/lib/ruby/gems/2.1.0/gems/anemone-0.7.2/lib/anemone/core.rb:18:in crawl' from anemone-mysql.rb:14:in
<main>'
anemone-mysql.rb
# -*- coding: utf-8 -*- require 'anemone' require 'nokogiri' require 'kconv' urls = [] urls.push("https://www.amazon.co.jp/") opts = { :storage => Anemone::Storage::MySQL(), :depth_limit => 0 } Anemone.crawl(urls, opts) do |anemone| anemone.on_every_page do |page| # # 文字コードをUTF8に変換したうえで、Nokogiriでパース # doc = Nokogiri::HTML.parse(page.body.toutf8) puts page.url puts page.doc.xpath("//title/text()").to_s if page.doc end end
mysql.rb
# coding: utf-8 begin require 'mysql2' rescue LoadError puts "You need the mysql2 gem to use Anemone::Storage::MySQL" exit end module Anemone module Storage class MySQL #初期化 def initialize(opts = {}) host = opts[:host] || 'localhost' username = opts[:username] || 'crawler' password = opts[:password] || 'anemone_pass' database = opts[:database] || 'anemone' @db = Mysql2::Client.new(:host => host, :username => username, :password => password, :database => database) create_schema end #データの抽出 def [](url) value = @db.query("SELECT data FROM anemone_storage WHERE page_key = '#{get_hash_value(url)}'").first['data'] if value Marshal.load(value) end end #データの更新・登録 def []=(url, value) key = get_hash_value(url) data = Marshal.dump(value) if has_key?(url) @db.query("UPDATE anemone_storage SET page_data = '#{data}' WHERE page_key = '#{key}'") else @db.query("INSERT INTO anemone_storage (page_key, page_data) VALUES('#{key}', '#{data}')") end end #データの削除 def delete(url) page = self[url] @db.query("DELETE FROM anemone_storage WHERE page_key = '#{get_hash_value(url)}' ") page end #全てのデータを抽出 def each @db.execute("SELECT page_key, page_data FROM anemone_storage ORDER BY id") do |row| value = Marshal.load(row[1]) yield row[0], value end end #マージ def merge!(hash) hash.each { |key, value| self[key] = value } self end #登録データ数の表示 def size @db.query("SELECT COUNT(*) FROM anemone_storage") end #キー一覧の取得 def keys @db.query("SELECT page_key FROM anemone_storage ORDER BY id").map{|t| t[0]} end #キーの存在確認 def has_key?(url) key = get_hash_value(url) result = @db.query("SELECT count(id) FROM anemone_storage WHERE page_key = '#{key}'") if result.first['count(id)'] > 0 return true else return false end end #クローズ def close @db.close end def create_schema @db.query <<SQL create table if not exists anemone_storage ( id INT(11) NOT NULL auto_increment, page_key varchar(255), page_data BLOB, PRIMARY KEY (id), key (page_key) ) DEFAULT CHARSET=utf8; SQL end def load_page(hash) BINARY_FIELDS.each do |field| hash[field] = hash[field].to_s end Page.from_hash(hash) end def get_hash_value(key) Digest::SHA1.hexdigest(key) end end end end
バッドをするには、ログインかつ
こちらの条件を満たす必要があります。
2017/03/26 14:24