こんにちは！ webスクレイピングをpythonでやってみようとしていますが、from bs3 import BeautifulSoup3がImportError: No module named bs3とエラーが出て困っています。
どうすればエラーが解決するでしょうか。よろしくお願いいたしますm(_)m

解決するべく行ったこと (しかしうまく行かない)

最初はfrom bs4 import BeautifulSoupでエラーが出たので、$ python2.7 -m pip listと打つとBeautifulSoup (3.2.1)と出たので「ではbsとBeautifulSoupに3をつけてみよう」と思いついたのですが、それでもうまく行きません。

開発環境

cloud9
Python2.7
beautifulsoup3.2.1

`$ python2.7 -m pip list`の結果全文

argparse (1.2.1)
BeautifulSoup (3.2.1)
bzr (2.7.0dev1)
BzrTools (2.6.0)
chardet (2.0.1)
colorama (0.2.5)
configobj (4.7.2)
decorator (3.4.0)
duplicity (0.6.23)
html5lib (0.999)
httplib2 (0.8)
ipython (1.2.1)
keyring (3.5)
launchpadlib (1.10.2)
lazr.restfulclient (0.13.3)
lazr.uri (1.0.3)
lockfile (0.8)
lpthw.web (1.1)
matplotlib (1.3.1)
mercurial (2.8.2)
numpy (1.8.2)
oauth (1.0.1)
paramiko (1.10.1)
pexpect (3.1)
Pillow (2.3.0)
pip (1.5.4)
pycrypto (2.6.1)
pygobject (3.12.0)
pygpgme (0.3)
pyparsing (2.0.1)
python-apt (0.9.3.5ubuntu2)
python-dateutil (1.5)
pytz (2012c)
requests (2.2.1)
scipy (0.13.3)
SecretStorage (2.0.0)
setuptools (3.3)
simplegeneric (0.8.1)
simplejson (3.3.1)
six (1.5.2)
stevedore (0.14.1)
urllib3 (1.7.1)
virtualenv (1.11.4)
virtualenv-clone (0.2.4)
virtualenvwrapper (4.1.1)
wadllib (1.3.2)
wheel (0.24.0)
wsgiref (0.1.2)
zope.interface (4.0.5)

エラーが出ているコード全文

これは「独学プログラマー」(Cory Althoff)の「Googleニュースをスクレイピングする」という章のコードをPython2.7、cloud9という開発環境で再現しようとしています。

# -*-coding:utf-8-*-
# ファイル名: python_ex289_ex293.py
# 手本のコード: https://github.com/calthoff/self_taught/blob/master/python_ex293.py/
# 手本のコードはpython3, pip3, beautifulsoup4を使っている
import urllib3
from bs3 import BeautifulSoup3
class Scraper:
    def __init__(self, site): # __init__メソッドはスクレイピング対象のURLを受け取る。
        self.site = site
        
    def scrape(self):
        r = urllib3.urlopen(self.site)
        html = r.read()
        parser = "html.parser"
        sp = BeautifulSoup3(html, parser)
        
        for tag in sp.find_all("a"):
            url = tag.get("href")
            if url is None:
                continue
            if "html" in url:
                print("\n" + url)
                
news = "https://news.google.com"
Scraper(news).scrape()

案1を採用

umyuさんの回答から案1を採用し、from bs3 import BeautifulSoup3をfrom BeautifulSoup import BeautifulSoupに変更しました。おかげさまでImportError: No module named bs3 のエラーはなくなりました(≧∀≦)

その他いろいろエラーがあったので、解決できる部分は解決 (多分?) したのですが、エラー「TypeError: 'unicode' object is not callable」が解決できませんorz

只今のコード

python:web.py
1# -*-coding:utf-8-*-
2# https://github.com/calthoff/self_taught/blob/master/python_ex289.py/
3from BeautifulSoup import BeautifulSoup
4import requests
5import urllib3
6from HTMLParser import HTMLParser
7class Scraper:
8    def __init__(self, site): # __init__メソッドはスクレイピング対象のURLを受け取る。
9        self.site = site
10        
11    def scrape(self):
12        r = requests.get(self.site)
13        html = r.text()
14        parser = HTMLParser
15        sp = BeautifulSoup(html, parser)
16        
17        for tag in sp.find_all("a"):
18            url = tag.get("href")
19            if url is None:
20                continue
21            if "html" in url:
22                print("\n" + url)
23                
24news = "https://news.google.com"
25Scraper(news).scrape()

##エラーからの対応策概要

urlopenが使えないようなのでrequests.getに変更
parser = "html.parser"はPython2では使えないようなのでparser = HTMLParserに変更
AttributeError: 'Response' object has no attribute 'read'と出たのでhtml = r.read()をhtml = r.text()に変更。

しかしエラー「TypeError: 'unicode' object is not callable」で白旗 (´；ω；｀)

Traceback (most recent call last):
  File "python_ex289_ex293.py", line 25, in <module>
    Scraper(news).scrape()
  File "python_ex289_ex293.py", line 13, in scrape
    html = r.text()
TypeError: 'unicode' object is not callable

もう一度アドバイスをもらって完成

umyu様にスタックトレースの読み方まで教えていただいての完成です(*≧∀≦)
html = r.text()部分は括弧をとりましたが、それでも既視感のあるエラー (AttributeErrorだったかな?) がまた出て堂々巡りｺﾛｺﾛ ⌒((:з)⌒((ε:)⌒((:3に入った感があったので心が折れ、案2: beautifulsoup4を使うに切替え、html = r.read()に戻しましたorz

https://news.google.comをスクレイピング対象にしていましたが、urlにhtmlがまったく入っていなかったので、BeautifulSoup4のドキュメントを対象にしました。

python
1# ファイル名: python_ex289_ex293.py
2# -*-coding:utf-8-*-
3# 手本: https://github.com/calthoff/self_taught/blob/master/python_ex289.py/
4import urllib2
5from bs4 import BeautifulSoup
6
7class Scraper:
8    def __init__(self, site):
9        self.site = site
10        
11    def scrape(self):
12        r = urllib2.urlopen(self.site) # urlopen関数を実行するとResponseオブジェクトが返される。
13        html = r.read()
14        parser = "html.parser"
15        sp = BeautifulSoup(html, parser)
16        for tag in sp.find_all("a"):
17            url = tag.get("href")
18            if url is None:
19                continue
20            if "html" in url:
21                print("\n" + url)
22                
23news = "https://www.crummy.com/software/BeautifulSoup/bs4/doc/"                
24Scraper(news).scrape()

行動規範の内容に同意します

回答1件

ベストアンサー

BeautifulSoup (3.2.1)と出たので「ではbsとBeautifulSoupに3をつけてみよう」

こういう時は公式ドキュメントを探します。キーワードは「BeautifulSoup doc 3.2.1」です、検索するとBeautifulSoupのページがHITします。

BeautifulSoupのバージョンが3.2.1なのでpip install BeautifulSoupと入力してインストールしませんでしたか？

案としては2つ
案1,BeautifulSoup 3.2.1のドキュメントを参考にプログラムを修正する。

案2, BeautifulSoup(3.2.1)をアンインストール後にbeautifulsoup4をインストールする。
2-1,BeautifulSoupをpip uninstallする。

Python
1pip uninstall BeautifulSoup

2-2, beautifulsoup4をインストールする。

pip install beautifulsoup4

としてくださいな。

◇スタックトレースの読み方

Python
1  File "python_ex289_ex293.py", line 13, in scrape
2    html = r.text() # エラーが発生した時に実行していた行
3TypeError: 'unicode' object is not callable # 実行エラー

まず、実行エラーをグーグル翻訳に掛けます。
TypeError： 'unicode'オブジェクトは呼び出し可能ではありません

次に、実行エラーの原因を推測します。

Python
1html = r.text()

1,代入文なので、htmlは原因となりえません。

Python
1r.text()

2, 次にrの値がNoneならば、NoneTypeの例外が発生するので、これも違います。

よって原因は以下です。

Python
1text()

推測が正しいか公式ドキュメントで確認します。

Python
1r.text

と括弧なしで記載されています。よって括弧をはずして以下のように。

Python
1html = r.text

投稿2018/08/08 15:00

編集2018/08/09 02:18

umyu

総合スコア5846

Yukiya025

2018/08/09 01:09

umyuさん、こんにちは:D ありがとうございます(≧∀≦) 案1を採用し、他にも色々エラーが出て、解決できる部分は解決(?)したのですが、エラー「`TypeError: 'unicode' object is not callable`」でまた倒れていますヽ(_ _ヽ)彡質問本文の「# 案1を採用」見出しから見ていただけないでしょうか(>_<)