lxml.etree.ParserError: Unicode parsing is not supported on this platform

###前提・実現したいこと
この動画を参考に指定したサイトに行き各見出しリンクで使われているそれぞれの単語の頻度を表示する関数を作りたい。

###発生している問題・エラーメッセージ

Traceback (most recent call last):
  File "word_frequency_counter.py", line 18, in <module>
    start("https://www.thenewboston.com/forum/")
  File "word_frequency_counter.py", line 9, in start
    soup = BeautifulSoup(source_code)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/bs4/__init__.py", line 168, in __init__
    self._feed()
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/bs4/__init__.py", line 181, in _feed
    self.builder.feed(self.markup)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/bs4/builder/_lxml.py", line 61, in feed
    self.parser.feed(markup)
  File "parser.pxi", line 1201, in lxml.etree._FeedParser.feed (src/lxml/lxml.etree.c:102246)
  File "parser.pxi", line 1236, in lxml.etree._FeedParser.feed (src/lxml/lxml.etree.c:101171)
lxml.etree.ParserError: Unicode parsing is not supported on this platform

エラーの内容をはっきりと理解できていないのですが、lxml.etree.ParserError: Unicode parsing is not supported on this platformから察するにこれは単にbeautifulsoupとpython3の互換性の問題なのでしょうか？それとも原因は他にあるのでしょうか？

###ソースコード
動画で教えられている通りです。

word_frequency_counter.py
1import requests
2from bs4 import BeautifulSoup
3import operator
4
5
6def start(url):
7	word_list = []
8	source_code = requests.get(url).text #gonna connect to the link and use it as plain text
9	soup = BeautifulSoup(source_code)
10	for post_text in soup.findAll('a', {'class': 'title text-semibold'}): #go through all the contents
11		content = post_text.string #.string = only get the texts thats inside "soup"
12		words = cotent.lower().split()
13		for each_word in words:
14			print(each_word)
15			word_list.append(each_word)
16		
17
18start("https://www.thenewboston.com/forum/")

###補足情報(言語/FW/ツール等のバージョンなど)

Python3.4
Beautiful4

行動規範の内容に同意します

回答1件

自己解決

いくつかのスペルミスとsoup = BeautifulSoup(source_code)をsoup = BeautifulSoup(source_code, 'html.parser')に変えたらちゃんと動きました。

import requests
from bs4 import BeautifulSoup
import operator


def start(url):
	word_list = []
	source_code = requests.get(url).text #gonna connect to the link and use it as plain text
	soup = BeautifulSoup(source_code, 'html.parser')
	for post_text in soup.findAll('a', {'class': 'title text-semibold'}): #go through all the contents
		content = post_text.string #.string = only get the texts thats inside "soup"
		words = content.lower().split()
		for each_word in words:
			print(each_word)
			word_list.append(each_word)
		

start("https://www.thenewboston.com/forum/")

以下の様な結果が

dictionary
print
order
permanent
display
of
content
rendering
problems
whenever
i
start
the
android
studio
two
beginner
python
courses?
vector
about
double
buffering
arduino
code
asterisk
before
a
pointer
can
you
provide
me
the
arduino
code
for
eye
blinking
sensor(ir
sensor)
for
accidental
prevention.
can't
import
images
in
android
studio
can't
install
intel
haxm
free
internet
javascript
interpreter
lambda
function
my
funny
litlte
program
navigation
drawer
activity
not
able
to
find
the
problem
need
help
org.apache.http.client.httpclient
deprecated
question
about
themes
someone
share
a
link
to
source
codes??
source
code
?
which
all
views
should
be
turned
on?
x86
emulation
error
error
when
trying
to
build
and
run.
computer
doesn't
support
virtualization.
web
development
using
html
java
game
about
getting
user
input
eclipse
doesn't
recognise
my
imports
other
ways
of
styling

投稿2016/01/04 04:04

hiro_weedslayer

総合スコア15