編集履歴

質問編集履歴

実証コードに変更

2017/10/30 22:24

投稿

退会済みユーザー

スコア0

test CHANGED Viewed

File without changes

test CHANGED Viewed

@@ -38,9 +38,19 @@
+---
+**2017-10-31 コード修正**
+suyamaさんのアドバイスを受け、ページをクロールしながらタイトルを拾えるところまでできました。ここまでできればあとはwxPythonか何かでGUIをつければOKですね。ここで一区切りにします。Pythonで検索しようとしている人の参考までにコードを載せます。
 ```Python
-# coding: UTF-8
+# coding: UTF-8
 import os
@@ -49,6 +59,8 @@
 import json
 import ssl
+import sys
@@ -66,7 +78,7 @@
-URL = "https://teratail.com/api/v1/users/" + USER_NAME + "/clips"
+URL = "https://teratail.com/api/v1/users/" + USER_NAME + "/clips?limit=100&page="
@@ -74,31 +86,35 @@
-def downloader():
+def downloader(page):
-    # Get clip data from teratail
+    try:
-    with urllib.request.urlopen(URL) as response:
+        # Get clip data from teratail
-       html = response.read()
+        with urllib.request.urlopen(URL + str(page)) as response:
+            html = response.read()
-    text = html.decode('utf-8')
+            text = html.decode('utf-8')
-    # Load clip data as json file
+        # Load clip data as json file
-    json_dat = json.loads(text)
+        json_dat = json.loads(text)
-    # print(json.dumps(json_dat, sort_keys = True, indent = 4)
+        # print(json.dumps(json_dat, sort_keys = True, indent = 4)
-    # return json_dat["meta"]
+        # Return clip data as dict type
+        return json_dat["meta"], json_dat["questions"]
-    # Return clip data as dict type
+    except:
-    return json_dat["questions"]
+        sys.exit("The process was terminated. You may got 403 error caused by excess access [30 access / hour * IP address].")
@@ -108,12 +124,32 @@
     for dct in lst_questions:
-        print( dct["created"] + dct["title"])
+        # print( dct["created"] + " " + dct["title"])
+        print(dct["title"])
 if __name__ == "__main__":
+    # Initialize page index
+    page = 1
+    page_max = 1
+    # Crawl each pages
+    while page <= page_max:
+        meta, title = downloader(page)
+        page_max = meta["total_page"]
-    crop_titles(downloader())
+        crop_titles(title)
+        page = page + 1
 ```

誤植修正

2017/10/30 22:24

投稿

退会済みユーザー

スコア0

test CHANGED Viewed

File without changes

test CHANGED Viewed

@@ -28,7 +28,7 @@
 - **1)**
-検証時にChromeで[URL](https://teratail.com/api/v1/users/slash/)を直接開くと問題はなかったのですが、Pythonでそのまま走らせるとSSLの認証エラーではじかれました。なので、'ssl._create_default_https_context = ssl._create_unverified_context'のくだりを入れています。これは**teratailのSSLの設定が？かしい？**ということなのでしょうか…
+検証時にChromeで[URL](https://teratail.com/api/v1/users/slash/)を直接開くと問題はなかったのですが、Pythonでそのまま走らせると**teratailのAPIがSSLの認証エラー**ではじかれました。なので、'ssl._create_default_https_context = ssl._create_unverified_context'のくだりを入れています。これはteratailのSSLの設定がおかしい？ということなのでしょうか…

追補

2017/10/29 22:27

投稿

退会済みユーザー

スコア0

test CHANGED Viewed

File without changes

test CHANGED Viewed

@@ -4,7 +4,9 @@
+下記のsuyamaさんのアドバイスを受け、[teratailのAPI](http://docs.teratailv1.apiary.io/#reference/(user)/3/0?console=1)を使ってデータを取得出来るところまできましたが、**件数が20件**に絞られています。
-しかし、teratailではクリップの検索ができなかったり、ブラウザで検索しようとするとひたすら「もっと見る」をクリックするしかないようです。
+（「○○件中20件を検索したよ」のような情報は応答のjson形式の"meta"に収まっているようでした。）
@@ -22,15 +24,17 @@
 ---
-**2017-10-10-29 14:20追記**
+**teratailから提供されているAPIの、Python3での検証コード**
-当面はPythonでやることにしました。
+- **1)**
+検証時にChromeで[URL](https://teratail.com/api/v1/users/slash/)を直接開くと問題はなかったのですが、Pythonでそのまま走らせるとSSLの認証エラーではじかれました。なので、'ssl._create_default_https_context = ssl._create_unverified_context'のくだりを入れています。これは**teratailのSSLの設定が？かしい？**ということなのでしょうか…
-検証時にChromeで[URL](https://teratail.com/api/v1/users/slash/)を直接開くと問題はなかったのですが、Pythonでそのまま走らせるとSSLの認証エラーではじかれました。なので、'ssl._create_default_https_context = ssl._create_unverified_context'のくだりを入れています。これはteratailのSSLがおかしい**(？)**ということなのでしょうか…
+- **2)**
+**やりすぎるとAPIから`403`**されてしまいます。もし試される方が居ましたらやりすぎには気を付けてください。
@@ -48,6 +52,12 @@
+USER_NAME = "slash"
+# -------------------
 # To avoid an error
 # ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:720)
@@ -56,28 +66,54 @@
-USER_NAME = "slash"
+URL = "https://teratail.com/api/v1/users/" + USER_NAME + "/clips"
-URL = "https://teratail.com/api/v1/users/" + USER_NAME + "/"
+# -------------------
 def downloader():
+    # Get clip data from teratail
     with urllib.request.urlopen(URL) as response:
        html = response.read()
+    text = html.decode('utf-8')
-    #json_dat = json.loads(text.read)
+    # Load clip data as json file
+    json_dat = json.loads(text)
+    # print(json.dumps(json_dat, sort_keys = True, indent = 4)
+    # return json_dat["meta"]
+    # Return clip data as dict type
+    return json_dat["questions"]
+def crop_titles(lst_questions):
+    # Crop dictionary from list
-    print(html)
+    for dct in lst_questions:
+        print( dct["created"] + dct["title"])
 if __name__ == "__main__":
-    downloader()
+    crop_titles(downloader())
 ```

現状報告

2017/10/29 22:22

投稿

退会済みユーザー

スコア0

test CHANGED Viewed

File without changes

test CHANGED Viewed

@@ -17,3 +17,67 @@
 ***調べたこと**
 少し前に[teratail の過去遺産をもっと活用したい](https://teratail.com/questions/92505)という記事があり、APIに目を通しましたが「クリップの検索」はないようでした。
+---
+**2017-10-10-29 14:20追記**
+当面はPythonでやることにしました。
+検証時にChromeで[URL](https://teratail.com/api/v1/users/slash/)を直接開くと問題はなかったのですが、Pythonでそのまま走らせるとSSLの認証エラーではじかれました。なので、'ssl._create_default_https_context = ssl._create_unverified_context'のくだりを入れています。これはteratailのSSLがおかしい**(？)**ということなのでしょうか…
+```Python
+# coding: UTF-8
+import os
+import urllib.request
+import json
+import ssl
+# To avoid an error
+# ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:720)
+ssl._create_default_https_context = ssl._create_unverified_context
+USER_NAME = "slash"
+URL = "https://teratail.com/api/v1/users/" + USER_NAME + "/"
+def downloader():
+    with urllib.request.urlopen(URL) as response:
+       html = response.read()
+    #json_dat = json.loads(text.read)
+    print(html)
+if __name__ == "__main__":
+    downloader()
+```