回答率: 85.30%

質問するログイン新規登録

トップ PDFに関する質問 PDFデータが読み込み内

編集履歴

質問編集履歴

1

違うソースで読み込んだ結果を記した

2023/07/21 08:11

投稿

スコア66

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -87,5 +87,49 @@
     df.to_csv(f)
 ```
+### 試したこと
+違うソースで読み込みんでみたら、以下のような内容が出力された
+```
+'\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c'
+```
+試したソース
+```ここに言語を入力
+from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
+from pdfminer.converter import TextConverter
+from pdfminer.layout import LAParams
+from pdfminer.pdfpage import PDFPage
+from io import StringIO
+from glob import glob
+def convert_pdf_to_txt(path): # 引数にはPDFファイルパスを指定
+    rsrcmgr = PDFResourceManager()
+    retstr = StringIO()
+    codec = 'utf-8'
+    laparams = LAParams()
+    laparams.detect_vertical = True # Trueにすることで綺麗にテキストを抽出できる
+    device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
+    fp = open(path, 'rb')
+    interpreter = PDFPageInterpreter(rsrcmgr, device)
+    maxpages = 0
+    caching = True
+    pagenos=set()
+    fstr = ''
+    for page in PDFPage.get_pages(fp, pagenos, maxpages=maxpages,caching=caching, check_extractable=True):
+        interpreter.process_page(page)
+        str = retstr.getvalue()
+        fstr += str
+    fp.close()
+    device.close()
+    retstr.close()
+    return fstr
+inpname = 'kanban_apc_print.pdf'
+#inpname = 'pdf_test1.pdf'
+convert_pdf_to_txt(inpname)
+```
 ### 補足情報（FW/ツールのバージョンなど）