質問編集履歴

5

エラーメッセージを追記しました

2020/08/13 12:29

投稿

trave
trave

スコア7

test CHANGED
File without changes
test CHANGED
@@ -2,19 +2,7 @@
2
2
 
3
3
 
4
4
 
5
- 以下のようなコードを用いることにより処理をしようと考えているのですが、2つ問題が生じて、解決方法を知りたす。
5
+ 以下のようなコードを用いることにより処理をしようと考えているのですが、PDFファイルによって以下のようなエラーが生じ、エラーが生じたファイルが壊れ(0KBとな読み込めなくなって)します。
6
-
7
-
8
-
9
- 問題1.PDFファイルが保存されていない場合(例えばtest2.pdfがフォルダに格納されていない場合)、エラーが生じるのではなく、その部分の処理をスキップしたい
10
-
11
-
12
-
13
- 問題2.return NameObject(name,decode('utf-8'))というエラーが生じてしまう(格納されたPDFファイルの中にutf-8とそうでないものとが混在している?)。
14
-
15
- ⇒ファイルによってはエラーが生じない(セキュリティに違いがなくても)
16
-
17
-
18
6
 
19
7
  ![イメージ説明](c66d09cb51d332c692eb5f76828bf19d.png)
20
8
 
@@ -63,3 +51,91 @@
63
51
  dst_pdf.write(f)
64
52
 
65
53
  ```
54
+
55
+
56
+
57
+ エラーメッセージ
58
+
59
+ PdfReadError Traceback (most recent call last)
60
+
61
+ <ipython-input-22-2eb6e479cf31> in <module>
62
+
63
+ 22
64
+
65
+ 23 with open('Desktop\テスト/'+path,'wb') as f:
66
+
67
+ ---> 24 dst_pdf.write(f)
68
+
69
+
70
+
71
+ ~\Anaconda3\lib\site-packages\PyPDF2\pdf.py in write(self, stream)
72
+
73
+ 480 self.stack = []
74
+
75
+ 481 if debug: print(("ERM:", externalReferenceMap, "root:", self._root))
76
+
77
+ --> 482 self._sweepIndirectReferences(externalReferenceMap, self._root)
78
+
79
+ 483 del self.stack
80
+
81
+ 484
82
+
83
+
84
+
85
+ ~\Anaconda3\lib\site-packages\PyPDF2\pdf.py in _sweepIndirectReferences(self, externMap, data)
86
+
87
+ 569 self.stack.append(data.idnum)
88
+
89
+ 570 realdata = self.getObject(data)
90
+
91
+ --> 571 self._sweepIndirectReferences(externMap, realdata)
92
+
93
+ 572 return data
94
+
95
+ 573 else:
96
+
97
+
98
+
99
+ ~\Anaconda3\lib\site-packages\PyPDF2\pdf.py in _sweepIndirectReferences(self, externMap, data)
100
+
101
+ 545 for key, value in list(data.items()):
102
+
103
+ 546 origvalue = value
104
+
105
+ --> 547 value = self._sweepIndirectReferences(externMap, value)
106
+
107
+ 548 if isinstance(value, StreamObject):
108
+
109
+ 549 # a dictionary value is a stream. streams must be indirect
110
+
111
+
112
+
113
+ ~\Anaconda3\lib\site-packages\PyPDF2\pdf.py in _sweepIndirectReferences(self, externMap, data)
114
+
115
+ 575 if newobj == None:
116
+
117
+ 576 try:
118
+
119
+ --> 577 newobj = data.pdf.getObject(data)
120
+
121
+ 578 self._objects.append(None) # placeholder
122
+
123
+ 579 idnum = len(self._objects)
124
+
125
+
126
+
127
+ ~\Anaconda3\lib\site-packages\PyPDF2\pdf.py in getObject(self, indirectReference)
128
+
129
+ 1629 indirectReference.generation), utils.PdfReadWarning)
130
+
131
+ 1630 #if self.strict:
132
+
133
+ -> 1631 raise utils.PdfReadError("Could not find object.")
134
+
135
+ 1632 self.cacheIndirectObject(indirectReference.generation,
136
+
137
+ 1633 indirectReference.idnum, retval)
138
+
139
+
140
+
141
+ PdfReadError: Could not find object.

4

問題2について修正しました

2020/08/13 12:28

投稿

trave
trave

スコア7

test CHANGED
File without changes
test CHANGED
@@ -12,7 +12,7 @@
12
12
 
13
13
  問題2.return NameObject(name,decode('utf-8'))というエラーが生じてしまう(格納されたPDFファイルの中にutf-8とそうでないものとが混在している?)。
14
14
 
15
- 解決。セキュリティのかかってるファイル混在しいるだけでした。
15
+ ファイルによってはエラーが生じない(セキュリティに違いがなくも)
16
16
 
17
17
 
18
18
 

3

問題2について解決した旨を追記

2020/08/11 07:12

投稿

trave
trave

スコア7

test CHANGED
File without changes
test CHANGED
@@ -11,6 +11,10 @@
11
11
 
12
12
 
13
13
  問題2.return NameObject(name,decode('utf-8'))というエラーが生じてしまう(格納されたPDFファイルの中にutf-8とそうでないものとが混在している?)。
14
+
15
+ ⇒解決。セキュリティのかかっているファイルが混在しているだけでした。
16
+
17
+
14
18
 
15
19
  ![イメージ説明](c66d09cb51d332c692eb5f76828bf19d.png)
16
20
 
@@ -59,263 +63,3 @@
59
63
  dst_pdf.write(f)
60
64
 
61
65
  ```
62
-
63
- 以下エラーメッセージになります。
64
-
65
- UnicodeDecodeError Traceback (most recent call last)
66
-
67
- ~\anaconda3\lib\site-packages\PyPDF2\generic.py in readFromStream(stream, pdf)
68
-
69
- 483 try:
70
-
71
- --> 484 return NameObject(name.decode('utf-8'))
72
-
73
- 485 except (UnicodeEncodeError, UnicodeDecodeError) as e:
74
-
75
-
76
-
77
- UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 8: invalid start byte
78
-
79
-
80
-
81
- During handling of the above exception, another exception occurred:
82
-
83
-
84
-
85
- PdfReadError Traceback (most recent call last)
86
-
87
- <ipython-input-95-1c3d6b0830f5> in <module>
88
-
89
- 18
90
-
91
- 19 with open('Desktop\テスト/'+path,'wb') as f:
92
-
93
- ---> 20 dst_pdf.write(f)
94
-
95
-
96
-
97
- ~\anaconda3\lib\site-packages\PyPDF2\pdf.py in write(self, stream)
98
-
99
- 480 self.stack = []
100
-
101
- 481 if debug: print(("ERM:", externalReferenceMap, "root:", self._root))
102
-
103
- --> 482 self._sweepIndirectReferences(externalReferenceMap, self._root)
104
-
105
- 483 del self.stack
106
-
107
- 484
108
-
109
-
110
-
111
- ~\anaconda3\lib\site-packages\PyPDF2\pdf.py in _sweepIndirectReferences(self, externMap, data)
112
-
113
- 569 self.stack.append(data.idnum)
114
-
115
- 570 realdata = self.getObject(data)
116
-
117
- --> 571 self._sweepIndirectReferences(externMap, realdata)
118
-
119
- 572 return data
120
-
121
- 573 else:
122
-
123
-
124
-
125
- ~\anaconda3\lib\site-packages\PyPDF2\pdf.py in _sweepIndirectReferences(self, externMap, data)
126
-
127
- 545 for key, value in list(data.items()):
128
-
129
- 546 origvalue = value
130
-
131
- --> 547 value = self._sweepIndirectReferences(externMap, value)
132
-
133
- 548 if isinstance(value, StreamObject):
134
-
135
- 549 # a dictionary value is a stream. streams must be indirect
136
-
137
-
138
-
139
- ~\anaconda3\lib\site-packages\PyPDF2\pdf.py in _sweepIndirectReferences(self, externMap, data)
140
-
141
- 584 externMap[data.pdf][data.generation] = {}
142
-
143
- 585 externMap[data.pdf][data.generation][data.idnum] = newobj_ido
144
-
145
- --> 586 newobj = self._sweepIndirectReferences(externMap, newobj)
146
-
147
- 587 self._objects[idnum-1] = newobj
148
-
149
- 588 return newobj_ido
150
-
151
-
152
-
153
- ~\anaconda3\lib\site-packages\PyPDF2\pdf.py in _sweepIndirectReferences(self, externMap, data)
154
-
155
- 545 for key, value in list(data.items()):
156
-
157
- 546 origvalue = value
158
-
159
- --> 547 value = self._sweepIndirectReferences(externMap, value)
160
-
161
- 548 if isinstance(value, StreamObject):
162
-
163
- 549 # a dictionary value is a stream. streams must be indirect
164
-
165
-
166
-
167
- ~\anaconda3\lib\site-packages\PyPDF2\pdf.py in _sweepIndirectReferences(self, externMap, data)
168
-
169
- 554 elif isinstance(data, ArrayObject):
170
-
171
- 555 for i in range(len(data)):
172
-
173
- --> 556 value = self._sweepIndirectReferences(externMap, data[i])
174
-
175
- 557 if isinstance(value, StreamObject):
176
-
177
- 558 # an array value is a stream. streams must be indirect
178
-
179
-
180
-
181
- ~\anaconda3\lib\site-packages\PyPDF2\pdf.py in _sweepIndirectReferences(self, externMap, data)
182
-
183
- 584 externMap[data.pdf][data.generation] = {}
184
-
185
- 585 externMap[data.pdf][data.generation][data.idnum] = newobj_ido
186
-
187
- --> 586 newobj = self._sweepIndirectReferences(externMap, newobj)
188
-
189
- 587 self._objects[idnum-1] = newobj
190
-
191
- 588 return newobj_ido
192
-
193
-
194
-
195
- ~\anaconda3\lib\site-packages\PyPDF2\pdf.py in _sweepIndirectReferences(self, externMap, data)
196
-
197
- 545 for key, value in list(data.items()):
198
-
199
- 546 origvalue = value
200
-
201
- --> 547 value = self._sweepIndirectReferences(externMap, value)
202
-
203
- 548 if isinstance(value, StreamObject):
204
-
205
- 549 # a dictionary value is a stream. streams must be indirect
206
-
207
-
208
-
209
- ~\anaconda3\lib\site-packages\PyPDF2\pdf.py in _sweepIndirectReferences(self, externMap, data)
210
-
211
- 545 for key, value in list(data.items()):
212
-
213
- 546 origvalue = value
214
-
215
- --> 547 value = self._sweepIndirectReferences(externMap, value)
216
-
217
- 548 if isinstance(value, StreamObject):
218
-
219
- 549 # a dictionary value is a stream. streams must be indirect
220
-
221
-
222
-
223
- ~\anaconda3\lib\site-packages\PyPDF2\pdf.py in _sweepIndirectReferences(self, externMap, data)
224
-
225
- 545 for key, value in list(data.items()):
226
-
227
- 546 origvalue = value
228
-
229
- --> 547 value = self._sweepIndirectReferences(externMap, value)
230
-
231
- 548 if isinstance(value, StreamObject):
232
-
233
- 549 # a dictionary value is a stream. streams must be indirect
234
-
235
-
236
-
237
- ~\anaconda3\lib\site-packages\PyPDF2\pdf.py in _sweepIndirectReferences(self, externMap, data)
238
-
239
- 575 if newobj == None:
240
-
241
- 576 try:
242
-
243
- --> 577 newobj = data.pdf.getObject(data)
244
-
245
- 578 self._objects.append(None) # placeholder
246
-
247
- 579 idnum = len(self._objects)
248
-
249
-
250
-
251
- ~\anaconda3\lib\site-packages\PyPDF2\pdf.py in getObject(self, indirectReference)
252
-
253
- 1609 % (indirectReference.idnum, indirectReference.generation, idnum, generation))
254
-
255
- 1610 assert generation == indirectReference.generation
256
-
257
- -> 1611 retval = readObject(self.stream, self)
258
-
259
- 1612
260
-
261
- 1613 # override encryption is used for the /Encrypt dictionary
262
-
263
-
264
-
265
- ~\anaconda3\lib\site-packages\PyPDF2\generic.py in readObject(stream, pdf)
266
-
267
- 64 stream.seek(-2, 1) # reset to start
268
-
269
- 65 if peek == b_('<<'):
270
-
271
- ---> 66 return DictionaryObject.readFromStream(stream, pdf)
272
-
273
- 67 else:
274
-
275
- 68 return readHexStringFromStream(stream)
276
-
277
-
278
-
279
- ~\anaconda3\lib\site-packages\PyPDF2\generic.py in readFromStream(stream, pdf)
280
-
281
- 577 tok = readNonWhitespace(stream)
282
-
283
- 578 stream.seek(-1, 1)
284
-
285
- --> 579 value = readObject(stream, pdf)
286
-
287
- 580 if not data.get(key):
288
-
289
- 581 data[key] = value
290
-
291
-
292
-
293
- ~\anaconda3\lib\site-packages\PyPDF2\generic.py in readObject(stream, pdf)
294
-
295
- 58 if idx == 0:
296
-
297
- 59 # name object
298
-
299
- ---> 60 return NameObject.readFromStream(stream, pdf)
300
-
301
- 61 elif idx == 1:
302
-
303
- 62 # hexadecimal string OR dictionary
304
-
305
-
306
-
307
- ~\anaconda3\lib\site-packages\PyPDF2\generic.py in readFromStream(stream, pdf)
308
-
309
- 490 return NameObject(name)
310
-
311
- 491 else:
312
-
313
- --> 492 raise utils.PdfReadError("Illegal character in Name Object")
314
-
315
- 493
316
-
317
- 494 readFromStream = staticmethod(readFromStream)
318
-
319
-
320
-
321
- PdfReadError: Illegal character in Name Object

2

エラーメッセージを追記しました

2020/08/11 05:47

投稿

trave
trave

スコア7

test CHANGED
File without changes
test CHANGED
@@ -59,3 +59,263 @@
59
59
  dst_pdf.write(f)
60
60
 
61
61
  ```
62
+
63
+ 以下エラーメッセージになります。
64
+
65
+ UnicodeDecodeError Traceback (most recent call last)
66
+
67
+ ~\anaconda3\lib\site-packages\PyPDF2\generic.py in readFromStream(stream, pdf)
68
+
69
+ 483 try:
70
+
71
+ --> 484 return NameObject(name.decode('utf-8'))
72
+
73
+ 485 except (UnicodeEncodeError, UnicodeDecodeError) as e:
74
+
75
+
76
+
77
+ UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 8: invalid start byte
78
+
79
+
80
+
81
+ During handling of the above exception, another exception occurred:
82
+
83
+
84
+
85
+ PdfReadError Traceback (most recent call last)
86
+
87
+ <ipython-input-95-1c3d6b0830f5> in <module>
88
+
89
+ 18
90
+
91
+ 19 with open('Desktop\テスト/'+path,'wb') as f:
92
+
93
+ ---> 20 dst_pdf.write(f)
94
+
95
+
96
+
97
+ ~\anaconda3\lib\site-packages\PyPDF2\pdf.py in write(self, stream)
98
+
99
+ 480 self.stack = []
100
+
101
+ 481 if debug: print(("ERM:", externalReferenceMap, "root:", self._root))
102
+
103
+ --> 482 self._sweepIndirectReferences(externalReferenceMap, self._root)
104
+
105
+ 483 del self.stack
106
+
107
+ 484
108
+
109
+
110
+
111
+ ~\anaconda3\lib\site-packages\PyPDF2\pdf.py in _sweepIndirectReferences(self, externMap, data)
112
+
113
+ 569 self.stack.append(data.idnum)
114
+
115
+ 570 realdata = self.getObject(data)
116
+
117
+ --> 571 self._sweepIndirectReferences(externMap, realdata)
118
+
119
+ 572 return data
120
+
121
+ 573 else:
122
+
123
+
124
+
125
+ ~\anaconda3\lib\site-packages\PyPDF2\pdf.py in _sweepIndirectReferences(self, externMap, data)
126
+
127
+ 545 for key, value in list(data.items()):
128
+
129
+ 546 origvalue = value
130
+
131
+ --> 547 value = self._sweepIndirectReferences(externMap, value)
132
+
133
+ 548 if isinstance(value, StreamObject):
134
+
135
+ 549 # a dictionary value is a stream. streams must be indirect
136
+
137
+
138
+
139
+ ~\anaconda3\lib\site-packages\PyPDF2\pdf.py in _sweepIndirectReferences(self, externMap, data)
140
+
141
+ 584 externMap[data.pdf][data.generation] = {}
142
+
143
+ 585 externMap[data.pdf][data.generation][data.idnum] = newobj_ido
144
+
145
+ --> 586 newobj = self._sweepIndirectReferences(externMap, newobj)
146
+
147
+ 587 self._objects[idnum-1] = newobj
148
+
149
+ 588 return newobj_ido
150
+
151
+
152
+
153
+ ~\anaconda3\lib\site-packages\PyPDF2\pdf.py in _sweepIndirectReferences(self, externMap, data)
154
+
155
+ 545 for key, value in list(data.items()):
156
+
157
+ 546 origvalue = value
158
+
159
+ --> 547 value = self._sweepIndirectReferences(externMap, value)
160
+
161
+ 548 if isinstance(value, StreamObject):
162
+
163
+ 549 # a dictionary value is a stream. streams must be indirect
164
+
165
+
166
+
167
+ ~\anaconda3\lib\site-packages\PyPDF2\pdf.py in _sweepIndirectReferences(self, externMap, data)
168
+
169
+ 554 elif isinstance(data, ArrayObject):
170
+
171
+ 555 for i in range(len(data)):
172
+
173
+ --> 556 value = self._sweepIndirectReferences(externMap, data[i])
174
+
175
+ 557 if isinstance(value, StreamObject):
176
+
177
+ 558 # an array value is a stream. streams must be indirect
178
+
179
+
180
+
181
+ ~\anaconda3\lib\site-packages\PyPDF2\pdf.py in _sweepIndirectReferences(self, externMap, data)
182
+
183
+ 584 externMap[data.pdf][data.generation] = {}
184
+
185
+ 585 externMap[data.pdf][data.generation][data.idnum] = newobj_ido
186
+
187
+ --> 586 newobj = self._sweepIndirectReferences(externMap, newobj)
188
+
189
+ 587 self._objects[idnum-1] = newobj
190
+
191
+ 588 return newobj_ido
192
+
193
+
194
+
195
+ ~\anaconda3\lib\site-packages\PyPDF2\pdf.py in _sweepIndirectReferences(self, externMap, data)
196
+
197
+ 545 for key, value in list(data.items()):
198
+
199
+ 546 origvalue = value
200
+
201
+ --> 547 value = self._sweepIndirectReferences(externMap, value)
202
+
203
+ 548 if isinstance(value, StreamObject):
204
+
205
+ 549 # a dictionary value is a stream. streams must be indirect
206
+
207
+
208
+
209
+ ~\anaconda3\lib\site-packages\PyPDF2\pdf.py in _sweepIndirectReferences(self, externMap, data)
210
+
211
+ 545 for key, value in list(data.items()):
212
+
213
+ 546 origvalue = value
214
+
215
+ --> 547 value = self._sweepIndirectReferences(externMap, value)
216
+
217
+ 548 if isinstance(value, StreamObject):
218
+
219
+ 549 # a dictionary value is a stream. streams must be indirect
220
+
221
+
222
+
223
+ ~\anaconda3\lib\site-packages\PyPDF2\pdf.py in _sweepIndirectReferences(self, externMap, data)
224
+
225
+ 545 for key, value in list(data.items()):
226
+
227
+ 546 origvalue = value
228
+
229
+ --> 547 value = self._sweepIndirectReferences(externMap, value)
230
+
231
+ 548 if isinstance(value, StreamObject):
232
+
233
+ 549 # a dictionary value is a stream. streams must be indirect
234
+
235
+
236
+
237
+ ~\anaconda3\lib\site-packages\PyPDF2\pdf.py in _sweepIndirectReferences(self, externMap, data)
238
+
239
+ 575 if newobj == None:
240
+
241
+ 576 try:
242
+
243
+ --> 577 newobj = data.pdf.getObject(data)
244
+
245
+ 578 self._objects.append(None) # placeholder
246
+
247
+ 579 idnum = len(self._objects)
248
+
249
+
250
+
251
+ ~\anaconda3\lib\site-packages\PyPDF2\pdf.py in getObject(self, indirectReference)
252
+
253
+ 1609 % (indirectReference.idnum, indirectReference.generation, idnum, generation))
254
+
255
+ 1610 assert generation == indirectReference.generation
256
+
257
+ -> 1611 retval = readObject(self.stream, self)
258
+
259
+ 1612
260
+
261
+ 1613 # override encryption is used for the /Encrypt dictionary
262
+
263
+
264
+
265
+ ~\anaconda3\lib\site-packages\PyPDF2\generic.py in readObject(stream, pdf)
266
+
267
+ 64 stream.seek(-2, 1) # reset to start
268
+
269
+ 65 if peek == b_('<<'):
270
+
271
+ ---> 66 return DictionaryObject.readFromStream(stream, pdf)
272
+
273
+ 67 else:
274
+
275
+ 68 return readHexStringFromStream(stream)
276
+
277
+
278
+
279
+ ~\anaconda3\lib\site-packages\PyPDF2\generic.py in readFromStream(stream, pdf)
280
+
281
+ 577 tok = readNonWhitespace(stream)
282
+
283
+ 578 stream.seek(-1, 1)
284
+
285
+ --> 579 value = readObject(stream, pdf)
286
+
287
+ 580 if not data.get(key):
288
+
289
+ 581 data[key] = value
290
+
291
+
292
+
293
+ ~\anaconda3\lib\site-packages\PyPDF2\generic.py in readObject(stream, pdf)
294
+
295
+ 58 if idx == 0:
296
+
297
+ 59 # name object
298
+
299
+ ---> 60 return NameObject.readFromStream(stream, pdf)
300
+
301
+ 61 elif idx == 1:
302
+
303
+ 62 # hexadecimal string OR dictionary
304
+
305
+
306
+
307
+ ~\anaconda3\lib\site-packages\PyPDF2\generic.py in readFromStream(stream, pdf)
308
+
309
+ 490 return NameObject(name)
310
+
311
+ 491 else:
312
+
313
+ --> 492 raise utils.PdfReadError("Illegal character in Name Object")
314
+
315
+ 493
316
+
317
+ 494 readFromStream = staticmethod(readFromStream)
318
+
319
+
320
+
321
+ PdfReadError: Illegal character in Name Object

1

タイトルを修正しました

2020/08/11 04:33

投稿

trave
trave

スコア7

test CHANGED
@@ -1 +1 @@
1
- PDFファイルのプロパティにExcelファイルの情報を入力したい
1
+ PythonでPDFファイルのプロパティにExcelファイルの情報を入力したい
test CHANGED
File without changes