teratail header banner
teratail header banner
質問するログイン新規登録

回答編集履歴

1

質問の変更に対応しました。

2018/04/26 10:58

投稿

退会済みユーザー
answer CHANGED
@@ -1,18 +1,73 @@
1
- Javaで実装したHTMLパーサ[jsoup](https://jsoup.org/)を使うのが便利です。
1
+ Javaで実装したHTMLパーサ[jsoup](https://jsoup.org/)を使うのが便利です。これを使う場合はまず[Download and install jsoup](https://jsoup.org/download)からjsoup-1.11.3.jarをダウンロードしてクラスパスに追加してください。
2
2
  以下はspanタグのリストを取り出して、その中のテキストだけを抽出しています。
3
3
 
4
4
  ```java
5
- String html = "<tr style=\"min-height: 27px\">\r\n" +
6
- "<td style=\"border-left: 1px solid #000000; border-top: 1px solid #000000; border-right: 1px solid #000000; border-bottom: 1px solid #000000; vertical-align: middle\">\r\n" +
7
- "<p style=\"margin-left: 30px; line-height: 13.33px; margin-right: 6px; text-align: left\">\r\n" +
8
- "<span style=\"font-family: 'MS Mincho'; font-size: 12px\">流動資産合計</span>\r\n" +
9
- "</p>\r\n" +
10
- .....
11
- Document doc = Jsoup.parse(html);
12
- Elements spans = doc.select("span");
13
- for (Element e : spans)
5
+ package stackoverflow;
14
- System.out.println(e.text());
15
6
 
7
+ import org.jsoup.Jsoup;
8
+ import org.jsoup.nodes.Document;
9
+ import org.jsoup.nodes.Element;
10
+ import org.jsoup.select.Elements;
11
+
12
+ public class Main {
13
+
14
+ public static void main(String[] args) {
15
+ String html = "<tr style=\"min-height: 27px\">\r\n" +
16
+ "<td style=\"border-left: 1px solid #000000; border-top: 1px solid #000000; border-right: 1px solid #000000; border-bottom: 1px solid #000000; vertical-align: middle\">\r\n" +
17
+ "<p style=\"margin-left: 30px; line-height: 13.33px; margin-right: 6px; text-align: left\">\r\n" +
18
+ "<span style=\"font-family: 'MS Mincho'; font-size: 12px\">流動資産合計</span>\r\n" +
19
+ "</p>\r\n" +
20
+ "</td>\r\n" +
21
+ "<td style=\"border-left: 1px solid #000000; border-top: 1px solid #000000; border-right: 1px solid #000000; border-bottom: 1px solid #000000; vertical-align: middle\">\r\n" +
22
+ "<p style=\"line-height: 13.33px; text-align: center\">&#160;</p>\r\n" +
23
+ "</td>\r\n" +
24
+ "<td style=\"border-left: 1px solid #000000; border-top: 1px solid #000000; border-right: 1px solid #000000; border-bottom: 1px solid #000000; vertical-align: middle\">\r\n" +
25
+ "<p style=\"line-height: 13.33px; margin-right: 6px; text-align: right\">\r\n" +
26
+ "<span style=\"font-family: 'MS Mincho'; font-size: 12px\">34,303</span>\r\n" +
27
+ "</p>\r\n" +
28
+ "</td>\r\n" +
29
+ "<td style=\"border-left: 1px solid #000000; border-top: 1px solid #000000; border-right: 1px solid #000000; border-bottom: 1px solid #000000; vertical-align: middle\">\r\n" +
30
+ "<p style=\"line-height: 13.33px; margin-right: 6px; text-align: right\">\r\n" +
31
+ "<span style=\"font-family: 'MS Mincho'; font-size: 12px\">36,762</span>\r\n" +
32
+ "</p>\r\n" +
33
+ "</td>\r\n" +
34
+ "<td style=\"border-left: 1px solid #000000; border-top: 1px solid #000000; border-right: 1px solid #000000; border-bottom: 1px solid #000000; vertical-align: middle\">\r\n" +
35
+ "<p style=\"line-height: 13.33px; margin-right: 6px; text-align: right\">\r\n" +
36
+ "<span style=\"font-family: 'MS Mincho'; font-size: 12px\">28,016</span>\r\n" +
37
+ "</p>\r\n" +
38
+ "</td>\r\n" +
39
+ "</tr>\r\n" +
40
+ "<tr style=\"min-height: 27px\">\r\n" +
41
+ "<td style=\"border-left: 1px solid #000000; border-top: 1px solid #000000; border-right: 1px solid #000000; border-bottom: 1px solid #000000; vertical-align: middle\">\r\n" +
42
+ "<p style=\"margin-left: 30px; line-height: 13.33px; margin-right: 6px; text-align: left\">\r\n" +
43
+ "<span style=\"font-family: 'MS Mincho'; font-size: 12px\">流動負債合計</span>\r\n" +
44
+ "</p>\r\n" +
45
+ "</td>\r\n" +
46
+ "<td style=\"border-left: 1px solid #000000; border-top: 1px solid #000000; border-right: 1px solid #000000; border-bottom: 1px solid #000000; vertical-align: middle\">\r\n" +
47
+ "<p style=\"line-height: 13.33px; text-align: center\">&#160;</p>\r\n" +
48
+ "</td>\r\n" +
49
+ "<td style=\"border-left: 1px solid #000000; border-top: 1px solid #000000; border-right: 1px solid #000000; border-bottom: 1px solid #000000; vertical-align: middle\">\r\n" +
50
+ "<p style=\"line-height: 13.33px; margin-right: 6px; text-align: right\">\r\n" +
51
+ "<span style=\"font-family: 'MS Mincho'; font-size: 12px\">6,917</span>\r\n" +
52
+ "</p>\r\n" +
53
+ "</td>\r\n" +
54
+ "<td style=\"border-left: 1px solid #000000; border-top: 1px solid #000000; border-right: 1px solid #000000; border-bottom: 1px solid #000000; vertical-align: middle\">\r\n" +
55
+ "<p style=\"line-height: 13.33px; margin-right: 6px; text-align: right\">\r\n" +
56
+ "<span style=\"font-family: 'MS Mincho'; font-size: 12px\">6,809</span>\r\n" +
57
+ "</p>\r\n" +
58
+ "</td>\r\n" +
59
+ "<td style=\"border-left: 1px solid #000000; border-top: 1px solid #000000; border-right: 1px solid #000000; border-bottom: 1px solid #000000; vertical-align: middle\">\r\n" +
60
+ "<p style=\"line-height: 13.33px; margin-right: 6px; text-align: right\">\r\n" +
61
+ "<span style=\"font-family: 'MS Mincho'; font-size: 12px\">5,339</span>\r\n" +
62
+ "</p>\r\n" +
63
+ "</td>\r\n" +
64
+ "</tr>";
65
+ Document doc = Jsoup.parse(html);
66
+ Elements spans = doc.select("span");
67
+ for (Element e : spans)
68
+ System.out.println(e.text());
69
+ }
70
+ }
16
71
  ```
17
72
 
18
73
  結果は以下のようになります。