概要
CSVファイルA/B間のファイル差分をpythonで作成したツールで標準画面出力する
背景
pythonの学習を始め、CSVファイルの操作でつまづいている。
前提
学習に使用してしている環境は以下
- python3.7
- Anaconda
- Pandas
- Jupyter Note
郵便局のken_all.csvの形式を使用を想定して以下のようなデータを用意する
before.csv
bash
1mkdir /tmp/ken_all 2cd /tmp/ken_all 3vim before.txt
csv
113101,"100 ","1000000","トウキョウト","チヨダク","イカニケイサイガナイバアイ","東京都","千代田区","以下に掲載がない場合",0,0,0,0,0,0 213101,"102 ","1020072","トウキョウト","チヨダク","イイダバシ","東京都","千代田区","飯田橋",0,0,1,0,0,0 313101,"101 ","1010032","トウキョウト","チヨダク","イワモトチョウ","東京都","千代田区","岩本町",0,0,1,0,0,0 413101,"101 ","1010047","トウキョウト","チヨダク","ウチカンダ","東京都","千代田区","内神田",0,0,1,0,0,0
after.csv
bash
1vim after.csv
csv
113101,"100 ","1000000","トウキョウト","チヨダク","イカニケイサイガナイバアイ","東京都","千代田区","以下に掲載がない場合",0,0,0,0,0,0 213101,"102 ","1020082","トウキョウト","チヨダク","イチバンチョウ","東京都","千代田区","一番町",0,0,0,0,0,0 313101,"101 ","1010035","トウキョウト","チヨダク","イワモトチョウ","東京都","千代田区","岩本町",0,0,1,0,0,0 413101,"101 ","1010047","トウキョウト","チヨダク","ウチカンダ","東京都","千代田区","内神田",0,0,1,0,0,5
diff
ファイルの差分を確認
bash
1diff before.txt after.txt
diff
12,4c2,4 2< 13101,"102 ","1020072","トウキョウト","チヨダク","イイダバシ","東京都","千代田区","飯田橋",0,0,1,0,0,0 3< 13101,"101 ","1010032","トウキョウト","チヨダク","イワモトチョウ","東京都","千代田区","岩本町",0,0,1,0,0,0 4< 13101,"101 ","1010047","トウキョウト","チヨダク","ウチカンダ","東京都","千代田区","内神田",0,0,1,0,0,0 5--- 6> 13101,"102 ","1020082","トウキョウト","チヨダク","イチバンチョウ","東京都","千代田区","一番町",0,0,0,0,0,0 7> 13101,"101 ","1010035","トウキョウト","チヨダク","イワモトチョウ","東京都","千代田区","岩本町",0,0,1,0,0,0 8> 13101,"101 ","1010047","トウキョウト","チヨダク","ウチカンダ","東京都","千代田区","内神田",0,0,1,0,0,5
ゴール
目指す出力形式は以下
bash
1python3 diff_csv.py
diff
1-13101,"102 ","1020072","トウキョウト","チヨダク","イイダバシ","東京都","千代田区","飯田橋",0,0,1,0,0,0 2+13101,"102 ","1020082","トウキョウト","チヨダク","イチバンチョウ","東京都","千代田区","一番町",0,0,0,0,0,0 3-13101,"101 ","1010032","トウキョウト","チヨダク","イワモトチョウ","東京都","千代田区","岩本町",0,0,1,0,0,0 4+13101,"101 ","1010035","トウキョウト","チヨダク","イワモトチョウ","東京都","千代田区","岩本町",0,0,1,0,0,0 5-13101,"101 ","1010047","トウキョウト","チヨダク","ウチカンダ","東京都","千代田区","内神田",0,0,1,0,0,0 6+13101,"101 ","1010047","トウキョウト","チヨダク","ウチカンダ","東京都","千代田区","内神田",0,0,1,0,0,5
備考
- 理想に近いものは
git diff
の出力形式- この結果を半角カナの順にソートしてあるものをイメージしている
diff
1@@ -1,4 +1,4 @@ 2 13101,"100 ","1000000","トウキョウト","チヨダク","イカニケイサイガナイバアイ","東京都","千代田区","以下に掲載がない場合",0,0,0,0,0,0 3-13101,"102 ","1020072","トウキョウト","チヨダク","イイダバシ","東京都","千代田区","飯田橋",0,0,1,0,0,0 4-13101,"101 ","1010032","トウキョウト","チヨダク","イワモトチョウ","東京都","千代田区","岩本町",0,0,1,0,0,0 5-13101,"101 ","1010047","トウキョウト","チヨダク","ウチカンダ","東京都","千代田区","内神田",0,0,1,0,0,0 6+13101,"102 ","1020082","トウキョウト","チヨダク","イチバンチョウ","東京都","千代田区","一番町",0,0,0,0,0,0 7+13101,"101 ","1010035","トウキョウト","チヨダク","イワモトチョウ","東京都","千代田区","岩本町",0,0,1,0,0,0 8+13101,"101 ","1010047","トウキョウト","チヨダク","ウチカンダ","東京都","千代田区","内神田",0,0,1,0,0,5
手順
python
1import pandas as pd 2file1 = '/tmp/ken_all/before.csv' 3file2 = '/tmp/ken_all/after.csv' 4data1 = pd.read_csv(file1, encoding="shift_jis") 5data2 = pd.read_csv(file2, encoding="shift_jis")

回答2件
あなたの回答
tips
プレビュー
バッドをするには、ログインかつ
こちらの条件を満たす必要があります。
2019/02/24 18:36 編集
2019/02/25 03:18
2019/02/25 03:27
2019/02/28 14:25