編集履歴

回答編集履歴

サンプルを更に追加

2020/04/28 10:32

投稿

スコア15898

answer CHANGED Viewed

@@ -72,4 +72,24 @@
 plt.tight_layout()
 plt.show()
 ```
-のように書いても良いかもしれません。
+のように書いても良いかもしれません。
+---
+【追記】
+`value_counts().plot.bar()` で 描画するサンプル
+```Python
+fig, axs = plt.subplots(3,2, figsize=(10,6))
+df.index.year.value_counts(sort=False).plot.bar(ax=axs[0,1])
+pos = {2016:(1,0), 2017:(1,1), 2018:(2,0), 2019:(2,1)}
+for year, d in df.groupby(df.index.year):
+    ax = axs[pos[year][0], pos[year][1]]
+    d.index.month.value_counts(sort=False).plot.bar(ax=ax)
+    ax.set_title(year)
+    ax.set_xlabel("month")
+    ax.set_ylabel("number")
+plt.tight_layout()
+plt.show()
+```

マークダウンのミス修正

2020/04/28 10:32

投稿

magichan

スコア15898

answer CHANGED Viewed

@@ -22,10 +22,11 @@
 上記のサンプルでは、
 - `read_csv()` のパラメータに`parse_dates` を渡して、time行をdatetime型で読み込む
 - `read_csv()` のパラメータに`index_col` を渡して、上記 time行を Indexに設定
-しております。
-Indexにdatetime型のデータを渡して DatetimeIndex を設定するのは時系列データを扱う上でいろいろと便利なのでお勧めしておきます。
+をしております。
+このように、Indexにdatetime型のデータを渡して DatetimeIndex を設定するのは時系列データを扱う上でいろいろと便利なのでお勧めしておきます。
 [https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#indexing
 ](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#indexing)
@@ -41,7 +42,7 @@
 のように、ループ毎に year とそのデータを得ることが出来ます。
-で、これを使ってグラフの描画部分を行うには
+で、これを使ってグラフの描画部分を行うには、subplotの場所を指定するために更に enumerate を使って
 ```Python
 fig, axs = plt.subplots(3,2, figsize=(10,6))
@@ -71,4 +72,4 @@
 plt.tight_layout()
 plt.show()
 ```
-の方が良いかもしれません
+のように書いても良いかもしれません。

サンプルの間違い修正

2020/04/28 02:22

投稿

magichan

スコア15898

answer CHANGED Viewed

@@ -62,7 +62,7 @@
 fig, axs = plt.subplots(3,2, figsize=(10,6))
 pos = {2016:(1,0), 2017:(1,1), 2018:(2,0), 2019:(2,1)}
 for year, d in df.groupby(df.index.year):
-    ax = axs[*pos[year]]
+    ax = axs[pos[year][0], pos[year][1]]
     ax.hist(x=d.index.month, bins=range(1,12+1),alpha=0.5)
     ax.set_title(year)
     ax.set_xlabel("month")

説明追加

2020/04/28 02:19

投稿

magichan

スコア15898

answer CHANGED Viewed

@@ -1,6 +1,6 @@
 まず複数のCSVファイルですが、各列に日時情報が入っているのであれば全てのデータをまとめて１つのDataFrameとして管理したほうが楽です。（データ量があまりにも大きくてメモリを圧迫する場合は除きますが）
-複数のCSVファイルから１つのDataFrameを作成するのはこんな感じに
+複数のCSVファイルから１つのDataFrameを作成するのはこんな感じになります。
 ```Python
 import pandas as pd
@@ -19,14 +19,32 @@
 print(df)
 ```
+上記のサンプルでは、
-上記のサンプルでは、`read_csv()` のパラメータに`parse_dates` を渡して、time行をdatetime型で読み込んでおります。
+- `read_csv()` のパラメータに`parse_dates` を渡して、time行をdatetime型で読み込む
+- `read_csv()` のパラメータに`index_col` を渡して、上記 time行を Indexに設定
+しております。
-で描画部分は
+Indexにdatetime型のデータを渡して DatetimeIndex を設定するのは時系列データを扱う上でいろいろと便利なのでお勧めしておきます。
+[https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#indexing
+](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#indexing)
+で、上記のように纏められた DataFrame から年毎の処理を行うには、``groupby()`` を使ってループ処理を行うことで
 ```Python
+for year, d in df.groupby(df.index.year):
+    # 何かしらの処理
+    print(f"{year} 年のデータ")
+    print(d)
+```
+のように、ループ毎に year とそのデータを得ることが出来ます。
+で、これを使ってグラフの描画部分を行うには
+```Python
 fig, axs = plt.subplots(3,2, figsize=(10,6))
 for p, (year, d) in enumerate(df.groupby(df.index.year)):
     ax = axs[(p+2)//2, (p+2)%2]
     ax.hist(x=d.index.month, bins=range(1,12+1),alpha=0.5)
@@ -37,4 +55,20 @@
 plt.tight_layout()
 plt.show()
 ```
-こんな感じになるのではないでしょうか。
+こんな感じになるのではないでしょうか。
+`enumerate` の部分が気に入らないのであれば dictを準備して
+```Python
+fig, axs = plt.subplots(3,2, figsize=(10,6))
+pos = {2016:(1,0), 2017:(1,1), 2018:(2,0), 2019:(2,1)}
+for year, d in df.groupby(df.index.year):
+    ax = axs[*pos[year]]
+    ax.hist(x=d.index.month, bins=range(1,12+1),alpha=0.5)
+    ax.set_title(year)
+    ax.set_xlabel("month")
+    ax.set_ylabel("number")
+plt.tight_layout()
+plt.show()
+```
+の方が良いかもしれません

説明追加

2020/04/28 02:12

投稿

magichan

スコア15898

answer CHANGED Viewed

File without changes

サンプル追加

2020/04/28 02:12

投稿

magichan

スコア15898

answer CHANGED Viewed

@@ -19,5 +19,22 @@
 print(df)
 ```
-上記のサンプルでは、
-`read_csv()` のパラメータに`parse_dates` を渡して、time行をdatetime型で読み込んでいる
+上記のサンプルでは、`read_csv()` のパラメータに`parse_dates` を渡して、time行をdatetime型で読み込んでおります。
+で描画部分は
+```Python
+fig, axs = plt.subplots(3,2, figsize=(10,6))
+for p, (year, d) in enumerate(df.groupby(df.index.year)):
+    ax = axs[(p+2)//2, (p+2)%2]
+    ax.hist(x=d.index.month, bins=range(1,12+1),alpha=0.5)
+    ax.set_title(year)
+    ax.set_xlabel("month")
+    ax.set_ylabel("number")
+plt.tight_layout()
+plt.show()
+```
+こんな感じになるのではないでしょうか。