pandasのDataFrameにSeriesを特定の場合において追加できない

はじめまして
PandasのDataFrameにSeriesを結合する際に、特定の場合においてエラーが出て困っています。
具体的には、結合するSeriesの一つの値をNaNからpd.Timestampクラスの値に変えたものを追加しようとするとエラーが出てしまいます。
以下が具体的な問題です。

###発生している問題・エラーメッセージ
初期のデータフレームは以下のようなものです。
last_updateの値はpd.Timestampクラスです。

    In [9]: df
    Out[9]: 
      importance interval last_read                last_update    name trigger
    0          2      NaN       NaN  2017-12-09 00:00:00+09:00  foobar     NaN

ここでは、追加するシリーズをデータフレームの１行目と同じものだとします。

    In [10]: record = df.iloc[0].copy()

このシリーズをデータフレームに追加する場合はなんの問題も起きません。

    In [11]: df.append(record)
    Out[11]: 
      importance interval last_read                last_update    name trigger
    0          2      NaN       NaN  2017-12-09 00:00:00+09:00  foobar     NaN
    0          2      NaN       NaN  2017-12-09 00:00:00+09:00  foobar     NaN

今度はシリーズの一つの値(intervalの値をNaNから1に)を変えたものを追加してみますが、これも問題は起こりません。

    In [12]: record['interval'] = 1
    
    In [13]: df.append(record)
    Out[13]: 
      importance interval last_read                last_update    name trigger
    0          2      NaN       NaN  2017-12-09 00:00:00+09:00  foobar     NaN
    0          2        1       NaN  2017-12-09 00:00:00+09:00  foobar     NaN

次に、上のシリーズの値をもう一つ(triggerをNaNからpd.Timestampクラスの値に)変えたものを追加してみると、エラーが出ます。

    In [10]: record
    Out[10]: 
    importance                             2
    interval                             NaN
    last_read                            NaN
    last_update    2017-12-09 00:00:00+09:00
    name                              foobar
    trigger                              NaN
    Name: 0, dtype: object

    In [11]: record['trigger'] = record['last_update']
    
    In [12]: record
    Out[12]: 
    importance                             2
    interval                             NaN
    last_read                            NaN
    last_update    2017-12-09 00:00:00+09:00
    name                              foobar
    trigger        2017-12-09 00:00:00+09:00
    Name: 0, dtype: object
    
    In [13]: df.append(record)
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-13-7c027f1cbb54> in <module>()
    ----> 1 df.append(record)
    
    ~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in append(self, other, ignore_index, verify_integrity)
       4545             to_concat = [self, other]
       4546         return concat(to_concat, ignore_index=ignore_index,
    -> 4547                       verify_integrity=verify_integrity)
       4548 
       4549     def join(self, other, on=None, how='left', lsuffix='', rsuffix='',
    
    ~/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
        205                        verify_integrity=verify_integrity,
        206                        copy=copy)
    --> 207     return op.get_result()
        208 
        209 
    
    ~/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/concat.py in get_result(self)
        405             new_data = concatenate_block_managers(
        406                 mgrs_indexers, self.new_axes, concat_axis=self.axis,
    --> 407                 copy=self.copy)
        408             if not self.copy:
        409                 new_data._consolidate_inplace()
    
    ~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
       4830     blocks = [make_block(
       4831         concatenate_join_units(join_units, concat_axis, copy=copy),
    -> 4832         placement=placement) for placement, join_units in concat_plan]
       4833 
       4834     return BlockManager(blocks, axes)
    
    ~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in <listcomp>(.0)
       4830     blocks = [make_block(
       4831         concatenate_join_units(join_units, concat_axis, copy=copy),
    -> 4832         placement=placement) for placement, join_units in concat_plan]
       4833 
       4834     return BlockManager(blocks, axes)
    
    ~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in concatenate_join_units(join_units, concat_axis, copy)
       4937     to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,
       4938                                          upcasted_na=upcasted_na)
    -> 4939                  for ju in join_units]
       4940 
       4941     if len(to_concat) == 1:
    
    ~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in <listcomp>(.0)
       4937     to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,
       4938                                          upcasted_na=upcasted_na)
    -> 4939                  for ju in join_units]
       4940 
       4941     if len(to_concat) == 1:
    
    ~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in get_reindexed_values(self, empty_dtype, upcasted_na)
       5210                     pass
       5211                 else:
    -> 5212                     missing_arr = np.empty(self.shape, dtype=empty_dtype)
       5213                     missing_arr.fill(fill_value)
       5214                     return missing_arr
    
    TypeError: data type not understood

"data type not understood"とあるので、もしかしてpd.Timestampクラスの値は使えないのかとも思いましたが、以下のように問題なく新規にデータフレームを作ることは可能でした。

In [5]: a = pd.Timestamp('today')

In [6]: b = pd.Timestamp('today') + pd.tseries.offsets.Day(2)

In [7]: date = [a, b]

In [8]: name = ['foo', 'bar']

In [9]: df = pd.DataFrame({'date': date, 'name': name})

In [10]: df
Out[10]: 
                        date name
0 2017-12-10 17:56:29.315316  foo
1 2017-12-12 17:56:37.535777  bar

初心者のため、これが仕様なのかそれとも何らかのバグ、それとも単に自分の思い違いなのかもわかりません。
熟練者の皆様、アドバイスよろしくお願いします。

###補足情報(言語/FW/ツール等のバージョンなど)
pandasのバージョンは0.20.3です。

行動規範の内容に同意します

回答1件

ベストアンサー

どうやらpandas のバグのようですね。

いろいろ試してみましたが、空の列にawareなDatetimeデータ(TimeZone付きdatetimeデータ)を追加すると、TypeError が発生するようです。

【実験コード】

Python
1import pandas as pd
2df = pd.DataFrame(columns=['A','B'])
3# TimeZone付きのDatetimeを追加でエラーが発生
4df.append(pd.Series({'A':pd.Timestamp.now(tz='Asia/Tokyo'), 'B':1}), ignore_index=True) # <= TypeError
5# TimeZoneが無いと問題ない
6df.append(pd.Series({'A':pd.Timestamp.now(), 'B':1}), ignore_index=True)