ValueError：すべてのスカラー値を使用する場合は、インデックスを渡す必要があります

Question

次のコードを見てください。

_import MySQLdb as mdb import pandas as pd con = mdb.connect(db_Host, db_user, db_pass, db_name) query = """SELECT `TIME`.`BID-CLOSE` FROM `EUR-USD`.`tbl_EUR-USD_1-Day` WHERE TIME >= '2006-12-15 22:00:00' AND TIME <= '2007-01-03 22:00:00' ORDER BY TIME ASC;""" # Create a pandas dataframe from the SQL query eurusd = pd.read_sql_query(query, con=con, index_col='TIME') idx = pd.date_range('2006-12-17 22:00:00', '2007-01-03 22:00:00') eurusd.reindex(idx, fill_value=None) _

これにより、

_ BID-CLOSE 2006-12-17 22:00:00 1.30971 2006-12-18 22:00:00 1.31971 2006-12-19 22:00:00 1.31721 2006-12-20 22:00:00 1.31771 2006-12-21 22:00:00 1.31411 2006-12-22 22:00:00 NaN 2006-12-23 22:00:00 NaN 2006-12-24 22:00:00 NaN 2006-12-25 22:00:00 1.30971 2006-12-26 22:00:00 1.31131 2006-12-27 22:00:00 1.31491 2006-12-28 22:00:00 1.32021 2006-12-29 22:00:00 NaN 2006-12-30 22:00:00 NaN 2006-12-31 22:00:00 1.32731 2007-01-01 22:00:00 1.32731 2007-01-02 22:00:00 1.31701 2007-01-03 22:00:00 1.30831 _

データのインデックスを再作成する

_eurusd = eurusd.reindex(idx, fill_value=None) _

補間タイプのリスト

_methods = ['linear', 'quadratic', 'cubic'] _

次の行は例外をスローします...

_pd.DataFrame({m: eurusd.interpolate(method=m) for m in methods}) _

_ValueError: If using all scalar values, you must pass an index _

このガイドの補間セクションに従って http://pandas.pydata.org/pandas-docs/stable/missing_data.html この状況でどのようにして正しく「インデックスを渡す」のですか？

アップデート1

eurusd.interpolate('linear')の出力

_ BID-CLOSE 2006-12-17 22:00:00 1.309710 2006-12-18 22:00:00 1.319710 2006-12-19 22:00:00 1.317210 2006-12-20 22:00:00 1.317710 2006-12-21 22:00:00 1.314110 2006-12-22 22:00:00 1.313010 2006-12-23 22:00:00 1.311910 2006-12-24 22:00:00 1.310810 2006-12-25 22:00:00 1.309710 2006-12-26 22:00:00 1.311310 2006-12-27 22:00:00 1.314910 2006-12-28 22:00:00 1.320210 2006-12-29 22:00:00 1.322577 2006-12-30 22:00:00 1.324943 2006-12-31 22:00:00 1.327310 2007-01-01 22:00:00 1.327310 2007-01-02 22:00:00 1.317010 2007-01-03 22:00:00 1.308310 _

アップデート2

_In[9]: pd.DataFrame({m: eurusd['BID-CLOSE'].interpolate(method=m) for m in methods}) Out[9]: cubic linear quadratic 2006-12-17 22:00:00 1.309710 1.309710 1.309710 2006-12-18 22:00:00 1.319710 1.319710 1.319710 2006-12-19 22:00:00 1.317210 1.317210 1.317210 2006-12-20 22:00:00 1.317710 1.317710 1.317710 2006-12-21 22:00:00 1.314110 1.314110 1.314110 2006-12-22 22:00:00 1.310762 1.313010 1.307947 2006-12-23 22:00:00 1.309191 1.311910 1.305159 2006-12-24 22:00:00 1.308980 1.310810 1.305747 2006-12-25 22:00:00 1.309710 1.309710 1.309710 2006-12-26 22:00:00 1.311310 1.311310 1.311310 2006-12-27 22:00:00 1.314910 1.314910 1.314910 2006-12-28 22:00:00 1.320210 1.320210 1.320210 2006-12-29 22:00:00 1.323674 1.322577 1.321632 2006-12-30 22:00:00 1.325553 1.324943 1.323998 2006-12-31 22:00:00 1.327310 1.327310 1.327310 2007-01-01 22:00:00 1.327310 1.327310 1.327310 2007-01-02 22:00:00 1.317010 1.317010 1.317010 2007-01-03 22:00:00 1.308310 1.308310 1.308310 _

juanpa.arrivillaga · Accepted Answer

問題は、DataFrameコンストラクターを使用する場合です。

pd.DataFrame({m: eurusd.interpolate(method=m) for m in methods})

各mの値はDataFrameであり、これはスカラー値として解釈されますが、紛らわしいです。このコンストラクタは、ある種のシーケンスまたはSeriesを期待しています。以下は問題を解決するはずです：

pd.DataFrame({m: eurusd['BID-CLOSE'].interpolate(method=m) for m in methods})

列のサブセット化はSeriesを返すため。したがって、たとえば次の代わりに：

In [34]: pd.DataFrame({'linear':df.interpolate('linear')}) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-34-4b6c095c6da3> in <module>() ----> 1 pd.DataFrame({'linear':df.interpolate('linear')}) /home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy) 222 dtype=dtype, copy=copy) 223 Elif isinstance(data, dict): --> 224 mgr = self._init_dict(data, index, columns, dtype=dtype) 225 Elif isinstance(data, ma.MaskedArray): 226 import numpy.ma.mrecords as mrecords /home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype) 358 arrays = [data[k] for k in keys] 359 --> 360 return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype) 361 362 def _init_ndarray(self, values, index, columns, dtype=None, copy=False): /home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype) 5229 # figure out the index, if necessary 5230 if index is None: -> 5231 index = extract_index(arrays) 5232 else: 5233 index = _ensure_index(index) /home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in extract_index(data) 5268 5269 if not indexes and not raw_lengths: -> 5270 raise ValueError('If using all scalar values, you must pass' 5271 ' an index') 5272 ValueError: If using all scalar values, you must pass an index

代わりにこれを使用してください：

In [35]: pd.DataFrame({'linear':df['BID-CLOSE'].interpolate('linear')}) Out[35]: linear timestamp 2016-10-10 22:00:00 1.309710 2016-10-10 22:00:00 1.319710 2016-10-10 22:00:00 1.317210 2016-10-10 22:00:00 1.317710 2016-10-10 22:00:00 1.314110 2016-10-10 22:00:00 1.313010 2016-10-10 22:00:00 1.311910 2016-10-10 22:00:00 1.310810 2016-10-10 22:00:00 1.309710 2016-10-10 22:00:00 1.311310 2016-10-10 22:00:00 1.314910 2016-10-10 22:00:00 1.320210 2016-10-10 22:00:00 1.322577 2016-10-10 22:00:00 1.324943 2016-10-10 22:00:00 1.327310 2016-10-10 22:00:00 1.327310 2016-10-10 22:00:00 1.317010 2016-10-10 22:00:00 1.308310

公正な警告ですが、LinAlgError: singular matrix試してみるとエラー'quadratic'および'cubic'データの補間。なぜだかわかりません。