pandas.ewm（）。mean（）と同等の「指数加重移動平均」のNumPyバージョン

Question

pandas のようにNumPyで指数加重移動平均を取得するにはどうすればよいですか？

_import pandas as pd import pandas_datareader as pdr from datetime import datetime # Declare variables ibm = pdr.get_data_yahoo(symbols='IBM', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1)).reset_index(drop=True)['Adj Close'] windowSize = 20 # Get PANDAS exponential weighted moving average ewm_pd = pd.DataFrame(ibm).ewm(span=windowSize, min_periods=windowSize).mean().as_matrix() print(ewm_pd) _

NumPyで次のことを試しました

_import numpy as np import pandas_datareader as pdr from datetime import datetime # From this post: http://stackoverflow.com/a/40085052/3293881 by @Divakar def strided_app(a, L, S): # Window len = L, Stride len/stepsize = S nrows = ((a.size - L) // S) + 1 n = a.strides[0] return np.lib.stride_tricks.as_strided(a, shape=(nrows, L), strides=(S * n, n)) def numpyEWMA(price, windowSize): weights = np.exp(np.linspace(-1., 0., windowSize)) weights /= weights.sum() a2D = strided_app(price, windowSize, 1) returnArray = np.empty((price.shape[0])) returnArray.fill(np.nan) for index in (range(a2D.shape[0])): returnArray[index + windowSize-1] = np.convolve(weights, a2D[index])[windowSize - 1:-windowSize + 1] return np.reshape(returnArray, (-1, 1)) # Declare variables ibm = pdr.get_data_yahoo(symbols='IBM', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1)).reset_index(drop=True)['Adj Close'] windowSize = 20 # Get NumPy exponential weighted moving average ewma_np = numpyEWMA(ibm, windowSize) print(ewma_np) _

しかし、結果はパンダのものと似ていません。

NumPyで指数加重移動平均を直接計算し、 pandas.ewm().mean() とまったく同じ結果を得るためのより良いアプローチがありますか？

pandasソリューションで60,000リクエストで、私は約230秒を取得します。純粋なNumPyを使用すると、これを大幅に減らすことができると確信しています。

Jake Walden · Accepted Answer

2019年8月6日更新

大規模な入力向けの純粋でヌッピー、高速かつベクトル化されたソリューション

outパラメーター、インプレース計算用、dtypeパラメーター、インデックスorderパラメーター

この関数は、pandasのewm(adjust=False).mean()と同等ですが、はるかに高速です。 ewm(adjust=True).mean()（pandasのデフォルト）は、結果の開始時に異なる値を生成できます。このソリューションにadjust機能を追加する作業を行っています。

@ Divakar's answer 入力が大きすぎる場合、浮動小数点精度の問題が発生します。これは_(1-alpha)**(n+1) -> 0_が_n -> inf_および_alpha -> 1_のとき、ゼロ除算とNaN値が計算でポップアップするためです。

これは、精度の問題がなく、ほぼ完全にベクトル化された私の最速のソリューションです。少し複雑になりましたが、特に非常に大きな入力の場合、パフォーマンスは優れています。インプレース計算を使用しない場合（outパラメーターを使用するとメモリ割り当て時間を節約できます）：100M要素の入力ベクトルで3.62秒、100K要素の入力ベクトルで3.2ms、5000要素の入力で293µsかなり古いPC上のベクトル（結果は異なるalpha/_row_size_値によって異なります）。

_# tested with python3 & numpy 1.15.2 import numpy as np def ewma_vectorized_safe(data, alpha, row_size=None, dtype=None, order='C', out=None): """ Reshapes data before calculating EWMA, then iterates once over the rows to calculate the offset without precision issues :param data: Input data, will be flattened. :param alpha: scalar float in range (0,1) The alpha parameter for the moving average. :param row_size: int, optional The row size to use in the computation. High row sizes need higher precision, low values will impact performance. The optimal value depends on the platform and the alpha being used. Higher alpha values require lower row size. Default depends on dtype. :param dtype: optional Data type used for calculations. Defaults to float64 unless data.dtype is float32, then it will use float32. :param order: {'C', 'F', 'A'}, optional Order to use when flattening the data. Defaults to 'C'. :param out: ndarray, or None, optional A location into which the result is stored. If provided, it must have the same shape as the desired output. If not provided or `None`, a freshly-allocated array is returned. :return: The flattened result. """ data = np.array(data, copy=False) if dtype is None: if data.dtype == np.float32: dtype = np.float32 else: dtype = np.float else: dtype = np.dtype(dtype) row_size = int(row_size) if row_size is not None else get_max_row_size(alpha, dtype) if data.size <= row_size: # The normal function can handle this input, use that return ewma_vectorized(data, alpha, dtype=dtype, order=order, out=out) if data.ndim > 1: # flatten input data = np.reshape(data, -1, order=order) if out is None: out = np.empty_like(data, dtype=dtype) else: assert out.shape == data.shape assert out.dtype == dtype row_n = int(data.size // row_size) # the number of rows to use trailing_n = int(data.size % row_size) # the amount of data leftover first_offset = data[0] if trailing_n > 0: # set temporary results to slice view of out parameter out_main_view = np.reshape(out[:-trailing_n], (row_n, row_size)) data_main_view = np.reshape(data[:-trailing_n], (row_n, row_size)) else: out_main_view = out data_main_view = data # get all the scaled cumulative sums with 0 offset ewma_vectorized_2d(data_main_view, alpha, axis=1, offset=0, dtype=dtype, order='C', out=out_main_view) scaling_factors = (1 - alpha) ** np.arange(1, row_size + 1) last_scaling_factor = scaling_factors[-1] # create offset array offsets = np.empty(out_main_view.shape[0], dtype=dtype) offsets[0] = first_offset # iteratively calculate offset for each row for i in range(1, out_main_view.shape[0]): offsets[i] = offsets[i - 1] * last_scaling_factor + out_main_view[i - 1, -1] # add the offsets to the result out_main_view += offsets[:, np.newaxis] * scaling_factors[np.newaxis, :] if trailing_n > 0: # process trailing data in the 2nd slice of the out parameter ewma_vectorized(data[-trailing_n:], alpha, offset=out_main_view[-1, -1], dtype=dtype, order='C', out=out[-trailing_n:]) return out def get_max_row_size(alpha, dtype=float): assert 0. <= alpha < 1. # This will return the maximum row size possible on # your platform for the given dtype. I can find no impact on accuracy # at this value on my machine. # Might not be the optimal value for speed, which is hard to predict # due to numpy's optimizations # Use np.finfo(dtype).eps if you are worried about accuracy # and want to be extra safe. epsilon = np.finfo(dtype).tiny # If this produces an OverflowError, make epsilon larger return int(np.log(epsilon)/np.log(1-alpha)) + 1 _

1D ewma関数：

_def ewma_vectorized(data, alpha, offset=None, dtype=None, order='C', out=None): """ Calculates the exponential moving average over a vector. Will fail for large inputs. :param data: Input data :param alpha: scalar float in range (0,1) The alpha parameter for the moving average. :param offset: optional The offset for the moving average, scalar. Defaults to data[0]. :param dtype: optional Data type used for calculations. Defaults to float64 unless data.dtype is float32, then it will use float32. :param order: {'C', 'F', 'A'}, optional Order to use when flattening the data. Defaults to 'C'. :param out: ndarray, or None, optional A location into which the result is stored. If provided, it must have the same shape as the input. If not provided or `None`, a freshly-allocated array is returned. """ data = np.array(data, copy=False) if dtype is None: if data.dtype == np.float32: dtype = np.float32 else: dtype = np.float64 else: dtype = np.dtype(dtype) if data.ndim > 1: # flatten input data = data.reshape(-1, order) if out is None: out = np.empty_like(data, dtype=dtype) else: assert out.shape == data.shape assert out.dtype == dtype if data.size < 1: # empty input, return empty array return out if offset is None: offset = data[0] alpha = np.array(alpha, copy=False).astype(dtype, copy=False) # scaling_factors -> 0 as len(data) gets large # this leads to divide-by-zeros below scaling_factors = np.power(1. - alpha, np.arange(data.size + 1, dtype=dtype), dtype=dtype) # create cumulative sum array np.multiply(data, (alpha * scaling_factors[-2]) / scaling_factors[:-1], dtype=dtype, out=out) np.cumsum(out, dtype=dtype, out=out) # cumsums / scaling out /= scaling_factors[-2::-1] if offset != 0: offset = np.array(offset, copy=False).astype(dtype, copy=False) # add offsets out += offset * scaling_factors[1:] return out _

2D ewma関数：

_def ewma_vectorized_2d(data, alpha, axis=None, offset=None, dtype=None, order='C', out=None): """ Calculates the exponential moving average over a given axis. :param data: Input data, must be 1D or 2D array. :param alpha: scalar float in range (0,1) The alpha parameter for the moving average. :param axis: The axis to apply the moving average on. If axis==None, the data is flattened. :param offset: optional The offset for the moving average. Must be scalar or a vector with one element for each row of data. If set to None, defaults to the first value of each row. :param dtype: optional Data type used for calculations. Defaults to float64 unless data.dtype is float32, then it will use float32. :param order: {'C', 'F', 'A'}, optional Order to use when flattening the data. Ignored if axis is not None. :param out: ndarray, or None, optional A location into which the result is stored. If provided, it must have the same shape as the desired output. If not provided or `None`, a freshly-allocated array is returned. """ data = np.array(data, copy=False) assert data.ndim <= 2 if dtype is None: if data.dtype == np.float32: dtype = np.float32 else: dtype = np.float64 else: dtype = np.dtype(dtype) if out is None: out = np.empty_like(data, dtype=dtype) else: assert out.shape == data.shape assert out.dtype == dtype if data.size < 1: # empty input, return empty array return out if axis is None or data.ndim < 2: # use 1D version if isinstance(offset, np.ndarray): offset = offset[0] return ewma_vectorized(data, alpha, offset, dtype=dtype, order=order, out=out) assert -data.ndim <= axis < data.ndim # create reshaped data views out_view = out if axis < 0: axis = data.ndim - int(axis) if axis == 0: # transpose data views so columns are treated as rows data = data.T out_view = out_view.T if offset is None: # use the first element of each row as the offset offset = np.copy(data[:, 0]) Elif np.size(offset) == 1: offset = np.reshape(offset, (1,)) alpha = np.array(alpha, copy=False).astype(dtype, copy=False) # calculate the moving average row_size = data.shape[1] row_n = data.shape[0] scaling_factors = np.power(1. - alpha, np.arange(row_size + 1, dtype=dtype), dtype=dtype) # create a scaled cumulative sum array np.multiply( data, np.multiply(alpha * scaling_factors[-2], np.ones((row_n, 1), dtype=dtype), dtype=dtype) / scaling_factors[np.newaxis, :-1], dtype=dtype, out=out_view ) np.cumsum(out_view, axis=1, dtype=dtype, out=out_view) out_view /= scaling_factors[np.newaxis, -2::-1] if not (np.size(offset) == 1 and offset == 0): offset = offset.astype(dtype, copy=False) # add the offsets to the scaled cumulative sums out_view += offset[:, np.newaxis] * scaling_factors[np.newaxis, 1:] return out _

使用法：

_data_n = 100000000 data = ((0.5*np.random.randn(data_n)+0.5) % 1) * 100 span = 5000 # span >= 1 alpha = 2/(span+1) # for pandas` span parameter # com = 1000 # com >= 0 # alpha = 1/(1+com) # for pandas` center-of-mass parameter # halflife = 100 # halflife > 0 # alpha = 1 - np.exp(np.log(0.5)/halflife) # for pandas` half-life parameter result = ewma_vectorized_safe(data, alpha) _

ちょっとだけ

特定のalphaについて、そのウィンドウ内のデータの平均への寄与に応じて、「ウィンドウサイズ」（技術的に指数平均には無限の「ウィンドウ」がある）を計算するのは簡単です。これは、たとえば、境界効果により信頼できないものとして処理する結果の開始部分を選択するのに役立ちます。

_def window_size(alpha, sum_proportion): # Increases with increased sum_proportion and decreased alpha # solve (1-alpha)**window_size = (1-sum_proportion) for window_size return int(np.log(1-sum_proportion) / np.log(1-alpha)) alpha = 0.02 sum_proportion = .99 # window covers 99% of contribution to the moving average window = window_size(alpha, sum_proportion) # = 227 sum_proportion = .75 # window covers 75% of contribution to the moving average window = window_size(alpha, sum_proportion) # = 68 _

このスレッドで使用されるalpha = 2 / (window_size + 1.0)関係（ pandas の 'span'オプション）は、上記の関数の逆関数（_sum_proportion~=0.87_を使用）の非常に大まかな近似です。 alpha = 1 - np.exp(np.log(1-sum_proportion)/window_size)はより正確です（pandasの 'half-life'オプションは、この式が_sum_proportion=0.5_に等しい）。

次の例では、dataは連続したノイズの多い信号を表します。 _cutoff_idx_は、resultの最初の位置です。ここで、値の少なくとも99％はdataの個別の値に依存しています（つまり、1％未満はdata [0]に依存します）。 _cutoff_idx_までのデータは、dataの最初の値に過度に依存しているため、最終結果から除外されます。したがって、平均が歪む可能性があります。

_result = ewma_vectorized_safe(data, alpha, chunk_size) sum_proportion = .99 cutoff_idx = window_size(alpha, sum_proportion) result = result[cutoff_idx:] _

上記を解決する問題を説明するために、これを数回実行します。よく見られる赤い線の誤った開始は、_cutoff_idx_の後にスキップされます：

_data_n = 100000 data = np.random.Rand(data_n) * 100 window = 1000 sum_proportion = .99 alpha = 1 - np.exp(np.log(1-sum_proportion)/window) result = ewma_vectorized_safe(data, alpha) cutoff_idx = window_size(alpha, sum_proportion) x = np.arange(start=0, stop=result.size) import matplotlib.pyplot as plt plt.plot(x[:cutoff_idx+1], result[:cutoff_idx+1], '-r', x[cutoff_idx:], result[cutoff_idx:], '-b') plt.show() _

_cutoff_idx==window_には、アルファがwindow_size()関数の逆で、同じ_sum_proportion_で設定されていることに注意してください。これは、pandasがewm(span=window, min_periods=window)を適用する方法に似ています。

Divakar · Answer

私はついにそれをクラックしたと思います！

numpy_ewma -から正しい結果を生成すると主張されている@RaduS's post関数のベクトル化されたバージョンは次のとおりです。

def numpy_ewma_vectorized(data, window): alpha = 2 /(window + 1.0) alpha_rev = 1-alpha scale = 1/alpha_rev n = data.shape[0] r = np.arange(n) scale_arr = scale**r offset = data[0]*alpha_rev**(r+1) pw0 = alpha*alpha_rev**(n-1) mult = data*pw0*scale_arr cumsums = mult.cumsum() out = offset + cumsums*scale_arr[::-1] return out

さらにブースト

次のように、コードを再利用することでさらに強化できます-

def numpy_ewma_vectorized_v2(data, window): alpha = 2 /(window + 1.0) alpha_rev = 1-alpha n = data.shape[0] pows = alpha_rev**(np.arange(n+1)) scale_arr = 1/pows[:-1] offset = data[0]*pows[1:] pw0 = alpha*alpha_rev**(n-1) mult = data*pw0*scale_arr cumsums = mult.cumsum() out = offset + cumsums*scale_arr[::-1] return out

実行時テスト

大きなデータセットの同じループ関数に対するこれら2つの時間を見てみましょう。

In [97]: data = np.random.randint(2,9,(5000)) ...: window = 20 ...: In [98]: np.allclose(numpy_ewma(data, window), numpy_ewma_vectorized(data, window)) Out[98]: True In [99]: np.allclose(numpy_ewma(data, window), numpy_ewma_vectorized_v2(data, window)) Out[99]: True In [100]: %timeit numpy_ewma(data, window) 100 loops, best of 3: 6.03 ms per loop In [101]: %timeit numpy_ewma_vectorized(data, window) 1000 loops, best of 3: 665 µs per loop In [102]: %timeit numpy_ewma_vectorized_v2(data, window) 1000 loops, best of 3: 357 µs per loop In [103]: 6030/357.0 Out[103]: 16.89075630252101

17倍のスピードアップがあります！

James · Answer

df.ewm(alpha=alpha).mean()を使用するのと同等のNumPyを使用した実装を次に示します。ドキュメントを読んだ後は、ほんの数回のマトリックス操作です。トリックは、正しい行列を構築することです。

浮動小数点行列を作成しているため、入力配列が大きすぎる場合は、メモリをすばやく処理できることに注意してください。

import pandas as pd import numpy as np def ewma(x, alpha): ''' Returns the exponentially weighted moving average of x. Parameters: ----------- x : array-like alpha : float {0 <= alpha <= 1} Returns: -------- ewma: numpy array the exponentially weighted moving average ''' # Coerce x to an array x = np.array(x) n = x.size # Create an initial weight matrix of (1-alpha), and a matrix of powers # to raise the weights by w0 = np.ones(shape=(n,n)) * (1-alpha) p = np.vstack([np.arange(i,i-n,-1) for i in range(n)]) # Create the weight matrix w = np.tril(w0**p,0) # Calculate the ewma return np.dot(w, x[::np.newaxis]) / w.sum(axis=1)

テストしてみましょう：

alpha = 0.55 x = np.random.randint(0,30,15) df = pd.DataFrame(x, columns=['A']) df.ewm(alpha=alpha).mean() # returns: # A # 0 13.000000 # 1 22.655172 # 2 20.443268 # 3 12.159796 # 4 14.871955 # 5 15.497575 # 6 20.743511 # 7 20.884818 # 8 24.250715 # 9 18.610901 # 10 17.174686 # 11 16.528564 # 12 17.337879 # 13 7.801912 # 14 12.310889 ewma(x=x, alpha=alpha) # returns: # array([ 13. , 22.65517241, 20.44326778, 12.1597964 , # 14.87195534, 15.4975749 , 20.74351117, 20.88481763, # 24.25071484, 18.61090129, 17.17468551, 16.52856393, # 17.33787888, 7.80191235, 12.31088889])

Alexander McFarlane · Answer

最速のEWMA 23x `pandas`

質問は厳密にnumpyソリューションを求めていますが、OPは実際にはランタイムを高速化するために純粋なnumpyソリューションの直後にあったようです。

私は同様の問題を解決しましたが、代わりに_numba.jit_に注目しました。これは計算時間を大幅に高速化します

_In [24]: a = np.random.random(10**7) ...: df = pd.Series(a) In [25]: %timeit numpy_ewma(a, 10) # /a/42915307/4013571 ...: %timeit df.ewm(span=10).mean() # pandas ...: %timeit numpy_ewma_vectorized_v2(a, 10) # best w/o numba: /a/42926270/4013571 ...: %timeit _ewma(a, 10) # fastest accurate (below) ...: %timeit _ewma_infinite_hist(a, 10) # fastest overall (below) 4.14 s ± 116 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 991 ms ± 52.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 396 ms ± 8.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 181 ms ± 1.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 39.6 ms ± 979 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) _

a = np.random.random(100)のより小さな配列への縮小（結果は同じ順序になります）

_41.6 µs ± 491 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 945 ms ± 12 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 16 µs ± 93.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 1.66 µs ± 13.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) 1.14 µs ± 5.57 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) _

また、以下の関数はpandas（docstrの例を参照）と同じように調整されますが、ここでの回答のいくつかはさまざまな異なる近似を取ることを指摘する価値があります。例えば、

_In [57]: print(pd.DataFrame([1,2,3]).ewm(span=2).mean().values.ravel()) ...: print(numpy_ewma_vectorized_v2(np.array([1,2,3]), 2)) ...: print(numpy_ewma(np.array([1,2,3]), 2)) [1. 1.75 2.61538462] [1. 1.66666667 2.55555556] [1. 1.18181818 1.51239669] _

自分のライブラリ用に文書化したソースコード

_import numpy as np from numba import jit from numba import float64 from numba import int64 @jit((float64[:], int64), nopython=True, nogil=True) def _ewma(arr_in, window): r"""Exponentialy weighted moving average specified by a decay ``window`` to provide better adjustments for small windows via: y[t] = (x[t] + (1-a)*x[t-1] + (1-a)^2*x[t-2] + ... + (1-a)^n*x[t-n]) / (1 + (1-a) + (1-a)^2 + ... + (1-a)^n). Parameters ---------- arr_in : np.ndarray, float64 A single dimenisional numpy array window : int64 The decay window, or 'span' Returns ------- np.ndarray The EWMA vector, same length / shape as ``arr_in`` Examples -------- >>> import pandas as pd >>> a = np.arange(5, dtype=float) >>> exp = pd.DataFrame(a).ewm(span=10, adjust=True).mean() >>> np.array_equal(_ewma_infinite_hist(a, 10), exp.values.ravel()) True """ n = arr_in.shape[0] ewma = np.empty(n, dtype=float64) alpha = 2 / float(window + 1) w = 1 ewma_old = arr_in[0] ewma[0] = ewma_old for i in range(1, n): w += (1-alpha)**i ewma_old = ewma_old*(1-alpha) + arr_in[i] ewma[i] = ewma_old / w return ewma @jit((float64[:], int64), nopython=True, nogil=True) def _ewma_infinite_hist(arr_in, window): r"""Exponentialy weighted moving average specified by a decay ``window`` assuming infinite history via the recursive form: (2) (i) y[0] = x[0]; and (ii) y[t] = a*x[t] + (1-a)*y[t-1] for t>0. This method is less accurate that ``_ewma`` but much faster: In [1]: import numpy as np, bars ...: arr = np.random.random(100000) ...: %timeit bars._ewma(arr, 10) ...: %timeit bars._ewma_infinite_hist(arr, 10) 3.74 ms ± 60.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 262 µs ± 1.54 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Parameters ---------- arr_in : np.ndarray, float64 A single dimenisional numpy array window : int64 The decay window, or 'span' Returns ------- np.ndarray The EWMA vector, same length / shape as ``arr_in`` Examples -------- >>> import pandas as pd >>> a = np.arange(5, dtype=float) >>> exp = pd.DataFrame(a).ewm(span=10, adjust=False).mean() >>> np.array_equal(_ewma_infinite_hist(a, 10), exp.values.ravel()) True """ n = arr_in.shape[0] ewma = np.empty(n, dtype=float64) alpha = 2 / float(window + 1) ewma[0] = arr_in[0] for i in range(1, n): ewma[i] = arr_in[i] * alpha + ewma[i-1] * (1 - alpha) return ewma _

Divakar · Answer

alphaとwindowSizeが与えられた場合、NumPyの対応する動作をシミュレートする方法を次に示します-

def numpy_ewm_alpha(a, alpha, windowSize): wghts = (1-alpha)**np.arange(windowSize) wghts /= wghts.sum() out = np.full(df.shape[0],np.nan) out[windowSize-1:] = np.convolve(a,wghts,'valid') return out

検証のためのサンプル実行-

In [54]: alpha = 0.55 ...: windowSize = 20 ...: In [55]: df = pd.DataFrame(np.random.randint(2,9,(100))) In [56]: out0 = df.ewm(alpha = alpha, min_periods=windowSize).mean().as_matrix().ravel() ...: out1 = numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize) ...: print "Max. error : " + str(np.nanmax(np.abs(out0 - out1))) ...: Max. error : 5.10531254605e-07 In [57]: alpha = 0.75 ...: windowSize = 30 ...: In [58]: out0 = df.ewm(alpha = alpha, min_periods=windowSize).mean().as_matrix().ravel() ...: out1 = numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize) ...: print "Max. error : " + str(np.nanmax(np.abs(out0 - out1))) Max. error : 8.881784197e-16

より大きなデータセットでのランタイムテスト-

In [61]: alpha = 0.55 ...: windowSize = 20 ...: In [62]: df = pd.DataFrame(np.random.randint(2,9,(10000))) In [63]: %timeit df.ewm(alpha = alpha, min_periods=windowSize).mean() 1000 loops, best of 3: 851 µs per loop In [64]: %timeit numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize) 1000 loops, best of 3: 204 µs per loop

さらにブースト

さらにパフォーマンスを向上させるために、NaNによる初期化を回避し、代わりにnp.convolveから出力される配列を使用することができます。

def numpy_ewm_alpha_v2(a, alpha, windowSize): wghts = (1-alpha)**np.arange(windowSize) wghts /= wghts.sum() out = np.convolve(a,wghts) out[:windowSize-1] = np.nan return out[:a.size]

タイミング-

In [117]: alpha = 0.55 ...: windowSize = 20 ...: In [118]: df = pd.DataFrame(np.random.randint(2,9,(10000))) In [119]: %timeit numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize) 1000 loops, best of 3: 204 µs per loop In [120]: %timeit numpy_ewm_alpha_v2(df.values.ravel(), alpha = alpha, windowSize = windowSize) 10000 loops, best of 3: 195 µs per loop

RaduS · Answer

Oがその間に思いついた別の解決策があります。 pandasソリューションよりも約4倍高速です。

def numpy_ewma(data, window): returnArray = np.empty((data.shape[0])) returnArray.fill(np.nan) e = data[0] alpha = 2 / float(window + 1) for s in range(data.shape[0]): e = ((data[s]-e) *alpha ) + e returnArray[s] = e return returnArray

この式を出発点として使用しました。これはさらに改善できると確信していますが、少なくとも出発点です。

Danny · Answer

@Divakarの答えは、対処するときにオーバーフローを引き起こすようです

numpy_ewma_vectorized(np.random.random(500000), 10)

私が使用しているのは：

def EMA(input, time_period=10): # For time period = 10 t_ = time_period - 1 ema = np.zeros_like(input,dtype=float) multiplier = 2.0 / (time_period + 1) #multiplier = 1 - multiplier for i in range(len(input)): # Special Case if i > t_: ema[i] = (input[i] - ema[i-1]) * multiplier + ema[i-1] else: ema[i] = np.mean(input[:i+1]) return ema

ただし、これはパンダのソリューションよりもはるかに遅いです：

from pandas import ewma as pd_ema def EMA_fast(X, time_period = 10): out = pd_ema(X, span=time_period, min_periods=time_period) out[:time_period-1] = np.cumsum(X[:time_period-1]) / np.asarray(range(1,time_period)) return out

Samuel Utomo · Answer

この答えは無関係に思えるかもしれません。ただし、NumPyを使用して指数加重分散（および標準偏差）も計算する必要がある場合は、次のソリューションが役立ちます。

import numpy as np def ew(a, alpha, winSize): _alpha = 1 - alpha ws = _alpha ** np.arange(winSize) w_sum = ws.sum() ew_mean = np.convolve(a, ws)[winSize - 1] / w_sum bias = (w_sum ** 2) / ((w_sum ** 2) - (ws ** 2).sum()) ew_var = (np.convolve((a - ew_mean) ** 2, ws)[winSize - 1] / w_sum) * bias ew_std = np.sqrt(ew_var) return (ew_mean, ew_var, ew_std)

Gabriel_F · Answer

@Divakarのソリューションのおかげで、それは非常に高速です。ただし、@ Dannyが指摘したオーバーフローの問題が発生します。長さが13835を超えると、関数は正しい答えを返しません。

以下はDivakarのソリューションとpandas.ewm（）。mean（）に基づく私のソリューションです

def numpy_ema(data, com=None, span=None, halflife=None, alpha=None): """Summary Calculate ema with automatically-generated alpha. Weight of past effect decreases as the length of window increasing. # these functions reproduce the pandas result when the flag adjust=False is set. References: https://stackoverflow.com/questions/42869495/numpy-version-of-exponential-weighted-moving-average-equivalent-to-pandas-ewm Args: data (TYPE): Description com (float, optional): Specify decay in terms of center of mass, alpha=1/(1+com), for com>=0 span (float, optional): Specify decay in terms of span, alpha=2/(span+1), for span>=1 halflife (float, optional): Specify decay in terms of half-life, alpha=1-exp(log(0.5)/halflife), for halflife>0 alpha (float, optional): Specify smoothing factor alpha directly, 0<alpha<=1 Returns: TYPE: Description Raises: ValueError: Description """ n_input = sum(map(bool, [com, span, halflife, alpha])) if n_input != 1: raise ValueError( 'com, span, halflife, and alpha are mutually exclusive') nrow = data.shape[0] if np.isnan(data).any() or (nrow > 13835) or (data.ndim == 2): df = pd.DataFrame(data) df_ewm = df.ewm(com=com, span=span, halflife=halflife, alpha=alpha, adjust=False) out = df_ewm.mean().values.squeeze() else: if com: alpha = 1 / (1 + com) Elif span: alpha = 2 / (span + 1.0) Elif halflife: alpha = 1 - np.exp(np.log(0.5) / halflife) alpha_rev = 1 - alpha pows = alpha_rev**(np.arange(nrow + 1)) scale_arr = 1 / pows[:-1] offset = data[0] * pows[1:] pw0 = alpha * alpha_rev**(nrow - 1) mult = data * pw0 * scale_arr cumsums = np.cumsum(mult) out = offset + cumsums * scale_arr[::-1] return out

handy0815 · Answer

これは、ウィンドウサイズが無限の1D入力配列の実装です。大きな数を使用するため、float32を使用する場合、絶対値<1e16の要素を持つ入力配列でのみ機能しますが、通常はそうである必要があります。

アイデアは、オーバーフローが発生しないように入力配列を制限された長さのスライスに再形成し、各スライスで個別にewm計算を行うことです。

def ewm(x, alpha): """ Returns the exponentially weighted mean y of a numpy array x with scaling factor alpha y[0] = x[0] y[j] = (1. - alpha) * y[j-1] + alpha * x[j], for j > 0 x -- 1D numpy array alpha -- float """ n = int(-100. / np.log(1.-alpha)) # Makes sure that the first and last elements in f are very big and very small (about 1e22 and 1e-22) f = np.exp(np.arange(1-n, n, 2) * (0.5 * np.log(1. - alpha))) # Scaling factor for each slice tmp = (np.resize(x, ((len(x) + n - 1) // n, n)) / f * alpha).cumsum(axis=1) * f # Get ewm for each slice of length n # Add the last value of each previous slice to the next slice with corresponding scaling factor f and return result return np.resize(tmp + np.tensordot(np.append(x[0], np.roll(tmp.T[n-1], 1)[1:]), f * ((1. - alpha) / f[0]), axes=0), len(x))

kosnik · Answer

Divakar の優れた答えの上に構築されたものは、pandas関数のadjust=Trueフラグに対応する実装です。再帰。

def numpy_ewma(data, window): alpha = 2 /(window + 1.0) scale = 1/(1-alpha) n = data.shape[0] scale_arr = (1-alpha)**(-1*np.arange(n)) weights = (1-alpha)**np.arange(n) pw0 = (1-alpha)**(n-1) mult = data*pw0*scale_arr cumsums = mult.cumsum() out = cumsums*scale_arr[::-1] / weights.cumsum() return out

pandas.ewm（）。mean（）と同等の「指数加重移動平均」のNumPyバージョン

最速のEWMA 23x pandas

最速のEWMA 23x `pandas`