ゼロが現れるまでリストの累積合計を計算します

Question

ゼロと1がランダムに表示される（長い）リストがあります。

list_a = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]

List_bを取得したい

0が現れるまでのリストの合計
0が表示される場合、リスト内の0を保持します
```
list_b = [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3] 
```

これを次のように実装できます。

list_b = [] for i, x in enumerate(list_a): if x == 0: list_b.append(x) else: sum_value = 0 for j in list_a[i::-1]: if j != 0: sum_value += j else: break list_b.append(sum_value) print(list_b)

しかし、実際のリストの長さは非常に長いです。

それで、高速化のためにコードを改善したいです。（読めない場合）

このようにコードを変更します：

from itertools import takewhile list_c = [sum(takewhile(lambda x: x != 0, list_a[i::-1])) for i, d in enumerate(list_a)] print(list_c)

しかし、それは十分に高速ではありません。より効率的な方法でそれを行うにはどうすればよいですか？

cs95 · Answer

あなたはこれを考え直しています。

オプション1
現在の値が0であるかどうかに基づいて、インデックスを反復処理し、それに応じて更新（累積合計を計算）することができます。

data = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1] for i in range(1, len(data)): if data[i]: data[i] += data[i - 1]

つまり、現在の要素がゼロ以外の場合、現在のインデックスの要素を、現在の値と前のインデックスの値の合計として更新します。

print(data) [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]

これにより、リストが更新されます。 new_data = data.copy()が必要ない場合は、事前にコピーを作成して、同じ方法でnew_dataを反復処理できます。

オプション2
パフォーマンスが必要な場合は、pandas APIを使用できます。 0sの配置に基づいてグループを検索し、groupby + cumsumを使用して、上記と同様にグループごとの累積合計を計算します。

import pandas as pd s = pd.Series(data) data = s.groupby(s.eq(0).cumsum()).cumsum().tolist()

print(data) [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]

パフォーマンス

まず、セットアップ-

data = data * 100000 s = pd.Series(data)

次、

%%timeit new_data = data.copy() for i in range(1, len(data)): if new_data[i]: new_data[i] += new_data[i - 1] 328 ms ± 4.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

そして、コピーのタイミングを個別に調整し、

%timeit data.copy() 8.49 ms ± 17.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

そのため、コピーにはそれほど時間はかかりません。最後に、

%timeit s.groupby(s.eq(0).cumsum()).cumsum().tolist() 122 ms ± 1.69 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

pandasアプローチは概念的には線形（他のアプローチと同様）ですが、ライブラリの実装により一定の速度で高速化されています。

Chris_Rands · Answer

コンパクトなネイティブPythonおそらく最もメモリ効率が高いが、最速ではありません（コメントを参照）のソリューション）が必要な場合は、itertoolsから広範囲に描画できます。

_>>> from itertools import groupby, accumulate, chain >>> list(chain.from_iterable(accumulate(g) for _, g in groupby(list_a, bool))) [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3] _

ここでの手順は、リストを_0_（偽）の存在に基づいてサブリストにグループ化し、各サブリスト内の値の累積合計を取り、サブリストをフラット化します。

Stefan Pochmann コメントのように、リストの内容がバイナリの場合（_1_ sと_0_ sのみで構成される場合）、groupby()であり、恒等関数にフォールバックします。この場合、boolを使用するよりも〜30％高速です。

_>>> list(chain.from_iterable(accumulate(g) for _, g in groupby(list_a))) [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3] _

MSeifert · Answer

個人的に私はこのようなシンプルなジェネレーターを好むでしょう：

def gen(lst): cumulative = 0 for item in lst: if item: cumulative += item else: cumulative = 0 yield cumulative

魔法は何もありません（yieldの仕組みを知っているとき）、読みやすく、かなり高速です。

さらにパフォーマンスが必要な場合は、これをCython拡張タイプとしてラップすることもできます（ここではIPythonを使用しています）。これにより、「わかりやすい」部分が失われ、「重い依存関係」が必要になります。

%load_ext cython %%cython cdef class Cumulative(object): cdef object it cdef object cumulative def __init__(self, it): self.it = iter(it) self.cumulative = 0 def __iter__(self): return self def __next__(self): cdef object nxt = next(self.it) if nxt: self.cumulative += nxt else: self.cumulative = 0 return self.cumulative

両方を消費する必要があります。たとえば、listを使用して目的の出力を提供します。

>>> list_a = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1] >>> list(gen(list_a)) [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3] >>> list(Cumulative(list_a)) [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]

しかし、あなたが速度について尋ねたので、私は私のタイミングからの結果を共有したかった：

import pandas as pd import numpy as np import random import pandas as pd from itertools import takewhile from itertools import groupby, accumulate, chain def MSeifert(lst): return list(MSeifert_inner(lst)) def MSeifert_inner(lst): cumulative = 0 for item in lst: if item: cumulative += item else: cumulative = 0 yield cumulative def MSeifert2(lst): return list(Cumulative(lst)) def original1(list_a): list_b = [] for i, x in enumerate(list_a): if x == 0: list_b.append(x) else: sum_value = 0 for j in list_a[i::-1]: if j != 0: sum_value += j else: break list_b.append(sum_value) def original2(list_a): return [sum(takewhile(lambda x: x != 0, list_a[i::-1])) for i, d in enumerate(list_a)] def Coldspeed1(data): data = data.copy() for i in range(1, len(data)): if data[i]: data[i] += data[i - 1] return data def Coldspeed2(data): s = pd.Series(data) return s.groupby(s.eq(0).cumsum()).cumsum().tolist() def Chris_Rands(list_a): return list(chain.from_iterable(accumulate(g) for _, g in groupby(list_a, bool))) def EvKounis(list_a): cum_sum = 0 list_b = [] for item in list_a: if not item: # if our item is 0 cum_sum = 0 # the cumulative sum is reset (set back to 0) else: cum_sum += item # otherwise it sums further list_b.append(cum_sum) # and no matter what it gets appended to the result def schumich(list_a): list_b = [] s = 0 for a in list_a: s = a+s if a !=0 else 0 list_b.append(s) return list_b def jbch(seq): return list(jbch_inner(seq)) def jbch_inner(seq): s = 0 for n in seq: s = 0 if n == 0 else s + n yield s # Timing setup timings = {MSeifert: [], MSeifert2: [], original1: [], original2: [], Coldspeed1: [], Coldspeed2: [], Chris_Rands: [], EvKounis: [], schumich: [], jbch: []} sizes = [2**i for i in range(1, 20, 2)] # Timing for size in sizes: print(size) func_input = [int(random.random() < 0.75) for _ in range(size)] for func in timings: if size > 10000 and (func is original1 or func is original2): continue res = %timeit -o func(func_input) # if you use IPython, otherwise use the "timeit" module timings[func].append(res) %matplotlib notebook import matplotlib.pyplot as plt import numpy as np fig = plt.figure(1) ax = plt.subplot(111) baseline = MSeifert2 # choose one function as baseline for func in timings: ax.plot(sizes[:len(timings[func])], [time.best / ref.best for time, ref in Zip(timings[func], timings[baseline])], label=func.__name__) # you could also use "func.__name__" here instead ax.set_ylim(0.8, 1e4) ax.set_yscale('log') ax.set_xscale('log') ax.set_xlabel('size') ax.set_ylabel('time relative to {}'.format(baseline)) # you could also use "func.__name__" here instead ax.grid(which='both') ax.legend() plt.tight_layout()

正確な結果に興味がある場合は、 this Gist に入れます。

これは、対数プロットであり、Cythonの回答に関連しています。要するに：低いほど速くなり、2つの大きな目盛りの間の範囲は1桁を表します。

したがって、あなたが持っていた解決策を除いて、すべての解決策は1桁以内（少なくともリストが大きい場合）になる傾向があります。不思議なことに、pandasソリューションは、純粋なPythonアプローチと比較して非常に遅い。ただし、Cythonソリューションは、他のすべてのアプローチよりも2倍高い。

Ev. Kounis · Answer

本当に必要のないときに、投稿したコードでインデックスを使いすぎています。 累積合計を追跡し、0会うたびに0。

list_a = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1] cum_sum = 0 list_b = [] for item in list_a: if not item: # if our item is 0 cum_sum = 0 # the cumulative sum is reset (set back to 0) else: cum_sum += item # otherwise it sums further list_b.append(cum_sum) # and no matter what it gets appended to the result print(list_b) # -> [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]

Shubham Mishra · Answer

質問された質問のように複雑である必要はありません。非常に単純なアプローチはこれです。

list_a = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1] list_b = [] s = 0 for a in list_a: s = a+s if a !=0 else 0 list_b.append(s) print list_b

jbch · Answer

パフォーマンスが必要な場合はジェネレーターを使用します（それも簡単です）。

def weird_cumulative_sum(seq): s = 0 for n in seq: s = 0 if n == 0 else s + n yield s list_b = list(weird_cumulative_sum(list_a_))

私はあなたがそれより良くなるとは思わない、どんな場合でもあなたは少なくとも一度list_aを反復しなければならないだろう。

結果のlist（）を呼び出してコードのようにリストを取得しますが、list_bを使用するコードがforループまたは結果をリストに1回だけ反復する場合、結果をリストに変換する必要はないことに注意してくださいジェネレーター。

Xavier Guihot · Answer

Python 3.8の開始、および代入式（PEP 572）（:=演算子）の導入により、リスト内包内で変数を使用およびインクリメントできます。

# items = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1] total = 0 [total := (total + x if x else x) for x in items] # [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]

この：

変数totalを、実行中の合計を表す0に初期化します
各アイテムについて、これは両方：
- 代入式を介して現在のループされたアイテム（total := total + x）でtotalをインクリメントするか、アイテムが0の場合は0に戻す
- 同時に、xをtotalの新しい値にマップします