pandas pythonデータフレーム内のレコードまたは行の数をカウントする方法

Question

パンダにとって明らかに新しい。データフレーム内のレコード数を単純にカウントするにはどうすればよいですか。

私はこれがそれをするのと同じくらい簡単なことを考えたでしょう、そして、私は検索で答えさえ見つけることができないようです...おそらくそれがあまりに単純であるので。

cnt = df.count print cnt

上記のコードは実際にはdf全体を出力するだけです

tshauck · Accepted Answer

あなたの質問に関して... 1つのフィールドを数える？私はそれを質問にすることにしましたが、それが役立つことを願っています...

次のDataFrameがあるとします

import numpy as np import pandas as pd df = pd.DataFrame(np.random.normal(0, 1, (5, 2)), columns=["A", "B"])

単一の列を数えるには

df.A.count() #or df['A'].count()

両方とも5と評価されます。

クールなこと（または多くのw.r.t. pandasの1つ）は、NA値がある場合、countがそれを考慮に入れることです。

だから私がやったら

df['A'][1::2] = np.NAN df.count()

結果は

 A 3 B 5

user2314737 · Answer

データフレームの行数を取得するには、次を使用します。

_df.shape[0] _

（および列数を取得するには_df.shape[1]_）。

代わりに使用できます

_len(df) _

または

_len(df.index) _

（および列のlen(df.columns)）

shapeは、特にインタラクティブな作業（最後に追加するだけでよい）の場合、len()よりも用途が広く便利ですが、lenは少し高速です（参照：この答え）。

回避するために： count() 返されるのは要求された軸上のNA/null観測

len(df.index)は高速です

_import pandas as pd import numpy as np df = pd.DataFrame(np.arange(24).reshape(8, 3),columns=['A', 'B', 'C']) df['A'][5]=np.nan df # Out: # A B C # 0 0 1 2 # 1 3 4 5 # 2 6 7 8 # 3 9 10 11 # 4 12 13 14 # 5 NaN 16 17 # 6 18 19 20 # 7 21 22 23 %timeit df.shape[0] # 100000 loops, best of 3: 4.22 µs per loop %timeit len(df) # 100000 loops, best of 3: 2.26 µs per loop %timeit len(df.index) # 1000000 loops, best of 3: 1.46 µs per loop _

_df.__len___はlen(df.index)の単なる呼び出しです

_import inspect print(inspect.getsource(pd.DataFrame.__len__)) # Out: # def __len__(self): # """Returns length of info axis, but here we use the index """ # return len(self.index) _

count()を使用すべきではない理由

_df.count() # Out: # A 7 # B 8 # C 8 _

Surya · Answer

単純に、row_num = df.shape [0]＃行数を示します。ここに例を示します：

import pandas as pd import numpy as np In [322]: df = pd.DataFrame(np.random.randn(5,2), columns=["col_1", "col_2"]) In [323]: df Out[323]: col_1 col_2 0 -0.894268 1.309041 1 -0.120667 -0.241292 2 0.076168 -1.071099 3 1.387217 0.622877 4 -0.488452 0.317882 In [324]: df.shape Out[324]: (5, 2) In [325]: df.shape[0] ## Gives no. of rows/records Out[325]: 5 In [326]: df.shape[1] ## Gives no. of columns Out[326]: 2

ekta · Answer

上記のNanの例では1つのピースが欠落しているため、一般的ではありません。これをより「一般的に」行うには、df['column_name'].value_counts()を使用します。これにより、その列の各値のカウントが得られます。

d=['A','A','A','B','C','C'," " ," "," "," "," ","-1"] # for simplicity df=pd.DataFrame(d) df.columns=["col1"] df["col1"].value_counts() 5 A 3 C 2 -1 1 B 1 dtype: int64 """len(df) give you 12, so we know the rest must be Nan's of some form, while also having a peek into other invalid entries, especially when you might want to ignore them like -1, 0 , "", also"""