スパース行列（csc_matrix）をpandasデータフレームに変換します

Question

この行列をpandasデータフレームに変換したい。 csc_matrix

括弧内のfirst番号は、index、second数値は列および数値最後にはデータです。

テキスト分析で特徴選択を行うためにこれを実行したいと思います。最初の数字はドキュメントを表し、2番目はWordの特徴であり、最後の数字はTFIDFスコアです。

データフレームを取得すると、テキスト分析の問題をデータ分析に変換するのに役立ちます。

Alexander · Accepted Answer

from scipy.sparse import csc_matrix csc = csc_matrix(np.array( [[0, 0, 4, 0, 0, 0], [1, 0, 0, 0, 2, 0], [2, 0, 0, 1, 0, 0], [0, 0, 0, 0, 0, 1], [4, 0, 3, 2, 0, 0]])) # Return a Coordinate (coo) representation of the Compresses-Sparse-Column (csc) matrix. coo = csc.tocoo(copy=False) # Access `row`, `col` and `data` properties of coo matrix. >>> pd.DataFrame({'index': coo.row, 'col': coo.col, 'data': coo.data} )[['index', 'col', 'data']].sort_values(['index', 'col'] ).reset_index(drop=True) index col data 0 0 2 4 1 1 0 1 2 1 4 2 3 2 0 2 4 2 3 1 5 3 5 1 6 4 0 4 7 4 2 3 8 4 3 2