PythonがリストにCSVをインポート

Question

約2000レコードのCSVファイルがあります。

各レコードには文字列とそれに対するカテゴリがあります。

This is the first line, Line1 This is the second line, Line2 This is the third line, Line3

このファイルを次のようなリストに読み込む必要があります。

List = [('This is the first line', 'Line1'), ('This is the second line', 'Line2'), ('This is the third line', 'Line3')]

このcsvをPythonを使って必要なリストにインポートするにはどうすればいいですか？

Maciej Gol · Answer

csv モジュールを使用してください（Python 2.x）。

import csv with open('file.csv', 'rb') as f: reader = csv.reader(f) your_list = list(reader) print your_list # [['This is the first line', 'Line1'], # ['This is the second line', 'Line2'], # ['This is the third line', 'Line3']]

タプルが必要な場合：

import csv with open('test.csv', 'rb') as f: reader = csv.reader(f) your_list = map(Tuple, reader) print your_list # [('This is the first line', ' Line1'), # ('This is the second line', ' Line2'), # ('This is the third line', ' Line3')]

Python 3.xバージョン（以下@seokhoonleeによる）

import csv with open('file.csv', 'r') as f: reader = csv.reader(f) your_list = list(reader) print(your_list) # [['This is the first line', 'Line1'], # ['This is the second line', 'Line2'], # ['This is the third line', 'Line3']]

seokhoonlee · Answer

Python3 /用に更新

import csv with open('file.csv', 'r') as f: reader = csv.reader(f) your_list = list(reader) print(your_list) # [['This is the first line', 'Line1'], # ['This is the second line', 'Line2'], # ['This is the third line', 'Line3']]

Martin Thoma · Answer

Pandas はデータを扱うのが得意です。これを使用する方法の一例です。

import pandas as pd # Read the CSV into a pandas data frame (df) # With a df you can do many things # most important: visualize data with Seaborn df = pd.read_csv('filename.csv', delimiter=',') # Or export it in many ways, e.g. a list of tuples tuples = [Tuple(x) for x in df.values] # or export it as a list of dicts dicts = df.to_dict().values()

1つの大きな利点は、パンダが自動的にヘッダー行を処理することです。

Seaborn について聞いたことがない場合は、ぜひご覧ください。

参照： PythonでCSVファイルを読み書きするにはどうすればよいですか。

パンダ＃2

import pandas as pd # Get data - reading the CSV file import mpu.pd df = mpu.pd.example_df() # Convert dicts = df.to_dict('records')

Dfの内容は次のとおりです。

 country population population_time EUR 0 Germany 82521653.0 2016-12-01 True 1 France 66991000.0 2017-01-01 True 2 Indonesia 255461700.0 2017-01-01 False 3 Ireland 4761865.0 NaT True 4 Spain 46549045.0 2017-06-01 True 5 Vatican NaN NaT True

辞書の内容は

[{'country': 'Germany', 'population': 82521653.0, 'population_time': Timestamp('2016-12-01 00:00:00'), 'EUR': True}, {'country': 'France', 'population': 66991000.0, 'population_time': Timestamp('2017-01-01 00:00:00'), 'EUR': True}, {'country': 'Indonesia', 'population': 255461700.0, 'population_time': Timestamp('2017-01-01 00:00:00'), 'EUR': False}, {'country': 'Ireland', 'population': 4761865.0, 'population_time': NaT, 'EUR': True}, {'country': 'Spain', 'population': 46549045.0, 'population_time': Timestamp('2017-06-01 00:00:00'), 'EUR': True}, {'country': 'Vatican', 'population': nan, 'population_time': NaT, 'EUR': True}]

パンダ＃3

import pandas as pd # Get data - reading the CSV file import mpu.pd df = mpu.pd.example_df() # Convert tuples = [[row[col] for col in df.columns] for row in df.to_dict('records')]

tuplesの内容は次のとおりです。

[['Germany', 82521653.0, Timestamp('2016-12-01 00:00:00'), True], ['France', 66991000.0, Timestamp('2017-01-01 00:00:00'), True], ['Indonesia', 255461700.0, Timestamp('2017-01-01 00:00:00'), False], ['Ireland', 4761865.0, NaT, True], ['Spain', 46549045.0, Timestamp('2017-06-01 00:00:00'), True], ['Vatican', nan, NaT, True]]

Algebra · Answer

Python3の更新：

import csv from pprint import pprint with open('text.csv', newline='') as file: reader = csv.reader(file) l = list(map(Tuple, reader)) pprint(l) [('This is the first line', ' Line1'), ('This is the second line', ' Line2'), ('This is the third line', ' Line3')]

Csvfileがファイルオブジェクトの場合、newline=''で開く必要があります。
csvモジュール

Miquel · Answer

カテゴリを区切る以外に入力にカンマがないことが確実な場合は、,で 1行ずつファイルを読むと split を実行し、結果をListにプッシュできます

とは言っても、CSVファイルを見ているように見えるので、 the modules を使用することを検討してください。

Acid_Snake · Answer

result = [] for line in text.splitlines(): result.append(Tuple(line.split(",")))

Francesco Boi · Answer

すでにコメントで述べたように、あなたはpythonでcsvライブラリを使うことができます。 csvとは、カンマで区切られた値を意味します。これは、ラベルと値をカンマで区切ったものです。

カテゴリと値の型なので、タプルのリストではなく辞書型を使用します。

とにかく以下のコードで私は両方の方法を示します：dは辞書で、lはタプルのリストです。

import csv file_name = "test.txt" try: csvfile = open(file_name, 'rt') except: print("File not found") csvReader = csv.reader(csvfile, delimiter=",") d = dict() l = list() for row in csvReader: d[row[1]] = row[0] l.append((row[0], row[1])) print(d) print(l)

Jason Boucher · Answer

以下は、Python 3.xでCSVを多次元配列にインポートする最も簡単な方法であり、何もインポートせずにその4行のコードだけです！

#pull a CSV into a multidimensional array in 4 lines! L=[] #Create an empty list for the main array for line in open('log.txt'): #Open the file and read all the lines x=line.rstrip() #Strip the 
 from each line L.append(x.split(',')) #Split each line into a list and add it to the #Multidimensional array print(L)

Jan Vlcinsky · Answer

要件を少し拡張し、行の順序を気にせずにそれらをカテゴリにグループ化したい場合は、次の解決策が役立ちます。

>>> fname = "lines.txt" >>> from collections import defaultdict >>> dct = defaultdict(list) >>> with open(fname) as f: ... for line in f: ... text, cat = line.rstrip("
").split(",", 1) ... dct[cat].append(text) ... >>> dct defaultdict(<type 'list'>, {' CatA': ['This is the first line', 'This is the another line'], ' CatC': ['This is the third line'], ' CatB': ['This is the second line', 'This is the last line']})

このようにしてあなたはカテゴリーであるキーの下で辞書で利用可能なすべての関連する行を取得します。

Hunter McMillen · Answer

単純なループで十分です。

lines = [] with open('test.txt', 'r') as f: for line in f.readlines(): l,name = line.strip().split(',') lines.append((l,name)) print lines

Alexey Antonenko · Answer

次は、csvモジュールを使用しますが、csvテーブルのヘッダーである最初の行を使用してfile.csvの内容を辞書のリストに抽出するコードです。

import csv def csv2dicts(filename): with open(filename, 'rb') as f: reader = csv.reader(f) lines = list(reader) if len(lines) < 2: return None names = lines[0] if len(names) < 1: return None dicts = [] for values in lines[1:]: if len(values) != len(names): return None d = {} for i,_ in enumerate(names): d[names[i]] = values[i] dicts.append(d) return dicts return None if __== '__main__': your_list = csv2dicts('file.csv') print your_list