PythonでExcelファイルを読む

Question

私はExcelファイルを持っています

Arm_id DSPName DSPCode HubCode PinCode PPTL 1 JaVAS 01 AGR 282001 1,2 2 JaVAS 01 AGR 282002 3,4 3 JaVAS 01 AGR 282003 5,6

文字列をArm_id,DSPCode,Pincodeの形式で保存します。このフォーマットは設定可能です、すなわちそれはDSPCode,Arm_id,Pincodeに変わるかもしれません。私はそれのようなリストにそれをフォーマットを保存する

FORMAT = ['Arm_id', 'DSPName', 'Pincode']

FORMATが設定可能であると仮定して、与えられた名前で特定のカラムの内容を読むにはどうすればいいですか。

これが私が試したことです。現在、ファイル内のすべてのコンテンツを読み取ることができます

from xlrd import open_workbook wb = open_workbook('sample.xls') for s in wb.sheets(): #print 'Sheet:',s.name values = [] for row in range(s.nrows): col_value = [] for col in range(s.ncols): value = (s.cell(row,col).value) try : value = str(int(value)) except : pass col_value.append(value) values.append(col_value) print values

私の出力は

[[u'Arm_id', u'DSPName', u'DSPCode', u'HubCode', u'PinCode', u'PPTL'], ['1', u'JaVAS', '1', u'AGR', '282001', u'1,2'], ['2', u'JaVAS', '1', u'AGR', '282002', u'3,4'], ['3', u'JaVAS', '1', u'AGR', '282003', u'5,6']]

それからvalues[0]内のFORMATの内容を探してvalues[0]内のArm_id, DSPname and Pincodeのインデックスを取得しようとしてvalues[0]をループし、次のループからすべてのFORMAT要素のインデックスを知っています。。

しかし、これはとても悪い解決策です。

Excelファイルで特定の列の名前を持つ値を取得するにはどうすればよいですか？

tamasgal · Accepted Answer

これは一つのアプローチです：

from xlrd import open_workbook class Arm(object): def __init__(self, id, dsp_name, dsp_code, hub_code, pin_code, pptl): self.id = id self.dsp_name = dsp_name self.dsp_code = dsp_code self.hub_code = hub_code self.pin_code = pin_code self.pptl = pptl def __str__(self): return("Arm object:
" " Arm_id = {0}
" " DSPName = {1}
" " DSPCode = {2}
" " HubCode = {3}
" " PinCode = {4} 
" " PPTL = {5}" .format(self.id, self.dsp_name, self.dsp_code, self.hub_code, self.pin_code, self.pptl)) wb = open_workbook('sample.xls') for sheet in wb.sheets(): number_of_rows = sheet.nrows number_of_columns = sheet.ncols items = [] rows = [] for row in range(1, number_of_rows): values = [] for col in range(number_of_columns): value = (sheet.cell(row,col).value) try: value = str(int(value)) except ValueError: pass finally: values.append(value) item = Arm(*values) items.append(item) for item in items: print item print("Accessing one single value (eg. DSPName): {0}".format(item.dsp_name)) print

カスタムクラスを使う必要はありません。単にdict()を取ることができます。ただし、クラスを使用している場合は、上記のようにドット表記を使用してすべての値にアクセスできます。

上記のスクリプトの出力は次のとおりです。

Arm object: Arm_id = 1 DSPName = JaVAS DSPCode = 1 HubCode = AGR PinCode = 282001 PPTL = 1 Accessing one single value (eg. DSPName): JaVAS Arm object: Arm_id = 2 DSPName = JaVAS DSPCode = 1 HubCode = AGR PinCode = 282002 PPTL = 3 Accessing one single value (eg. DSPName): JaVAS Arm object: Arm_id = 3 DSPName = JaVAS DSPCode = 1 HubCode = AGR PinCode = 282003 PPTL = 5 Accessing one single value (eg. DSPName): JaVAS

sheinis · Answer

やや遅い答えですが、パンダを使うと、Excelファイルの列を直接取得することができます。

import pandas import xlrd df = pandas.read_Excel('sample.xls') #print the column names print df.columns #get the values for a given column values = df['Arm_id'].values #get a data frame with selected columns FORMAT = ['Arm_id', 'DSPName', 'Pincode'] df_selected = df[FORMAT]

Noel Evans · Answer

そのため、重要な部分は、ヘッダーを取得し（col_names = s.row(0)）、行を繰り返すときに、必要でない最初の行をスキップすることです（for row in range(1, s.nrows)）。これは、1から前方の範囲（暗黙の0ではなく）を使用します。その後、Zipを使用して、列のヘッダーとして 'name'を保持している行をステップスルーします。

from xlrd import open_workbook wb = open_workbook('Book2.xls') values = [] for s in wb.sheets(): #print 'Sheet:',s.name for row in range(1, s.nrows): col_names = s.row(0) col_value = [] for name, col in Zip(col_names, range(s.ncols)): value = (s.cell(row,col).value) try : value = str(int(value)) except : pass col_value.append((name.value, value)) values.append(col_value) print values

Mahabubuzzaman · Answer

パンダを使うことでExcelを簡単に読むことができます。

import pandas as pd import xlrd as xl from pandas import ExcelWriter from pandas import ExcelFile DataF=pd.read_Excel("Test.xlsx",sheet_name='Sheet1') print("Column headings:") print(DataF.columns)

でテストします。 https://repl.it 参照： https://pythonspot.com/read-Excel-with-pandas/

poida · Answer

私がとったアプローチは最初の行からヘッダ情報を読み、興味のあるカラムのインデックスを決定します。

質問の中で、値も文字列に出力したいと述べました。 FORMAT列リストからの出力用のフォーマット文字列を動的に作成します。行は、値の文字列に改行charで区切られて追加されます。

出力列の順序は、FORMATリスト内の列名の順序によって決まります。

以下の私のコードでは、FORMATリストの列名の大文字小文字の区別が重要です。上記の質問では、FORMATリストに「Pincode」が表示されていますが、Excelには「PinCode」が表示されています。これは以下ではうまくいきません、それは 'PinCode'である必要があるでしょう。

from xlrd import open_workbook wb = open_workbook('sample.xls') FORMAT = ['Arm_id', 'DSPName', 'PinCode'] values = "" for s in wb.sheets(): headerRow = s.row(0) columnIndex = [x for y in FORMAT for x in range(len(headerRow)) if y == firstRow[x].value] formatString = ("%s,"*len(columnIndex))[0:-1] + "
" for row in range(1,s.nrows): currentRow = s.row(row) currentRowValues = [currentRow[x].value for x in columnIndex] values += formatString % Tuple(currentRowValues) print values

上記のサンプル入力では、次のコードが出力されます。

>>> 1.0,JaVAS,282001.0 2.0,JaVAS,282002.0 3.0,JaVAS,282003.0

私はpythonのnoobなので、この答え、この答え、この質問、この質問そしてこの答え。

harsha vardhan · Answer

Excelファイルを読み取り、列1に存在するすべてのセル（最初のセル、つまりヘッダーを除く）を印刷するコードを次に示します。

import xlrd file_location="C:\pythonprog\xxx.xlsv" workbook=xlrd.open_workbook(file_location) sheet=workbook.sheet_by_index(0) print(sheet.cell_value(0,0)) for row in range(1,sheet.nrows): print(sheet.cell_value(row,0))

TSeymour · Answer

私はほとんどいつもこれにパンダを使っていますが、私の現在の小さなツールは実行ファイルにパッケージされており、パンダを含むのはやり過ぎです。そこで私は poida のソリューションのバージョンを作成しました。その結果、名前付きタプルのリストができました。この変更を加えた彼のコードは次のようになります。

from xlrd import open_workbook from collections import namedtuple from pprint import pprint wb = open_workbook('sample.xls') FORMAT = ['Arm_id', 'DSPName', 'PinCode'] OneRow = namedtuple('OneRow', ' '.join(FORMAT)) all_rows = [] for s in wb.sheets(): headerRow = s.row(0) columnIndex = [x for y in FORMAT for x in range(len(headerRow)) if y == headerRow[x].value] for row in range(1,s.nrows): currentRow = s.row(row) currentRowValues = [currentRow[x].value for x in columnIndex] all_rows.append(OneRow(*currentRowValues)) pprint(all_rows)