プログラムでpandasデータフレームをマークダウンテーブルに変換

Question

私はPandasエンコードが混在したデータを持つデータベースから生成されたデータフレームを持っています。例えば：

_+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+ | ID | path | language | date | longest_sentence | shortest_sentence | number_words | readability_consensus | +----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+ | 0 | data/Eng/Sagitarius.txt | Eng | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not... | 306 | 11th and 12th grade | +----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+ | 31 | data/Nor/Høylandet.txt | Nor | 2015-07-22 | Høgskolen i Østfold er et eksempel... | Som skuespiller har jeg både... | 253 | 15th and 16th grade | +----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+ _

ご覧のとおり、英語とノルウェー語が混在しています（データベースではISO-8859-1としてエンコードされています）。このDataframe出力の内容をMarkdownテーブルとして取得する必要がありますが、エンコードに問題はありません。この回答（質問からマークダウンテーブルを生成しますか？）に従い、以下を取得しました：

_import sys, sqlite3 db = sqlite3.connect("Applications.db") df = pd.read_sql_query("SELECT path, language, date, longest_sentence, shortest_sentence, number_words, readability_consensus FROM applications ORDER BY date(date) DESC", db) db.close() rows = [] for index, row in df.iterrows(): items = (row['date'], row['path'], row['language'], row['shortest_sentence'], row['longest_sentence'], row['number_words'], row['readability_consensus']) rows.append(items) headings = ['Date', 'Path', 'Language', 'Shortest Sentence', 'Longest Sentence since', 'Words', 'Grade level'] fields = [0, 1, 2, 3, 4, 5, 6] align = [('^', '<'), ('^', '^'), ('^', '<'), ('^', '^'), ('^', '>'), ('^','^'), ('^','^')] table(sys.stdout, rows, fields, headings, align) _

ただし、これによりUnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 72: ordinal not in range(128)エラーが発生します。データフレームをマークダウンテーブルとして出力するにはどうすればよいですか？つまり、Markdownドキュメントの作成に使用するファイルにこのコードを保存するためです。次のような出力が必要です。

_| ID | path | language | date | longest_sentence | shortest_sentence | number_words | readability_consensus | |----|-------------------------|----------|------------|------------------------------------------------|--------------------------------------------------------|--------------|-----------------------| | 0 | data/Eng/Sagitarius.txt | Eng | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not... | 306 | 11th and 12th grade | | 31 | data/Nor/Høylandet.txt | Nor | 2015-07-22 | Høgskolen i Østfold er et eksempel... | Som skuespiller har jeg både... | 253 | 15th and 16th grade | _

OleVik · Accepted Answer

そう、私は Rohit （ Python-Encoding string-Swedish Letters ）、extended his answer によって提案された質問から葉を取りました。そして、以下を思いつきました：

_# Enforce UTF-8 encoding import sys stdin, stdout = sys.stdin, sys.stdout reload(sys) sys.stdin, sys.stdout = stdin, stdout sys.setdefaultencoding('UTF-8') # SQLite3 database import sqlite3 # Pandas: Data structures and data analysis tools import pandas as pd # Read database, attach as Pandas dataframe db = sqlite3.connect("Applications.db") df = pd.read_sql_query("SELECT path, language, date, shortest_sentence, longest_sentence, number_words, readability_consensus FROM applications ORDER BY date(date) DESC", db) db.close() df.columns = ['Path', 'Language', 'Date', 'Shortest Sentence', 'Longest Sentence', 'Words', 'Readability Consensus'] # Parse Dataframe and apply Markdown, then save as 'table.md' cols = df.columns df2 = pd.DataFrame([['---','---','---','---','---','---','---']], columns=cols) df3 = pd.concat([df2, df]) df3.to_csv("table.md", sep="|", index=False) _

これの重要な前兆は、SQLiteデータベースに送信する前に.replace(' ', ' ').replace(' ', '')を適用することで削除されるように、_shortest_sentence_および_longest_sentence_列に不要な改行が含まれないことです。解決策は言語固有のエンコーディング（ノルウェー語の場合は_ISO-8859-1_）を強制するのではなく、デフォルトのASCIIの代わりに_UTF-8_が使用されるようです。

私はこれをIPythonノートブック（Python 2.7.10）で実行し、次のようなテーブルを取得しました（ここでは外観の固定間隔）。

_| Path | Language | Date | Shortest Sentence | Longest Sentence | Words | Readability Consensus | |-------------------------|----------|------------|----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|-----------------------| | data/Eng/Something1.txt | Eng | 2015-09-17 | I am able to relocate to London on short notice. | With my administrative experience in the preparation of the structure and content of seminars in various courses, and critiquing academic papers on various levels, I am confident that I can execute the work required as an editorial assistant. | 306 | 11th and 12th grade | | data/Nor/NoeNorrønt.txt | Nor | 2015-09-17 | Jeg har grundig kjennskap til Microsoft Office og Adobe. | I løpet av studiene har jeg vært salgsmedarbeider for et større konsern, hvor jeg solgte forsikring til studentene og de faglige ansatte ved universitetet i Trønderlag, samt renholdsarbeider i et annet, hvor jeg i en periode var avdelingsansvarlig. | 205 | 18th and 19th grade | | data/Nor/Ørret.txt.txt | Nor | 2015-09-17 | Jeg håper på positiv tilbakemelding, og møter naturligvis til intervju hvis det er ønskelig. | I løpet av studiene har jeg vært salgsmedarbeider for et større konsern, hvor jeg solgte forsikring til studentene og de faglige ansatte ved universitetet i Trønderlag, samt renholdsarbeider i et annet, hvor jeg i en periode var avdelingsansvarlig. | 160 | 18th and 19th grade | _

したがって、エンコードに問題のないマークダウンテーブル。

kpykc · Answer

IPython Notebookで使用するために、答えをさらに改善します。

def pandas_df_to_markdown_table(df): from IPython.display import Markdown, display fmt = ['---' for i in range(len(df.columns))] df_fmt = pd.DataFrame([fmt], columns=df.columns) df_formatted = pd.concat([df_fmt, df]) display(Markdown(df_formatted.to_csv(sep="|", index=False))) pandas_df_to_markdown_table(infodf)

または tabulate を使用します。

pip install tabulate

使用例はドキュメントにあります。

Sebastian Jylanki · Answer

Ascii-tablesを生成するには python-tabulate libraryをお勧めします。ライブラリはpandas.DataFrame 同様に。

使用方法は次のとおりです。

from pandas import DataFrame from tabulate import tabulate df = DataFrame({ "weekday": ["monday", "thursday", "wednesday"], "temperature": [20, 30, 25], "precipitation": [100, 200, 150], }).set_index("weekday") print(tabulate(df, tablefmt="pipe", headers="keys"))

出力：

| weekday | temperature | precipitation | |:----------|--------------:|----------------:| | monday | 20 | 100 | | thursday | 30 | 200 | | wednesday | 25 | 150 |

Rohit · Answer

これを試してください。うまくいきました。

この回答の最後で、HTMLに変換されたマークダウンファイルのスクリーンショットをご覧ください。

import pandas as pd # You don't need these two lines # as you already have your DataFrame in memory df = pd.read_csv("nor.txt", sep="|") df.drop(df.columns[-1], axis=1) # Get column names cols = df.columns # Create a new DataFrame with just the markdown # strings df2 = pd.DataFrame([['---',]*len(cols)], columns=cols) #Create a new concatenated DataFrame df3 = pd.concat([df2, df]) #Save as markdown df3.to_csv("nor.md", sep="|", index=False)

dubbbdan · Answer

私はこの投稿で上記のソリューションのいくつかを試しましたが、これが最も一貫して機能することがわかりました。

pandasデータフレームをマークダウンテーブルに変換するには、 pytablewriter を使用することをお勧めします。この投稿で提供されるデータを使用する：

import pandas as pd import pytablewriter from StringIO import StringIO c = StringIO("""ID, path,language, date,longest_sentence, shortest_sentence, number_words , readability_consensus 0, data/Eng/Sagitarius.txt , Eng, 2015-09-17 , With administrative experience in the prepa... , I am able to relocate internationally on short not..., 306, 11th and 12th grade 31 , data/Nor/Høylandet.txt , Nor, 2015-07-22 , Høgskolen i Østfold er et eksempel..., Som skuespiller har jeg både..., 253, 15th and 16th grade """) df = pd.read_csv(c,sep=',',index_col=['ID']) writer = pytablewriter.MarkdownTableWriter() writer.table_name = "example_table" writer.header_list = list(df.columns.values) writer.value_matrix = df.values.tolist() writer.write_table()

これは次の結果になります。

# example_table ID | path |language| date | longest_sentence | shortest_sentence | number_words | readability_consensus --:|--------------------------|--------|------------|------------------------------------------------|------------------------------------------------------|-------------:|----------------------- 0| data/Eng/Sagitarius.txt | Eng | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not...| 306| 11th and 12th grade 31| data/Nor/Høylandet.txt | Nor | 2015-07-22 | Høgskolen i Østfold er et eksempel... | Som skuespiller har jeg både... | 253| 15th and 16th grade

これは、マークダウンレンダリングされたスクリーンショットです。

Daniel Himmelstein · Answer

DataFrameをマークダウンにエクスポートします

Pandas.DataFrameをPythonのマークダウンにエクスポートするために、次の関数を作成しました。

def df_to_markdown(df, float_format='%.2g'): """ Export a pandas.DataFrame to markdown-formatted text. DataFrame should not contain any `|` characters. """ from os import linesep return linesep.join([ '|'.join(df.columns), '|'.join(4 * '-' for i in df.columns), df.to_csv(sep='|', index=False, header=False, float_format=float_format) ]).replace('|', ' | ')

この関数は、OPのエンコードの問題を自動的に修正することはできませんが、pandasからマークダウンに変換することとは異なる問題です。

Gustavo Bezerra · Answer

pytablewriterといくつかの正規表現を使用して、Jupyterでのデータフレームの外観（行ヘッダーは太字）により類似したマークダウンテーブルを作成する関数の例を次に示します。

import io import re import pandas as pd import pytablewriter def df_to_markdown(df): """ Converts Pandas DataFrame to markdown table, making the index bold (as in Jupyter) unless it's a pd.RangeIndex, in which case the index is completely dropped. Returns a string containing markdown table. """ isRangeIndex = isinstance(df.index, pd.RangeIndex) if not isRangeIndex: df = df.reset_index() writer = pytablewriter.MarkdownTableWriter() writer.stream = io.StringIO() writer.header_list = df.columns writer.value_matrix = df.values writer.write_table() writer.stream.seek(0) table = writer.stream.readlines() if isRangeIndex: return ''.join(table) else: # Make the indexes bold new_table = table[:2] for line in table[2:]: new_table.append(re.sub('^(.*?)\|', r'**\1**|', line)) return ''.join(new_table)

Peter Zagubisalo · Answer

さらに別のソリューション。今回は、tabulateの薄いラッパーを介して： tabulatehelper

import numpy as np import pandas as pd import tabulatehelper as th df = pd.DataFrame(np.random.random(16).reshape(4, 4), columns=('a', 'b', 'c', 'd')) print(th.md_table(df, formats={-1: 'c'}))

出力：

| a | b | c | d | |---------:|---------:|---------:|:--------:| | 0.413284 | 0.932373 | 0.277797 | 0.646333 | | 0.552731 | 0.381826 | 0.141727 | 0.2483 | | 0.779889 | 0.012458 | 0.308352 | 0.650859 | | 0.301109 | 0.982111 | 0.994024 | 0.43551 |

Ilya Prokin · Answer

外部ツールpandocとパイプの使用：

def to_markdown(df): from subprocess import Popen, PIPE s = df.to_latex() p = Popen('pandoc -f latex -t markdown', stdin=PIPE, stdout=PIPE, Shell=True) stdoutdata, _ = p.communicate(input=s.encode("utf-8")) return stdoutdata.decode("utf-8")

Anake · Answer

tabulateを使用してこれを行う方法を探している人のために、時間を節約するためにこれをここに配置すると思いました。

print(tabulate(df, tablefmt="pipe", headers="keys", showindex=False))

Alastair McCormack · Answer

sqlite3は、デフォルトでTEXTフィールドに対してUnicodeを返します。外部ソース（質問で提供しなかった）からtable()関数を導入する前に、すべてが機能するようにセットアップされました。

table()関数には、エンコードを提供しないstr()呼び出しがあるため、ASCIIを使用してユーザーを保護します。

特にUnicodeオブジェクトを持っているので、これをしないためにtable()を書き直す必要があります。単にstr()をunicode()に置き換えるだけでいくらか成功するかもしれません