pythonのリストにあるタプルの要素を連結します

Question

文字列が含まれているタプルのリストがあります。たとえば、

[('this', 'is', 'a', 'foo', 'bar', 'sentences') ('is', 'a', 'foo', 'bar', 'sentences', 'and') ('a', 'foo', 'bar', 'sentences', 'and', 'i') ('foo', 'bar', 'sentences', 'and', 'i', 'want') ('bar', 'sentences', 'and', 'i', 'want', 'to') ('sentences', 'and', 'i', 'want', 'to', 'ngramize') ('and', 'i', 'want', 'to', 'ngramize', 'it')]

次に、タプルの各文字列を連結して、スペースで区切られた文字列のリストを作成します。私は次の方法を使用しました：

NewData=[] for grams in sixgrams: NewData.append( (''.join([w+' ' for w in grams])).strip())

完全に問題なく動作しています。

しかし、私が持っているリストには100万を超えるタプルがあります。だから私の質問は、この方法が十分に効率的であるか、それを行うためのより良い方法があるかということです。ありがとう。

lvc · Accepted Answer

多くのデータの場合、すべてをリストに保持するために必要かどうかを検討する必要があります。一度に1つずつ処理する場合は、結合された各文字列を生成するジェネレータを作成できますが、メモリをすべて使用するわけではありません。

new_data = (' '.join(w) for w in sixgrams)

ジェネレータからも元のタプルを取得できる場合は、メモリにsixgramsリストが含まれないようにすることができます。

falsetru · Answer

リスト内包表記は一時的な文字列を作成します。 ' '.join代わりに。

>>> words_list = [('this', 'is', 'a', 'foo', 'bar', 'sentences'), ... ('is', 'a', 'foo', 'bar', 'sentences', 'and'), ... ('a', 'foo', 'bar', 'sentences', 'and', 'i'), ... ('foo', 'bar', 'sentences', 'and', 'i', 'want'), ... ('bar', 'sentences', 'and', 'i', 'want', 'to'), ... ('sentences', 'and', 'i', 'want', 'to', 'ngramize'), ... ('and', 'i', 'want', 'to', 'ngramize', 'it')] >>> new_list = [] >>> for words in words_list: ... new_list.append(' '.join(words)) # <--------------- ... >>> new_list ['this is a foo bar sentences', 'is a foo bar sentences and', 'a foo bar sentences and i', 'foo bar sentences and i want', 'bar sentences and i want to', 'sentences and i want to ngramize', 'and i want to ngramize it']

上記のforループは、次のリスト内包表記として表すことができます。

new_list = [' '.join(words) for words in words_list]

thefourtheye · Answer

あなたはこれをこのように効率的に行うことができます

joiner = " ".join print map(joiner, sixgrams)

このようにリスト内包表記を使用すると、パフォーマンスを改善できます

joiner = " ".join print [joiner(words) for words in sixgrams]

パフォーマンスの比較は、上記のリスト内包表記ソリューションが他の2つのソリューションよりもわずかに速いことを示しています。

from timeit import timeit joiner = " ".join def mapSolution(): return map(joiner, sixgrams) def comprehensionSolution1(): return ["".join(words) for words in sixgrams] def comprehensionSolution2(): return [joiner(words) for words in sixgrams] print timeit("mapSolution()", "from __main__ import joiner, mapSolution, sixgrams") print timeit("comprehensionSolution1()", "from __main__ import sixgrams, comprehensionSolution1, joiner") print timeit("comprehensionSolution2()", "from __main__ import sixgrams, comprehensionSolution2, joiner")

私のマシンでの出力

1.5691678524 1.66710209846 1.47555398941

パフォーマンスが向上するのは、空の文字列から毎回join関数を作成する必要がないためです。

編集：このようにパフォーマンスを向上させることができますが、最もPython的な方法は lvcの答えのようなジェネレーターを使用することです。