文字列内の複数のスペースを削除する簡単な方法は？

Question

これが文字列だとします。

The fox jumped over the log.

結果は次のようになります。

The fox jumped over the log.

これを行うことができる最も簡単な1-2ライナーは何ですか？分割してリストに入れることなく...

Josh Lee · Accepted Answer

>>> import re >>> re.sub(' +', ' ', 'The quick brown fox') 'The quick brown fox'

Taylor Leese · Answer

fooはあなたの文字列です：

" ".join(foo.split())

これは "すべての空白文字（スペース、タブ、改行、リターン、フォームフィード）"を削除しますが警告されます。（おかげで hhsaffar 、コメントを参照）すなわち"this is a test "は事実上"this is a test"になります

Nasir · Answer

import re s = "The fox jumped over the log." re.sub("\s\s+" , " ", s)

または

re.sub("\s\s+", " ", s)

コメントのmooseで言及されているように、カンマの前のスペースはPEP8ではpet peeveとしてリストされているため.

pythonlarry · Answer

「\ s」で正規表現を使用し、単純なstring.split（）を実行すると、also改行、キャリッジリターン、タブなどの他の空白も削除されます。これが必要でない限り、onlyに複数のスペースを行うために、これらの例を示します。

EDIT：やりたくないので、これで寝て、最後の結果のタイプミスを修正することに加えて（v3.3.3 @ 64ビット、not32ビット）、明らかなヒット：テスト文字列はかなり些細なものでした。

そこで、より現実的な時間テストを得るために ... 11段落、1000ワード、6665バイトのLorem Ipsum を得ました。次に、全体にランダムな長さの余分なスペースを追加しました。

original_string = ''.join(Word + (' ' * random.randint(1, 10)) for Word in lorem_ipsum.split(' '))

「proper join」も修正しました。気にするなら、ワンライナーは基本的に先頭/末尾のスペースを削除します。この修正版は先頭/末尾のスペースを保持します（ただしONE;-) 。（これは、ランダムに配置されたlorem_ipsumが最後に余分なスペースを取得し、assertに失敗したために見つかりました。）

# setup = ''' import re def while_replace(string): while ' ' in string: string = string.replace(' ', ' ') return string def re_replace(string): return re.sub(r' {2,}' , ' ', string) def proper_join(string): split_string = string.split(' ') # To account for leading/trailing spaces that would simply be removed beg = ' ' if not split_string[ 0] else '' end = ' ' if not split_string[-1] else '' # versus simply ' '.join(item for item in string.split(' ') if item) return beg + ' '.join(item for item in split_string if item) + end original_string = """Lorem ipsum ... no, really, it kept going... malesuada enim feugiat. Integer imperdiet erat.""" assert while_replace(original_string) == re_replace(original_string) == proper_join(original_string) #'''

# while_replace_test new_string = original_string[:] new_string = while_replace(new_string) assert new_string != original_string

# re_replace_test new_string = original_string[:] new_string = re_replace(new_string) assert new_string != original_string

# proper_join_test new_string = original_string[:] new_string = proper_join(new_string) assert new_string != original_string

注：「whileバージョン」はoriginal_stringのコピーを作成しました。最初の実行で変更されると、連続した実行が（少しだけ）高速になると思います。これにより時間が追加されるので、この文字列のコピーを他の2つに追加して、ロジックの違いだけが時間に表示されるようにしました。 stmtインスタンスのメインtimeitは一度だけ実行されることに注意してください ;私がこれを行った元の方法では、whileループは同じラベルoriginal_stringで機能したため、2回目の実行では、何もすることはありませんでした。現在設定されている方法、2つの異なるラベルを使用した関数の呼び出し、これは問題ではありません。すべてのワーカーにassertステートメントを追加して、繰り返しごとに何かを変更することを確認しました（疑わしい人のために）。たとえば、これに変更すると壊れます：

# while_replace_test new_string = original_string[:] new_string = while_replace(new_string) assert new_string != original_string # will break the 2nd iteration while ' ' in original_string: original_string = original_string.replace(' ', ' ')

Tests run on a laptop with an i5 processor running Windows 7 (64-bit). timeit.Timer(stmt = test, setup = setup).repeat(7, 1000) test_string = 'The fox jumped over
	 the log.' # trivial Python 2.7.3, 32-bit, Windows test | minum | maximum | average | median ---------------------+------------+------------+------------+----------- while_replace_test | 0.001066 | 0.001260 | 0.001128 | 0.001092 re_replace_test | 0.003074 | 0.003941 | 0.003357 | 0.003349 proper_join_test | 0.002783 | 0.004829 | 0.003554 | 0.003035 Python 2.7.3, 64-bit, Windows test | minum | maximum | average | median ---------------------+------------+------------+------------+----------- while_replace_test | 0.001025 | 0.001079 | 0.001052 | 0.001051 re_replace_test | 0.003213 | 0.004512 | 0.003656 | 0.003504 proper_join_test | 0.002760 | 0.006361 | 0.004626 | 0.004600 Python 3.2.3, 32-bit, Windows test | minum | maximum | average | median ---------------------+------------+------------+------------+----------- while_replace_test | 0.001350 | 0.002302 | 0.001639 | 0.001357 re_replace_test | 0.006797 | 0.008107 | 0.007319 | 0.007440 proper_join_test | 0.002863 | 0.003356 | 0.003026 | 0.002975 Python 3.3.3, 64-bit, Windows test | minum | maximum | average | median ---------------------+------------+------------+------------+----------- while_replace_test | 0.001444 | 0.001490 | 0.001460 | 0.001459 re_replace_test | 0.011771 | 0.012598 | 0.012082 | 0.011910 proper_join_test | 0.003741 | 0.005933 | 0.004341 | 0.004009

test_string = lorem_ipsum # Thanks to http://www.lipsum.com/ # "Generated 11 paragraphs, 1000 words, 6665 bytes of Lorem Ipsum" Python 2.7.3, 32-bit test | minum | maximum | average | median ---------------------+------------+------------+------------+----------- while_replace_test | 0.342602 | 0.387803 | 0.359319 | 0.356284 re_replace_test | 0.337571 | 0.359821 | 0.348876 | 0.348006 proper_join_test | 0.381654 | 0.395349 | 0.388304 | 0.388193 Python 2.7.3, 64-bit test | minum | maximum | average | median ---------------------+------------+------------+------------+----------- while_replace_test | 0.227471 | 0.268340 | 0.240884 | 0.236776 re_replace_test | 0.301516 | 0.325730 | 0.308626 | 0.307852 proper_join_test | 0.358766 | 0.383736 | 0.370958 | 0.371866 Python 3.2.3, 32-bit test | minum | maximum | average | median ---------------------+------------+------------+------------+----------- while_replace_test | 0.438480 | 0.463380 | 0.447953 | 0.446646 re_replace_test | 0.463729 | 0.490947 | 0.472496 | 0.468778 proper_join_test | 0.397022 | 0.427817 | 0.406612 | 0.402053 Python 3.3.3, 64-bit test | minum | maximum | average | median ---------------------+------------+------------+------------+----------- while_replace_test | 0.284495 | 0.294025 | 0.288735 | 0.289153 re_replace_test | 0.501351 | 0.525673 | 0.511347 | 0.508467 proper_join_test | 0.422011 | 0.448736 | 0.436196 | 0.440318

些細な文字列の場合、while-loopが最速であり、Pythonの文字列分割/結合、および正規表現が後を引きます。

非自明な文字列については、もう少し検討する必要があるようです。 32ビット2.7？それは救助への正規表現です！ 2.7 64ビット？ whileループは、まともなマージンで最適です。 32ビット3.2、「適切な」joinを使用します。 64ビット3.3、whileループに進みます。再び。

最終的に、パフォーマンスを向上させることができますif/where/when required、しかし、それは常に最善ですマントラを覚えておいてください：

動作させる
正しくする
高速化

IANAL、YMMV、Caveat Emptor！

Kevin Little · Answer

上記のPaul McGuireのコメントに同意する必要があります。私に、

' '.join(the_string.split())

正規表現を書き出すよりもはるかに好ましいです。

私の測定値（Linux、Python 2.5）は、分割再結合が "re.sub（...）"を実行するよりもほぼ5倍高速であることを示しています。複数回。そしてそれはどんな意味でも理解しやすい - もっともっとPythonicです。

Peter · Answer

前のソリューションと似ていますが、より具体的には、2つ以上のスペースを1つに置き換えます。

>>> import re >>> s = "The fox jumped over the log." >>> re.sub('\s{2,}', ' ', s) 'The fox jumped over the log.'

HMS · Answer

シンプルな魂

>>> import re >>> s="The fox jumped over the log." >>> print re.sub('\s+',' ', s) The fox jumped over the log.

devinbost · Answer

Pandas DataFrameで.apply（..）を使用せずに文字列分割手法を使用することもできます。これは、多数の文字列に対して操作をすばやく実行する必要がある場合に便利です。これは一行です。

df['message'] = (df['message'].str.split()).str.join(' ')

Rakesh Kumar · Answer

import re string = re.sub('[ 	
]+', ' ', 'The quick brown 

 	 fox')

これにより、すべてのタブ、改行、および単一の空白を含む複数の空白が削除されます。

vaultah · Answer

場合によっては、すべての空白文字の連続した出現箇所をその文字の単一のインスタンスで置き換えることが望ましい場合があります。そのためには、後方参照付きの正規表現を使用します。

(\s)\1{1,}は任意の空白文字と一致し、その後にその文字が1回以上出現します。さて、あなたがする必要があるのは、マッチの代わりとして最初のグループ（\1）を指定することだけです。

これを関数にラップする：

import re def normalize_whitespace(string): return re.sub(r'(\s)\1{1,}', r'\1', string)

>>> normalize_whitespace('The fox jumped over the log.') 'The fox jumped over the log.' >>> normalize_whitespace('First line			 


Second line') 'First line	 
Second line'

ravi tanwar · Answer

i have tried the following method and it even works with the extreme case like str1=' i live on earth ' ' '.join(str1.split()) but if you prefer regular expression it can be done as:- re.sub('\s+',' ',str1) although some preprocessing has to be done in order to remove the trailing and ending space.

gabchan · Answer

文の前後、中にある余分なスペースをすべて削除する1行のコード

sentence = " The fox jumped over the log. " sentence = ' '.join(filter(None,sentence.split(' ')))

説明：

文字列全体をリストに分割します。
リストから空の要素をフィルタリングします。
残りの要素を単一のスペースで再結合する

*残りの要素は単語や句読点を含む単語などであるべきです。私はこれを徹底的にテストしませんでしたが、これは良い出発点であるはずです。ではごきげんよう！

Kreshnik · Answer

その他の選択肢

>>> import re >>> str = 'this is a string with multiple spaces and tabs' >>> str = re.sub('[ 	]+' , ' ', str) >>> print str this is a string with multiple spaces and tabs

jw51 · Answer

def unPretty(S): # given a dictionary, json, list, float, int, or even a string.. # return a string stripped of CR, LF replaced by space, with multiple spaces reduced to one. return ' '.join( str(S).replace('
',' ').replace('
','').split() )

Hassan Baig · Answer

ユーザーが生成した文字列で最も速く取得できるのは次のとおりです。

if ' ' in text: while ' ' in text: text = text.replace(' ', ' ')

短絡は pythonlarryの包括的な答えよりもわずかに高速になります。効率を重視し、余分な空白[単一スペースの種類]を除外することを厳密に検討している場合は、これを選択してください。

Anakimi · Answer

これもうまくいくようです。

while " " in s: s=s.replace(" "," ")

変数sはあなたの文字列を表します。

Zoran Bajcer · Answer

私は大学で使用している私の簡単な方法があります。

line = "I have a Nice day." end = 1000 while end != 0: line.replace(" ", " ") end -= 1

これにより、すべてのダブルスペースがシングルスペースに置き換えられ、1000倍になります。それはあなたが2000の余分なスペースを持つことができ、それでも動作することを意味します。 :)

CameronE · Answer

単語の間の先頭、末尾、余分な空白を考慮して空白を削除するには、次のようにします。

（？<=\s）+ | ^ +（？=\s）| （？= + [\ n\0]）

最初または2番目の文字列は先頭の空白文字、最後の文字列は末尾の空白文字を扱います。

使用の証明のためにこのリンクはあなたにテストを提供するでしょう。

https://regex101.com/r/meBYli/4

この正規表現コードを破るような入力を見つけたら教えてください。

また - これはre.split関数で使用されることです

jsnklln · Answer

それが空白であれば、Noneの分割を扱っているので、戻り値に空の文字列は含まれません。

https://docs.python.org/2/library/stdtypes.html#str.split

Scott Anderson · Answer

他の例についてはあまり読んでいませんが、連続した複数の空白文字を統合するためのこの方法を作成しました。

ライブラリは使用しません。スクリプトの長さという点では比較的長いですが、複雑な実装ではありません。

def spaceMatcher(command): """ function defined to consolidate multiple whitespace characters in strings to a single space """ #initiate index to flag if more than 1 consecutive character iteration space_match = 0 space_char = "" for char in command: if char == " ": space_match += 1 space_char += " " Elif (char != " ") & (space_match > 1): new_command = command.replace(space_char, " ") space_match = 0 space_char = "" Elif char != " ": space_match = 0 space_char = "" return new_command command = None command = str(input("Please enter a command ->")) print(spaceMatcher(command)) print(list(spaceMatcher(command)))

Hassan Abdul-Kareem · Answer

string='This is a string full of spaces and taps' string=string.split(' ') while '' in string: string.remove('') string=' '.join(string) print(string)

結果：

これはスペースとタップでいっぱいの文字列です。