string.replaceに正規表現を入力するにはどうすればいいですか？

Question

私は正規表現を宣言する上でいくつかの助けが必要です。私の意見は以下のようなものです。

this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. and there are many other lines in the txt files with<[3> such tags </[3>

必要な出力は次のとおりです。

this is a paragraph with in between and then there are cases ... where the number ranges from 1-100. and there are many other lines in the txt files with such tags

私はこれを試してみました：

#!/usr/bin/python import os, sys, re, glob for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')): for line in reader: line2 = line.replace('<[1> ', '') line = line2.replace('</[1> ', '') line2 = line.replace('<[1>', '') line = line2.replace('</[1>', '') print line

私もこれを試してみました（しかし私は間違った正規表現の構文を使っているようです）

 line2 = line.replace('<[*> ', '') line = line2.replace('</[*> ', '') line2 = line.replace('<[*>', '') line = line2.replace('</[*>', '')

私はreplaceを1から99までハードコードしたくありません。。。

ridgerunner · Accepted Answer

このテスト済みのスニペットはそれを実行するはずです。

import re line = re.sub(r"</?\[\d+>", "", line)

編集：これがどのように機能するかを説明したコメント付きバージョンです。

line = re.sub(r""" (?x) # Use free-spacing mode. < # Match a literal '<' /? # Optionally match a '/' \[ # Match a literal '[' \d+ # Match one or more digits > # Match a literal '>' """, "", line)

正規表現は楽しいです！しかし、基本を勉強するのに1時間か2時間使うことを強くお勧めします。初心者にとっては、どの文字が特別であるかを学ぶ必要があります。 "メタ文字"はエスケープする必要があります（つまり、バックスラッシュを前に置いて - そして規則は文字クラスの内側と外側で異なります）） www.regular-expressions.info に優れたオンラインチュートリアルがあります。あなたがそこで過ごす時間は何度も自分自身のために支払うでしょう。ハッピー正規表現！

Ignacio Vazquez-Abrams · Answer

str.replace()は修正された置き換えを行います。代わりに re.sub() を使用してください。

Lorenzo Persichetti · Answer

私はこのようになるでしょう（正規表現はコメントで説明されています）：

import re # If you need to use the regex more than once it is suggested to compile it. pattern = re.compile(r"</{0,}\[\d+>") # </{0,}\[\d+> # # Match the character “<” literally «<» # Match the character “/” literally «/{0,}» # Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «{0,}» # Match the character “[” literally «\[» # Match a single digit 0..9 «\d+» # Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» # Match the character “>” literally «>» subject = """this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. and there are many other lines in the txt files with<[3> such tags </[3>""" result = pattern.sub("", subject) print(result)

正規表現についてもっと学びたいなら、Jan GoyvaertsとSteven Levithanによる正規表現クックブックを読むことをお勧めします。

Ezequiel Marquez · Answer

最も簡単な方法

import re txt='this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. and there are many other lines in the txt files with<[3> such tags </[3>' out = re.sub("(<[^>]+>)", '', txt) print out

Zac · Answer

文字列オブジェクトのreplaceメソッドは正規表現を受け付けず、固定文字列のみを受け付けます（ドキュメントを参照： http://docs.python.org/2/library/stdtypes.html#str.replace ）。

reモジュールを使う必要があります。

import re
newline= re.sub("<\/?\[[0-9]+>", "", line)

reモジュールを使う必要があります。

import re newline= re.sub("</?\[[0-9]+>", "", line)

kurumi · Answer

（あなたのサンプル文字列のために）正規表現を使う必要はありません

>>> s 'this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. 
and there are many other lines in the txt files
with<[3> such tags </[3>
' >>> for w in s.split(">"): ... if "<" in w: ... print w.split("<")[0] ... this is a paragraph with in between and then there are cases ... where the number ranges from 1-100 . and there are many other lines in the txt files with such tags

Abena Saulka · Answer

import os, sys, re, glob pattern = re.compile(r"\<\[\d\>") replacementStringMatchesPattern = "<[1>" for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')): for line in reader: retline = pattern.sub(replacementStringMatchesPattern, "", line) sys.stdout.write(retline) print (retline)