引用の間の文字列を抽出する

Question

ユーザーが入力したテキストから情報を抽出したい。次のように入力したとします。

SetVariables "a" "b" "c"

最初の引用のセットの間で情報をどのように抽出しますか？次に2番目ですか？次に3番目ですか？

jspcal · Accepted Answer

>>> import re >>> re.findall('"([^"]*)"', 'SetVariables "a" "b" "c" ') ['a', 'b', 'c']

Roman · Answer

あなたはそれにstring.split（）を行うことができます。文字列が引用符（つまり、引用符の偶数）で適切にフォーマットされている場合、リスト内のすべての奇数の値には、引用符の間にある要素が含まれます。

>>> s = 'SetVariables "a" "b" "c"'; >>> l = s.split('"')[1::2]; # the [1::2] is a slicing which extracts odd values >>> print l; ['a', 'b', 'c'] >>> print l[2]; # to show you how to extract individual items from output c

これは、正規表現よりも高速なアプローチです。 timeitモジュールを使用すると、このコードの速度は約4倍速くなります。

% python timeit.py -s 'import re' 're.findall("\"([^\"]*)\"", "SetVariables \"a\" \"b\" \"c\" ")' 1000000 loops, best of 3: 2.37 usec per loop % python timeit.py '"SetVariables \"a\" \"b\" \"c\"".split("\"")[1::2];' 1000000 loops, best of 3: 0.569 usec per loop

Alex Martelli · Answer

正規表現はこれが得意です：

import re quoted = re.compile('"[^"]*"') for value in quoted.findall(userInputtedText): print value