PythonでURL文字列を別々の部分に分割するにはどうすればよいですか？

Question

私はpython今夜:)私はCをかなりよく知っている（OSを書いた）ので、プログラミングの初心者ではないので、pythonはかなり簡単に思えますが、この問題を解決する方法がわかりません。次のアドレスを持っているとしましょう。

http://example.com/random/folder/path.html これから、サーバーの「ベース」名を含む2つの文字列を作成するにはどうすればよいですか。この例では、次のようになります。 http://example.com/ そして最後のファイル名のないものを含む別のものなので、この例では http://example.com/random/folder/ =。また、もちろん、それぞれ3番目と最後のスラッシュを見つける可能性も知っていますが、もっと良い方法を知っているかもしれません：]どちらの場合も末尾のスラッシュがあると便利ですが、簡単に追加できるので気にしません。それで、誰もがこれのための良い、速く、効果的な解決策を持っていますか？それとも、スラッシュを見つける「私の」解決策しかありませんか？

ありがとう！

sykora · Answer

python 2.x（またはpython 3.x）のurllib.parse）のurlparseモジュールがそれを行う方法です。

>>> from urllib.parse import urlparse >>> url = 'http://example.com/random/folder/path.html' >>> parse_object = urlparse(url) >>> parse_object.netloc 'example.com' >>> parse_object.path '/random/folder/path.html' >>> parse_object.scheme 'http' >>>

URLの下のファイルのパスでさらに作業を行いたい場合は、posixpathモジュールを使用できます。

>>> from posixpath import basename, dirname >>> basename(parse_object.path) 'path.html' >>> dirname(parse_object.path) '/random/folder'

その後、posixpath.joinを使用してパーツを接着できます。

編集：Windowsユーザーがos.pathのパス区切り文字で窒息することを完全に忘れました。 posixpathモジュールのドキュメントを読みましたが、URL操作への特別な参照があるので、すべて問題ありません。

Mike Hamer · Answer

これがURL解析の範囲である場合、Pythonの組み込みrpartitionがその役割を果たします。

>>> URL = "http://example.com/random/folder/path.html" >>> Segments = URL.rpartition('/') >>> Segments[0] 'http://example.com/random/folder' >>> Segments[2] 'path.html'

から Pydoc 、str.rpartition：

Splits the string at the last occurrence of sep, and returns a 3-Tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-Tuple containing two empty strings, followed by the string itself

これが意味するのは、rpartitionがあなたを検索し、指定した文字（この場合は/）の最後（右端）にある文字列を分割するということです。以下を含むタプルを返します。

(everything to the left of char , the character itself , everything to the right of char)

Sebastian Dietz · Answer

私はPythonの経験がありませんが、 rlparse module を見つけました。これでうまくいくはずです。

Paul Stephenson · Answer

In Python多くの操作はリストを使用して行われます。SebasianDietzによって言及されたurlparseモジュールは特定の問題を解決する可能性がありますが、文字列のスラッシュを見つけるPythonの方法に一般的に興味がある場合は、たとえば、次のようなものを試してください。

url = 'http://example.com/random/folder/path.html' # Create a list of each bit between slashes slashparts = url.split('/') # Now join back the first three sections 'http:', '' and 'example.com' basename = '/'.join(slashparts[:3]) + '/' # All except the last one dirname = '/'.join(slashparts[:-1]) + '/' print 'slashparts = %s' % slashparts print 'basename = %s' % basename print 'dirname = %s' % dirname

このプログラムの出力は次のとおりです。

 slashparts = ['http：'、 ''、 'example.com'、 'random'、 'folder'、 'path.html'] basename = http://example.com / dirname = http://example.com/random/folder/

興味深いビットは、split、join、スライス表記配列[A：B]（最後からのオフセットの負の値を含む）、そしてボーナスとして%文字列の演算子。printfスタイルのフォーマットを提供します。

Abbafei · Answer

ここにいる他の回答者に感謝します。彼らが与えた回答を通じて私を正しい方向に向けてくれました！

sykoraの答えで言及されているposixpathモジュールは、私のPythonセットアップ（python 2.7.3）では利用できないようです。

この記事のように、これを行うための「適切な」方法は...を使用しているようです。

urlparse.urlparseおよびurlparse.urlunparseを使用して、URLのベースをデタッチおよび再アタッチできます
os.pathの関数を使用して、パスを操作できます
urllib.url2pathnameおよびurllib.pathname2url（パス名の操作を移植可能にして、Windowsなどで機能できるようにするため）

したがって、たとえば（ベースURLの再添付は含まれません）...

>>> import urlparse, urllib, os.path >>> os.path.dirname(urllib.url2pathname(urlparse.urlparse("http://example.com/random/folder/path.html").path)) '/random/folder'

Mayank Jaiswal · Answer

Pythonのライブラリfurlを使用できます。

f = furl.furl("http://example.com/random/folder/path.html") print(str(f.path)) # '/random/folder/path.html' print(str(f.path).split("/")) # ['', 'random', 'folder', 'path.html']

最初の「/」の後にWordにアクセスするには、次を使用します。

str(f.path).split("/") # random