python nltkでのスタンフォードcoreNLPを使用した共同参照の解決

Question

Stanford CoreNLPは、相互参照解決を提供しますここで述べたように、また this thread 、 this は、Javaでの実装に関する洞察を提供します。

ただし、pythonとNLTKを使用しており、Core = NLPのCoreference解決機能をpythonコードでどのように使用できるかわかりません。 NLTKでStanfordParserをセットアップします。これがこれまでのところ私のコードです。

from nltk.parse.stanford import StanfordDependencyParser stanford_parser_dir = 'stanford-parser/' eng_model_path = stanford_parser_dir + "stanford-parser-models/edu/stanford/nlp/models/lexparser/englishRNN.ser.gz" my_path_to_models_jar = stanford_parser_dir + "stanford-parser-3.5.2-models.jar" my_path_to_jar = stanford_parser_dir + "stanford-parser.jar"

PythonでCoreNLPの相互参照解決を使用するにはどうすればよいですか？

Deesha · Accepted Answer

@Igorで述べたように、pythonこのGitHubリポジトリに実装されたラッパーを試すことができます： https://github.com/dasmith/stanford-corenlp-python

このリポジトリには2つのメインファイルが含まれています：corenlp.py client.py

次の変更を行って、coreNLPを機能させます。

Corenlp.pyで、corenlpフォルダーのパスを変更します。ローカルマシンにcorenlpフォルダーが含まれているパスを設定し、corenlp.pyの144行目にパスを追加します

if not corenlp_path: corenlp_path = <path to the corenlp file>
「corenlp.py」のjarファイルのバージョン番号が異なります。お持ちのcorenlpのバージョンに合わせて設定してください。 corenlp.pyの135行目で変更します

jars = ["stanford-corenlp-3.4.1.jar", "stanford-corenlp-3.4.1-models.jar", "joda-time.jar", "xom.jar", "jollyday.jar"]

これで、3.4.1をダウンロードしたjarバージョンに置き換えます。

次のコマンドを実行します。

python corenlp.py

これはサーバーを起動します

次に、メインのクライアントプログラムを実行します。

python client.py

これは辞書を提供し、 'coref'をキーとして使用してcorefにアクセスできます。

たとえば、ジョンはコンピュータサイエンティストです。彼はコーディングが好きです。

{ "coref": [[[["a Computer Scientist", 0, 4, 2, 5], ["John", 0, 0, 0, 1]], [["He", 1, 0, 0, 1], ["John", 0, 0, 0, 1]]]] }

私はこれをUbuntu 16.04で試しました。 Javaバージョン7または8を使用します。

Lynten · Answer

stanfordcorenlp は比較的新しいラッパーであり、動作する可能性があります。

「バラクオバマ氏はハワイで生まれました。彼は大統領です。オバマ氏は2008年に選出されました。」というテキストであるとします。

コード：

# coding=utf-8 import json from stanfordcorenlp import StanfordCoreNLP nlp = StanfordCoreNLP(r'G:\JavaLibraries\stanford-corenlp-full-2017-06-09', quiet=False) props = {'annotators': 'coref', 'pipelineLanguage': 'en'} text = 'Barack Obama was born in Hawaii. He is the president. Obama was elected in 2008.' result = json.loads(nlp.annotate(text, properties=props)) num, mentions = result['corefs'].items()[0] for mention in mentions: print(mention)

上記のすべての「メンション」は、Python dictは次のようになります：

{ "id": 0, "text": "Barack Obama", "type": "PROPER", "number": "SINGULAR", "gender": "MALE", "animacy": "ANIMATE", "startIndex": 1, "endIndex": 3, "headIndex": 2, "sentNum": 1, "position": [ 1, 1 ], "isRepresentativeMention": true }

rtrtrt · Answer

StanfordNLPのWebサイトで確認できるように、スタンフォードのCoreNLPに official Python binding StanfordNLPと呼ばれる）が追加されました。

ネイティブAPI 思わないはまだcorefプロセッサをサポートしていますが、CoreNLPClientインターフェイスを使用して「標準」のCoreNLPを呼び出すことができます（元のJavaソフトウェア） Pythonから。

したがって、Python wrapper here ）をセットアップするための指示に従って、次のような相互参照チェーンを取得できます。

from stanfordnlp.server import CoreNLPClient text = 'Barack was born in Hawaii. His wife Michelle was born in Milan. He says that she is very smart.' print(f"Input text: {text}") # set up the client client = CoreNLPClient(properties={'annotators': 'coref', 'coref.algorithm' : 'statistical'}, timeout=60000, memory='16G') # submit the request to the server ann = client.annotate(text) mychains = list() chains = ann.corefChain for chain in chains: mychain = list() # Loop through every mention of this chain for mention in chain.mention: # Get the sentence in which this mention is located, and get the words which are part of this mention # (we can have more than one Word, for example, a mention can be a pronoun like "he", but also a compound noun like "His wife Michelle") words_list = ann.sentence[mention.sentenceIndex].token[mention.beginIndex:mention.endIndex] #build a string out of the words of this mention ment_Word = ' '.join([x.Word for x in words_list]) mychain.append(ment_Word) mychains.append(mychain) for chain in mychains: print(' <-> '.join(chain))

Igor · Answer

多分これはあなたのために働きますか？ https://github.com/dasmith/stanford-corenlp-python そうでない場合は、 http://www.jython.org/ を使用して自分で2つを組み合わせることができます=