Luceneの新しいAnalyzingInfixSuggesterAPIを使用して自動提案を実装するにはどうすればよいですか？

Question

私はLuceneのグリーンハンドであり、グーグルのように自動提案を実装したいと思います。「G」のような文字を入力すると、リストが表示されます。自分で試すことができます。

ネット全体で検索しました。誰もこれを行っていません、そしてそれは私たちにパッケージのいくつかの新しいツールを与えます提案

しかし、私はそれを行う方法を教えてくれる例が必要です

誰か助けてくれる人はいますか？

John Wiseman · Answer

AnalyzingInfixSuggesterの使用方法を示すかなり完全な例を紹介します。この例では、Amazonであると偽って、商品検索フィールドをオートコンプリートします。 Lucene提案システムの機能を利用して、以下を実装します。

ランク付けされた結果：最も人気のある一致する製品を最初に提案します。
地域限定の結果：お客様の国で販売している製品のみを提案します。
商品写真：商品写真のURLを提案インデックスに保存して、追加のデータベース検索を行わなくても検索結果に表示できるようにします。

まず、Product.Javaで製品に関する情報を保持するための単純なクラスを定義します。

import Java.util.Set; class Product implements Java.io.Serializable { String name; String image; String[] regions; int numberSold; public Product(String name, String image, String[] regions, int numberSold) { this.name = name; this.image = image; this.regions = regions; this.numberSold = numberSold; } }

AnalyzingInfixSuggesterのbuildメソッドを使用してレコードにインデックスを付けるには、org.Apache.lucene.search.suggest.InputIteratorインターフェイスを実装するオブジェクトをレコードに渡す必要があります。 InputIteratorは、キー、コンテキストへのアクセスを提供します、payloadおよびweight各レコード。

キーは、実際に検索してオートコンプリートするテキストです。この例では、製品の名前になります。

contextsは、レコードのフィルタリングに使用できる追加の任意のデータのセットです。この例では、コンテキストは、特定の製品を出荷する国のISOコードのセットです。

payloadは、レコードのインデックスに格納する追加の任意のデータです。この例では、実際に各Productインスタンスをシリアル化し、結果のバイトをペイロードとして格納します。その後、ルックアップを実行するときに、ペイロードを逆シリアル化し、画像のURLなどの製品インスタンスの情報にアクセスできます。

weightは、提案結果の順序付けに使用されます。重みの高い結果が最初に返されます。特定の製品の販売数を重量として使用します。

ProductIterator.Javaの内容は次のとおりです。

import Java.io.ByteArrayOutputStream; import Java.io.IOException; import Java.io.ObjectOutputStream; import Java.io.UnsupportedEncodingException; import Java.util.Comparator; import Java.util.HashSet; import Java.util.Iterator; import Java.util.Set; import org.Apache.lucene.search.suggest.InputIterator; import org.Apache.lucene.util.BytesRef; class ProductIterator implements InputIterator { private Iterator<Product> productIterator; private Product currentProduct; ProductIterator(Iterator<Product> productIterator) { this.productIterator = productIterator; } public boolean hasContexts() { return true; } public boolean hasPayloads() { return true; } public Comparator<BytesRef> getComparator() { return null; } // This method needs to return the key for the record; this is the // text we'll be autocompleting against. public BytesRef next() { if (productIterator.hasNext()) { currentProduct = productIterator.next(); try { return new BytesRef(currentProduct.name.getBytes("UTF8")); } catch (UnsupportedEncodingException e) { throw new Error("Couldn't convert to UTF-8"); } } else { return null; } } // This method returns the payload for the record, which is // additional data that can be associated with a record and // returned when we do suggestion lookups. In this example the // payload is a serialized Java object representing our product. public BytesRef payload() { try { ByteArrayOutputStream bos = new ByteArrayOutputStream(); ObjectOutputStream out = new ObjectOutputStream(bos); out.writeObject(currentProduct); out.close(); return new BytesRef(bos.toByteArray()); } catch (IOException e) { throw new Error("Well that's unfortunate."); } } // This method returns the contexts for the record, which we can // use to restrict suggestions. In this example we use the // regions in which a product is sold. public Set<BytesRef> contexts() { try { Set<BytesRef> regions = new HashSet(); for (String region : currentProduct.regions) { regions.add(new BytesRef(region.getBytes("UTF8"))); } return regions; } catch (UnsupportedEncodingException e) { throw new Error("Couldn't convert to UTF-8"); } } // This method helps us order our suggestions. In this example we // use the number of products of this type that we've sold. public long weight() { return currentProduct.numberSold; } }

ドライバープログラムでは、次のことを行います。

RAMにインデックスディレクトリを作成します。
StandardTokenizerを作成します。
RAMディレクトリとトークナイザーを使用してAnalyzingInfixSuggesterを作成します。
ProductIteratorを使用して多数の製品にインデックスを付けます。
いくつかのサンプルルックアップの結果を出力します。

ドライバープログラムSuggestProducts.Javaは次のとおりです。

import Java.io.ByteArrayInputStream; import Java.io.IOException; import Java.io.ObjectInputStream; import Java.io.UnsupportedEncodingException; import Java.util.ArrayList; import Java.util.HashSet; import Java.util.List; import org.Apache.lucene.analysis.standard.StandardAnalyzer; import org.Apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester; import org.Apache.lucene.search.suggest.Lookup; import org.Apache.lucene.store.RAMDirectory; import org.Apache.lucene.util.BytesRef; import org.Apache.lucene.util.Version; public class SuggestProducts { // Get suggestions given a prefix and a region. private static void lookup(AnalyzingInfixSuggester suggester, String name, String region) { try { List<Lookup.LookupResult> results; HashSet<BytesRef> contexts = new HashSet<BytesRef>(); contexts.add(new BytesRef(region.getBytes("UTF8"))); // Do the actual lookup. We ask for the top 2 results. results = suggester.lookup(name, contexts, 2, true, false); System.out.println("-- \"" + name + "\" (" + region + "):"); for (Lookup.LookupResult result : results) { System.out.println(result.key); Product p = getProduct(result); if (p != null) { System.out.println(" image: " + p.image); System.out.println(" # sold: " + p.numberSold); } } } catch (IOException e) { System.err.println("Error"); } } // Deserialize a Product from a LookupResult payload. private static Product getProduct(Lookup.LookupResult result) { try { BytesRef payload = result.payload; if (payload != null) { ByteArrayInputStream bis = new ByteArrayInputStream(payload.bytes); ObjectInputStream in = new ObjectInputStream(bis); Product p = (Product) in.readObject(); return p; } else { return null; } } catch (IOException|ClassNotFoundException e) { throw new Error("Could not decode payload :("); } } public static void main(String[] args) { try { RAMDirectory index_dir = new RAMDirectory(); StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_48); AnalyzingInfixSuggester suggester = new AnalyzingInfixSuggester( Version.LUCENE_48, index_dir, analyzer); // Create our list of products. ArrayList<Product> products = new ArrayList<Product>(); products.add( new Product( "Electric Guitar", "http://images.example/electric-guitar.jpg", new String[]{"US", "CA"}, 100)); products.add( new Product( "Electric Train", "http://images.example/train.jpg", new String[]{"US", "CA"}, 100)); products.add( new Product( "Acoustic Guitar", "http://images.example/acoustic-guitar.jpg", new String[]{"US", "ZA"}, 80)); products.add( new Product( "Guarana Soda", "http://images.example/soda.jpg", new String[]{"ZA", "IE"}, 130)); // Index the products with the suggester. suggester.build(new ProductIterator(products.iterator())); // Do some example lookups. lookup(suggester, "Gu", "US"); lookup(suggester, "Gu", "ZA"); lookup(suggester, "Gui", "CA"); lookup(suggester, "Electric guit", "US"); } catch (IOException e) { System.err.println("Error!"); } } }

そして、これがドライバープログラムからの出力です：

-- "Gu" (US): Electric Guitar image: http://images.example/electric-guitar.jpg # sold: 100 Acoustic Guitar image: http://images.example/acoustic-guitar.jpg # sold: 80 -- "Gu" (ZA): Guarana Soda image: http://images.example/soda.jpg # sold: 130 Acoustic Guitar image: http://images.example/acoustic-guitar.jpg # sold: 80 -- "Gui" (CA): Electric Guitar image: http://images.example/electric-guitar.jpg # sold: 100 -- "Electric guit" (US): Electric Guitar image: http://images.example/electric-guitar.jpg # sold: 100

付録

完全なInputIteratorを書かないようにする方法があります。 InputIterator、null、およびnextメソッドからpayloadを返すスタブcontextsを記述できます。そのインスタンスをAnalyzingInfixSuggesterのbuildメソッドに渡します。

suggester.build(new ProductIterator(new ArrayList<Product>().iterator()));

次に、インデックスを作成するアイテムごとに、AnalyzingInfixSuggester add メソッドを呼び出します。

suggester.add(text, contexts, weight, payload)

すべてのインデックスを作成したら、refreshを呼び出します。

suggester.refresh();

大量のデータにインデックスを付ける場合は、複数のスレッドでこのメソッドを使用してインデックス作成を大幅に高速化できます。buildを呼び出し、次に複数のスレッドを使用してaddアイテムを呼び出し、最後にrefresh。

[2015-04-23を編集して、LookupResultペイロードからの情報の逆シリアル化を示しました。]