HBaseシェルを使用してフィルターでスキャンする

Question

誰かがいくつかのスキャンフィルターに基づいてレコードをスキャンする方法を知っていますか？

column:something = "somevalue"

this のようなものですが、HBaseシェルからですか？

havanki4j · Accepted Answer

これを試して。それはちょっとugいですが、私にはうまくいきます。

import org.Apache.hadoop.hbase.filter.CompareFilter import org.Apache.hadoop.hbase.filter.SingleColumnValueFilter import org.Apache.hadoop.hbase.filter.SubstringComparator import org.Apache.hadoop.hbase.util.Bytes scan 't1', { COLUMNS => 'family:qualifier', FILTER => SingleColumnValueFilter.new (Bytes.toBytes('family'), Bytes.toBytes('qualifier'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('somevalue')) }

HBase Shellは〜/ .irbrcにあるものをすべて含むので、そこに次のようなものを入れることができます（私はRubyエキスパート、改善は大歓迎です）：

# imports like above def scan_substr(table,family,qualifier,substr,*cols) scan table, { COLUMNS => cols, FILTER => SingleColumnValueFilter.new (Bytes.toBytes(family), Bytes.toBytes(qualifier), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new(substr)) } end

そして、あなたはシェルで言うことができます：

scan_substr 't1', 'family', 'qualifier', 'somevalue', 'family:qualifier'

dape · Answer

scan 'test', {COLUMNS => ['F'],FILTER => \ "(SingleColumnValueFilter('F','u',=,'regexstring:http:.*pdf',true,true)) AND \ (SingleColumnValueFilter('F','s',=,'binary:2',true,true))"}

詳細については、こちらをご覧ください。添付のFilter Language.docxファイルには複数の例があります。

Tony · Answer

使用方法のヘルプに示されているように、scanのFILTERパラメーターを使用します。

hbase(main):002:0> scan ERROR: wrong number of arguments (0 for 1) Here is some help for this command: Scan a table; pass table name and optionally a dictionary of scanner specifications. Scanner specifications may include one or more of: TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH, or COLUMNS. If no columns are specified, all columns will be scanned. To scan all members of a column family, leave the qualifier empty as in 'col_family:'. Some examples: hbase> scan '.META.' hbase> scan '.META.', {COLUMNS => 'info:regioninfo'} hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'} hbase> scan 't1', {FILTER => org.Apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)} hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]} For experts, there is an additional option -- CACHE_BLOCKS -- which switches block caching for the scanner on (true) or off (false). By default it is enabled. Examples: hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}

Chetan Somani · Answer

フィルターの1つはValuefilterで、これを使用してすべての列値をフィルターできます。

hbase(main):067:0> scan 'dummytable', {FILTER => "ValueFilter(=,'binary:2016-01-26')"}

binaryは、フィルター内で使用されるコンパレータの1つです。目的に応じて、フィルター内で異なるコンパレーターを使用できます。

次のURLを参照できます。http：// www.hadooptpoint.com/filters-in-hbase-Shell/。 HBase Shellでさまざまなフィルターを使用する方法の良い例を提供します。

KannarKK · Answer

Scan scan = new Scan(); FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL); //in case you have multiple SingleColumnValueFilters, you would want the row to pass MUST_PASS_ALL conditions or MUST_PASS_ONE condition. SingleColumnValueFilter filter_by_name = new SingleColumnValueFilter( Bytes.toBytes("SOME COLUMN FAMILY" ), Bytes.toBytes("SOME COLUMN NAME"), CompareOp.EQUAL, Bytes.toBytes("SOME VALUE")); filter_by_name.setFilterIfMissing(true); //if you don't want the rows that have the column missing. Remember that adding the column filter doesn't mean that the rows that don't have the column will not be put into the result set. They will be, if you don't include this statement. list.addFilter(filter_by_name); scan.setFilter(list);

Prateek Mishra · Answer

クエリの最後にsetFilterIfMissing（true）を追加します

hbase(main):009:0> import org.Apache.hadoop.hbase.util.Bytes; import org.Apache.hadoop.hbase.filter.SingleColumnValueFilter; import org.Apache.hadoop.hbase.filter.BinaryComparator; import org.Apache.hadoop.hbase.filter.CompareFilter; import org.Apache.hadoop.hbase.filter. Filter; scan 'test:test8', { FILTER => SingleColumnValueFilter.new(Bytes.toBytes('account'), Bytes.toBytes('ACCOUNT_NUMBER'), CompareFilter::CompareOp.valueOf('EQUAL'), BinaryComparator.new(Bytes.toBytes('0003000587'))).setFilterIfMissing(true)}