Hive-外部（動的）パーティションテーブル

Question

MySQLvizにテーブルがあります。 nas_comps。

select comp_code, count(leg_id) from nas_comps_01012011_31012011 n group by comp_code; comp_code count(leg_id) 'J' 20640 'Y' 39680

まず、Sqoopを使用してHDFSHadoopバージョン1.0.2）にデータをインポートしました。

sqoop import --connect jdbc:mysql://172.25.37.135/pros_olap2 \ --username hadoopranch \ --password hadoopranch \ --query "select * from nas_comps where dep_date between '2011-01-01' and '2011-01-10' AND \$CONDITIONS" \ -m 1 \ --target-dir /pros/olap2/dataimports/nas_comps

次に、外部のパーティション化されたHiveテーブルを作成しました。

/*shows the partitions on 'describe' but not 'show partitions'*/ create external table nas_comps(DS_NAME string,DEP_DATE string, CRR_CODE string,FLIGHT_NO string,ORGN string, DSTN string,PHYSICAL_CAP int,ADJUSTED_CAP int, CLOSED_CAP int) PARTITIONED BY (LEG_ID int, month INT, COMP_CODE string) location '/pros/olap2/dataimports/nas_comps'

説明すると、パーティション列が表示されます。

Hive> describe extended nas_comps; OK ds_name string dep_date string crr_code string flight_no string orgn string dstn string physical_cap int adjusted_cap int closed_cap int leg_id int month int comp_code string Detailed Table Information Table(tableName:nas_comps, dbName:pros_olap2_optim, owner:hadoopranch, createTime:1374849456, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:ds_name, type:string, comment:null), FieldSchema(name:dep_date, type:string, comment:null), FieldSchema(name:crr_code, type:string, comment:null), FieldSchema(name:flight_no, type:string, comment:null), FieldSchema(name:orgn, type:string, comment:null), FieldSchema(name:dstn, type:string, comment:null), FieldSchema(name:physical_cap, type:int, comment:null), FieldSchema(name:adjusted_cap, type:int, comment:null), FieldSchema(name:closed_cap, type:int, comment:null), FieldSchema(name:leg_id, type:int, comment:null), FieldSchema(name:month, type:int, comment:null), FieldSchema(name:comp_code, type:string, comment:null)], location:hdfs://172.25.37.21:54300/pros/olap2/dataimports/nas_comps, inputFormat:org.Apache.hadoop.mapred.TextInputFormat, outputFormat:org.Apache.hadoop.Hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.Apache.hadoop.Hive.serde2.lazy.LazySimpleSerDe, parameters: {serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}), partitionKeys: [FieldSchema(name:leg_id, type:int, comment:null), FieldSchema(name:month, type:int, comment:null), FieldSchema(name:comp_code, type:string, comment:null)], parameters:{EXTERNAL=TRUE, transient_lastDdlTime=1374849456}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE)

しかし、パーティションが作成されているかどうかはわかりません：

Hive> show partitions nas_comps; OK Time taken: 0.599 seconds select count(1) from nas_comps;

0レコードを返します

動的パーティションを使用して外部Hiveテーブルを作成するにはどうすればよいですか？

dimamah · Accepted Answer

Hiveは、この方法でパーティションを作成しません。
目的のパーティションキーでパーティション化されたテーブルを作成し、外部テーブルから新しいパーティション化されたテーブルにinsert overwrite tableを実行します（Hive.exec.dynamic.partition=trueとHive.exec.dynamic.partition.mode=nonstrictを設定）。

テーブルを外部でパーティション化しておく必要がある場合は、ディレクトリを手動で作成する必要があります（パーティションごとに1つのディレクトリ、名前はPARTION_KEY=VALUEである必要があります）。次に、MSCK REPAIR TABLE table_name; コマンドを使用します。

Sandeep Singh · Answer

動的パーティション

レコードをHiveテーブルに挿入するときに、パーティションが動的に追加されます。

挿入ステートメントでのみサポートします。
load dataステートメントではサポートされていません。
Hiveテーブルにデータを挿入する前に、動的パーティション設定を有効にする必要があります。 Hive.exec.dynamic.partition.mode=nonstrictデフォルト値はstrict Hive.exec.dynamic.partition=trueデフォルト値はfalseです。

動的パーティションクエリ

SET Hive.exec.dynamic.partition.mode=nonstrict; SET Hive.exec.dynamic.partition=true; INSERT INTO table_name PARTITION (loaded_date) select * from table_name1 where loaded_date = 20151217

ここで、loaded_date = 20151217はパーティションとその値です。

制限：

動的パーティションは、上記のステートメントでのみ機能します。
loaded_dateのtable_name1列から選択したデータに従って動的にパーティションを作成します。

条件が上記の基準と一致しない場合は、次のようにします。

最初にパーティションテーブルを作成してから、次のようにします。

ALTER TABLE table_name ADD PARTITION (DS_NAME='partname1',DATE='partname2');

またはこれを使用してくださいリンク動的パーティションの作成に。