Postgresの空間クエリ用の3Dポイントデータのレイアウトは適切ですか？

Question

another 質問に示されているように、私は3D空間の多くの（> 10,000,000）エントリの点を扱います。これらのポイントは次のように定義されています。

CREATE TYPE float3d AS ( x real, y real, z real);

私が間違っていない場合、これらのポイントの1つを格納するために3 * 8バイト+ 8バイトのパディングが必要です（MAXALIGNは8）。この種のデータを格納するより良い方法はありますか？前述の質問では、複合型にはかなりのオーバーヘッドが伴うと述べられました。

私はしばしばこのような空間クエリを行います：

 SELECT t1.id, t1.parent_id, (t1.location).x, (t1.location).y, (t1.location).z, t1.confidence, t1.radius, t1.skeleton_id, t1.user_id, t2.id, t2.parent_id, (t2.location).x, (t2.location).y, (t2.location).z, t2.confidence, t2.radius, t2.skeleton_id, t2.user_id FROM treenode t1 INNER JOIN treenode t2 ON ( (t1.id = t2.parent_id OR t1.parent_id = t2.id) OR (t1.parent_id IS NULL AND t1.id = t2.id)) WHERE (t1.LOCATION).z = 41000.0 AND (t1.LOCATION).x > 2822.6 AND (t1.LOCATION).x < 62680.2 AND (t1.LOCATION).y > 33629.8 AND (t1.LOCATION).y < 65458.6 AND t1.project_id = 1 LIMIT 5000;

このようなクエリは約160ミリ秒かかりますが、これを削減できるかどうか疑問に思います。

これは、構造が使用されるテーブルレイアウトです。

 Column | Type | Modifiers ---------------+--------------------------+------------------------------------------------------- id | bigint | not null default nextval('location_id_seq'::regclass) user_id | integer | not null creation_time | timestamp with time zone | not null default now() edition_time | timestamp with time zone | not null default now() project_id | integer | not null location | float3d | not null editor_id | integer | parent_id | bigint | radius | real | not null default 0 confidence | smallint | not null default 5 skeleton_id | integer | not null Indexes: "treenode_pkey" PRIMARY KEY, btree (id) "treenode_parent_id" btree (parent_id) "treenode_project_id_location_x_index" btree (project_id, ((location).x)) "treenode_project_id_location_y_index" btree (project_id, ((location).y)) "treenode_project_id_location_z_index" btree (project_id, ((location).z)) "treenode_project_id_skeleton_id_index" btree (project_id, skeleton_id) "treenode_project_id_user_id_index" btree (project_id, user_id) "treenode_skeleton_id_index" btree (skeleton_id)

Erwin Brandstetter · Accepted Answer

複合型はすっきりとしたデザインですが、パフォーマンスをしないパフォーマンスをまったく助けません。

まず、floatは、Postgresでは_float8_ a.k.a. _double precision_に変換されます。あなたは誤解に基づいています。
real データ型は4バイトを占有します（8ではありません）。 4バイトの倍数でアラインする必要があります。

pg_column_size() を使用して実際のサイズを測定します。

SQL Fiddle 実際のサイズを示します。

複合型_real3d_は36バイトを占有します。それは：

_23 byte Tuple header 1 byte padding 4 bytes real x 4 bytes real y 4 bytes real z --- 36 bytes _

それをテーブルに埋め込む場合は、パディングmayを追加する必要があります。一方、タイプのヘッダーはディスク上で3バイト小さくすることができます。ディスク上での表現は通常、RAM内の表現よりも少し小さくなります。大きな違いはありません。

もっと：

テーブルレイアウト

この同等のデザインを使用して、行サイズを大幅に削減します：：

_ Column | Type | Modifiers ---------------+--------------------------+--------------------------------- id | bigint | not null default nextval(... creation_time | timestamp with time zone | not null default now() edition_time | timestamp with time zone | not null default now() user_id | integer | not null project_id | integer | not null location_x | real | not null location_y | real | not null location_z | real | not null radius | real | not null default 0 skeleton_id | integer | not null confidence | smallint | not null default 5 parent_id | bigint | editor_id | integer | _

私の主張を検証するために前後にテストします：

_SELECT pg_relation_size('treenode') As table_size; SELECT avg(pg_column_size(t) AS avg_row_size FROM treenode t; _

詳細：

PostgreSQLテーブル行のサイズを測定

Evan Carroll · Answer

PostGIS

PostGISは簡単で、クエリを実行するための一連の関数を提供します。 PostGISソリューションはシンプルで、インデックスも使用します。ここでは3d（zに対応するため）を使用していますが、必要かどうかはわかりません。

CREATE EXTENSION IF NOT EXISTS postgis; CREATE TABLE t AS ( geom geometry(point) ); INSERT INTO t(geom) VALUES (ST_MakePoint(x,y,z)); CREATE INDEX idx ON table USING Gist(geom Gist_geometry_ops_nd); SELECT * FROM t WHERE geom &&& ST_3DMakeBox( ST_MakePoint(2822.6, 33629.8, 41000.0), ST_MakePoint(62680.2, 65458.6, 41000.0) );

すべてのポイントとジオムにz = 41000がある場合は、2dジオムを使用します。