行を列に変換する

Question

次のようなデータがあります：

 created_at |ステータス --------------------- + ------------- 2016-04- 05 1:27:15 | info 2016-04-05 3:27:15 | info 2016-04-05 5:27:15 |警告 2016-04-05 10:27:15 | info 2016-04-05 11:27:15 |警告

このデータを使用して、次のように変換します。

 ステータス| 2016-04-05 1:00:00 | 2016-04-05 4:00:00 | 2016-04-05 8:00:00 | 2016-04-05 12:00:00 --------- + -------------------- + ---- ---------------- + -------------------- + ------------ ------- 情報| 1 | 1 | 0 | 1 警告| 0 | 0 | 1 | 1

誰もがこれを行うための最良の方法を提案できますか？

Erwin Brandstetter · Answer

代わりに_2016-04-05 0:27:15_を想定 ~~_2016-04-05 1:27:15_~~ 基礎となる表では、質問は私にとってより理にかなっています：

_CREATE TABLE tbl (created_at timestamp, status text); INSERT INTO tbl VALUES ('2016-04-05 00:27:15', 'info') , ('2016-04-05 03:27:15', 'info') , ('2016-04-05 05:27:15', 'warn') , ('2016-04-05 10:27:15', 'info') , ('2016-04-05 11:27:15', 'warn'); _

ロジックは、次の境界までに発生したイベントをカウントし、excludeします。これは見落とされがちな関数 width_bucket() に完全に適合します。正確には、Postgres 9.5で導入されたarbitrary境界を持つバリアントが必要です（OPの境界には通常のパターンがないため）。マニュアルから直接説明：

_width_bucket(operand anyelement, thresholds anyarray) _
バケットの下限をリストする配列を指定すると、operandが割り当てられるバケット番号を返します。最初の下限より小さい入力に対しては_0_を返します。 thresholds配列ソートする必要があります、最初に最小、または予期しない結果が得られます

regularバケットでは、 Postgres 9.1で利用可能な別のバリアントを使用することもできます。
列名と同じ境界を再利用してcrosstab()と組み合わせます（残りのクエリはPostgres 9.1で動作します）：

_SELECT * FROM crosstab( $$SELECT status , width_bucket(created_at, '{2016-04-05 01:00 , 2016-04-05 04:00 , 2016-04-05 08:00 , 2016-04-05 12:00}'::timestamp[]) , count(*)::int FROM tbl WHERE created_at < '2016-04-05 12:00' -- exclude later rows GROUP BY 1, 2 ORDER BY 1, 2$$ , 'SELECT generate_series(0,3)' ) AS t(status text, "2016-04-05 01:00" int , "2016-04-05 04:00" int , "2016-04-05 08:00" int , "2016-04-05 12:00" int); _

結果：

_ status | 2016-04-05 01:00 | 2016-04-05 04:00 | 2016-04-05 08:00 | 2016-04-05 12:00 --------+------------------+------------------+------------------+------------------ info | 1 | 1 | | 1 warn | | | 1 | 1 _

2番目のクロス集計パラメーター（'SELECT generate_series(0,3)'）は、実行されたときのクエリ文字列で、すべてのターゲット列に対して1行を返します。どちらの側にもない値-生データ内にない、または2番目のパラメーターによって生成されない-はすべて単に無視されます。

crosstab()の基本：

PostgreSQLクロス集計クエリ

NULLを0に置き換えます

結果にNULLではなく_0_が必要な場合は、COALESCE()で修正しますが、これは単なる表面的な問題です。

_SELECT status , COALESCE(t0, 0) AS "2016-04-05 01:00" , COALESCE(t1, 0) AS "2016-04-05 04:00" , COALESCE(t2, 0) AS "2016-04-05 08:00" , COALESCE(t3, 0) AS "2016-04-05 12:00" FROM crosstab( $$SELECT status , width_bucket(created_at, '{2016-04-05 01:00 , 2016-04-05 04:00 , 2016-04-05 08:00 , 2016-04-05 12:00}'::timestamp[]) , count(*)::int FROM tbl WHERE created_at < '2016-04-05 12:00' GROUP BY 1, 2 ORDER BY 1, 2$$ , 'SELECT generate_series(0,3)' ) AS t(status text, t0 int, t1 int, t2 int, t3 int); _

結果：

_ status | 2016-04-05 01:00 | 2016-04-05 04:00 | 2016-04-05 08:00 | 2016-04-05 12:00 --------+------------------+------------------+------------------+------------------ info | 1 | 1 | 0 | 1 warn | 0 | 0 | 1 | 1 _

合計を追加する

statusごとの合計を追加するには、Postgres 9.5以降で新しい _GROUPING SETS_を使用します

_SELECT status , COALESCE(t0, 0) AS "2016-04-05 01:00" , COALESCE(t1, 0) AS "2016-04-05 04:00" , COALESCE(t2, 0) AS "2016-04-05 08:00" , COALESCE(t3, 0) AS "2016-04-05 12:00" , COALESCE(t4, 0) AS total FROM crosstab( $$SELECT status, COALESCE(slot, -1), ct -- special slot for totals FROM ( SELECT status , width_bucket(created_at, '{2016-04-05 01:00 , 2016-04-05 04:00 , 2016-04-05 08:00 , 2016-04-05 12:00}'::timestamp[]) AS slot , count(*)::int AS ct FROM tbl WHERE created_at < '2016-04-05 12:00' GROUP BY GROUPING SETS ((1, 2), 1) -- add totals per status ORDER BY 1, 2 ) sub$$ , 'VALUES (0), (1), (2), (3), (-1)' -- switched to VALUES for more sophisticated series ) AS t(status text, t0 int, t1 int, t2 int, t3 int, t4 int);_

上記のような結果に加えて：

_... | total ... -+------- ... | 3 ... | 2 _

totalには、crosstab()でフィルタリングした場合でも、除外されていないbefore集計以外のすべての行が含まれることに注意してください。

これは、不明確な質問ではなく、コメントでの @Véraceの要求への返信です。

V&#233;race · Answer

これに答えるために、私は次のことをしました。（次の3つのスレッドが役に立った- 1 、 2 および。また、generate_seriesとCROSSTAB here と here を使用すると便利です）。これはする必要がありますは9.1で動作しますが、テストは行われていませんが、ドキュメントでは、9.1以降のものが使用されていないことが示されています。

テーブルを作成しました：

ntest=# create table pv_tab(created_at timestamp, status varchar(10));

そしてそれを投入した。

INSERT INTO pv_tab VALUES('2016-04-05 01:27:15', 'info'); INSERT INTO pv_tab VALUES('2016-04-05 03:27:15', 'info'); INSERT INTO pv_tab VALUES('2016-04-05 05:27:15', 'warn'); INSERT INTO pv_tab VALUES('2016-04-05 10:27:15', 'info'); INSERT INTO pv_tab VALUES('2016-04-05 11:27:15', 'warn'); INSERT INTO pv_tab VALUES('2016-04-05 00:27:15', 'info'); INSERT INTO pv_tab VALUES('2016-04-05 00:24:15', 'info'); INSERT INTO pv_tab VALUES('2016-04-05 00:24:13', 'warn'); INSERT INTO pv_tab VALUES('2016-04-05 00:24:13', 'warn'); INSERT INTO pv_tab VALUES('2016-04-05 01:24:13', 'warn'); INSERT INTO pv_tab VALUES('2016-04-05 01:24:13', 'info'); INSERT INTO pv_tab VALUES('2016-04-05 01:12:13', 'info'); INSERT INTO pv_tab VALUES('2016-04-05 01:12:22', 'info'); INSERT INTO pv_tab VALUES('2016-04-05 02:05:45', 'info'); INSERT INTO pv_tab VALUES('2016-04-05 02:34:45', 'warn'); INSERT INTO pv_tab VALUES('2016-04-05 10:34:45', 'warn'); INSERT INTO pv_tab VALUES('2016-04-05 10:35:45', 'warn'); INSERT INTO pv_tab VALUES('2016-04-05 10:36:45', 'warn'); INSERT INTO pv_tab VALUES('2016-04-05 10:36:45', 'info'); INSERT INTO pv_tab VALUES('2016-04-05 10:36:34', 'info'); (20 rows)

私のクエリの（正しい）結果は：

stat slot1 slot2 slot3 slot4 slot5 slot6 Total ---- ----- ----- ----- ----- ----- ----- ----- info 6 2 0 3 0 0 11 warn 3 1 1 4 0 0 9

機能するクエリは次のとおりです。

SELECT * FROM CROSSTAB ( ' WITH time_slots AS ( SELECT status, CASE -- Here I put the "created_at" values into "buckets" - it would -- not be desirable to have too many of these buckets - certainly -- any more than 12 would make the `SQL` and result unwieldy! -- I recommend that you create 2hr slots - 00:00 - 02:00, &c. -- This `CTE` splits the times into the various slots sample slots -- 1-4 - you can, of course, have more but it makes the `SQL` and -- the answer more messy. Here, I;ve deliberately only used 4 -- out of 6 in order to illustrate dealing with sparse data in -- the result. (I used the OP;s initial slots - easy to change). WHEN created_at < ''2016-04-05 02:00'' THEN 1 WHEN created_at >= ''2016-04-05 02:00'' AND created_at < ''2016-04-05 04:00'' THEN 2 WHEN created_at >= ''2016-04-05 04:00'' AND created_at < ''2016-04-05 08:00'' THEN 3 WHEN created_at >= ''2016-04-05 08:00'' AND created_at < ''2016-04-05 12:00'' THEN 4 END AS time_slot, COUNT(status) AS stat_count FROM pv_tab GROUP BY status, time_slot ORDER BY status, time_slot ), statuae AS -- Get all statuses. Hardly necessary when there are only two, but -- could be an issue later if more values are required ("unknown".. &c.). ( SELECT DISTINCT(status) AS stati FROM pv_tab ), all_slots (slots) AS -- This `CTE` is necessary to perform a cross-join between statuses -- and slots. This is because the `CROSSTAB` table function doesn;t -- appear to play well with `NULL`s - see my question to Erwin -- Brandstetter in comments. ( SELECT generate_series(1, 6) -- six (should be) 2 hour slots. In any case, it is arbitrary! ), stat_slots AS -- Here the statuses slots are cross-joined - i.e. all slots with all statuses. ( SELECT statuae.stati, all_slots.slots FROM statuae, all_slots ), individual_stati AS -- `Left-join` the complete status/slot table with the actual slots in -- the sample table. NULL counts are `COALESCE`ed into 0 - necessary, otherwise -- `NULL`s "back up" the result and leaves blanks in the right-most -- columns - and the totals appear in what should be slots. ( SELECT ss.stati AS status, ss.slots AS time_slot, COALESCE(ts.stat_count, 0) AS counts FROM stat_slots ss LEFT JOIN time_slots ts ON ss.stati = ts.status AND ss.slots = ts.time_slot ORDER BY 1, 2 ), total_stati AS -- This is just pure showing off :-). I;m using this `CTE` to add -- a totals field to the query. Not asked for by the OP - can be -- ripped out! I got the idea for this from the 3rd link (top of post). ( SELECT status, 7 AS time_slot, count(status) AS counts -- 7 - an exta slot for totals FROM pv_tab GROUP BY status ) -- Final query bringing it all together - Nice, simple and elegant. :-) SELECT status, time_slot, counts FROM individual_stati UNION SELECT status, time_slot, counts FROM total_stati ORDER BY 1, 2 ' ) AS My_Tab("stat" varchar(10), "slot1" bigint, "slot2" bigint, "slot3" bigint, "slot4" bigint, "slot5" bigint, "slot6" bigint, "Total" bigint);