web-dev-qa-db-ja.com

PostgreSQLのネストされたテーブルから選択するより良い方法

マッチテーブルに数百万行の次のスキーマがあり、各マッチには2つの側面(アウェイとホーム)があります。 IDを名前で置き換える一致に関する最も重要なデータを表示するビューを作成したいと思います。名前は変更される可能性があります。これが、コーチとチームが別々のテーブルを使用する理由です。

schema

2つの方法を考えましたが、どちらにも満足できません。

CREATE VIEW "fumbbl"."matches" AS
  SELECT "fmid", "time",
    (SELECT "name" FROM "fumbbl"."division"
      WHERE "fdid" = m."fdid") "division",
    (SELECT "coachbracket" FROM "fumbbl"."side"
      WHERE "fmid" = m."fmid" AND "away" IS FALSE) "hbracket",
    (SELECT "name" FROM "fumbbl"."coach"
      WHERE "fcid" = (SELECT "fcid" FROM "fumbbl"."side"
        WHERE "fmid" = m."fmid" AND "away" IS FALSE)) "hcoach",
    (SELECT "name" FROM "fumbbl"."team"
      WHERE "ftid" = (SELECT "ftid" FROM "fumbbl"."side"
        WHERE "fmid" = m."fmid" AND "away" IS FALSE)) "hteam",
    (SELECT "kilotv" FROM "fumbbl"."side"
      WHERE "fmid" = m."fmid" AND "away" IS FALSE) "hktv",
    (SELECT "race" FROM "fumbbl"."team"
      WHERE "ftid" = (SELECT "ftid" FROM "fumbbl"."side"
        WHERE "fmid" = m."fmid" AND "away" IS FALSE)) "hrace",
    (SELECT "score" FROM "fumbbl"."side"
      WHERE "fmid" = m."fmid" AND "away" IS FALSE) "hscore",
    (SELECT "score" FROM "fumbbl"."side"
      WHERE "fmid" = m."fmid" AND "away" IS TRUE) "ascore",
    (SELECT "race" FROM "fumbbl"."team"
      WHERE "ftid" = (SELECT "ftid" FROM "fumbbl"."side"
        WHERE "fmid" = m."fmid" AND "away" IS TRUE)) "arace",
    (SELECT "kilotv" FROM "fumbbl"."side"
      WHERE "fmid" = m."fmid" AND "away" IS TRUE) "aktv",
    (SELECT "name" FROM "fumbbl"."team"
      WHERE "ftid" = (SELECT "ftid" FROM "fumbbl"."side"
        WHERE "fmid" = m."fmid" AND "away" IS TRUE)) "ateam",
    (SELECT "name" FROM "fumbbl"."coach"
      WHERE "fcid" = (SELECT "fcid" FROM "fumbbl"."side"
        WHERE "fmid" = m."fmid" AND "away" IS TRUE)) "acoach",
    (SELECT "coachbracket" FROM "fumbbl"."side"
      WHERE "fmid" = m."fmid" AND "away" IS TRUE) "abracket"
  FROM "fumbbl"."match" m ORDER BY "fmid" DESC;

これの問題は、同じサブクエリを複数回実行しているために効果がないことです。ただし、行数が限られている場合、2番目のパフォーマンスよりもパフォーマンスが向上します。ただし、すべてのデータで選択を行う場合、またはfmidで昇順の選択を行う場合は、速度が遅くなります。

説明の最初の行:

"Index Scan using match_fmid_desc on match m  (cost=0.42..46098616.62 rows=289261 width=14)"

ここで使用されるインデックスは

CREATE INDEX "match_fmid_desc" ON "fumbbl"."match" ("fmid" DESC);

2番目のアプローチでは、結合を使用します。

CREATE VIEW "fumbbl"."matches" AS
  SELECT
      m."fmid",
      m."time",
      (SELECT "name" FROM "fumbbl"."division"
         WHERE "fdid" = m."fdid") "division",
      "hbracket",
      (SELECT "name" FROM "fumbbl"."coach"
         WHERE "fcid" = hst."fcid") "hcoach",
      "hteam",
      "hktv",
      "hrace",
      "hscore",
      "ascore",
      "arace",
      "aktv",
      "ateam",
      (SELECT "name" FROM "fumbbl"."coach"
         WHERE "fcid" = ast."fcid") "acoach",
      "abracket"
    FROM (
      SELECT *
      FROM "fumbbl"."match"
      ) m
    JOIN (
      (SELECT
        "fmid", "away", "fcid", "ftid",
        "coachbracket" AS "hbracket",
        "kilotv" AS "hktv",
        "score" AS "hscore"
      FROM "fumbbl"."side") hside
      JOIN (
        SELECT
          "ftid",
          "name" AS "hteam",
          "race" AS "hrace"
        FROM "fumbbl"."team") ht
        ON (hside."ftid" = ht."ftid")
      ) hst
    ON (m."fmid" = hst."fmid" AND hst."away" IS FALSE)
    JOIN (
      (SELECT
        "fmid", "away", "fcid", "ftid",
        "coachbracket" AS "abracket",
        "kilotv" AS "aktv",
        "score" AS "ascore"
      FROM "fumbbl"."side") aside
      JOIN (
        SELECT
          "ftid",
          "name" AS "ateam",
          "race" AS "arace"
        FROM "fumbbl"."team") at
        ON (aside."ftid" = at."ftid")
      ) ast
    ON (m."fmid" = ast."fmid" AND ast."away" IS TRUE)
  ORDER BY "fmid" DESC;

説明の最初の行:

"Sort  (cost=7238489.68..7239207.00 rows=286927 width=84)"

これに関する私の問題は、単一の行を提供する前にテーブル全体を結合することです。ただし、データ全体でパフォーマンスが向上します。

私はPostgreSQLを初めて使用するので、この種のクエリを効果的に実行する方法を見逃していたのかもしれません。各反復で一度だけ両側のデータを取得する反復メソッドがあればうれしいです。

編集:説明値によってうまく機能したアーウィンの答えの後、私はまだ別の方法を試したかったです。マッチのサイドフィールドを取得することで、仕事のほとんどを実行する関数を作成しました。この関数はビューで使用されます。

CREATE OR REPLACE FUNCTION fumbbl.get_sides_data(
  in "fmid" integer,
  out "hcoach" character varying,
  out "hcoachbracket" "fumbbl"."bracket",
  out "hteam" character varying,
  out "hrace" character varying,
  out "hkilotv" smallint,
  out "hscore" smallint,
  out "acoach" character varying,
  out "acoachbracket" "fumbbl"."bracket",
  out "ateam" character varying,
  out "arace" character varying,
  out "akilotv" smallint,
  out "ascore" smallint
) AS
$$
DECLARE
  sfcid integer;
  sftid integer;
BEGIN
  SELECT s."fcid", s."coachbracket", s."kilotv", s."ftid", s."score"
    from fumbbl.side s
    WHERE (s."fmid" = $1 AND s."away" IS FALSE) INTO sfcid, $3, $6, sftid, $7;
  SELECT c."name" from fumbbl.coach c WHERE c."fcid" = sfcid INTO $2;
  SELECT t."name", t."race" from fumbbl.team t WHERE t."ftid" = sftid INTO $4, $5;

  SELECT s."fcid", s."coachbracket", s."kilotv", s."ftid", s."score"
    from fumbbl.side s
    WHERE (s."fmid" = $1 AND s."away" IS TRUE) INTO sfcid, $9, $12, sftid, $13;
  SELECT c."name" from fumbbl.coach c WHERE c."fcid" = sfcid INTO $8;
  SELECT t."name", t."race" from fumbbl.team t WHERE t."ftid" = sftid INTO $10, $11;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE VIEW fumbbl.matches4 AS
SELECT
m.fmid,
m.time,
(SELECT "name" FROM "fumbbl"."division" WHERE "fdid" = m."fdid") "division",
s."hcoach",
s."hcoachbracket",
s."hteam",
s."hrace",
s."hkilotv",
s."hscore",
s."acoach",
s."acoachbracket",
s."ateam",
s."arace",
s."akilotv",
s."ascore"
FROM fumbbl.match m, fumbbl.get_sides_data(m.fmid) s
ORDER  BY m.fmid DESC;

結論:

説明では関数を呼び出すビューの有効性を測定できないことに気づきました。そのため、オプション(Erwinを含む)をタイマーと比較することになりました。当然のことながら、オプション1(多くの選択)とオプション3(関数の呼び出し元)は、発熱のある行を使用した選択でパフォーマンスが向上します。オプション3は、私のデータベースの1〜10行を支配するように見えます。ただし、数十万行の場合は、JOINが適しています。 JOINを使用すると、単一の行を取得する前に操作全体を待機する必要があります。驚いたことに、Erwinの方法はオプション2(ネストされた結合)と同じくらい高速に見えるため、PGがバックグラウンドで何らかの最適化を行っているように感じます。それでも、Erwinのコードは見栄えがよく、より保守可能です。

4
SzieberthAdam

Postgres 9.4では、集約FILTER句を使用して単純化できます。

CREATE VIEW fumbbl.matches AS
SELECT m.fmid, m.time, d.name AS division
     , min(s.coachbracket) FILTER (WHERE NOT s.away) AS hbracket
     , min(t.name)         FILTER (WHERE NOT s.away) AS hteam
     , min(c.name)         FILTER (WHERE NOT s.away) AS hcoach
     , min(s.score)        FILTER (WHERE NOT s.away) AS hscore
     , min(t.race)         FILTER (WHERE NOT s.away) AS hrace
     , min(s.kilotv)       FILTER (WHERE NOT s.away) AS hktv

     , min(s.coachbracket) FILTER (WHERE s.away) AS abracket
     , min(t.name)         FILTER (WHERE s.away) AS ateam
     , min(c.name)         FILTER (WHERE s.away) AS acoach
     , min(s.score)        FILTER (WHERE s.away) AS ascore
     , min(t.race)         FILTER (WHERE s.away) AS arace
     , min(s.kilotv)       FILTER (WHERE s.away) AS aktv
FROM   fumbbl.match    m
JOIN   fumbbl.division d USING (fdid)
JOIN   fumbbl.side     s USING (fmid)
JOIN   fumbbl.team     t USING (ftid)
JOIN   fumbbl.coach    c USING (fcid)
GROUP  BY m.fmid, d.fdid  -- PK columns
ORDER  BY m.fmid DESC;

各テーブルに1回だけ結合します。しかし、ホームとアウェイサイドを1つの行に折りたたむには、集約ステップが必要です。

新しいFILTER句の詳細:

2