


# Projects with at least 50 members with at least 240 yearly commits each
select project_id, count(*) as active_members from
        select project_members.repo_id as project_id, project_members.user_id, count(*)
        from project_members
        inner join yearly_project_commits on project_members.user_id = yearly_project_commits.committer_id and project_members.repo_id = yearly_project_commits.project_id
        group by project_members.repo_id, project_members.user_id
        having count(*) > 240
) as active_member_projects
group by project_id
having count(*) > 50;

元々、結果が得られずに数日間実行されていました。クエリの実行を中断した時点では、MySQLはCPU時間を大幅に消費しておらず、システムコールも発行していませんでした(mysqldプロセスでstraceを実行すると表示されます)。その時点で、show processlistの出力は次の結果をもたらしました。

| Id | User      | Host      | db        | Command | Time   | State        | Info                                                                                                 |
| 55 | ghtorrent | localhost | ghtorrent | Query   | 217907 | Sending data | select project_id, count(*) as active_members from
        select project_members.repo_id as project_id,  |
| 69 | ghtorrent | localhost | ghtorrent | Query   |      0 | NULL         | show processlist                                                                                     |

また、クエリでEXPLAINを実行しようとしましたが、これもスタックし、show processlistは次のように表示されます。

| Id | User      | Host      | db        | Command | Time | State                        | Info                                                                                                 |
| 10 | ghtorrent | localhost | ghtorrent | Query   |  564 | Copying to tmp table on disk | explain select populous_projects.name, members, count(populous_projects.project_id) as yearly_projec |
| 40 | ghtorrent | localhost | ghtorrent | Query   |    1 | NULL                         | show processlist                                                                                     |


| id | select_type | table                  | type  | possible_keys           | key        | key_len | ref                               | rows  | Extra       |
|  1 | SIMPLE      | project_members        | index | PRIMARY,user_id         | PRIMARY    | 8       | NULL                              | 53530 | Using index |
|  1 | SIMPLE      | yearly_project_commits | ref   | committer_id,project_id | project_id | 4       | ghtorrent.project_members.repo_id |   113 | Using where |


mysql> show indexes from yearly_project_commits;
| Table                  | Non_unique | Key_name     | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
| yearly_project_commits |          1 | committer_id |            1 | committer_id | A         |     2535097 |     NULL | NULL   | YES  | BTREE      |         |               |
| yearly_project_commits |          1 | project_id   |            1 | project_id   | A         |     7134168 |     NULL | NULL   |      | BTREE      |         |               |

mysql> show indexes from project_members;
| Table           | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
| project_members |          0 | PRIMARY  |            1 | repo_id     | A         |     5559879 |     NULL | NULL   |      | BTREE      |         |               |
| project_members |          0 | PRIMARY  |            2 | user_id     | A         |     5559879 |     NULL | NULL   |      | BTREE      |         |               |
| project_members |          1 | user_id  |            1 | user_id     | A         |     5559879 |     NULL | NULL   |      | BTREE      |         |               |

対応するcreate tableコマンドは次のとおりです。

| Table           | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| project_members | CREATE TABLE `project_members` (
  `repo_id` int(11) NOT NULL,
  `user_id` int(11) NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `ext_ref_id` varchar(24) NOT NULL DEFAULT '0',
  PRIMARY KEY (`repo_id`,`user_id`),
  KEY `user_id` (`user_id`),
  CONSTRAINT `project_members_ibfk_1` FOREIGN KEY (`repo_id`) REFERENCES `projects` (`id`),
  CONSTRAINT `project_members_ibfk_2` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`)
| yearly_project_commits | CREATE TABLE `yearly_project_commits` (
  `id` int(11) NOT NULL DEFAULT '0',
  `author_id` int(11) DEFAULT NULL,
  `committer_id` int(11) DEFAULT NULL,
  `project_id` int(11) NOT NULL DEFAULT '0',
  `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  KEY `committer_id` (`committer_id`),
  KEY `project_id` (`project_id`)

EXPLAINの実行時のshow full processlistの出力は次のとおりです。

| Id | User      | Host      | db        | Command | Time  | State        | Info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| 43 | ghtorrent | localhost | ghtorrent | Query   | 53434 | Sending data | explain
    select project_id, count(*) as active_members from
            select project_members.repo_id as project_id, project_members.user_id, count(*)
            from project_members
            inner join yearly_project_commits on project_members.user_id = yearly_project_commits.committer_id and project_members.repo_id = yearly_project_commits.project_id
            group by project_members.repo_id, project_members.user_id
            having count(*) > 240
    ) as active_member_projects
    group by project_id
    having count(*) > 50 |
| 47 | ghtorrent | localhost | ghtorrent | Query   |     0 | NULL         | SHOW FULL PROCESSLIST                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |

これらのことは、Debian GNU/Linux 8.0(jessie)でMySQLServerバージョン5.5.43-0 + deb8u1(Debian)を実行しているアイドル状態のマシンで発生します。マシンには24GBのRAMがあり、innodb_buffer_pool_size=1GBで構成されています。



一部のクエリでは、EXPLAINは統計を取得するためにいくつかのサブクエリを実行しようとします-それは 既知のバグ これは5.6およびMariaDBで修正されています(少なくとも10、5.5についてはわかりません)。


両方の列(project_id, committer_id)を使用して1つの複数列インデックスを作成する必要があります。これにより、サブクエリが、結合で直接アクセスできるインデックスのみのスキャンになり、大幅に高速化されます。
