lundi 19 octobre 2015

Worse query plan with a JOIN after ANALYZE

I see that running ANALYZE results in significantly poor performance on a particular JOIN I'm making between two tables.

Suppose the following schema:

CREATE TABLE a ( id INTEGER PRIMARY KEY, name TEXT );
CREATE TABLE b ( a NOT NULL REFERENCES a, value INTEGER, PRIMARY KEY(a, b) );

CREATE VIEW ab AS SELECT a.name, b.text, MAX(b.value)
FROM a
JOIN b ON b.a = a.id;
GROUP BY a.id
ORDER BY a.name

Table a is approximately 10K rows, table b is approximately 48K rows (~5 rows per row in table a).

Before ANALYZE

Now when I run the following query:

SELECT * FROM ab;

The query plan looks as follows:

1|0|0|SCAN TABLE b
1|1|1|SEARCH TABLE a USING INTEGER PRIMARY KEY (rowid=?)

This is a good plan, b is larger and I want it to be in the outer loop, making use of the index in table a. It finishes well within a second.

After ANALYZE

When I execute the same query again, the query plan results in two table scans:

1|0|1|SCAN TABLE a
1|1|0|SCAN TABLE b

This is far for optimal. For some reason the query planner thinks that an outer loop of 10K rows and an inner loop of 48K rows is a better fit. This takes about 1.5 minute to complete.

Should I adapt the index in table b to make it work after ANALYZE? Anything else to change to the indexing/schema?

I just try to understand the problem here. I worked around it using a CROSS JOIN, but that feels dirty and I don't really understand why the planner would go with a plan that is orders of magnitude slower than the un-analyzed plan. It seems to be related to GROUP BY, since the query planner puts table b in the outer loop without it (but that renders the query useless for what I want).

Aucun commentaire:

Enregistrer un commentaire