vendredi 20 mars 2015

SQLite: query optimization using index on range query

Here is a simplified version of my real problem: there is a table with two columns and an index on the table. That is,



CREATE TABLE T (a integer primary key, b integer not null);
CREATE INDEX Ti on T (b, a);


Column a is the primary key thus unique but column b can have many duplicates. And the query in question is like



SELECT * from T where b=5 and a>3 order by b, a limit 1;


My expectation is only one binary search should be enough to locate the smallest (b, a) pair that meets the condition if it makes full use of the index. Moreover, the document on SQLite query planner specifically states that



the index might be used if the initial columns of the index (columns a, b, and so forth) appear in WHERE clause terms. The initial columns of the index must be used with the = or IN or IS NULL operators. The right-most column that is used can employ inequalities. For the right-most column of an index that is used, there can be up to two inequalities that must sandwich the allowed values of the column between two extremes.



but the result by explain and explain query plan is quite disappointing. (sqlite3 3.8.8.3)



sqlite> explain query plan select * from T where b=5 and a>3 order by b,a limit 1;
0 0 0 SEARCH TABLE T USING COVERING INDEX Ti (b=?)

sqlite> explain select * from T where b=5 and a>3 order by b,a limit 1;
5 SeekGE 2 14 2 1 00
6 IdxGT 2 14 2 1 00
7 IdxRowid 2 3 0 00
8 Le 4 13 3 54
9 Copy 3 5 0 00
10 Column 2 0 6 00
11 ResultRow 5 2 0 00
12 IfZero 1 14 -1 00
13 Next 2 6 0 00


Definitely it's using the index only for locating the first row with b=5 and then doing linear scan to find rows with a>3. When there are only few rows with duplicating b values it might be okay, but it can be a problem otherwise. Instead of SeekGE with p4=1, SeekGT with p4=2 and IdxGT with p4=1 can be much more efficient because it can directly locate the right row with only one binary search.


So the question is, of course there'd be not much I can do if you say that's just the way it is at this point, but just in case is there anything I'm missing that could make it work better? ANALYZE is not an option because this is about that there is a generally better way to handle the range query not only on specific data set.


Aucun commentaire:

Enregistrer un commentaire