Question Summary
I can read all values out of the single column of a one-column table quite quickly. How can I read all values just as quickly from a single column of a table that has several other columns as well?
Details
I'm using the C++ api to read a sqlite database containing a single table with 2.2 million records.
The data has a "coordinates" column and (optionally) several other columns. The "coordinates" column is a BLOB and currently is always 8 bytes long. The other columns are a mix of TEXT and REAL, with the TEXT strings anywhere from a few characters to about 100 characters (the lengths vary record by record).
In one experiment, I created the table with the "coordinates" column, plus about 15 other columns. The total database file size was 745 MB. I did a simple
int rc = sqlite3_exec( db, "select coordinates from facilities", ReadSQLiteCallback, NULL, &errorMessage );
and it took 91 seconds to execute.
I then created the table with just the "coordinates" column and no other data columns. The total database file size was 36 MB. I ran the same select statement and it took 1.23 seconds.
I'm trying to understand what accounts for this dramatic difference in speed, and how I can improve the speed when the table has those additional data columns.
I do understand that the larger file means simply more data to read through. But I would expect the slowdown to be at worst more or less linear with the file size (i.e., that it would take maybe 20 times the 1.23 seconds, or about 25 seconds, but not 91 seconds).
Question Part I
I'm not using an index on the file because in general I tend to read most or all of the entire "coordinates" column at once as in the simple select above. So I don't really need an index for sorting or quickly accessing a subset of the records. But perhaps having an index would help the engine move from one variable-sized record to the next more quickly as it reads through all the data?
Is there any other simple idea that might help cut down on those 91 seconds?
Question Part II
Assuming there is no magic bullet for bringing the 91 seconds (when the 15 other data columns are included) down close to the 1.23 seconds (when just the coordinates column is present) in a single table, it seems like I could just use multiple tables, putting the coordinates in one table and the rest of the fields (to which I don't need such quick access) in another.
This sounds like it may be a use for foreign keys, but it seems like my case doesn't necessarily require the complexity of foreign keys, because I have a simple 1-to-1 correspondence between the coordinates table and the other data table -- each row of the coordinates table corresponds to the same row number of the other data table, so it's really just like I've "split" each record across two tables.
So the question is: I can of course manage this splitting by myself, by adding a row to both tables for each of my records, and deleting a row from both tables to delete a record. But is there a way to make SQLite manage this splitting for me (I googled "sqlite split record across tables" but didn't find much)?
Aucun commentaire:
Enregistrer un commentaire