I am working with a sqlite3 database of around 70 gigabytes right now. This db has three tables: one with about 30 million rows, and two more with ~150 and ~300 million each, with each table running from 6-11 columns.
The table with the fewest rows is consuming the bulk of the space, as it contains a raw data column of zipped BLOBs, generally running between 1 and 6 kilobytes per row; all other columns in the database are numeric, and the zipped data is immutable so inefficiency in modification is not a concern.
I have noticed that creating indexes on the numeric columns of this table:
[15:52:36] Query finished in 723.253 second(s).
takes several times as long as creating a comparable index on the table with five times as many rows:
[15:56:24] Query finished in 182.009 second(s).
[16:06:40] Query finished in 201.977 second(s).
Would it be better practice to store the BLOB data in a separate table to access with JOINs? The extra width of each row is the most likely candidate for the slow scan rate of this table.
My current suspicions are:
- This is mostly due to the way data is read from disk, making skipping medium-sized amounts of data impractical and yielding a very low ratio of usable data per sector read from the disk by the operating system, and
- It is therefore probably standard practice that I did not know as a relative newcomer to relational databases to avoid putting larger, variable-width data into the same table as other data that may need to be scanned without indices
but I would appreciate some feedback from someone with more knowledge in the field.
Aucun commentaire:
Enregistrer un commentaire