samedi 19 décembre 2015

Organizing tables with data-heavy rows to optimize access times

I am working with a sqlite3 database of around 70 gigabytes right now. This db has three tables: one with about 30 million rows, and two more with ~150 and ~300 million each, with each table running from 6-11 columns.

The table with the fewest rows is consuming the bulk of the space, as it contains a raw data column of zipped BLOBs, generally running between 1 and 6 kilobytes per row; all other columns in the database are numeric, and the zipped data is immutable so inefficiency in modification is not a concern.

I have noticed that creating indexes on the numeric columns of this table:

[15:52:36] Query finished in 723.253 second(s).

takes several times as long as creating a comparable index on the table with five times as many rows:

[15:56:24] Query finished in 182.009 second(s).
[16:06:40] Query finished in 201.977 second(s).

Would it be better practice to store the BLOB data in a separate table to access with JOINs? The extra width of each row is the most likely candidate for the slow scan rate of this table.

My current suspicions are:

  1. This is mostly due to the way data is read from disk, making skipping medium-sized amounts of data impractical and yielding a very low ratio of usable data per sector read from the disk by the operating system, and
  2. It is therefore probably standard practice that I did not know as a relative newcomer to relational databases to avoid putting larger, variable-width data into the same table as other data that may need to be scanned without indices

but I would appreciate some feedback from someone with more knowledge in the field.

Aucun commentaire:

Enregistrer un commentaire