samedi 7 février 2015

How to optimize an Sqlite DB file size of INTEGERs?

I want to build a 20,000,000 record table in sqlite, but the file size is slightly larger than its TAB-separated plaintext representation.


Are there ways to optimize data size storage, specific to sqlite?


Details:


Each record has:



  • 8 integers

  • 3 enums (represented for now as 1 byte text),

  • 7 text


I suspect that the numbers are not stored efficiently (value range 10,000,000 to 900,000,000)


According to the docs, I expect them to take 3-4 bytes, if stored as a number, and 8-9 bytes if stored as text (maybe additional termination byte or size indicator byte), meaning 1:2 ratio between storing as int : store as text).


But it doesn't appear so.


So I created an experient:


2 different sqlite databases (files), each consists of 1 table, 1 column: an INTEGER vs. a TEXT column.



CREATE TABLE t(a INTEGER); # number.db
CREATE TABLE t(b TEXT); # text.db


I generated 60,000 random numbers between 20,000,000 to 999,999,999: and inserted them as number (in number.db) and as text (to text.db).



  • insert into t values (548704536), (679788835), (500536870), ... ;

  • insert into t values ("548704536"), ("679788835"), ("500536870"), ...;


after vacuum; and .quit, the file sizes are:



  • 723,968 number.sqlite

  • 1,036,288 text.sqlite


strange; it's rather far than my 1:2 theory; how comes? maybe it's the record overhead?


Aucun commentaire:

Enregistrer un commentaire