I want to build a 20,000,000 record table in sqlite, but the file size is slightly larger than its TAB-separated plaintext representation.
Are there ways to optimize data size storage, specific to sqlite?
Details:
Each record has:
- 8 integers
- 3 enums (represented for now as 1 byte text),
- 7 text
I suspect that the numbers are not stored efficiently (value range 10,000,000 to 900,000,000)
According to the docs, I expect them to take 3-4 bytes, if stored as a number, and 8-9 bytes if stored as text (maybe additional termination byte or size indicator byte), meaning 1:2 ratio between storing as int : store as text).
But it doesn't appear so.
So I created an experient:
2 different sqlite databases (files), each consists of 1 table, 1 column: an INTEGER vs. a TEXT column.
CREATE TABLE t(a INTEGER); # number.db
CREATE TABLE t(b TEXT); # text.db
I generated 60,000 random numbers between 20,000,000 to 999,999,999: and inserted them as number (in number.db) and as text (to text.db).
- insert into t values (548704536), (679788835), (500536870), ... ;
- insert into t values ("548704536"), ("679788835"), ("500536870"), ...;
after vacuum; and .quit, the file sizes are:
- 723,968 number.sqlite
- 1,036,288 text.sqlite
strange; it's rather far than my 1:2 theory; how comes? maybe it's the record overhead?
Aucun commentaire:
Enregistrer un commentaire