mardi 7 juillet 2015

256000 Bytes Limit of Python's SQLite3 Databases

I have written a web crawler that does some online pre-processing and stores that data into a database. The database layout is quite simple, but let me outline it:

  • Table dispatched lists items that were dispatched at a particular time, which is one of the columns. There are 15 other columns that describe the items: One of these is from type TEXT and the rest is from type INT.
  • The table missed lists the time periods that the crawler failed to watch, e.g. because of network problems. It has 4 columns from type INT and 2 columns from type TEXT.

I left it run for many hours twice. Both times, the resulting database file had a total size of exactly 256000 bytes, at least according to ls -l. I have seen from the recorded data that regularly 1-3 items are recorded per minute, but starting from a particular time, there are no new items listed any more.

To me, this sounds as if there was a limitation that I hit. Given that the resulting database file size was exactly 1000 * 2^8 bytes both times, I would suspect that it is a limitation on the maximum database file size, but the documentation doesn't say anything like that.

The moment that SQLite stopped appending new rows to the database, there were

  • 5187 rows on dispatched and 3 rows on missed during the first run and
  • 5212 rows on dispatched and 2 rows on missed during the second run.

I'm using the sqlite3 module for Python 2.7. I appreciate any help that might point out what was going on, why SQLite stopped appending new rows after hitting 256000 bytes and how I could fix this.

Aucun commentaire:

Enregistrer un commentaire