lundi 2 mai 2016

Python: Sqlite multiple inserts with only one disk operation

I am having some disk problems when deploying to a docker instance and I can't figure out yet how to solve it.

In the docker instance, there is a process that uses Shove with sqlite as the backend to save data to disk as a dictionary (I am saving around 200'000 records).

The problem occurs because I am using Shove as a dictionary to save, which makes every __setitem__ operation into a insert in sqlite.

Now Shove offers a sync (an integer) parameter to specify when to execute the actual storage into disk, but that only saves a temporary dictionary in memory, and when the time comes it still makes the number of sync operations into disk (which I don't want).

I checked that sqlite offers a executemany operation which should reduce the method-call overheads, but does it really does 1 (or at least close to 1) operations to disk? I can't seem to find a clear explanation for this.

I also checked that a bulk operation could be done on sqlite but it isn't explained on python, so should I just call it with execute and creating a very big string? (I want to be able to at least save 1000 records at a time).

So what should be the best way to write to disk with the minimum amount of operations?

Aucun commentaire:

Enregistrer un commentaire