samedi 16 avril 2016

Speed up many python sqlite3 insert or ignore and update commands

I have some array data (in python) which I want to add to an SQL database using the data index as its primary key. I am doing this for multiple columns (shown below for a column named theta0 but really I'm stepping through columns named theta0, theta1, etc.) and each time I would like the data to either be added to the existing row (if the primary key exists) or create a new row (if the primary key does not exist). Here's the meat of the code, where nonzero_index is a list of indices.

    # Either inserts new ID with given value or ignores if id already exists 
    c.executemany("INSERT OR IGNORE INTO "+tablename+" (id, theta0) VALUES (?, ?)", [(i, data[i]) for i in nonzero_index])

    # Inserts data to new ids
    c.executemany("UPDATE "+tablename+" SET theta0=? WHERE id=?", [(data[i], i) for i in nonzero_index])

The problem is that nonzero_index list is typically very long, ~8million integers. This code is very slow, taking 4-5 hours for a ~8million indices. How can I speed this up? Would it be faster to pre-determine which indices already exist in the database vs which need to be new rows, e.g. using python sets?

Aucun commentaire:

Enregistrer un commentaire