I have some array data (in python) which I want to add to an SQL database using the data index as its primary key. I am doing this for multiple columns (shown below for a column named theta0
but really I'm stepping through columns named theta0
, theta1
, etc.) and each time I would like the data to either be added to the existing row (if the primary key exists) or create a new row (if the primary key does not exist). Here's the meat of the code, where nonzero_index
is a list of indices.
# Either inserts new ID with given value or ignores if id already exists
c.executemany("INSERT OR IGNORE INTO "+tablename+" (id, theta0) VALUES (?, ?)", [(i, data[i]) for i in nonzero_index])
# Inserts data to new ids
c.executemany("UPDATE "+tablename+" SET theta0=? WHERE id=?", [(data[i], i) for i in nonzero_index])
The problem is that nonzero_index
list is typically very long, ~8million integers. This code is very slow, taking 4-5 hours for a ~8million indices. How can I speed this up? Would it be faster to pre-determine which indices already exist in the database vs which need to be new rows, e.g. using python sets?
Aucun commentaire:
Enregistrer un commentaire