I'm inserting ~500k items into an sqlite database using Python's sqlite3 module and execute many, chunking up the inserts:
c = self.conn.cursor()
for chunk in (values[i:i+CHUNK_SIZE] for i in xrange(0, len(values), CHUNK_SIZE)):
data = ((x[0], x[1].decode('utf-8', 'ignore')) for x in chunk)
try:
c.executemany('INSERT INTO posts VALUES (?,?)', data )
self.conn.commit()
except Exception, e:
print "EXCEPTION:", e, data
The data is not in my control and sometimes contains bad encodings, causing errors in the executemany:
You must not use 8-bit bytestrings unless you use a text_factory that can
interpret 8-bit bytestrings (like text_factory = str). It is highly recommended
that you instead just switch your application to Unicode strings.
I've tried various ways of decoding the text but the errors keep coming back.
I'd like to ignore the errors, skipping the items with the encoding problems (or store them partially; doesn't really matter) while storing the rest.
Is there a way to tell executemany to not bail out on inserting the rest of the batch when it runs into an error? executemany that ignores errors but stores the rest of the items?
Aucun commentaire:
Enregistrer un commentaire