sqlite: Appending rows to sqlite from Pandas takes non-linear time

lundi 30 mars 2015

Appending rows to sqlite from Pandas takes non-linear time

I'm writing a large CSV (100M rows) to a sqlite database through Pandas in 10k chunks.


chunksize = 10000

for df in pd.read_csv('100M_rows.csv', chunksize=chunksize, iterator=True, encoding='utf-8'):
    df.index += index_start
    df.to_sql('data', disk_engine, if_exists='append')
    index_start += df.index[-1] + 1

From Sqlite appending data performance linear degradation, is this solvable?, I thought that incrementing the df.index would solve this issue. However, I'm still seeing non-linear writing speeds after writing 1M rows: Non-linear write speeds from SQLite to Pandas

Is there something else that I need to do to improve the appending performance from Pandas to sqlite?

Aucun commentaire:

Enregistrer un commentaire