lundi 30 mars 2015

Appending rows to sqlite from Pandas takes non-linear time

I'm writing a large CSV (100M rows) to a sqlite database through Pandas in 10k chunks.



chunksize = 10000

for df in pd.read_csv('100M_rows.csv', chunksize=chunksize, iterator=True, encoding='utf-8'):
df.index += index_start
df.to_sql('data', disk_engine, if_exists='append')
index_start += df.index[-1] + 1


From Sqlite appending data performance linear degradation, is this solvable?, I thought that incrementing the df.index would solve this issue. However, I'm still seeing non-linear writing speeds after writing 1M rows: Non-linear write speeds from SQLite to Pandas


Is there something else that I need to do to improve the appending performance from Pandas to sqlite?


Aucun commentaire:

Enregistrer un commentaire