I'm writing a large CSV (100M rows) to a sqlite database through Pandas in 10k chunks.
chunksize = 10000
for df in pd.read_csv('100M_rows.csv', chunksize=chunksize, iterator=True, encoding='utf-8'):
df.index += index_start
df.to_sql('data', disk_engine, if_exists='append')
index_start += df.index[-1] + 1
From Sqlite appending data performance linear degradation, is this solvable?, I thought that incrementing the df.index
would solve this issue. However, I'm still seeing non-linear writing speeds after writing 1M rows:
Is there something else that I need to do to improve the appending performance from Pandas to sqlite?
Aucun commentaire:
Enregistrer un commentaire