I had around 300GBs of CSVs on a network I needed to analyse so I quickly wrote this to append them into one table in SQLite:
def push_csvs_to_sql():
#Connect to SQLite3
con = sqlite3.connect(db_path + 'my_database.sqlite')
count = 0
for path in paths:
#All CSVs in path
for i in os.listdir(path):
if i.endswith(".csv") & i.startswith("pro_"):
my_csv = pd.read_csv(os.path.join(path, i), header=None, sep='~')
my_csv.columns = [
'A',
'B',
'etc'
]
my_csv.to_sql('my_table', con, if_exists='append', index=False)
# Disconnect
con.close()
I was able to run a few queries (group-by) when the database was small (e.g. after only 2 files), however after all the 25 CSVs were read-in the database became unusable.
Whenever I submit a command or use a GUI (like the firefox extension) to browse the data everything freezes. I see a lot of disk-read activity go on for maybe an hour however then everything dies.
Did I do something wrong in the script I wrote? Or is SQLite not meant for such large flat databases.?
Aucun commentaire:
Enregistrer un commentaire