samedi 3 octobre 2015

Cannot open huge SQLite database (300GB)

I had around 300GBs of CSVs on a network I needed to analyse so I quickly wrote this to append them into one table in SQLite:

def push_csvs_to_sql():
    #Connect to SQLite3
    con = sqlite3.connect(db_path + 'my_database.sqlite')
    count = 0
    for path in paths:
        #All CSVs in path
        for i in os.listdir(path):
            if i.endswith(".csv") & i.startswith("pro_"):
                my_csv = pd.read_csv(os.path.join(path, i), header=None, sep='~')
                my_csv.columns = [
                    'A',
                    'B',
                    'etc'
                ]
                my_csv.to_sql('my_table', con, if_exists='append', index=False)

    # Disconnect
    con.close()

I was able to run a few queries (group-by) when the database was small (e.g. after only 2 files), however after all the 25 CSVs were read-in the database became unusable.

Whenever I submit a command or use a GUI (like the firefox extension) to browse the data everything freezes. I see a lot of disk-read activity go on for maybe an hour however then everything dies.

Did I do something wrong in the script I wrote? Or is SQLite not meant for such large flat databases.?

Aucun commentaire:

Enregistrer un commentaire