I have created an SQLite database containing 32GB worth of financial data (I started from csv files and then used the to_sql command in pandas). More specifically, the database has over 550 million rows and 6 columns (3 string columns, 2 float columns and a date-time column). I want to perform a simple count by group query using pandas but this seems to take a very long time. I did the following:
import pandas as pd
from sqlalchemy import create_engine
disk_engine = create_engine('sqlite:///trade_data.db')
df = pd.read_sql_query('SELECT Type, COUNT(Type) as `number_of_trades`'
'FROM data '
'GROUP BY Type ', disk_engine)
However, when doing this on my machine (a standard laptop) this simple query takes about 1 hour and 10 minutes. Does anyone have an idea on how to speed this up? Thanks!
Aucun commentaire:
Enregistrer un commentaire