jeudi 7 mai 2015

Pandas: speeding up a query to a large SQLite database

I have created an SQLite database containing 32GB worth of financial data (I started from csv files and then used the to_sql command in pandas). More specifically, the database has over 550 million rows and 6 columns (3 string columns, 2 float columns and a date-time column). I want to perform a simple count by group query using pandas but this seems to take a very long time. I did the following:

import pandas as pd
from sqlalchemy import create_engine

disk_engine = create_engine('sqlite:///trade_data.db') 
df = pd.read_sql_query('SELECT Type, COUNT(Type) as `number_of_trades`'
                   'FROM data '
                   'GROUP BY Type ', disk_engine)  

However, when doing this on my machine (a standard laptop) this simple query takes about 1 hour and 10 minutes. Does anyone have an idea on how to speed this up? Thanks!

Aucun commentaire:

Enregistrer un commentaire