vendredi 21 août 2015

Pandas for large(r) datasets

I have a rather complex database which I deliver in CSV format to my client. The logic to arrive at that database is an intricate mix of Python processing and SQL joins done in sqlite3.

There are ~15 source datasets ranging from a few hundreds records to as many as several million (but fairly short) records.

Instead of having a mix of Python / sqlite3 logic, for clarity, maintainability and several other reasons I would love to move ALL logic to an efficient set of Python scripts and circumvent sqlite3 altogether.

I understand that the answer and the path to go would be Pandas, but could you please advise if this is the right track for a rather large database like the one described above?

Aucun commentaire:

Enregistrer un commentaire