I have look around to find an answer for this question to make sure I avoid duplicates. I hope I have not overlooked anything.
I have around ~60 tables with each 2 columns. An ID and a corresponding value. The IDs are overlapping across tables and each table has around ~2mill rows. I will need to do some statistics on each ID. For example mean(table1.value, table2.value, table3.value ....).
My question is essentially. What is the fastest way to do this.
I have tried 2 approaches so far:
- Merging: I merge table1 and table2 to tableA, table3 and table4 to tableB, then tableA and tableB....and so on.
With this approach I pretty fast reach a memory limit.
- Going through each id: I use an index of all IDs and go through each table ala
SELECT value FROM tableA WHERE id=row[id]
This just takes ages and I would not finish the query within the next month. And from this question it seems I should prefer joins.
Thus another question: Would a partial join with subsequent stacking of the resulting merged tables be more efficient?
Aucun commentaire:
Enregistrer un commentaire