mardi 24 mars 2015

SQLite query performance: UPDATE vs JOIN

I have been fighting with this for a few weeks now -- as you can see here and here. Although the answers I got there are correct, they don't work for me because of performance issues. More below.


I am using SQLite. I have a table where rows represent individuals in a point in time. I am trying create a column (statusLag) with the lagged values of another column (status), like so:



workerID timeVar status statusLag
---------------------------------------------
1 1 0 NULL
1 2 1 0
1 3 1 1
1 4 1 1
----------------------------------------------
2 3 1 NULL
2 4 0 1
2 5 1 0
2 6 0 1


What I have tried


Approach 1: UPDATE



UPDATE myTable
SET statusLag = (SELECT t2.status
FROM myTable t2
WHERE t2.workerID = myTable.workerID AND
t2.timeVar < myTable.timeVar
ORDER BY t2.timeVar desc
LIMIT 1
);


This does the job in the tests I ran. However, I have a very large table -- 3.2 billion rows. I could work for now with one table of 300 million rows, this query uses way too much RAM, more than what my machine (12GB) can handle. So, for all (my) practical purposes, this doesn't work.


Approach 2: JOIN


It seems to me that I could do a JOIN. Something along the lines of



SELECT t1.*, t2.status as statusLag FROM myTable AS t1
LEFT JOIN myTable AS t2
ON t1.workerID=t2.workerID AND t1.timeVar=t2.timeVar+1
ORDER BY t1.workerID, t1.timeVar ;


I am not clear whether this will do the job or not. Also, this is the firs time I do JOIN, and it is my impression that this query alone won't insert or update anything on myTable. Do I need to complement this query to accomplish what I explained at the beginning of my post?


Any thoughts, any help, much appreciated. I have been struggling with this for two weeks now, and I need to get this done.


Aucun commentaire:

Enregistrer un commentaire