mardi 5 janvier 2016

Add next unique value to SQL column

I have two tables which I am trying to join based on two criteria. One of the criteria is that a date from t1 is between a date in t2 and the next date in t2. The other is that the name from t1 matches the name from t2.

I.e. if t2 looks like this:

Record     Name     Date
1          A1234    2016-01-03 04:58:00 
2          A1234    2015-12-15 08:34:00
3          A5678    2016-01-04 03:14:00
4          A1234    2016-01-05 21:06:00

Then:

  • Any records from t1 for Name A1234 with dates between 2016-01-03 04:58:00 and 2016-01-05 21:06:00 would be joined to record 1.
  • Any records from t1 for Name A1234 with dates between 2015-12-15 08:34:00 and 2016-01-03 04:58:00 would be joined to record 2
  • Any records from t1 for A1234 after the date of record 4 would be joined to record 4
  • Any records from t1 for A5678 would be joined to record 3 because there's only one date.

My initial approach is to use a correlated subquery to find the next date. However, due to a large number of records, I determined this would take over a year to execute because it searches all of t2 for the next later date during each iteration. Original SQLite:

CREATE TABLE outputtable AS SELECT * FROM t1, t2 d
WHERE t1.Name = d.Name AND t1.Date BETWEEN d.Date AND (
    SELECT * FROM (
        SELECT Date from t2
        WHERE t2.Name = d.Name
        ORDER BY Date ASC )
    WHERE Date > d.Date
LIMIT 1 )

Now, I would like to find the next date only once for all records in t2 and create a new column in t2 that contains the next date. This way, I only search for the next date about 400,000 times instead of 56 billion times, significantly improving my performance.

Thus the output of the query I'm looking for would make t2 look like this:

Record     Name     Date                    Next_Date
1          A1234    2016-01-03 04:58:00     2016-01-05 21:06:00
2          A1234    2015-12-15 08:34:00     2016-01-03 04:58:00
3          A5678    2016-01-04 03:14:00     2999-12-31 23:59:59
4          A1234    2016-01-05 21:06:00     2999-12-31 23:59:59 

Then I would be able to simply query whether t1.Date is between t2.Date and t2.Next_Date.

How can I build a query that will add the next date to a new column in t2?

Aucun commentaire:

Enregistrer un commentaire