lundi 24 août 2015

Aggregate group after selecting distinct column combinations

I have a table with ~500 million rows and 49 columns of various data types with a composite index on a non-nullable columns user_id and order_id in a SQLite database.

Multiple rows appear for the same user_id and order_id corresponding to the unique items within an order. For instance an order might look like this (excluding non-relevant columns):

  USER_ID      ORDER_ID  ORDER_TOTAL ORDER_DATE  CATEGORY  ITEM_TOTAL
---------  ------------  ----------- ----------  --------  ----------
094f077b2  47f12c149808       330.41 2015-03-01      BEER       20.30
094f077b2  47f12c149808       330.41 2015-03-01      BEER       36.23 
094f077b2  47f12c149808       330.41 2015-03-01      BEER       55.03 

As you can see, there are missing items from the order since the sum of the item_total does not equal the order_total.

I would like to extract the unique values of order_date, order_total and order_category for each order_id in order to compute a valid category total by date.

I have come up with the two most obvious solutions:

SELECT category, order_date, SUM(order_total) 
  FROM (
    SELECT DISTINCT category, order_id, order_date, order_total
      FROM order_history
    )
  GROUP BY category, order_date
;

and

SELECT category, order_date, SUM(order_total) 
  FROM (
    SELECT category, order_id, order_date, AVG(order_total)
      FROM order_history
      GROUP BY category, order_id, order_date
    )
  GROUP BY category, order_date
;

Where the AVG() could be MIN() or MAX() as well to achieve the same result. Is any one of these preferred over the other or is there a better way in general to achieve this?

Aucun commentaire:

Enregistrer un commentaire