jeudi 2 juillet 2015

Inner Joins: performance and addressing columns

I have an Amazon-like database products and copurchases (i.e. products that are often bought together) and would like find the 2-nearest neighbors of a certain product.

The two queries below give me the same result set, but the first one is much slower (25 seconds vs 2ms).

SELECT DISTINCT `product`.`asin` AS 'asin' FROM `product`
INNER JOIN `copurchased` AS 'primary' ON `primary`.`product_asin1` = `product`.`asin`
INNER JOIN `copurchased` AS 'secondary' ON `secondary`.`product_asin1` = `primary`.`product_asin2`
WHERE `secondary`.`product_asin2` = '0792838661'
ORDER BY `asin` ASC

SELECT DISTINCT `secondary`.`product_asin2` AS 'asin' FROM `product`
INNER JOIN `copurchased` AS 'primary' ON `primary`.`product_asin1` = `product`.`asin`
INNER JOIN `copurchased` AS 'secondary' ON `secondary`.`product_asin1` = `primary`.`product_asin2`
WHERE `product`.`asin` = '0792838661'
ORDER BY `asin` ASC

I would prefer the first one over the second one in case I wanted to know additional information about the neighbors, like their title.

Why is it so much slower than the first one?

In case it matters (and I think it does) product.asin is the primary key, copurchased.product_asin1 and copurchased.product_asin2 are both foreign keys, while the combination of the two is the table's primary key.


Bonus question: say I would like to exclude the starting product from the result set in the first query. I cannot use AND `asin` != '0792838661', or I get an empty result. I guess it assumes I mean asin from product, even though I do not. How can I tell the interpreter I want the column I renamed to asin within the result set?

Aucun commentaire:

Enregistrer un commentaire