lundi 2 février 2015

Compound SQLite SELECT query using DISTINCT and aggregator across multiple columns

I'm not entirely sure how I can put together a single SQLite query to achieve the following. I can get bits and pieces to work, but can't seem to meld it all into one single one.


I have a table that looks like this (the real data set is several GB):



| ID | ColumnA | ColumnB | ColumnC | ColumnD |
| 1 | 21 | 34 | 10 | 0.12654 |
| 2 | 21 | 34 | 20 | 0.25478 |
| 3 | 21 | 46 | 10 | 0.43564 |
| 4 | 21 | 46 | 20 | 1.02487 |
| 5 | 34 | 21 | 10 | 0.01476 |
| 6 | 34 | 21 | 20 | 0.87265 |
| 7 | 34 | 46 | 10 | 0.46478 |
| 8 | 34 | 46 | 20 | 0.13665 |
| 9 | 46 | 21 | 10 | 0.04189 |
| 10 | 46 | 21 | 20 | 0.91754 |
| 11 | 46 | 34 | 10 | 0.73688 |
| 12 | 46 | 34 | 20 | 0.24299 |


This data is generated by a nested do-loop for some simulations I'm carrying out.


From this table, I essentially need to extract a table that looks like the following, so that I can plot a color/heatmap.



| ID | ColumnA | ColumnB | ColumnC | ColumnD |
| 1 | 21 | 34 | 10 | 0.12654 |
| 3 | 21 | 46 | 10 | 0.43564 |
| 5 | 34 | 21 | 10 | 0.01476 |
| 8 | 34 | 46 | 20 | 0.13665 |
| 9 | 46 | 21 | 10 | 0.04189 |
| 12 | 46 | 34 | 20 | 0.24299 |


So this will enable me to make a colormap based on ColumnD values using a 2D-array (the diagonal should be set to zero, as the values of ColumnA and ColumnB are never equal for a given row; hence data for the diagonal is not present in the database):



| 21 | 34 | 46
------------------
21 | 0 | |
------------------
34 | | 0 |
------------------
46 | | | 0


My question is basically how I can set up a single query to aggregate all the data that goes into the 2D array to generate the colormap.


It's important to note that ColumnA and ColumnB are basically populated by the same set of integers. I can obtain the unique list of integer values using DISTINCT. I've found some SO threads about selecting DISTINCT across multiple columns, however none of the examples show how to use an aggregator to pick values in the other columns. In this case, I want to use min() to select the lowest value in ColumnD, for each pair of ids in ColumnA and ColumnB. Selecting DISTINCT on a single column doesn't work, because it's the coordinate pair (ColumnA,ColumnB) that's distinct.


Any help would be greatly appreciated!


Aucun commentaire:

Enregistrer un commentaire