jeudi 5 février 2015

Retrieving and plotting ordered, 2D heatmap data from a SQLite database

I've been trying to grapple with this most of this week and I figured I'm probably best placed to ask for some assistance here. I have 2D array data stored in a SQLite database, which I would like to fetch and visualize. There are a number of constraints on how I need to fetch and plot the data though that I'm not sure how to handle.


So, I have a database with a table containing something like the following:



| ID | SourceID | TargetID | Parameter | Values |
| 1 | 21 | 34 | 23.46513 | 0.12654 |
| 2 | 21 | 34 | 23.46513 | 0.25478 |
| 3 | 21 | 46 | 23.46513 | 0.43564 |
| 4 | 21 | 46 | 23.46513 | 1.02487 |
| 5 | 34 | 21 | 14.56319 | 0.01476 |
| 6 | 34 | 21 | 14.56319 | 0.87265 |
| 7 | 34 | 46 | 14.56319 | 0.46478 |
| 8 | 34 | 46 | 14.56319 | 0.13665 |
| 9 | 46 | 21 | 7.99581 | 0.04189 |
| 10 | 46 | 21 | 7.99581 | 0.91754 |
| 11 | 46 | 34 | 7.99581 | 0.73688 |
| 12 | 46 | 34 | 7.99581 | 0.24299 |


Some features of this data set to note:



  • ID is the key and is unique

  • The SourceID and TargetID values are from the same set. In this case, they both contain {21, 34, 46}. The SourceID column constitutes the x-axis data for the heatmap and the TargetID column constitutes the y-axis data.

  • The Values column contains the data that has to be plotted in the heatmap.

  • The Parameter column is the column I need to use to sort the data, since I need to create an ordered heatmap. In this case, Parameter is based on values associated with each SourceID and TargetID.


The following table shows the association of each SourceId/TargetId with a Parameter value:



| SourceID/TargetID | Parameter |
| 46 | 7.99581 |
| 34 | 14.56319 |
| 21 | 23.46513 |


Based on another question that I posed, I know that I can reduce this table to unique combinations of (SourceId, TargetId) based on the minimum of Values using a SQL query like:



SELECT SourceID, TargetID, min(Values)
FROM dataset
GROUP BY SourceID, TargetID;


This yields the following reduced table:



| ID | SourceID | TargetID | SourceSort | Values |
| 1 | 21 | 34 | 23.46513 | 0.12654 |
| 3 | 21 | 46 | 23.46513 | 0.43564 |
| 5 | 34 | 21 | 14.56319 | 0.01476 |
| 8 | 34 | 46 | 14.56319 | 0.13665 |
| 9 | 46 | 21 | 7.99581 | 0.04189 |
| 12 | 46 | 34 | 7.99581 | 0.24299 |


The bit I'm struggling with is that I now need to order this table based on the values in Parameter. If I add ORDER BY Parameter to the SQL query, it successfully orders the SourceID column, but the TargetID column doesn't sort according to this for rows that have the same SourceID. In a nutshell, I want to obtain the following table:



| ID | SourceID | TargetID | Values |
| 12 | 46 | 34 | 0.24299 |
| 9 | 46 | 21 | 0.04189 |
| 8 | 34 | 46 | 0.13665 |
| 5 | 34 | 21 | 0.01476 |
| 3 | 21 | 46 | 0.43564 |
| 1 | 21 | 34 | 0.12654 |


An example of what's underlying this sorting: the table is firstly sorted based on SourceID ordering using Parameter values. Subsequently, TargetID is sorted based on Parameter ordering, for rows where the SourceID values are equal, e.g., Row 1 (ID=8) is sorted before Row 2 (ID=5) because TargetID = 46 should be sorted before TargetID = 21 based on the associated Parameter values.


Finally, with this table, I need to construct a heatmap:



(TargetID ordered by `Parameter ASC`)
^
|
|
----------------------------------
21 | 0.04189 | 0.01476 | 0 |
----------------------------------
34 | 0.24299 | 0 | 0.12654 |
----------------------------------
46 | 0 | 0.13665 | 0.43564 |
----------------------------------
| 46 | 34 | 21 --> (SourceID ordered by `Parameter ASC`)


To achieve this, I'm using the Python Pandas library, and executing the SQL query using the read_sql() function. I've discovered that I can use matplotlib's pcolor() function to plot the heatmap. I'd like to plot the values in the 2D array using a colormap, and I'd like the ordering of the data set to be exactly like above. Finally, I'd then like to plot the SourceID and TargetID values as tick labels.


This seems like something that should be doable, but so far, I've only been able to generate the heatmap without the ordering according to Parameter. I don't know if I'm best placed to solve the ordering bit directly in the SQL statement, or if I should just fetch the data and then sort the DataFrame that is returned by read_sql().


In any case, I'd greatly appreciate input! If anything is unclear, please let me know and I'll try to clarify.


Thanks!


Aucun commentaire:

Enregistrer un commentaire