mercredi 23 mars 2016

Generate database diagram by looking at data only

For this question lets assume we are dealing with a SQLite database without any foreign keys defined on the tables. But ideally I would like to create a generic tool that might be used on any database.

Would it be possible to guess what the foreign keys are based on the data?

My idea is to scan the entire database and create a table (lets call this table "hashes") with columns:

hash, tableName, columnName

Where hash is a hash of the cell value found at tableName.columnName for each cell in the database. The hash column is the primary key of hashes.

My question is, say I have tableNames = ["table1", "table2"] would it be possible to write a query or queries that joins hashes on itself and use a count to figure out if there are any columnNames that are likely to be good candidates to join the given tableNames?

Has it been done before? Is there a better way to approach the problem?

Consider some example data:

create table hashes(hash, tableName, columnName);

insert into hashes (hash, tableName, columnName) values
(1, 'table1', 'a'),
(2, 'table1', 'a'),
(3, 'table1', 'a'),
(4, 'table1', 'a'),
(5, 'table1', 'a'),
(5, 'table2', 'a'),
(3, 'table2', 'a'),
(1, 'table2', 'a'),
(2, 'table2', 'b'),
(4, 'table2', 'b'),
(6, 'table2', 'c'),
(7, 'table2', 'c');

select 
    table1.columnName as table1_columns, 
    table2.columnName as table1_columns, 
    count(*) as joins 
from hashes table1
join hashes table2 on table1.hash = table2.hash
where table1.tableName = 'table1'
and table2.tableName = 'table2'
group by table1.columnName, table2.columnName;

The result is:

table1_column, table2_columns, joins
a,             a,              3
a,             b,              2

I can see the likeliest column that joins table1 and table2 is a.

Aucun commentaire:

Enregistrer un commentaire