jeudi 25 juin 2015

How to use an auto-incrementing integer primary key to combine multiple files?

How do you set up a valid auto-incrementing integer primary key on a table if you want to join it with separate files? I get data like this on a daily basis:

Interaction data:

Date | PersonID | DateTime | CustomerID | Other values...

The primary key there would be PersonID + DateTime + CustomerID. If I have an integer key, how can I get that to relate back to another table? I want to know the rows where a specific person interacted with a specific customer so I can tie back those pieces of data together into one master-file.

Survey return data:

Date | PersonID | DateTime | CustomerID | Other values...

I am normally processing all raw data first in pandas before loading it into a database. Some other files also do not have a datetime stamp and only have a date. It is rare for one person to interact with the same customer on the same day so I normally drop all rows where there are duplicates (all instances) so my sample of joins are just purely unique.

Other Data:

Date | PersonID | CustomerID | Other values...

I can't imagine how I can set it up so I know row 56,547 on 'Interaction Data' table matches with row 10,982 on 'Survey Return Data' table. Or should I keep doing it the way I am with a composite key of three columns?

Aucun commentaire:

Enregistrer un commentaire