Is there a code to do this in pandas simply? Or perhaps I should try SQL (I am not very familiar with SQL). Here is what I have so far (fake example, the real one has about 20,000 individuals)
employee_df =
EmpID Name Title ManagerID
abc123 John Head pqr456
pqr456 Jake VP bs92999
zyx987 Jill Lead abc123
bs92999 Bob SVP NaN
Duplicate the dataframe: manager_df = employee_df
roster = pd.merge(manager_df, employee_df, how='outer', left_on ='ManagerID', right_on = 'EmpID')
My output is messy, although seemingly correct (it is telling me who the manager is without having separate tables by referring to itself).
EmpID_x Name_x Title_x ManagerID_x EmpID_y Name_y Title_y ManagerID_y
abc123 John Head pqr456 pqr456 Jake VP bs92999
pqr456 Jake VP bs92999 bs92999 Bob SVP NaN
zyx987 Jill Lead abc123 abc123 John Head pqr456
bs92999 Bob SVP NaN NaN NaN NaN NaN
NaN NaN NaN NaN zyx987 Jill Lead abc123
The most common desired output would be:
EmpID | Name | Title | Manager_Name
But sometimes I would need another level too (the boss of the boss), with a max potential of about 5 layers which seems crazy but there is a lot of hierarchy - such high layers is not necessary all of the time but I would like to be able to roll this data up to a higher level if needed:
EmpID | Name | Title | Manager_Name_1 | Manager_Name_2
The third dataframe is reporting_df:
EmpID | ManagerID | StartDate | EndDate
Sometimes there are management changes that happen even mid-month and the results should reflect on the manager who 'owned' that employee on that date.
file = any file or report that has the EmpID where I might want to find out who the manager (or their manager) was on the date which is also included on the file. Is this the right approach to solve for this?
for i in range(len(file)):
file.ix[i,'Manager'] = reporting_df[(reporting_df.StartDate.shift(-1) > file.StartDate[i]) &(reporting_df.StartDate <= file.Date[i])]
Aucun commentaire:
Enregistrer un commentaire