How to optimize iterating a Dataframe?
Question:
I want to groupby dataframe column values based on ID and I have the following:
data = {
"ID": ["1", "1", "2"],
"start": [5, 6, 30],
"end": [10,20,50],
"label": ["age", "gender", "history"]
}
df = pd.DataFrame(data)
The result i’m looking for is something like this:
Dict = {{'ID': 1, 'Labels': [[5,10,'age'], [6,20,"gender"]]}, {'ID': 2, 'Labels': [[30,50,'history']]} }
I tried multiple approaches and they all take extremally long, is there a way to optimize this code?
inner_dict = {}
inner_list = []
middle_list = []
labels_dict= []
for idx in df.index:
ID = df['ID'][idx]
for ddx in df.index:
if (df['ID'][ddx] == ID):
inner_list = [df['start'][ddx], df['end'][ddx],df['Label'][ddx]]
middle_list.append(inner_list)
else :
inner_dict = ({'ID': ID, 'Label': middle_list})
# print (inner_dict)
# inner_dict = ({'Label': inner_list})
labels_dict.append(inner_dict)
Answers:
Group the dataframe inside a list comprehension and collect the Labels for each ID
s = df.set_index('ID')
[{'ID': k, 'Labels': g.values.tolist()} for k, g in s.groupby('ID')]
Result
[{'ID': '1', 'Labels': [[5, 10, 'age'], [6, 20, 'gender']]},
{'ID': '2', 'Labels': [[30, 50, 'history']]}]
I want to groupby dataframe column values based on ID and I have the following:
data = {
"ID": ["1", "1", "2"],
"start": [5, 6, 30],
"end": [10,20,50],
"label": ["age", "gender", "history"]
}
df = pd.DataFrame(data)
The result i’m looking for is something like this:
Dict = {{'ID': 1, 'Labels': [[5,10,'age'], [6,20,"gender"]]}, {'ID': 2, 'Labels': [[30,50,'history']]} }
I tried multiple approaches and they all take extremally long, is there a way to optimize this code?
inner_dict = {}
inner_list = []
middle_list = []
labels_dict= []
for idx in df.index:
ID = df['ID'][idx]
for ddx in df.index:
if (df['ID'][ddx] == ID):
inner_list = [df['start'][ddx], df['end'][ddx],df['Label'][ddx]]
middle_list.append(inner_list)
else :
inner_dict = ({'ID': ID, 'Label': middle_list})
# print (inner_dict)
# inner_dict = ({'Label': inner_list})
labels_dict.append(inner_dict)
Group the dataframe inside a list comprehension and collect the Labels for each ID
s = df.set_index('ID')
[{'ID': k, 'Labels': g.values.tolist()} for k, g in s.groupby('ID')]
Result
[{'ID': '1', 'Labels': [[5, 10, 'age'], [6, 20, 'gender']]},
{'ID': '2', 'Labels': [[30, 50, 'history']]}]