Selecting rows from a list or other iterable but in order
Question:
I have a dataframe that has a column named "ID"
I also have another dataframe with a list of ID values that I want to use.I can select a sub dataframe with the rows corresponding to the IDs in the list
For example
IDlist_df=pd.DataFrame({"v":[3,4,6,9]})
df=pd.DataFrame({"ID":[1,1,2,3,3,4,4,4,5,6,6,7,8,9],"name":['menelaus','helen','ulyses','paris','hector', 'priamus','hecuba','andromache','achiles','ascanius','eneas','ajax','nestor','helenus']})
selected_lines=df[df['ID'].isin(IDlist_df['v'])]
print(selected_lines)
With this I get
ID name
3 3 paris
4 3 hector
5 4 priamus
6 4 hecuba
7 4 andromache
9 6 ascanius
10 6 eneas
13 9 helenus
I got a sub dataframe with the rows with ID 3,4,6,9
So far so good.
However, if I want to maintain the order and I have
IDlist_df=pd.DataFrame({"v":[3,9,6,4]})
I get the same result as above.
How can I get something like
ID name
3 3 paris
4 3 hector
13 9 helenus
9 6 ascanius
10 6 eneas
5 4 priamus
6 4 hecuba
7 4 andromache
(You can see that the order 3,9,6,4 is being maintained)
Answers:
If values in IDlist_df.v
column are unique is possible use rename
with DataFrame.merge
:
df = IDlist_df.rename(columns={'v':'ID'}).merge(df, on='ID')
print (df)
ID name
0 3 paris
1 3 hector
2 9 helenus
3 6 ascanius
4 6 eneas
5 4 priamus
6 4 hecuba
7 4 andromache
Find way to preserve index
(selected_lines.reset_index().set_index('ID').loc[[3, 9, 6, 4]]
.reset_index().set_index('index').rename_axis(''))
result:
ID name
3 3 paris
4 3 hector
13 9 helenus
9 6 ascanius
10 6 eneas
5 4 priamus
6 4 hecuba
7 4 andromache
other way
sort_values with category
lst = [3, 9, 6, 4]
selected_lines.sort_values('ID', key=lambda x: pd.Categorical(x, categories=lst, ordered=True))
I have a dataframe that has a column named "ID"
I also have another dataframe with a list of ID values that I want to use.I can select a sub dataframe with the rows corresponding to the IDs in the list
For example
IDlist_df=pd.DataFrame({"v":[3,4,6,9]})
df=pd.DataFrame({"ID":[1,1,2,3,3,4,4,4,5,6,6,7,8,9],"name":['menelaus','helen','ulyses','paris','hector', 'priamus','hecuba','andromache','achiles','ascanius','eneas','ajax','nestor','helenus']})
selected_lines=df[df['ID'].isin(IDlist_df['v'])]
print(selected_lines)
With this I get
ID name
3 3 paris
4 3 hector
5 4 priamus
6 4 hecuba
7 4 andromache
9 6 ascanius
10 6 eneas
13 9 helenus
I got a sub dataframe with the rows with ID 3,4,6,9
So far so good.
However, if I want to maintain the order and I have
IDlist_df=pd.DataFrame({"v":[3,9,6,4]})
I get the same result as above.
How can I get something like
ID name
3 3 paris
4 3 hector
13 9 helenus
9 6 ascanius
10 6 eneas
5 4 priamus
6 4 hecuba
7 4 andromache
(You can see that the order 3,9,6,4 is being maintained)
If values in IDlist_df.v
column are unique is possible use rename
with DataFrame.merge
:
df = IDlist_df.rename(columns={'v':'ID'}).merge(df, on='ID')
print (df)
ID name
0 3 paris
1 3 hector
2 9 helenus
3 6 ascanius
4 6 eneas
5 4 priamus
6 4 hecuba
7 4 andromache
Find way to preserve index
(selected_lines.reset_index().set_index('ID').loc[[3, 9, 6, 4]]
.reset_index().set_index('index').rename_axis(''))
result:
ID name
3 3 paris
4 3 hector
13 9 helenus
9 6 ascanius
10 6 eneas
5 4 priamus
6 4 hecuba
7 4 andromache
other way
sort_values with category
lst = [3, 9, 6, 4]
selected_lines.sort_values('ID', key=lambda x: pd.Categorical(x, categories=lst, ordered=True))