Reorganizing a pandas dataframe by a repeating value
Question:
Thanks for the help! I’ve been scouring the site for similar questions, so sorry if this is a repeat, but I haven’t found anything similar.
But I have a dataframe coming down from a query with a ton of securities and a single data point over a number of dates. As you can see in the picture, the way the raw data comes down repeats the security over all the available dates with the results all in the same column at the end of the dataframe.
I want to see if I can transform the dataframe to make a column for each security with the dates as the index. I can do this with a for loop, but I was hoping there’d be something more elegant within the raw dataframe that someone might have an idea for.
I was trying some groupbys and some data slices on the ID column, but couldn’t think of a good way to transform the slices.
Thanks!
ID DATE SOURCE ID_DATE
0 NVTS US Equity 2022-03-15 ETF 2023-03-10
1 NVTS US Equity 2022-03-31 ETF 2023-03-10
2 NVTS US Equity 2022-04-14 ETF 2023-03-10
3 NVTS US Equity 2022-04-29 ETF 2023-03-10
4 NVTS US Equity 2022-05-13 ETF 2023-03-10
... ... ... ... ...
1762 BEEM US Equity 2023-01-13 ETF 2023-03-10
1763 BEEM US Equity 2023-01-31 ETF 2023-03-10
1764 BEEM US Equity 2023-02-15 ETF 2023-03-10
1765 BEEM US Equity 2023-02-28 ETF 2023-03-10
Answers:
You can try the pivot
function from pandas.
Something like this
pivot_df = df.pivot(columns='ID', index='DATE', values='SOURCE')
Thanks for the help! I’ve been scouring the site for similar questions, so sorry if this is a repeat, but I haven’t found anything similar.
But I have a dataframe coming down from a query with a ton of securities and a single data point over a number of dates. As you can see in the picture, the way the raw data comes down repeats the security over all the available dates with the results all in the same column at the end of the dataframe.
I want to see if I can transform the dataframe to make a column for each security with the dates as the index. I can do this with a for loop, but I was hoping there’d be something more elegant within the raw dataframe that someone might have an idea for.
I was trying some groupbys and some data slices on the ID column, but couldn’t think of a good way to transform the slices.
Thanks!
ID DATE SOURCE ID_DATE
0 NVTS US Equity 2022-03-15 ETF 2023-03-10
1 NVTS US Equity 2022-03-31 ETF 2023-03-10
2 NVTS US Equity 2022-04-14 ETF 2023-03-10
3 NVTS US Equity 2022-04-29 ETF 2023-03-10
4 NVTS US Equity 2022-05-13 ETF 2023-03-10
... ... ... ... ...
1762 BEEM US Equity 2023-01-13 ETF 2023-03-10
1763 BEEM US Equity 2023-01-31 ETF 2023-03-10
1764 BEEM US Equity 2023-02-15 ETF 2023-03-10
1765 BEEM US Equity 2023-02-28 ETF 2023-03-10
You can try the pivot
function from pandas.
Something like this
pivot_df = df.pivot(columns='ID', index='DATE', values='SOURCE')