Python: Create Dataframe where values match the column names
Question:
I would like to bring together 2 lists in one dataframe.
The first list I have contains various dates.
my_dates = ['20/12/2024', '31/08/2022', '19/08/2022', '21/06/2024']
The second list is a list of list that contains various stock names and dates.
my_stocks = [
["AAPL", ['20/12/2024', '31/08/2022']],
["MSFT", ['20/12/2024', '31/08/2022', '21/06/2024']],
["TSLA", []],
["META", ['20/12/2024', '31/08/2022', '19/08/2022', '21/06/2024']]
]
I would like to bring those two lists together in a dataframe where:
- The header / column names are the values of my_dates
- The first value of each row is the stock name
- The value of the "cells" is the date if it is equal to the column name
20/12/2024
31/08/2022
19/08/2022
21/06/2024
AAPL
20/12/2024
31/08/2022
MSFT
20/12/2024
31/08/2022
21/06/2024
TSLA
META
20/12/2024
31/08/2022
19/08/2022
21/06/2024
I thought about something like
new = []
for elem in my_stocks:
new.append(elem)
df = pd.DataFrame(new)
df = pd.concat([df[0],df[1].apply(pd.Series)],axis=1)
print(df)
But this doesn’t include the headers and doesn’t match the dates with the headers.
As you can see I am new to Python and any help is highly appreciated!
Many thanks
Mika
Answers:
here is an approach that creates each stock as it’s own pandas series and then uses pd.concat
to make them into a dataframe. it then uses the my_dates
to select which dates to use and transposes the table to match your desired output
import pandas as pd
my_dates = ['20/12/2024', '31/08/2022', '19/08/2022', '21/06/2024']
my_stocks = [["AAPL", ['20/12/2024', '31/08/2022']],["MSFT", ['20/12/2024', '31/08/2022', '21/06/2024']], ["TSLA", []], ["META", ['20/12/2024', '31/08/2022', '19/08/2022', '21/06/2024']]]
stock_df = pd.concat((pd.Series(ds, ds, name=n, dtype='object') for n,ds in my_stocks), axis=1)
stock_df = stock_df.loc[my_dates]
stock_df = stock_df.T
print(stock_df)
Usually in such cases it is better to put the data in the correct format using pure python and only then create a DataFrame with the processed data. It makes your code simpler, more readable and in general even more efficient than using a pandas
-based solution.
import pandas as pd
import numpy as np
my_dates = ['20/12/2024', '31/08/2022', '19/08/2022', '21/06/2024']
my_stocks = [
["AAPL", ['20/12/2024', '31/08/2022']],
["MSFT", ['20/12/2024', '31/08/2022', '21/06/2024']],
["TSLA", []],
["META", ['20/12/2024', '31/08/2022', '19/08/2022', '21/06/2024']]
]
stocks, data = zip(*my_stocks)
data = [[date if date in row else np.nan
for date in my_dates]
for row in data]
df = pd.DataFrame(data, index=stocks, columns=my_dates)
Output:
>>> df
20/12/2024 31/08/2022 19/08/2022 21/06/2024
AAPL 20/12/2024 31/08/2022 NaN NaN
MSFT 20/12/2024 31/08/2022 NaN 21/06/2024
TSLA NaN NaN NaN NaN
META 20/12/2024 31/08/2022 19/08/2022 21/06/2024
I would like to bring together 2 lists in one dataframe.
The first list I have contains various dates.
my_dates = ['20/12/2024', '31/08/2022', '19/08/2022', '21/06/2024']
The second list is a list of list that contains various stock names and dates.
my_stocks = [
["AAPL", ['20/12/2024', '31/08/2022']],
["MSFT", ['20/12/2024', '31/08/2022', '21/06/2024']],
["TSLA", []],
["META", ['20/12/2024', '31/08/2022', '19/08/2022', '21/06/2024']]
]
I would like to bring those two lists together in a dataframe where:
- The header / column names are the values of my_dates
- The first value of each row is the stock name
- The value of the "cells" is the date if it is equal to the column name
20/12/2024 | 31/08/2022 | 19/08/2022 | 21/06/2024 | |
---|---|---|---|---|
AAPL | 20/12/2024 | 31/08/2022 | ||
MSFT | 20/12/2024 | 31/08/2022 | 21/06/2024 | |
TSLA | ||||
META | 20/12/2024 | 31/08/2022 | 19/08/2022 | 21/06/2024 |
I thought about something like
new = []
for elem in my_stocks:
new.append(elem)
df = pd.DataFrame(new)
df = pd.concat([df[0],df[1].apply(pd.Series)],axis=1)
print(df)
But this doesn’t include the headers and doesn’t match the dates with the headers.
As you can see I am new to Python and any help is highly appreciated!
Many thanks
Mika
here is an approach that creates each stock as it’s own pandas series and then uses pd.concat
to make them into a dataframe. it then uses the my_dates
to select which dates to use and transposes the table to match your desired output
import pandas as pd
my_dates = ['20/12/2024', '31/08/2022', '19/08/2022', '21/06/2024']
my_stocks = [["AAPL", ['20/12/2024', '31/08/2022']],["MSFT", ['20/12/2024', '31/08/2022', '21/06/2024']], ["TSLA", []], ["META", ['20/12/2024', '31/08/2022', '19/08/2022', '21/06/2024']]]
stock_df = pd.concat((pd.Series(ds, ds, name=n, dtype='object') for n,ds in my_stocks), axis=1)
stock_df = stock_df.loc[my_dates]
stock_df = stock_df.T
print(stock_df)
Usually in such cases it is better to put the data in the correct format using pure python and only then create a DataFrame with the processed data. It makes your code simpler, more readable and in general even more efficient than using a pandas
-based solution.
import pandas as pd
import numpy as np
my_dates = ['20/12/2024', '31/08/2022', '19/08/2022', '21/06/2024']
my_stocks = [
["AAPL", ['20/12/2024', '31/08/2022']],
["MSFT", ['20/12/2024', '31/08/2022', '21/06/2024']],
["TSLA", []],
["META", ['20/12/2024', '31/08/2022', '19/08/2022', '21/06/2024']]
]
stocks, data = zip(*my_stocks)
data = [[date if date in row else np.nan
for date in my_dates]
for row in data]
df = pd.DataFrame(data, index=stocks, columns=my_dates)
Output:
>>> df
20/12/2024 31/08/2022 19/08/2022 21/06/2024
AAPL 20/12/2024 31/08/2022 NaN NaN
MSFT 20/12/2024 31/08/2022 NaN 21/06/2024
TSLA NaN NaN NaN NaN
META 20/12/2024 31/08/2022 19/08/2022 21/06/2024