Python: Create Dataframe where values match the column names

Question:

I would like to bring together 2 lists in one dataframe.

The first list I have contains various dates.

my_dates = ['20/12/2024', '31/08/2022', '19/08/2022', '21/06/2024']

The second list is a list of list that contains various stock names and dates.

my_stocks = [
    ["AAPL", ['20/12/2024', '31/08/2022']],
    ["MSFT", ['20/12/2024', '31/08/2022', '21/06/2024']], 
    ["TSLA", []], 
    ["META", ['20/12/2024', '31/08/2022', '19/08/2022', '21/06/2024']]
]

I would like to bring those two lists together in a dataframe where:

  • The header / column names are the values of my_dates
  • The first value of each row is the stock name
  • The value of the "cells" is the date if it is equal to the column name
20/12/2024 31/08/2022 19/08/2022 21/06/2024
AAPL 20/12/2024 31/08/2022
MSFT 20/12/2024 31/08/2022 21/06/2024
TSLA
META 20/12/2024 31/08/2022 19/08/2022 21/06/2024

I thought about something like

new = []

for elem in my_stocks:
    new.append(elem)


df = pd.DataFrame(new)

df = pd.concat([df[0],df[1].apply(pd.Series)],axis=1)

print(df)

But this doesn’t include the headers and doesn’t match the dates with the headers.

As you can see I am new to Python and any help is highly appreciated!

Many thanks
Mika

Asked By: mika579

||

Answers:

here is an approach that creates each stock as it’s own pandas series and then uses pd.concat to make them into a dataframe. it then uses the my_dates to select which dates to use and transposes the table to match your desired output

import pandas as pd

my_dates = ['20/12/2024', '31/08/2022', '19/08/2022', '21/06/2024']
my_stocks = [["AAPL", ['20/12/2024', '31/08/2022']],["MSFT", ['20/12/2024', '31/08/2022', '21/06/2024']], ["TSLA", []], ["META", ['20/12/2024', '31/08/2022', '19/08/2022', '21/06/2024']]]

stock_df = pd.concat((pd.Series(ds, ds, name=n, dtype='object') for n,ds in my_stocks), axis=1)
stock_df = stock_df.loc[my_dates]

stock_df = stock_df.T
print(stock_df)

enter image description here

Answered By: mitoRibo

Usually in such cases it is better to put the data in the correct format using pure python and only then create a DataFrame with the processed data. It makes your code simpler, more readable and in general even more efficient than using a pandas-based solution.

import pandas as pd
import numpy as np 

my_dates = ['20/12/2024', '31/08/2022', '19/08/2022', '21/06/2024']

my_stocks = [
    ["AAPL", ['20/12/2024', '31/08/2022']],
    ["MSFT", ['20/12/2024', '31/08/2022', '21/06/2024']], 
    ["TSLA", []], 
    ["META", ['20/12/2024', '31/08/2022', '19/08/2022', '21/06/2024']]
]

stocks, data = zip(*my_stocks)

data = [[date if date in row else np.nan 
         for date in my_dates] 
         for row in data]

df = pd.DataFrame(data, index=stocks, columns=my_dates)

Output:

>>> df

      20/12/2024  31/08/2022  19/08/2022  21/06/2024
AAPL  20/12/2024  31/08/2022         NaN         NaN
MSFT  20/12/2024  31/08/2022         NaN  21/06/2024
TSLA         NaN         NaN         NaN         NaN
META  20/12/2024  31/08/2022  19/08/2022  21/06/2024
Answered By: Rodalm
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.