Write "null" if column doesn't exist with KeyError: "['Column'] not in index" in df.to_csv?

Question:

I am getting KeyError: "['CashFinancial'] not in index" on the df.to_csv line because ‘GOOG’ doesn’t have the CashFinancial column. How can I have it write in null for the CashFinancial value for ‘GOOG’?

import pandas as pd
from yahooquery import Ticker
symbols = ['AAPL','GOOG','MSFT'] #This will be 75,000 symbols.
header = ["asOfDate","CashAndCashEquivalents","CashFinancial","CurrentAssets","TangibleBookValue","CurrentLiabilities","TotalLiabilitiesNetMinorityInterest"]

for tick in symbols:
    faang = Ticker(tick)
    faang.balance_sheet(frequency='q')
    df = faang.balance_sheet(frequency='q')
    df.to_csv('output.csv', mode='a', index=True, header=False, columns=header)
Asked By: HTMLHelpMe

||

Answers:

What about :

if tick == "GOOG"
    df.loc[:,"CashFinancial"] = None

To set an entire CashFinancial column to "None" only if your "tick" was GOOG, before writing it to csv.

The full code from the example you posted would he something like :

import pandas as pd
from yahooquery import Ticker
symbols = ['AAPL','GOOG','MSFT']
header = ["asOfDate","CashAndCashEquivalents","CashFinancial","CurrentAssets","TangibleBookValue","CurrentLiabilities","TotalLiabilitiesNetMinorityInterest"]

for tick in symbols:
    faang = Ticker(tick)
    faang.balance_sheet(frequency='q')
    df = faang.balance_sheet(frequency='q')#,{"symbol":[1],"asOfDate":[2],"CashAndCashEquivalents":[3],"CashFinancial":[4],"CurrentAssets":[5],"TangibleBookValue":[6],"CurrentLiabilities":[7],"TotalLiabilitiesNetMinorityInterest":[8],"marketCap":[9]}
    for column_name in header :
        if not column_name in df.columns :
            #Here, if any column is missing from the names you defined 
            #in your "header" variable, we add this column and set all 
            #it's row values to None
            df.loc[:,column_name  ] = None
    
    df.to_csv('output.csv', mode='a', index=True, header=False, columns=header)

Answered By: Osamoele

Load all dataframes into a list, then use pd.concat (it will create NaN in missing columns):

import pandas as pd
from yahooquery import Ticker

symbols = ["AAPL", "GOOG", "MSFT"]
header = [
    "asOfDate",
    "CashAndCashEquivalents",
    "CashFinancial",
    "CurrentAssets",
    "TangibleBookValue",
    "CurrentLiabilities",
    "TotalLiabilitiesNetMinorityInterest",
]

all_dfs = []
for tick in symbols:
    faang = Ticker(tick)
    df = faang.balance_sheet(frequency="q")
    all_dfs.append(df)

df = pd.concat(all_dfs)

for symbol, g in df.groupby(level=0):
    print(symbol)
    print(g[header])
    # to save to CSV:
    # g[header].to_csv('filename.csv')
    print("-" * 80)

Prints:

AAPL
         asOfDate  CashAndCashEquivalents  CashFinancial  CurrentAssets  TangibleBookValue  CurrentLiabilities  TotalLiabilitiesNetMinorityInterest
symbol                                                                                                                                             
AAPL   2021-09-30            3.494000e+10   1.730500e+10   1.348360e+11       6.309000e+10        1.254810e+11                         2.879120e+11
AAPL   2021-12-31            3.711900e+10   1.799200e+10   1.531540e+11       7.193200e+10        1.475740e+11                         3.092590e+11
AAPL   2022-03-31            2.809800e+10   1.429800e+10   1.181800e+11       6.739900e+10        1.275080e+11                         2.832630e+11
AAPL   2022-06-30            2.750200e+10   1.285200e+10   1.122920e+11       5.810700e+10        1.298730e+11                         2.782020e+11
AAPL   2022-09-30            2.364600e+10   1.854600e+10   1.354050e+11       5.067200e+10        1.539820e+11                         3.020830e+11
--------------------------------------------------------------------------------
GOOG
         asOfDate  CashAndCashEquivalents  CashFinancial  CurrentAssets  TangibleBookValue  CurrentLiabilities  TotalLiabilitiesNetMinorityInterest
symbol                                                                                                                                             
GOOG   2021-09-30            2.371900e+10            NaN   1.841100e+11       2.203950e+11        6.178200e+10                         1.028360e+11
GOOG   2021-12-31            2.094500e+10            NaN   1.881430e+11       2.272620e+11        6.425400e+10                         1.076330e+11
GOOG   2022-03-31            2.088600e+10            NaN   1.778530e+11       2.296810e+11        6.194800e+10                         1.030920e+11
GOOG   2022-06-30            1.793600e+10            NaN   1.723710e+11       2.300930e+11        6.135400e+10                         9.976600e+10
GOOG   2022-09-30            2.198400e+10            NaN   1.661090e+11       2.226000e+11        6.597900e+10                         1.046290e+11
--------------------------------------------------------------------------------
MSFT
         asOfDate  CashAndCashEquivalents  CashFinancial  CurrentAssets  TangibleBookValue  CurrentLiabilities  TotalLiabilitiesNetMinorityInterest
symbol                                                                                                                                             
MSFT   2021-09-30            1.916500e+10   6.863000e+09   1.743260e+11       9.372900e+10        8.052800e+10                         1.834400e+11
MSFT   2021-12-31            2.060400e+10   6.255000e+09   1.741880e+11       1.016270e+11        7.751000e+10                         1.803790e+11
MSFT   2022-03-31            1.249800e+10   7.456000e+09   1.539220e+11       8.420500e+10        7.743900e+10                         1.816830e+11
MSFT   2022-06-30            1.393100e+10   8.258000e+09   1.696840e+11       8.772000e+10        9.508200e+10                         1.982980e+11
MSFT   2022-09-30            2.288400e+10   7.237000e+09   1.608120e+11       9.529900e+10        8.738900e+10                         1.862180e+11
--------------------------------------------------------------------------------
Answered By: Andrej Kesely
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.