Write "null" if column doesn't exist with KeyError: "['Column'] not in index" in df.to_csv?
Question:
I am getting KeyError: "['CashFinancial'] not in index"
on the df.to_csv
line because ‘GOOG’ doesn’t have the CashFinancial
column. How can I have it write in null
for the CashFinancial
value for ‘GOOG’?
import pandas as pd
from yahooquery import Ticker
symbols = ['AAPL','GOOG','MSFT'] #This will be 75,000 symbols.
header = ["asOfDate","CashAndCashEquivalents","CashFinancial","CurrentAssets","TangibleBookValue","CurrentLiabilities","TotalLiabilitiesNetMinorityInterest"]
for tick in symbols:
faang = Ticker(tick)
faang.balance_sheet(frequency='q')
df = faang.balance_sheet(frequency='q')
df.to_csv('output.csv', mode='a', index=True, header=False, columns=header)
Answers:
What about :
if tick == "GOOG"
df.loc[:,"CashFinancial"] = None
To set an entire CashFinancial
column to "None" only if your "tick" was GOOG
, before writing it to csv.
The full code from the example you posted would he something like :
import pandas as pd
from yahooquery import Ticker
symbols = ['AAPL','GOOG','MSFT']
header = ["asOfDate","CashAndCashEquivalents","CashFinancial","CurrentAssets","TangibleBookValue","CurrentLiabilities","TotalLiabilitiesNetMinorityInterest"]
for tick in symbols:
faang = Ticker(tick)
faang.balance_sheet(frequency='q')
df = faang.balance_sheet(frequency='q')#,{"symbol":[1],"asOfDate":[2],"CashAndCashEquivalents":[3],"CashFinancial":[4],"CurrentAssets":[5],"TangibleBookValue":[6],"CurrentLiabilities":[7],"TotalLiabilitiesNetMinorityInterest":[8],"marketCap":[9]}
for column_name in header :
if not column_name in df.columns :
#Here, if any column is missing from the names you defined
#in your "header" variable, we add this column and set all
#it's row values to None
df.loc[:,column_name ] = None
df.to_csv('output.csv', mode='a', index=True, header=False, columns=header)
Load all dataframes into a list, then use pd.concat
(it will create NaN in missing columns):
import pandas as pd
from yahooquery import Ticker
symbols = ["AAPL", "GOOG", "MSFT"]
header = [
"asOfDate",
"CashAndCashEquivalents",
"CashFinancial",
"CurrentAssets",
"TangibleBookValue",
"CurrentLiabilities",
"TotalLiabilitiesNetMinorityInterest",
]
all_dfs = []
for tick in symbols:
faang = Ticker(tick)
df = faang.balance_sheet(frequency="q")
all_dfs.append(df)
df = pd.concat(all_dfs)
for symbol, g in df.groupby(level=0):
print(symbol)
print(g[header])
# to save to CSV:
# g[header].to_csv('filename.csv')
print("-" * 80)
Prints:
AAPL
asOfDate CashAndCashEquivalents CashFinancial CurrentAssets TangibleBookValue CurrentLiabilities TotalLiabilitiesNetMinorityInterest
symbol
AAPL 2021-09-30 3.494000e+10 1.730500e+10 1.348360e+11 6.309000e+10 1.254810e+11 2.879120e+11
AAPL 2021-12-31 3.711900e+10 1.799200e+10 1.531540e+11 7.193200e+10 1.475740e+11 3.092590e+11
AAPL 2022-03-31 2.809800e+10 1.429800e+10 1.181800e+11 6.739900e+10 1.275080e+11 2.832630e+11
AAPL 2022-06-30 2.750200e+10 1.285200e+10 1.122920e+11 5.810700e+10 1.298730e+11 2.782020e+11
AAPL 2022-09-30 2.364600e+10 1.854600e+10 1.354050e+11 5.067200e+10 1.539820e+11 3.020830e+11
--------------------------------------------------------------------------------
GOOG
asOfDate CashAndCashEquivalents CashFinancial CurrentAssets TangibleBookValue CurrentLiabilities TotalLiabilitiesNetMinorityInterest
symbol
GOOG 2021-09-30 2.371900e+10 NaN 1.841100e+11 2.203950e+11 6.178200e+10 1.028360e+11
GOOG 2021-12-31 2.094500e+10 NaN 1.881430e+11 2.272620e+11 6.425400e+10 1.076330e+11
GOOG 2022-03-31 2.088600e+10 NaN 1.778530e+11 2.296810e+11 6.194800e+10 1.030920e+11
GOOG 2022-06-30 1.793600e+10 NaN 1.723710e+11 2.300930e+11 6.135400e+10 9.976600e+10
GOOG 2022-09-30 2.198400e+10 NaN 1.661090e+11 2.226000e+11 6.597900e+10 1.046290e+11
--------------------------------------------------------------------------------
MSFT
asOfDate CashAndCashEquivalents CashFinancial CurrentAssets TangibleBookValue CurrentLiabilities TotalLiabilitiesNetMinorityInterest
symbol
MSFT 2021-09-30 1.916500e+10 6.863000e+09 1.743260e+11 9.372900e+10 8.052800e+10 1.834400e+11
MSFT 2021-12-31 2.060400e+10 6.255000e+09 1.741880e+11 1.016270e+11 7.751000e+10 1.803790e+11
MSFT 2022-03-31 1.249800e+10 7.456000e+09 1.539220e+11 8.420500e+10 7.743900e+10 1.816830e+11
MSFT 2022-06-30 1.393100e+10 8.258000e+09 1.696840e+11 8.772000e+10 9.508200e+10 1.982980e+11
MSFT 2022-09-30 2.288400e+10 7.237000e+09 1.608120e+11 9.529900e+10 8.738900e+10 1.862180e+11
--------------------------------------------------------------------------------
I am getting KeyError: "['CashFinancial'] not in index"
on the df.to_csv
line because ‘GOOG’ doesn’t have the CashFinancial
column. How can I have it write in null
for the CashFinancial
value for ‘GOOG’?
import pandas as pd
from yahooquery import Ticker
symbols = ['AAPL','GOOG','MSFT'] #This will be 75,000 symbols.
header = ["asOfDate","CashAndCashEquivalents","CashFinancial","CurrentAssets","TangibleBookValue","CurrentLiabilities","TotalLiabilitiesNetMinorityInterest"]
for tick in symbols:
faang = Ticker(tick)
faang.balance_sheet(frequency='q')
df = faang.balance_sheet(frequency='q')
df.to_csv('output.csv', mode='a', index=True, header=False, columns=header)
What about :
if tick == "GOOG"
df.loc[:,"CashFinancial"] = None
To set an entire CashFinancial
column to "None" only if your "tick" was GOOG
, before writing it to csv.
The full code from the example you posted would he something like :
import pandas as pd
from yahooquery import Ticker
symbols = ['AAPL','GOOG','MSFT']
header = ["asOfDate","CashAndCashEquivalents","CashFinancial","CurrentAssets","TangibleBookValue","CurrentLiabilities","TotalLiabilitiesNetMinorityInterest"]
for tick in symbols:
faang = Ticker(tick)
faang.balance_sheet(frequency='q')
df = faang.balance_sheet(frequency='q')#,{"symbol":[1],"asOfDate":[2],"CashAndCashEquivalents":[3],"CashFinancial":[4],"CurrentAssets":[5],"TangibleBookValue":[6],"CurrentLiabilities":[7],"TotalLiabilitiesNetMinorityInterest":[8],"marketCap":[9]}
for column_name in header :
if not column_name in df.columns :
#Here, if any column is missing from the names you defined
#in your "header" variable, we add this column and set all
#it's row values to None
df.loc[:,column_name ] = None
df.to_csv('output.csv', mode='a', index=True, header=False, columns=header)
Load all dataframes into a list, then use pd.concat
(it will create NaN in missing columns):
import pandas as pd
from yahooquery import Ticker
symbols = ["AAPL", "GOOG", "MSFT"]
header = [
"asOfDate",
"CashAndCashEquivalents",
"CashFinancial",
"CurrentAssets",
"TangibleBookValue",
"CurrentLiabilities",
"TotalLiabilitiesNetMinorityInterest",
]
all_dfs = []
for tick in symbols:
faang = Ticker(tick)
df = faang.balance_sheet(frequency="q")
all_dfs.append(df)
df = pd.concat(all_dfs)
for symbol, g in df.groupby(level=0):
print(symbol)
print(g[header])
# to save to CSV:
# g[header].to_csv('filename.csv')
print("-" * 80)
Prints:
AAPL
asOfDate CashAndCashEquivalents CashFinancial CurrentAssets TangibleBookValue CurrentLiabilities TotalLiabilitiesNetMinorityInterest
symbol
AAPL 2021-09-30 3.494000e+10 1.730500e+10 1.348360e+11 6.309000e+10 1.254810e+11 2.879120e+11
AAPL 2021-12-31 3.711900e+10 1.799200e+10 1.531540e+11 7.193200e+10 1.475740e+11 3.092590e+11
AAPL 2022-03-31 2.809800e+10 1.429800e+10 1.181800e+11 6.739900e+10 1.275080e+11 2.832630e+11
AAPL 2022-06-30 2.750200e+10 1.285200e+10 1.122920e+11 5.810700e+10 1.298730e+11 2.782020e+11
AAPL 2022-09-30 2.364600e+10 1.854600e+10 1.354050e+11 5.067200e+10 1.539820e+11 3.020830e+11
--------------------------------------------------------------------------------
GOOG
asOfDate CashAndCashEquivalents CashFinancial CurrentAssets TangibleBookValue CurrentLiabilities TotalLiabilitiesNetMinorityInterest
symbol
GOOG 2021-09-30 2.371900e+10 NaN 1.841100e+11 2.203950e+11 6.178200e+10 1.028360e+11
GOOG 2021-12-31 2.094500e+10 NaN 1.881430e+11 2.272620e+11 6.425400e+10 1.076330e+11
GOOG 2022-03-31 2.088600e+10 NaN 1.778530e+11 2.296810e+11 6.194800e+10 1.030920e+11
GOOG 2022-06-30 1.793600e+10 NaN 1.723710e+11 2.300930e+11 6.135400e+10 9.976600e+10
GOOG 2022-09-30 2.198400e+10 NaN 1.661090e+11 2.226000e+11 6.597900e+10 1.046290e+11
--------------------------------------------------------------------------------
MSFT
asOfDate CashAndCashEquivalents CashFinancial CurrentAssets TangibleBookValue CurrentLiabilities TotalLiabilitiesNetMinorityInterest
symbol
MSFT 2021-09-30 1.916500e+10 6.863000e+09 1.743260e+11 9.372900e+10 8.052800e+10 1.834400e+11
MSFT 2021-12-31 2.060400e+10 6.255000e+09 1.741880e+11 1.016270e+11 7.751000e+10 1.803790e+11
MSFT 2022-03-31 1.249800e+10 7.456000e+09 1.539220e+11 8.420500e+10 7.743900e+10 1.816830e+11
MSFT 2022-06-30 1.393100e+10 8.258000e+09 1.696840e+11 8.772000e+10 9.508200e+10 1.982980e+11
MSFT 2022-09-30 2.288400e+10 7.237000e+09 1.608120e+11 9.529900e+10 8.738900e+10 1.862180e+11
--------------------------------------------------------------------------------