Append row entries of specific column to empty dataframe based on multiple conditions
Question:
I have a dictionary of dataframes called dictoftickersdf that looks like this.
I will cycle over each frame using a for loop in the dictionary. So assume each one is called tickersdf.
Country Type Ticker
1 US Pub AAPL
2 US Priv etc
3 GER Pub etc
4 HK Pub etc
and
Country Type Ticker
1 US Pub GE
2 US Priv etc
3 GER Pub etc
4 HK Pub etc
5 US Pub MSFT
etc..
I have an empty dataframe, df = pd.DataFrame()
I am running a for loop over various tickersdf which have different companies.
I want to only append the entries that appear in the column ticker based on certain conditions (Type == Pub and Country == US).
So I want the end df to look like this
AAPL
GE
MSFT
...
So far I have this,
df = pd.DataFrame()
for subdir, dirs, files in os.walk(r"/Users/xxx/Documents/"):
for file in files:
filepath = os.path.join(subdir, file)
print(filepath)
dictoftickersdf = pd.read_excel(filepath,sheet_name=None) #multiple sheets per file
for key, tickersdf in dictoftickersdf.items():
df = df.append(tickersdf.loc[(tickersdf['Country']=='US') & (tickersdf['Type']=='Pub'),'Ticker'])
But the dataframe df comes up empty, what am I doing wrong?
Update:
I added an assignment command at the end and its not empty anymore but it’s still not working right. Now the df looks like this
1 1 5 ...
Ticker AAPL NaN NaN ...
Ticker NaN GE MSFT ...
Ticker ....................
Answers:
Looks like ‘Public’ is shortened to just ‘Pub’ in your dataframe. Try shortening that part to see if that fixes it.
I had to use pd.Series instead of pd.DataFrame because I was only getting a single series.
df = pd.Series()
for subdir, dirs, files in os.walk(r"/Users/xxx/Documents/"):
for file in files:
filepath = os.path.join(subdir, file)
print(filepath)
dictoftickersdf = pd.read_excel(filepath,sheet_name=None) #multiple sheets per file
for key, tickersdf in dictoftickersdf.items():
df = df.append(tickersdf.loc[(tickersdf['Country']=='US') & (tickersdf['Type']=='Pub'),'Ticker'])
I have a dictionary of dataframes called dictoftickersdf that looks like this.
I will cycle over each frame using a for loop in the dictionary. So assume each one is called tickersdf.
Country Type Ticker
1 US Pub AAPL
2 US Priv etc
3 GER Pub etc
4 HK Pub etc
and
Country Type Ticker
1 US Pub GE
2 US Priv etc
3 GER Pub etc
4 HK Pub etc
5 US Pub MSFT
etc..
I have an empty dataframe, df = pd.DataFrame()
I am running a for loop over various tickersdf which have different companies.
I want to only append the entries that appear in the column ticker based on certain conditions (Type == Pub and Country == US).
So I want the end df to look like this
AAPL
GE
MSFT
...
So far I have this,
df = pd.DataFrame()
for subdir, dirs, files in os.walk(r"/Users/xxx/Documents/"):
for file in files:
filepath = os.path.join(subdir, file)
print(filepath)
dictoftickersdf = pd.read_excel(filepath,sheet_name=None) #multiple sheets per file
for key, tickersdf in dictoftickersdf.items():
df = df.append(tickersdf.loc[(tickersdf['Country']=='US') & (tickersdf['Type']=='Pub'),'Ticker'])
But the dataframe df comes up empty, what am I doing wrong?
Update:
I added an assignment command at the end and its not empty anymore but it’s still not working right. Now the df looks like this
1 1 5 ...
Ticker AAPL NaN NaN ...
Ticker NaN GE MSFT ...
Ticker ....................
Looks like ‘Public’ is shortened to just ‘Pub’ in your dataframe. Try shortening that part to see if that fixes it.
I had to use pd.Series instead of pd.DataFrame because I was only getting a single series.
df = pd.Series()
for subdir, dirs, files in os.walk(r"/Users/xxx/Documents/"):
for file in files:
filepath = os.path.join(subdir, file)
print(filepath)
dictoftickersdf = pd.read_excel(filepath,sheet_name=None) #multiple sheets per file
for key, tickersdf in dictoftickersdf.items():
df = df.append(tickersdf.loc[(tickersdf['Country']=='US') & (tickersdf['Type']=='Pub'),'Ticker'])