Append row entries of specific column to empty dataframe based on multiple conditions

Question

I have a dictionary of dataframes called dictoftickersdf that looks like this.

I will cycle over each frame using a for loop in the dictionary. So assume each one is called tickersdf.

    Country Type  Ticker
1   US      Pub   AAPL
2   US      Priv  etc
3   GER     Pub   etc
4   HK      Pub   etc

and

    Country Type  Ticker
1   US      Pub   GE
2   US      Priv  etc
3   GER     Pub   etc
4   HK      Pub   etc
5   US      Pub   MSFT

etc..

I have an empty dataframe, df = pd.DataFrame()

I am running a for loop over various tickersdf which have different companies.

I want to only append the entries that appear in the column ticker based on certain conditions (Type == Pub and Country == US).

So I want the end df to look like this

AAPL
GE
MSFT
...

So far I have this,

df = pd.DataFrame()

for subdir, dirs, files in os.walk(r"/Users/xxx/Documents/"):
    for file in files:
        filepath = os.path.join(subdir, file)
        print(filepath)

        dictoftickersdf = pd.read_excel(filepath,sheet_name=None) #multiple sheets per file

        for key, tickersdf in dictoftickersdf.items():
            df = df.append(tickersdf.loc[(tickersdf['Country']=='US') & (tickersdf['Type']=='Pub'),'Ticker'])

But the dataframe df comes up empty, what am I doing wrong?

Update:

I added an assignment command at the end and its not empty anymore but it’s still not working right. Now the df looks like this

          1     1     5    ...
Ticker    AAPL  NaN   NaN  ...
Ticker    NaN   GE    MSFT ...
Ticker    ....................

Asked By: anarchy

||

Source

Answer 1

Looks like ‘Public’ is shortened to just ‘Pub’ in your dataframe. Try shortening that part to see if that fixes it.

Answered By: BrutalPeanut

Answer 2

I had to use pd.Series instead of pd.DataFrame because I was only getting a single series.

df = pd.Series()

for subdir, dirs, files in os.walk(r"/Users/xxx/Documents/"):
    for file in files:
        filepath = os.path.join(subdir, file)
        print(filepath)

        dictoftickersdf = pd.read_excel(filepath,sheet_name=None) #multiple sheets per file


        for key, tickersdf in dictoftickersdf.items():
            df = df.append(tickersdf.loc[(tickersdf['Country']=='US') & (tickersdf['Type']=='Pub'),'Ticker'])

Answered By: anarchy

Append row entries of specific column to empty dataframe based on multiple conditions

Question:

Answers: