Data frame as Global Variable inside each function

Question:

I have a dataframe as df, i want to split my activities into different functions so that i can use those functions into future programs

# check if dataframe has duplicates
    def duplicate_check ():
        global df
        df = df.drop_duplicates(['datetime', 'tagname'])
        df.drop(['tagname'], axis=1, inplace=True)
        return df

    df = duplicate_check()

# Split my dataframe array column to individual column
    def array_split():
        global df
        date = df['datetime']
        df = df['value'] 
            .str.split('t', expand=True).fillna('0') 
            .replace(r's+|\n', ' ', regex=True) 
            .apply(pd.to_numeric)
        df['datetime'] = date  # Join date back to dataframe
        return df

    df = array_split()

# split dataframe df to df and df_spec 
    def remove_duplicate_spec():
        global df, df_spec
        df_spec = df.loc[df[123].isin([1])]
        df = df.loc[df[123].isin([0])]
        df_spec = df_spec.drop_duplicates(119)
        return df, df_spec


    df, df_spec = remove_duplicate_spec()

Question: Should i declare global df/ df_spec inside each function?
Is this the best practice? or how can I optimize the code further

Asked By: user_v27

||

Answers:

The best way is to use your dataframe as argument for each function.

df = pd.DataFrame({'datetime':[0,0,1,1,2], 'tagname':[0,0,1,1,2], 'other':range(95,100)})

def duplicate_check(df):
    return df.drop_duplicates(['datetime', 'tagname'], keep='last').drop(['tagname'], axis=1)

duplicate_check(df)

DataFrame:

   datetime  tagname  other
0         0        0     95
1         0        0     96
2         1        1     97
3         1        1     98
4         2        2     99

Result of duplicate_check(df):

   datetime  other
1         0     96
3         1     98
4         2     99
Answered By: Rene
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.