Creating a new column in a Pandas dataframe based on a copy of another dataframe


I’m reading in a bunch of csv files into one large dataframe. The code below works but it gives a weird warning which I’m not sure if I should do anything about it.

import pandas as pd
import glob

# List of folders

folders = [
(202206, r"\a..."),
(202207, r"\a..."),
(202208, r"\a..."),
(202209, r"\a..."),
(202210, r"\a..."),
(202211, r"\a..."),
(202212, r"\a..."),
(202301, r"\a...")

columns = ['A','B','C','D']

topline = 3

# Loop through folders and append each mpf into a dataframe

list_df = []

for folder in folders:
    csv_files = glob.glob(folder[1])
    for file in csv_files:
        temp_df = pd.read_csv(file, header=topline, skip_blank_lines=True, usecols=columns)
        # tilde removes the Nans and junk at the bottom of the file
        df = temp_df[~temp_df[columns[0]].isna()]
        df['period'] = folder[0]
data = pd.concat(list_df, axis=0, ignore_index=True)

It gives the following warning:

<ipython-input-17-79f98aa2c09a>:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation:
  df['period'] = folder[0]

Can someone explain what this means and should I be concerned. I read the linked page and couldn’t make sense of it or how it relates to what I’m doing.

Asked By: Zain



The warning you are seeing, is just a warning because you are setting values on a copy of a slice of a DataFrame, because it can be a source of bug later in your program

if you want to avoid this warning use .loc instead to set the value directly on the original dataframe, this way you can be sure that it set the value on the original dataframe instead of a copy of a slice

temp_df.loc[~temp_df[columns[0]].isna(), 'period'] = folder[0]
df = temp_df[~temp_df[columns[0]].isna()]
Answered By: Saxtheowl
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.