Python – Having trouble assigning value into new col based on data from another cell

Question:

I have data that looks like;

ID    File
1     this_file_whatever.ext1
2     this_whatever.ext2
3     this_is_ok_pooh.ext3

I am trying to get the extension and put the key from a dict in a new col based on the extension in File.

    def create_filegroups(row):
    filegroup_dict = {
        'GroupA': 'ext1',
        'GroupB': 'ext2',
        'GroupC': 'ext3'
    }
    if '.' in row['Name']:
        test = row['Name'].split(".",1)[1]
    return test

DF = build_df()
DF['COL3'] = DF.apply(create_filegroups(row), axis=1)
print(DF)

I can’t figure out what I am doing wrong. The dict compare I can do when I get there, but I can’t seem to apply a function to the cells.

Asked By: DuckButts

||

Answers:

I believe you need pandas.Series.map after extracting the file extension from the column File.

Try this:

df['COL3']= (
                df['File']
                    .str.extract(r'w+.(w+)', expand=False)
                    .map({k:v for v,k in filegroup_dict.items()})
            )

# Output :

print(df)

   ID                     File    COL3
0   1  this_file_whatever.ext1  GroupA
1   2       this_whatever.ext2  GroupB
2   3     this_is_ok_pooh.ext3  GroupC
Answered By: abokey
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.