How to rename images name using data frame?

Question:

I have a data frame that has file_name and corresponding text and want to update the file_name and the image name in imgs folder by concatenating with some text or number the structure of input_folder look like :

input_folder --|
               |--- imgs -- |-- 0.jpg
                            |-- 1.jpg
                            |-- 2.jpg
                            .........

               |--- train.jsonl

the train.jsonl file has :

{"file_name": "0.jpg", "text": "The Fulton County Grand Jury said Friday an investigation"}
{"file_name": "1.jpg", "text": "of Atlanta's recent primary election produced "no evidence" that"}
path ="input_folder/train.jsonl"
df = pd.read_json(path_or_buf = input_file,   lines=True,)
print(df.head())

# rename file_name col
new_df['file_name'] = df['file_name '].apply(lambda x: 'A' + x)
# def rename(df['file_name'],new_df['file_name'])

What I am expecting is : updating the file_name column in resulting data frame with renaming the image name in imgs folder

out_folder --|
             |-- imgs -- |-- A_0.jpg
                         |-- A_1.jpg
                         |-- A_2.jpg
                           .........

             |---- train.jsonl

the train.jsonl file has :

{"file_name": "A_0.jpg", "text": "The Fulton County Grand Jury said Friday an investigation"}
{"file_name": "A_1.jpg", "text": "of Atlanta's recent primary election produced "no evidence" that"}

Answers:

you can import subprocess to execute a shell command to make your new directories and rename images accordingly:

import pandas as pd
from subprocess import call

path = "input_folder/train.jsonl"
df = pd.read_json(path_or_buf=path, lines=True,)

# make a duplicate dataframe
ndf = df.copy(deep=True)
for i in range(len(ndf)):
    og_fn = ndf['file_name'][i]
    new_fn = 'A_{fn}'.format(fn=og_fn)
    ndf.loc[i, ['file_name']] = [new_fn]

# create output directory
call(['mkdir out_folder'], shell=True)

# copy original imgs folder into out_folder
dst = '/Users/username/.../out_folder'
src = '/Users/username/.../input_folder/imgs'
cmd = 'cp -a {s} {d}'.format(s=src, d=dst)
call([cmd, src, dst], shell=True)

# write the new dataframe to '/.../out_folder/train.jsonl'
with open(f'{dst}/train.jsonl', 'w') as f:
    f.write(ndf.to_json(orient='records', lines=True))

call(['pwd'], shell=True)
cmd = 'for f in *.jpg; do mv "$f" "A_$f"; done'
call([cmd], shell=True, cwd=dst + '/imgs')

this should give you an output_folder in the same directory as the input_folder with an updated train.jsonl and imgs directory (containing the images with updated names) inside — let me know if this works for you. PS you must use deep copy because shallow copies will store references to the original dataframe’s data.

Answered By: harriet