How to rename images name using data frame?
Question:
I have a data frame that has file_name
and corresponding text
and want to update the file_name
and the image name in imgs
folder by concatenating with some text or number the structure of input_folder
look like :
input_folder --|
|--- imgs -- |-- 0.jpg
|-- 1.jpg
|-- 2.jpg
.........
|--- train.jsonl
the train.jsonl
file has :
{"file_name": "0.jpg", "text": "The Fulton County Grand Jury said Friday an investigation"}
{"file_name": "1.jpg", "text": "of Atlanta's recent primary election produced "no evidence" that"}
path ="input_folder/train.jsonl"
df = pd.read_json(path_or_buf = input_file, lines=True,)
print(df.head())
# rename file_name col
new_df['file_name'] = df['file_name '].apply(lambda x: 'A' + x)
# def rename(df['file_name'],new_df['file_name'])
What I am expecting is : updating the file_name
column in resulting data frame with renaming the image name in imgs
folder
out_folder --|
|-- imgs -- |-- A_0.jpg
|-- A_1.jpg
|-- A_2.jpg
.........
|---- train.jsonl
the train.jsonl
file has :
{"file_name": "A_0.jpg", "text": "The Fulton County Grand Jury said Friday an investigation"}
{"file_name": "A_1.jpg", "text": "of Atlanta's recent primary election produced "no evidence" that"}
Answers:
you can import subprocess to execute a shell command to make your new directories and rename images accordingly:
import pandas as pd
from subprocess import call
path = "input_folder/train.jsonl"
df = pd.read_json(path_or_buf=path, lines=True,)
# make a duplicate dataframe
ndf = df.copy(deep=True)
for i in range(len(ndf)):
og_fn = ndf['file_name'][i]
new_fn = 'A_{fn}'.format(fn=og_fn)
ndf.loc[i, ['file_name']] = [new_fn]
# create output directory
call(['mkdir out_folder'], shell=True)
# copy original imgs folder into out_folder
dst = '/Users/username/.../out_folder'
src = '/Users/username/.../input_folder/imgs'
cmd = 'cp -a {s} {d}'.format(s=src, d=dst)
call([cmd, src, dst], shell=True)
# write the new dataframe to '/.../out_folder/train.jsonl'
with open(f'{dst}/train.jsonl', 'w') as f:
f.write(ndf.to_json(orient='records', lines=True))
call(['pwd'], shell=True)
cmd = 'for f in *.jpg; do mv "$f" "A_$f"; done'
call([cmd], shell=True, cwd=dst + '/imgs')
this should give you an output_folder in the same directory as the input_folder
with an updated train.jsonl and imgs directory (containing the images with updated names) inside — let me know if this works for you. PS you must use deep copy because shallow copies will store references to the original dataframe’s data.
I have a data frame that has file_name
and corresponding text
and want to update the file_name
and the image name in imgs
folder by concatenating with some text or number the structure of input_folder
look like :
input_folder --|
|--- imgs -- |-- 0.jpg
|-- 1.jpg
|-- 2.jpg
.........
|--- train.jsonl
the train.jsonl
file has :
{"file_name": "0.jpg", "text": "The Fulton County Grand Jury said Friday an investigation"}
{"file_name": "1.jpg", "text": "of Atlanta's recent primary election produced "no evidence" that"}
path ="input_folder/train.jsonl"
df = pd.read_json(path_or_buf = input_file, lines=True,)
print(df.head())
# rename file_name col
new_df['file_name'] = df['file_name '].apply(lambda x: 'A' + x)
# def rename(df['file_name'],new_df['file_name'])
What I am expecting is : updating the file_name
column in resulting data frame with renaming the image name in imgs
folder
out_folder --|
|-- imgs -- |-- A_0.jpg
|-- A_1.jpg
|-- A_2.jpg
.........
|---- train.jsonl
the train.jsonl
file has :
{"file_name": "A_0.jpg", "text": "The Fulton County Grand Jury said Friday an investigation"}
{"file_name": "A_1.jpg", "text": "of Atlanta's recent primary election produced "no evidence" that"}
you can import subprocess to execute a shell command to make your new directories and rename images accordingly:
import pandas as pd
from subprocess import call
path = "input_folder/train.jsonl"
df = pd.read_json(path_or_buf=path, lines=True,)
# make a duplicate dataframe
ndf = df.copy(deep=True)
for i in range(len(ndf)):
og_fn = ndf['file_name'][i]
new_fn = 'A_{fn}'.format(fn=og_fn)
ndf.loc[i, ['file_name']] = [new_fn]
# create output directory
call(['mkdir out_folder'], shell=True)
# copy original imgs folder into out_folder
dst = '/Users/username/.../out_folder'
src = '/Users/username/.../input_folder/imgs'
cmd = 'cp -a {s} {d}'.format(s=src, d=dst)
call([cmd, src, dst], shell=True)
# write the new dataframe to '/.../out_folder/train.jsonl'
with open(f'{dst}/train.jsonl', 'w') as f:
f.write(ndf.to_json(orient='records', lines=True))
call(['pwd'], shell=True)
cmd = 'for f in *.jpg; do mv "$f" "A_$f"; done'
call([cmd], shell=True, cwd=dst + '/imgs')
this should give you an output_folder in the same directory as the input_folder
with an updated train.jsonl and imgs directory (containing the images with updated names) inside — let me know if this works for you. PS you must use deep copy because shallow copies will store references to the original dataframe’s data.