How to prepend a string that starts with a number with string from same row in dataframe?

Question:

I have the following dataframe (df):

col_1      col_2 col_3 col_4
sample_001 fjsah AB    11-110
sample_002 dfshb CD    20-210
sample_003 fsvhb EF    N3-303
sample_004 dfbkk GH    Q4-444
sample_005 gnddl IJ    55-005

I want to prepend the string in col_3 to the respective string in col_4 only if the string in col_4 starts with a number, such that the df is as follows:

col_1      col_2 col_3 col_4
sample_001 fjsah AB    AB11-110
sample_002 dfshb CD    CD20-210
sample_003 fsvhb EF    N3-303
sample_004 dfbkk GH    Q4-444
sample_005 gnddl IJ    IJ55-005

I am able to identify which col_4 strings start with a number with:

for n in df['col_4']:
    if n[0].isdigit():
        print(n)

but I can’t figure out how to make the "selective merge" happen in the for loop

Asked By: Bot75

||

Answers:

You can use Series.str[0].str.isdigit() to create a series of boolean indicating if the first character in each row is a digit or not, and you can use this masking along with .loc to modify the values:

df.loc[df['col_4'].str[0].str.isdigit(), 'col_4'] = df['col_3']+df['col_4']

# df
        col_1  col_2 col_3     col_4
0  sample_001  fjsah    AB  AB11-110
1  sample_002  dfshb    CD  CD20-210
2  sample_003  fsvhb    EF    N3-303
3  sample_004  dfbkk    GH    Q4-444
4  sample_005  gnddl    IJ  IJ55-005
Answered By: ThePyGuy

Another way – with apply and lambda

df.loc[:, 'col_4'] = df.apply(lambda row: row['col_3'] + row['col_4'] if row['col_4'][0].isdigit() else row['col_4'], axis=1)

Output

        col_1  col_2 col_3     col_4
0  sample_001  fjsah    AB  AB11-110
1  sample_002  dfshb    CD  CD20-210
2  sample_003  fsvhb    EF    N3-303
3  sample_004  dfbkk    GH    Q4-444
4  sample_005  gnddl    IJ  IJ55-005
Answered By: Mortz

You can make a function encapsulating that logic and apply it by row.

def f(row):
    try:
        number = int(row.col_4[0])
        return f'{row.col_3}{row.col_4}'
    except ValueError:
        return row.col_4

df['new_col'] = df.apply(f, axis=1)

        col_1  col_2 col_3   col_4   new_col
0  sample_001  fjsah    AB  11-110  AB11-110
1  sample_002  dfshb    CD  20-210  CD20-210
2  sample_003  fsvhb    EF  N3-303    N3-303
3  sample_004  dfbkk    GH  Q4-444    Q4-444
4  sample_005  gnddl    IJ  55-005  IJ55-005
Answered By: alec_djinn
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.