How to create new column using loop with condition

Question:

This is my DataFrame and I want to create a new column using loop with conditions.

import pandas as pd
student_card = pd.DataFrame({'ID':[20190103, 20190222, 20190531],
                             'name':['Kim', 'Yang', 'Park'],
                             'class':['H', 'W', 'S']})


student_card['new'] = pd.Series() #1.create new column
for i, v in student_card['name'].items(): #2.set index and values
    if "Yang" in v: #3.if there's "Yang" in value
        student_card['new'].append(v) #4. append the value of name column in new column

So I tried this method and got stuck with following error:

TypeError: cannot concatenate object of type ‘<class ‘str’>’; only Series and DataFrame objs are valid

Which is not true btw (type of this column is Series)

Asked By: Se Bi Y

||

Answers:

What append does is to concatenate a series, which is not the case in your code as v is a string, i is the index of that string. You can try printing print(type(v)) and see for yourself. As for the documentation, you can find it here:
https://pandas.pydata.org/docs/reference/api/pandas.Series.append.html

What you are looking for is to set a value to a prexisting index on a column (or Series as its called in pandas). Something like that:

df.loc[index] = value

So in your code, this should do the trick

import pandas as pd
student_card = pd.DataFrame({'ID':[20190103, 20190222, 20190531],
                             'name':['Kim', 'Yang', 'Park'],
                             'class':['H', 'W', 'S']})


student_card['new'] = pd.Series() #1.create new column
for i, v in student_card['name'].items(): #2.set index and values
    if "Yang" in v: #3.if there's "Yang" in value
        student_card['new'].loc[i] = v #4. append the value of name column in new column
Answered By: ZooPanda

Append will concatenate two Series. What you want is accesing a row. Use indexing like iloc or iat to do so:

import pandas as pd
student_card = pd.DataFrame({'ID':[20190103, 20190222, 20190531],
                             'name':['Kim', 'Yang', 'Park'],
                             'class':['H', 'W', 'S']})


student_card['new'] = pd.Series() #1.create new column
for i, v in student_card['name'].items(): #2.set index and values
    if "Yang" in v: #3.if there's "Yang" in value
        student_card['new'].iat[i] = v #4. append the value of name column in new column

Output:

(Index) ID name class new
0 20190103 Kim H NaN
1 20190222 Yang W Yang
2 20190531 Park S NaN
Answered By: tturbo

You should really not use a loop to manipulate a pandas dataframe, this is an anti-pattern.

Also, append is now deprecated.

Use a vectorial approach with boolean indexing:

# select the rows for which name==Yang and add the same name in the new column
student_card.loc[student_card['name'].eq('Yang'), 'new'] = student_card['name']

Or, using where:

# mask all non matching values (name!=Yang) and copy the column
student_card['new'] = student_card['name'].where(student_card['name'].eq('Yang'))

output:

         ID  name class   new
0  20190103   Kim     H   NaN
1  20190222  Yang     W  Yang
2  20190531  Park     S   NaN
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.