Create a new column if ends with certain string
Question:
I have a data frame and a list. I want to check if strings in column ends with anything in my list. I want to create a new column showing if column ends with anything in the list then value is “Y”, other wiese “N”. my data frame Data looks like following:
import pandas as pd
city = ['New York', 'Los Angeles','Buffalo','Miami','San Deigo', 'San
Francisco']
population = ['8.5','3.9','0.25','0.45','1.4','0.87']
df = pd.DataFrame({'city':city,'population':population})
ending = ['les','sco', 'igo']
Expected result should looks like this:
city population flag
New York 8.5 N
Los Angeles 3.9 Y
Buffalo 0.25 N
Miami 0.45 N
San Deigo 1.4 Y
San Francisco 0.87 Y
I tried to use if statement:
if df['city'].str.endswith(tuple(ending)):
val = 'Y'
elif df['city'].str.endswith(tuple(ending)):
val= 'Y'
else:
val = 'N'
I get error message:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Any suggestion? Thank!
Answers:
Assuming the ending is always a three character string, you could use:
df['flag']=df['city'].map(lambda x: x[-3:] in ending)
which produces
city population flag
0 New York 8.5 False
1 Los Angeles 3.9 True
2 Buffalo 0.25 False
3 Miami 0.45 False
4 San Deigo 1.4 True
5 San Francisco 0.87 True
if you really need the binary outcome to be Y/N instead of True/False you could perform another map:
def bin(arg):
if arg==True:
return 'Y'
return 'F'
df.flag=df.flag.map(lambda x: bin(x))
which results in
city population flag
0 New York 8.5 F
1 Los Angeles 3.9 Y
2 Buffalo 0.25 F
3 Miami 0.45 F
4 San Deigo 1.4 Y
5 San Francisco 0.87 Y
The any built-in function can help.
val = 'Y' if any(df['city'].endswith(e) for e in ending) else 'N'
You can use pd.Series.isin
followed by pd.Series.map
with a dictionary mapping. This solution tests specifically the last 3 characters. Otherwise, use @Wen’s solution.
ending = ['les', 'sco', 'igo']
mapper = {True: 'Y', False: 'N'}
df['flag'] = df['city'].str[-3:].isin(ending).map(mapper)
print(df)
city population flag
0 New York 8.5 N
1 Los Angeles 3.9 Y
2 Buffalo 0.25 N
3 Miami 0.45 N
4 San Deigo 1.4 Y
5 San Francisco 0.87 Y
Using str.endswith
, this dose not required the same length string in ending
df.city.str.endswith(tuple(ending)).map({True:'Y',False:'N'})
0 N
1 Y
2 N
3 N
4 Y
5 Y
Name: city, dtype: object
import numpy as np
col = "city"
conditions = [
df[col].str.endswith(tuple(ending)),
~df[col].str.endswith(tuple(ending)),
]
choices = ["Y", "F"]
df["flag"] = np.select(conditions, choices, default=np.nan)
I have a data frame and a list. I want to check if strings in column ends with anything in my list. I want to create a new column showing if column ends with anything in the list then value is “Y”, other wiese “N”. my data frame Data looks like following:
import pandas as pd
city = ['New York', 'Los Angeles','Buffalo','Miami','San Deigo', 'San
Francisco']
population = ['8.5','3.9','0.25','0.45','1.4','0.87']
df = pd.DataFrame({'city':city,'population':population})
ending = ['les','sco', 'igo']
Expected result should looks like this:
city population flag
New York 8.5 N
Los Angeles 3.9 Y
Buffalo 0.25 N
Miami 0.45 N
San Deigo 1.4 Y
San Francisco 0.87 Y
I tried to use if statement:
if df['city'].str.endswith(tuple(ending)):
val = 'Y'
elif df['city'].str.endswith(tuple(ending)):
val= 'Y'
else:
val = 'N'
I get error message:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Any suggestion? Thank!
Assuming the ending is always a three character string, you could use:
df['flag']=df['city'].map(lambda x: x[-3:] in ending)
which produces
city population flag
0 New York 8.5 False
1 Los Angeles 3.9 True
2 Buffalo 0.25 False
3 Miami 0.45 False
4 San Deigo 1.4 True
5 San Francisco 0.87 True
if you really need the binary outcome to be Y/N instead of True/False you could perform another map:
def bin(arg):
if arg==True:
return 'Y'
return 'F'
df.flag=df.flag.map(lambda x: bin(x))
which results in
city population flag
0 New York 8.5 F
1 Los Angeles 3.9 Y
2 Buffalo 0.25 F
3 Miami 0.45 F
4 San Deigo 1.4 Y
5 San Francisco 0.87 Y
The any built-in function can help.
val = 'Y' if any(df['city'].endswith(e) for e in ending) else 'N'
You can use pd.Series.isin
followed by pd.Series.map
with a dictionary mapping. This solution tests specifically the last 3 characters. Otherwise, use @Wen’s solution.
ending = ['les', 'sco', 'igo']
mapper = {True: 'Y', False: 'N'}
df['flag'] = df['city'].str[-3:].isin(ending).map(mapper)
print(df)
city population flag
0 New York 8.5 N
1 Los Angeles 3.9 Y
2 Buffalo 0.25 N
3 Miami 0.45 N
4 San Deigo 1.4 Y
5 San Francisco 0.87 Y
Using str.endswith
, this dose not required the same length string in ending
df.city.str.endswith(tuple(ending)).map({True:'Y',False:'N'})
0 N
1 Y
2 N
3 N
4 Y
5 Y
Name: city, dtype: object
import numpy as np
col = "city"
conditions = [
df[col].str.endswith(tuple(ending)),
~df[col].str.endswith(tuple(ending)),
]
choices = ["Y", "F"]
df["flag"] = np.select(conditions, choices, default=np.nan)