Remove leading spaces from strings in DataFrame column of lists
Question:
Assuming an existing df with a column containing lists of countries…
x = df['Countries'][0]
x
[‘Australia’,
‘ Brazil’,
‘ Canada’,
‘ China’]
We see that other than the first country, each next country has a leading space that is preventing a string comparison further down my script. I am trying to use lstrip() to strip away the spaces for each list in the column.
y = []
for eachRow in df['Countries']:
for country in eachRow:
country = country.lstrip()
y.append(country)
y
TypeError: ‘float’ object is not iterable
I’m sure there’s a simpler way.
Answers:
try using a lambda function with map and str.strip
# sample data
df = pd.DataFrame({'Countries': [['Australia', ' Brazil', ' Canada', ' China']]})
df['Countries'] = df['Countries'].apply(lambda x: list(map(str.strip, x)))
print(df['Countries'][0]) # -> ['Australia', 'Brazil', 'Canada', 'China']
The error you are getting indicates that there are NaN
values in your df['Countries']
column, which cannot be iterated over. You can modify your code to handle this by checking for NaN
values before iterating over the list of countries in each row. Here’s an example:
y = []
for eachRow in df['Countries']:
if isinstance(eachRow, list): # Check if row is a list
row_countries = []
for country in eachRow:
if isinstance(country, str): # Check if country is a string
row_countries.append(country.lstrip())
y.append(row_countries)
else:
y.append(eachRow) # Append non-list values as is
This code first checks if the value in df['Countries']
is a list, and if so, iterates over each country in the list and uses lstrip()
to remove the leading spaces. If the value is not a list, it simply appends the value as is.
Note that this code assumes no other types of values in the df['Countries']
column besides lists and NaN
values. If there are, you may need to change the code.
You should use pandas.str method. It works with nan. Just do:
df['columns']=df['column'].str.strip()
as said above. If you got nans it will still work:
paises = pd.DataFrame()
paises['name']= ["1" ," Argentina", "Brazil ", " Mexico "]
paises.loc[0,'name'] = np.nan
for i in paises.name:
if type(i) == type(paises.iloc[0,0]):
print("Wrong name")
elif type(i) == type(paises.iloc[1,0]):
print(f"Name: {i} ->> Number of characters {len(i)}")
paises['name'] = paises['name'].str.strip()
print("n---------------------------------n")
for i in paises.name:
if type(i) == type(paises.iloc[0,0]):
print("Wrong name")
elif type(i) == type(paises.iloc[1,0]):
print(f"Name: {i} ->> Number of characters {len(i)}")
Wrong name
Name: Argentina ->> Number of characters 10
Name: Brazil ->> Number of characters 7
Name: Mexico ->> Number of characters 8
Wrong name
Name: Argentina ->> Number of characters 9
Name: Brazil ->> Number of characters 6
Name: Mexico ->> Number of characters 6
Assuming an existing df with a column containing lists of countries…
x = df['Countries'][0]
x
[‘Australia’,
‘ Brazil’,
‘ Canada’,
‘ China’]
We see that other than the first country, each next country has a leading space that is preventing a string comparison further down my script. I am trying to use lstrip() to strip away the spaces for each list in the column.
y = []
for eachRow in df['Countries']:
for country in eachRow:
country = country.lstrip()
y.append(country)
y
TypeError: ‘float’ object is not iterable
I’m sure there’s a simpler way.
try using a lambda function with map and str.strip
# sample data
df = pd.DataFrame({'Countries': [['Australia', ' Brazil', ' Canada', ' China']]})
df['Countries'] = df['Countries'].apply(lambda x: list(map(str.strip, x)))
print(df['Countries'][0]) # -> ['Australia', 'Brazil', 'Canada', 'China']
The error you are getting indicates that there are NaN
values in your df['Countries']
column, which cannot be iterated over. You can modify your code to handle this by checking for NaN
values before iterating over the list of countries in each row. Here’s an example:
y = []
for eachRow in df['Countries']:
if isinstance(eachRow, list): # Check if row is a list
row_countries = []
for country in eachRow:
if isinstance(country, str): # Check if country is a string
row_countries.append(country.lstrip())
y.append(row_countries)
else:
y.append(eachRow) # Append non-list values as is
This code first checks if the value in df['Countries']
is a list, and if so, iterates over each country in the list and uses lstrip()
to remove the leading spaces. If the value is not a list, it simply appends the value as is.
Note that this code assumes no other types of values in the df['Countries']
column besides lists and NaN
values. If there are, you may need to change the code.
You should use pandas.str method. It works with nan. Just do:
df['columns']=df['column'].str.strip()
as said above. If you got nans it will still work:
paises = pd.DataFrame()
paises['name']= ["1" ," Argentina", "Brazil ", " Mexico "]
paises.loc[0,'name'] = np.nan
for i in paises.name:
if type(i) == type(paises.iloc[0,0]):
print("Wrong name")
elif type(i) == type(paises.iloc[1,0]):
print(f"Name: {i} ->> Number of characters {len(i)}")
paises['name'] = paises['name'].str.strip()
print("n---------------------------------n")
for i in paises.name:
if type(i) == type(paises.iloc[0,0]):
print("Wrong name")
elif type(i) == type(paises.iloc[1,0]):
print(f"Name: {i} ->> Number of characters {len(i)}")
Wrong name
Name: Argentina ->> Number of characters 10
Name: Brazil ->> Number of characters 7
Name: Mexico ->> Number of characters 8
Wrong name
Name: Argentina ->> Number of characters 9
Name: Brazil ->> Number of characters 6
Name: Mexico ->> Number of characters 6