Remove leading spaces from strings in DataFrame column of lists

Question:

Assuming an existing df with a column containing lists of countries…

x = df['Countries'][0]
x

[‘Australia’,
‘ Brazil’,
‘ Canada’,
‘ China’]

We see that other than the first country, each next country has a leading space that is preventing a string comparison further down my script. I am trying to use lstrip() to strip away the spaces for each list in the column.

y = []

for eachRow in df['Countries']:
     for country in eachRow:
          country = country.lstrip()
    y.append(country)
y

TypeError: ‘float’ object is not iterable

I’m sure there’s a simpler way.

Asked By: Michael Kessler

||

Answers:

try using a lambda function with map and str.strip

# sample data
df = pd.DataFrame({'Countries': [['Australia', ' Brazil', ' Canada', ' China']]})

df['Countries'] = df['Countries'].apply(lambda x: list(map(str.strip, x)))

print(df['Countries'][0]) # -> ['Australia', 'Brazil', 'Canada', 'China']
Answered By: It_is_Chris

The error you are getting indicates that there are NaN values in your df['Countries'] column, which cannot be iterated over. You can modify your code to handle this by checking for NaN values before iterating over the list of countries in each row. Here’s an example:

y = []

for eachRow in df['Countries']:
    if isinstance(eachRow, list):  # Check if row is a list
        row_countries = []
        for country in eachRow:
            if isinstance(country, str):  # Check if country is a string
                row_countries.append(country.lstrip())
        y.append(row_countries)
    else:
        y.append(eachRow)  # Append non-list values as is


This code first checks if the value in df['Countries'] is a list, and if so, iterates over each country in the list and uses lstrip() to remove the leading spaces. If the value is not a list, it simply appends the value as is.

Note that this code assumes no other types of values in the df['Countries'] column besides lists and NaN values. If there are, you may need to change the code.

Answered By: SimplyhumanRight

You should use pandas.str method. It works with nan. Just do:
df['columns']=df['column'].str.strip() as said above. If you got nans it will still work:

paises = pd.DataFrame()
paises['name']= ["1" ," Argentina", "Brazil ", " Mexico "]
paises.loc[0,'name'] = np.nan

for i in paises.name:
    if type(i) == type(paises.iloc[0,0]):
        print("Wrong name")
    elif type(i) == type(paises.iloc[1,0]):
        print(f"Name: {i} ->> Number of characters {len(i)}")

paises['name'] = paises['name'].str.strip()
print("n---------------------------------n")

for i in paises.name:
    if type(i) == type(paises.iloc[0,0]):
        print("Wrong name")
    elif type(i) == type(paises.iloc[1,0]):
        print(f"Name: {i} ->> Number of characters {len(i)}")

Wrong name

Name: Argentina ->> Number of characters 10

Name: Brazil ->> Number of characters 7

Name: Mexico ->> Number of characters 8


Wrong name

Name: Argentina ->> Number of characters 9

Name: Brazil ->> Number of characters 6

Name: Mexico ->> Number of characters 6

Answered By: Marcelo
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.