Remove leading spaces from strings in DataFrame column of lists

Question

Assuming an existing df with a column containing lists of countries…

x = df['Countries'][0]
x

[‘Australia’,
‘ Brazil’,
‘ Canada’,
‘ China’]

We see that other than the first country, each next country has a leading space that is preventing a string comparison further down my script. I am trying to use lstrip() to strip away the spaces for each list in the column.

y = []

for eachRow in df['Countries']:
     for country in eachRow:
          country = country.lstrip()
    y.append(country)
y

TypeError: ‘float’ object is not iterable

I’m sure there’s a simpler way.

Asked By: Michael Kessler

||

Source

Answer 1

try using a lambda function with map and str.strip

# sample data
df = pd.DataFrame({'Countries': [['Australia', ' Brazil', ' Canada', ' China']]})

df['Countries'] = df['Countries'].apply(lambda x: list(map(str.strip, x)))

print(df['Countries'][0]) # -> ['Australia', 'Brazil', 'Canada', 'China']

Answered By: It_is_Chris

Answer 2

The error you are getting indicates that there are NaN values in your df['Countries'] column, which cannot be iterated over. You can modify your code to handle this by checking for NaN values before iterating over the list of countries in each row. Here’s an example:

y = []

for eachRow in df['Countries']:
    if isinstance(eachRow, list):  # Check if row is a list
        row_countries = []
        for country in eachRow:
            if isinstance(country, str):  # Check if country is a string
                row_countries.append(country.lstrip())
        y.append(row_countries)
    else:
        y.append(eachRow)  # Append non-list values as is

This code first checks if the value in df['Countries'] is a list, and if so, iterates over each country in the list and uses lstrip() to remove the leading spaces. If the value is not a list, it simply appends the value as is.

Note that this code assumes no other types of values in the df['Countries'] column besides lists and NaN values. If there are, you may need to change the code.

Answered By: SimplyhumanRight

Answer 3

You should use pandas.str method. It works with nan. Just do:
df['columns']=df['column'].str.strip() as said above. If you got nans it will still work:

paises = pd.DataFrame()
paises['name']= ["1" ," Argentina", "Brazil ", " Mexico "]
paises.loc[0,'name'] = np.nan

for i in paises.name:
    if type(i) == type(paises.iloc[0,0]):
        print("Wrong name")
    elif type(i) == type(paises.iloc[1,0]):
        print(f"Name: {i} ->> Number of characters {len(i)}")

paises['name'] = paises['name'].str.strip()
print("n---------------------------------n")

for i in paises.name:
    if type(i) == type(paises.iloc[0,0]):
        print("Wrong name")
    elif type(i) == type(paises.iloc[1,0]):
        print(f"Name: {i} ->> Number of characters {len(i)}")

Wrong name

Name: Argentina ->> Number of characters 10

Name: Brazil ->> Number of characters 7

Name: Mexico ->> Number of characters 8

Wrong name

Name: Argentina ->> Number of characters 9

Name: Brazil ->> Number of characters 6

Name: Mexico ->> Number of characters 6

Answered By: Marcelo

Remove leading spaces from strings in DataFrame column of lists

Question:

Answers: