Replace all occurrences of a string in a pandas dataframe (Python)
Question:
I have a pandas dataframe with about 20 columns.
It is possible to replace all occurrences of a string (here a newline) by manually writing all column names:
df['columnname1'] = df['columnname1'].str.replace("n","<br>")
df['columnname2'] = df['columnname2'].str.replace("n","<br>")
df['columnname3'] = df['columnname3'].str.replace("n","<br>")
...
df['columnname20'] = df['columnname20'].str.replace("n","<br>")
This unfortunately does not work:
df = df.replace("n","<br>")
Is there any other, more elegant solution?
Answers:
You can use replace
and pass the strings to find/replace as dictionary keys/items:
df.replace({'n': '<br>'}, regex=True)
For example:
>>> df = pd.DataFrame({'a': ['1n', '2n', '3'], 'b': ['4n', '5', '6n']})
>>> df
a b
0 1n 4n
1 2n 5
2 3 6n
>>> df.replace({'n': '<br>'}, regex=True)
a b
0 1<br> 4<br>
1 2<br> 5
2 3 6<br>
Note that this method returns a new DataFrame instance by default (it does not modify the original), so you’ll need to either reassign the output:
df = df.replace({'n': '<br>'}, regex=True)
or specify inplace=True
:
df.replace({'n': '<br>'}, regex=True, inplace=True)
It seems Pandas has change its API to avoid ambiguity when handling regex. Now you should use:
df.replace({'n': '<br>'}, regex=True)
For example:
>>> df = pd.DataFrame({'a': ['1n', '2n', '3'], 'b': ['4n', '5', '6n']})
>>> df
a b
0 1n 4n
1 2n 5
2 3 6n
>>> df.replace({'n': '<br>'}, regex=True)
a b
0 1<br> 4<br>
1 2<br> 5
2 3 6<br>
This will remove all newlines and unecessary spaces. You can edit the ‘ ‘.join to specify a replacement character
df['columnname'] = [''.join(c.split()) for c in df['columnname'].astype(str)]
You can iterate over all columns and use the method str.replace
:
for col in df.columns:
df[col] = df[col].str.replace('n', '<br>')
This method uses regex by default.
I have a pandas dataframe with about 20 columns.
It is possible to replace all occurrences of a string (here a newline) by manually writing all column names:
df['columnname1'] = df['columnname1'].str.replace("n","<br>")
df['columnname2'] = df['columnname2'].str.replace("n","<br>")
df['columnname3'] = df['columnname3'].str.replace("n","<br>")
...
df['columnname20'] = df['columnname20'].str.replace("n","<br>")
This unfortunately does not work:
df = df.replace("n","<br>")
Is there any other, more elegant solution?
You can use replace
and pass the strings to find/replace as dictionary keys/items:
df.replace({'n': '<br>'}, regex=True)
For example:
>>> df = pd.DataFrame({'a': ['1n', '2n', '3'], 'b': ['4n', '5', '6n']})
>>> df
a b
0 1n 4n
1 2n 5
2 3 6n
>>> df.replace({'n': '<br>'}, regex=True)
a b
0 1<br> 4<br>
1 2<br> 5
2 3 6<br>
Note that this method returns a new DataFrame instance by default (it does not modify the original), so you’ll need to either reassign the output:
df = df.replace({'n': '<br>'}, regex=True)
or specify inplace=True
:
df.replace({'n': '<br>'}, regex=True, inplace=True)
It seems Pandas has change its API to avoid ambiguity when handling regex. Now you should use:
df.replace({'n': '<br>'}, regex=True)
For example:
>>> df = pd.DataFrame({'a': ['1n', '2n', '3'], 'b': ['4n', '5', '6n']})
>>> df
a b
0 1n 4n
1 2n 5
2 3 6n
>>> df.replace({'n': '<br>'}, regex=True)
a b
0 1<br> 4<br>
1 2<br> 5
2 3 6<br>
This will remove all newlines and unecessary spaces. You can edit the ‘ ‘.join to specify a replacement character
df['columnname'] = [''.join(c.split()) for c in df['columnname'].astype(str)]
You can iterate over all columns and use the method str.replace
:
for col in df.columns:
df[col] = df[col].str.replace('n', '<br>')
This method uses regex by default.