Filter and move text in another column in substring
Question:
I have the following dataset:
df = pd.DataFrame([
{'Phone': 'Fax(925) 482-1195', 'Fax': None},
{'Phone': 'Fax(406) 226-0317', 'Fax': None},
{'Phone': 'Fax+1 650-383-6305', 'Fax': None},
{'Phone': 'Phone(334) 585-1171', 'Fax': 'Fax(334) 585-1182'},
{'Phone': None, 'Fax': None},
{'Phone': 'Phone(334) 585-1171', 'Fax': 'Fax(334) 585-1176'}]
)
Which should look like:
What I’m trying to do is: for every row that I see "Fax", I want to truncate it and transfer this record to the column "Fax".
At first, I was trying to query only the matching with this filtering:
df[df['Phone'].str.contains("Fax") == True, "Fax"] = df[df['Phone'].str.contains("Fax") == True]
But it does not works, with the error: "TypeError: unhashable type: ‘Series’".
Any ideas?
Answers:
You have a bunch of rows, that is, a list of dicts.
Simplest approach would be to massage each row
prior to adding it to the dataframe.
rows = [ ... ]
def get_contacts(rows):
for row in rows:
phone, fax = row['Phone'], row['Fax']
if 'Fax' in phone:
phone, fax = None, phone
yield phone, fax
df = pd.DataFrame(get_contacts(rows))
You can force str
instead of None
with a filter like this:
...
yield clean(phone), clean(fax)
...
def clean(s, default=''):
if s is None:
return default
return s
If you really prefer to stick to using Pandas,
you might want to
- identify a mask of rows where
df.Phone
contains ‘Fax’, then
- copy that subset into
df['Fax']
, then
- blank out selected
df['Phone']
entries.
You can verify / debug each step by itself — get (1) right
before moving on to attempt (2).
If you choose to go this route, please post your final solution.
Another option using .assign and np.where
import numpy as np
import pandas as pd
condition = df["Phone"].str.contains("fax", case=False)
df = df.assign(
Fax=np.where(condition, df["Phone"], df["Fax"]),
Phone=np.where(condition, "", df["Phone"])
).fillna("")
print(df)
Phone Fax
0 Fax(925) 482-1195
1 Fax(406) 226-0317
2 Fax+1 650-383-6305
3 Phone(334) 585-1171 Fax(334) 585-1182
4
5 Phone(334) 585-1171 Fax(334) 585-1176
I have the following dataset:
df = pd.DataFrame([
{'Phone': 'Fax(925) 482-1195', 'Fax': None},
{'Phone': 'Fax(406) 226-0317', 'Fax': None},
{'Phone': 'Fax+1 650-383-6305', 'Fax': None},
{'Phone': 'Phone(334) 585-1171', 'Fax': 'Fax(334) 585-1182'},
{'Phone': None, 'Fax': None},
{'Phone': 'Phone(334) 585-1171', 'Fax': 'Fax(334) 585-1176'}]
)
Which should look like:
What I’m trying to do is: for every row that I see "Fax", I want to truncate it and transfer this record to the column "Fax".
At first, I was trying to query only the matching with this filtering:
df[df['Phone'].str.contains("Fax") == True, "Fax"] = df[df['Phone'].str.contains("Fax") == True]
But it does not works, with the error: "TypeError: unhashable type: ‘Series’".
Any ideas?
You have a bunch of rows, that is, a list of dicts.
Simplest approach would be to massage each row
prior to adding it to the dataframe.
rows = [ ... ]
def get_contacts(rows):
for row in rows:
phone, fax = row['Phone'], row['Fax']
if 'Fax' in phone:
phone, fax = None, phone
yield phone, fax
df = pd.DataFrame(get_contacts(rows))
You can force str
instead of None
with a filter like this:
...
yield clean(phone), clean(fax)
...
def clean(s, default=''):
if s is None:
return default
return s
If you really prefer to stick to using Pandas,
you might want to
- identify a mask of rows where
df.Phone
contains ‘Fax’, then - copy that subset into
df['Fax']
, then - blank out selected
df['Phone']
entries.
You can verify / debug each step by itself — get (1) right
before moving on to attempt (2).
If you choose to go this route, please post your final solution.
Another option using .assign and np.where
import numpy as np
import pandas as pd
condition = df["Phone"].str.contains("fax", case=False)
df = df.assign(
Fax=np.where(condition, df["Phone"], df["Fax"]),
Phone=np.where(condition, "", df["Phone"])
).fillna("")
print(df)
Phone Fax
0 Fax(925) 482-1195
1 Fax(406) 226-0317
2 Fax+1 650-383-6305
3 Phone(334) 585-1171 Fax(334) 585-1182
4
5 Phone(334) 585-1171 Fax(334) 585-1176