Filter and move text in another column in substring

Question:

I have the following dataset:

df = pd.DataFrame([
{'Phone': 'Fax(925) 482-1195', 'Fax': None},
{'Phone': 'Fax(406) 226-0317', 'Fax': None},
{'Phone': 'Fax+1 650-383-6305', 'Fax': None},
{'Phone': 'Phone(334) 585-1171', 'Fax': 'Fax(334) 585-1182'},
{'Phone': None, 'Fax': None},
{'Phone': 'Phone(334) 585-1171', 'Fax': 'Fax(334) 585-1176'}]
)

Which should look like:

enter image description here

What I’m trying to do is: for every row that I see "Fax", I want to truncate it and transfer this record to the column "Fax".

At first, I was trying to query only the matching with this filtering:

df[df['Phone'].str.contains("Fax") == True, "Fax"] = df[df['Phone'].str.contains("Fax") == True]

But it does not works, with the error: "TypeError: unhashable type: ‘Series’".

Any ideas?

Asked By: bellotto

||

Answers:

You have a bunch of rows, that is, a list of dicts.
Simplest approach would be to massage each row
prior to adding it to the dataframe.

rows = [ ... ]

def get_contacts(rows):
    for row in rows:
        phone, fax = row['Phone'], row['Fax']
        if 'Fax' in phone:
            phone, fax = None, phone
        yield phone, fax

df = pd.DataFrame(get_contacts(rows))

You can force str instead of None with a filter like this:

        ...
        yield clean(phone), clean(fax)
        ...

def clean(s, default=''):
    if s is None:
        return default
    return s

If you really prefer to stick to using Pandas,
you might want to

  1. identify a mask of rows where df.Phone contains ‘Fax’, then
  2. copy that subset into df['Fax'], then
  3. blank out selected df['Phone'] entries.

You can verify / debug each step by itself — get (1) right
before moving on to attempt (2).

If you choose to go this route, please post your final solution.

Answered By: J_H

Another option using .assign and np.where

import numpy as np
import pandas as pd



condition = df["Phone"].str.contains("fax", case=False)
df = df.assign(
    Fax=np.where(condition, df["Phone"], df["Fax"]),
    Phone=np.where(condition, "", df["Phone"])
).fillna("")

print(df)

                 Phone                 Fax
0                        Fax(925) 482-1195
1                        Fax(406) 226-0317
2                       Fax+1 650-383-6305
3  Phone(334) 585-1171   Fax(334) 585-1182
4                                         
5  Phone(334) 585-1171   Fax(334) 585-1176
Answered By: snake_charmer_775
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.