Pandas to fill empty cells in column according to another column

Question:

A dataframe looks like this, and I want to fill the empty cells in the ‘Date’ column (when the "Area" is West or North), with content in "Year" column plus "0601".

enter image description here

Wanted result is as follows:

enter image description here

What I have tried:

from io import StringIO
import pandas as pd


csvfile = StringIO(
"""
Name    Area    Date    Year
David   West        2014
Mike    North   20220919    2022
Kate    West        2017
Lilly   East    20221226    2022
Peter   North   20221226    2022
Cara    Middle      2016

""")

df = pd.read_csv(csvfile, sep = 't', engine='python')


L1 = ['West','North']
m1 = df['Date'].isnull()
m2 = df['Area'].isin(L1)

df['Date'] = df['Date'].mask(m1 & m2, df['Year'] + '0601')      # Try_1

df['Date'] = np.where(np.where(m1 & m2, df['Year'] + '0601'))   # Try_2

Both Try_1 and Try_2 pop the same error.

What’s the right way to write the lines?

Traceback (most recent call last):
  File "C:Python38libsite-packagespandascoreopsarray_ops.py", line 142, in _na_arithmetic_op
    result = expressions.evaluate(op, left, right)
  File "C:Python38libsite-packagespandascorecomputationexpressions.py", line 235, in evaluate
    return _evaluate(op, op_str, a, b)  # type: ignore[misc]
  File "C:Python38libsite-packagespandascorecomputationexpressions.py", line 69, in _evaluate_standard
    return op(a, b)
numpy.core._exceptions.UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U21'), dtype('<U21')) -> dtype('<U21')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:My DocumentsScripts(Desktop) WSS 20200323GG.py", line 336, in <module>
    df['Date'] = np.where(np.where(m1 & m2, df['Year'] + '0601'))                   # try 2
  File "C:Python38libsite-packagespandascoreopscommon.py", line 65, in new_method
    return method(self, other)
  File "C:Python38libsite-packagespandascorearraylike.py", line 89, in __add__
    return self._arith_method(other, operator.add)
  File "C:Python38libsite-packagespandascoreseries.py", line 4998, in _arith_method
    result = ops.arithmetic_op(lvalues, rvalues, op)
  File "C:Python38libsite-packagespandascoreopsarray_ops.py", line 189, in arithmetic_op
    res_values = _na_arithmetic_op(lvalues, rvalues, op)
  File "C:Python38libsite-packagespandascoreopsarray_ops.py", line 149, in _na_arithmetic_op
    result = _masked_arith_op(left, right, op)
  File "C:Python38libsite-packagespandascoreopsarray_ops.py", line 111, in _masked_arith_op
    result[mask] = op(xrav[mask], y)
numpy.core._exceptions.UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U21'), dtype('<U21')) -> dtype('<U21')
Asked By: Mark K

||

Answers:

You example works find, provided you have strings:

csvfile = StringIO("""
Name    Area        Date    Year
David   West         NaN    2014
Mike    North   20220919    2022
Kate    West         NaN    2017
Lilly   East    20221226    2022
Peter   North   20221226    2022
Cara    Middle       NaN    2016
""")

df = pd.read_csv(csvfile, sep = 's+', engine='python', dtype='str')


L1 = ['West','North']
m1 = df['Date'].isnull()
m2 = df['Area'].isin(L1)

df['Date'] = df['Date'].mask(m1 & m2, df['Year'] + '0601')

print(df)

If year is not a string:

df['Date'] = df['Date'].mask(m1 & m2, df['Year'].astype(str) + '0601')

Output:

    Name    Area      Date  Year
0  David    West  20140601  2014
1   Mike   North  20220919  2022
2   Kate    West  20170601  2017
3  Lilly    East  20221226  2022
4  Peter   North  20221226  2022
5   Cara  Middle       NaN  2016

If you have numeric data:

df['Date'] = df['Date'].mask(m1 & m2, df['Year'].mul(10000) + 601)

Output:

    Name    Area        Date  Year
0  David    West  20140601.0  2014
1   Mike   North  20220919.0  2022
2   Kate    West  20170601.0  2017
3  Lilly    East  20221226.0  2022
4  Peter   North  20221226.0  2022
5   Cara  Middle         NaN  2016
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.