Python add column with new value when 2 conditions match

Question:

I am trying to add a new column to my data with a FIPS code in it (5 digit number). Basically when County from maindata.csv matches County from fipsdata.tsv, I want the FIPS code (fipsCountyFIPS) to land in a new column i.e. data[fips] (so if County in maindata matches County in fipsdata THEN write the corresponding fips code to a new column in dataframe).

data = pd.read_csv ("maindata.csv")
fips = pd.read_csv ("fips2county.tsv",sep='t')

data[fips] = np.where(data.County == fips.CountyName, fipsCountyFIPS)

I also experimented with the following which sounds like it should be easier in theory, though I couldn’t work it out 🙁 https://github.com/fitnr/addfips — I would prefer to just do it with the above if possible.

If anyone could share how to do this that would be amazing!

Update:

enter image description here

Asked By: afroduck

||

Answers:

Is this what you’re looking for?

# 1. Check if all the counties in the data are in the FIPS data.
#    If so, add a new column to `data` with each county corresponding FIPS code.
if data.County.isin(fips.CountyName.unique()).all():
    data = (
        # 2. Merge the data with the FIPS data
        data.merge(
            fips[["FIPS", "CountyName"]],
            left_on="County",
            right_on="CountyName",
        )
        # 3. Rename the FIPS column
        #    This step is only needed if the column name you want to give
        #    to the FIPS codes is different from the original column name
        #    from `fips` dataframe.
        .rename(columns={"FIPS": "fips"})
        # 4. Drop the `"CountyName"` column from merged dataframe.
        .drop(columns="CountyName", errors="ignore")
    )

Full Example Code

Here’s an example of the above code in action:

Note: fips data downloaded from https://public.opendatasoft.com/explore/dataset/georef-united-states-of-america-county


import pandas as pd
import numpy as np


# == Data to run the example ===================================================
# 1. Read in the FIPS data
fips = pd.read_csv(
    "https://public.opendatasoft.com/explore/dataset/georef-united-states-of-america-county/download/?format=csv&timezone=America/Argentina/Buenos_Aires&lang=en&use_labels_for_header=true&csv_separator=%3B",
    sep=";",
)

# 2. Rename the columns to match the data
fips = fips.rename(
    columns={'Official Name County': 'CountyName', 'County FIPS Code': 'FIPS'}
)

# 3. Make sure the FIPS column is a string and has 5 digits
fips['FIPS'] = fips['FIPS'].astype(str).str.zfill(5)

# 4. Create a list of counties to sample from
counties = [
    "DeKalb", "Johnson", "Linn", "Macon", "Chase", "Hall", "Hitchcock",
    "Pierce", "Rock", "Wheeler", "St. Lawrence", "Wayne", "Buncombe",
    "Martin", "Perquimans", "Scotland", "Vance", "Fairfield", "Lake"
]

# 5. Create a dataframe with a random sample of counties
data = pd.DataFrame(
    {"County": np.random.choice(counties, size=20, replace=True)}
)

# == Actual Solution ===========================================================
# 6. Check if all the counties in the data are in the FIPS data.
#    If so, add a new column to `data` with each county corresponding FIPS code.
if data.County.isin(fips.CountyName.unique()).all():
    data = (
        # 7. Merge the data with the FIPS data
        data.merge(
            fips[["FIPS", "CountyName"]],
            left_on="County",
            right_on="CountyName",
        )
        # 8. Rename the FIPS column
        #    This step is only needed if the column name you want to give
        #    to the FIPS codes is different from the original column name
        #    from `fips` dataframe.
        .rename(columns={"FIPS": "fips"})
        # 9. Drop the CountyName column
        .drop(columns="CountyName", errors="ignore")
    )
print(data)
# Prints:
#
#           County   fips
# 0         Pierce  00139
# 1         Pierce  00069
# 2         Pierce  00229
# 3         Pierce  00053
# 4         Pierce  00093
# ..           ...    ...
# 93        DeKalb  00049
# 94        DeKalb  00089
# 95        DeKalb  00041
# 96        DeKalb  00037
# 97  St. Lawrence  00089

Output screenshot:

enter image description here

If one or more "Counties" not found inside fips:

enter image description here

Answered By: Ingwersen_erik
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.