How can I add a new column to a dataframe in Python based on whether a set of conditions are met in another column?

Question:

I would like to add a new column, "Type," to an existing dataframe, df:

  Circuit        Size
0    6026       Large
1    5011  Very Small
2      50       Small
3    9023  Very Small
4    85GA  Very Small
5     90A       Large

The circuit and size datatypes are both objects.

If the row’s "Circuit" value is a 4 digit integer (no letters), I would like the new column to read: "1".
If the row’s "Circuit" value contains any letters of the alphabet, I would like the new column to read: "2".
If the row’s "Circuit" value is an integer (no letters), but more or less than 4 digits, I would like the new column to read "3".

So the result would be:

     Circuit     Size    Type
0    6026       Large    1
1    5011  Very Small    1 
2      50       Small    3 
3    9023  Very Small    1 
4    85GA  Very Small    2 
5    90CO       Large    2

I tried the following, but it’s not working.

condition_1 = (df5["Circuit"].isdigit()) & (df5["Circuit"] >= 1000) & (df5["Circuit"] <= 9999)
condition_2 = df5["Circuit"].str.contains('[a-zA-Z]').any()
condition_3 = (df5["Circuit"].isdigit()) & (df5["Circuit"] <= 9999)

conditions = [condition_1, condition_2, condition_3]
choices = [1,2,3]
df["Type"] = np.select(conditions, choices, default="")

How should I go about this? Thanks for your help!

Asked By: vcodingadventures

||

Answers:

Since there is a mix of text and numbers, you might be better off with applying a function per element, because normal comparison operations won’t work. (E.g.: You can’t do "85GA" <= 9999)

def calc_type(x):
    if x.isdigit():
        if 1000 <= int(x) <= 9999:
            return 1
        return 3
    return 2
    

df['Type'] = df['Circuit'].apply(calc_type)
Answered By: JBernardo

For processing, the only issue here is that some values in df["Circuit"] are integers and some are strings.

Once changed to strings, you can solve this with a simple one-liner:

df["Circuit"] = [str(x) for x in df["Circuit"]]
df["Type"] = [2 if any(char.isalpha() for char in circuit) else (1 if len(circuit) == 4 else 3) for circuit in df["Circuit"]]
Answered By: Jakub

Not sure about the conditions but how about using np.where and combine the results finally:

a1 = np.where(((pd.to_numeric(df["Circuit"], errors="coerce")>=1000) & (pd.to_numeric(df["Circuit"], errors="coerce")<=9999)),1,0)
a2 = np.where(df["Circuit"].str.contains('[a-zA-Z]'), 2, 0)
a3 = np.where((pd.to_numeric(df["Circuit"], errors="coerce")<=1000), 3, 0)

df["Type"] = a1 + a2 + a3
Answered By: Andre S.

Here is a way by using isupper() to check for letters.

l = df['Circuit'].str.len()
s = df['Circuit'].str.upper().str.isupper()

df.loc[s,'Type'] = 2
df.loc[(l.eq(4)) & (~s),'Type'] = 1
df.loc[(~l.eq(4)) & (~s),'Type'] = 3

or np.select()

l = [df['Circuit'].str.upper().str.isupper(),df['Circuit'].str.len().eq(4)]

np.select(l,[2,1],default=3)
Answered By: rhug123