How can I add a new column to a dataframe in Python based on whether a set of conditions are met in another column?
Question:
I would like to add a new column, "Type," to an existing dataframe, df:
Circuit Size
0 6026 Large
1 5011 Very Small
2 50 Small
3 9023 Very Small
4 85GA Very Small
5 90A Large
The circuit and size datatypes are both objects.
If the row’s "Circuit" value is a 4 digit integer (no letters), I would like the new column to read: "1".
If the row’s "Circuit" value contains any letters of the alphabet, I would like the new column to read: "2".
If the row’s "Circuit" value is an integer (no letters), but more or less than 4 digits, I would like the new column to read "3".
So the result would be:
Circuit Size Type
0 6026 Large 1
1 5011 Very Small 1
2 50 Small 3
3 9023 Very Small 1
4 85GA Very Small 2
5 90CO Large 2
I tried the following, but it’s not working.
condition_1 = (df5["Circuit"].isdigit()) & (df5["Circuit"] >= 1000) & (df5["Circuit"] <= 9999)
condition_2 = df5["Circuit"].str.contains('[a-zA-Z]').any()
condition_3 = (df5["Circuit"].isdigit()) & (df5["Circuit"] <= 9999)
conditions = [condition_1, condition_2, condition_3]
choices = [1,2,3]
df["Type"] = np.select(conditions, choices, default="")
How should I go about this? Thanks for your help!
Answers:
Since there is a mix of text and numbers, you might be better off with applying a function per element, because normal comparison operations won’t work. (E.g.: You can’t do "85GA" <= 9999
)
def calc_type(x):
if x.isdigit():
if 1000 <= int(x) <= 9999:
return 1
return 3
return 2
df['Type'] = df['Circuit'].apply(calc_type)
For processing, the only issue here is that some values in df["Circuit"] are integers and some are strings.
Once changed to strings, you can solve this with a simple one-liner:
df["Circuit"] = [str(x) for x in df["Circuit"]]
df["Type"] = [2 if any(char.isalpha() for char in circuit) else (1 if len(circuit) == 4 else 3) for circuit in df["Circuit"]]
Not sure about the conditions but how about using np.where
and combine the results finally:
a1 = np.where(((pd.to_numeric(df["Circuit"], errors="coerce")>=1000) & (pd.to_numeric(df["Circuit"], errors="coerce")<=9999)),1,0)
a2 = np.where(df["Circuit"].str.contains('[a-zA-Z]'), 2, 0)
a3 = np.where((pd.to_numeric(df["Circuit"], errors="coerce")<=1000), 3, 0)
df["Type"] = a1 + a2 + a3
Here is a way by using isupper()
to check for letters.
l = df['Circuit'].str.len()
s = df['Circuit'].str.upper().str.isupper()
df.loc[s,'Type'] = 2
df.loc[(l.eq(4)) & (~s),'Type'] = 1
df.loc[(~l.eq(4)) & (~s),'Type'] = 3
or np.select()
l = [df['Circuit'].str.upper().str.isupper(),df['Circuit'].str.len().eq(4)]
np.select(l,[2,1],default=3)
I would like to add a new column, "Type," to an existing dataframe, df:
Circuit Size
0 6026 Large
1 5011 Very Small
2 50 Small
3 9023 Very Small
4 85GA Very Small
5 90A Large
The circuit and size datatypes are both objects.
If the row’s "Circuit" value is a 4 digit integer (no letters), I would like the new column to read: "1".
If the row’s "Circuit" value contains any letters of the alphabet, I would like the new column to read: "2".
If the row’s "Circuit" value is an integer (no letters), but more or less than 4 digits, I would like the new column to read "3".
So the result would be:
Circuit Size Type
0 6026 Large 1
1 5011 Very Small 1
2 50 Small 3
3 9023 Very Small 1
4 85GA Very Small 2
5 90CO Large 2
I tried the following, but it’s not working.
condition_1 = (df5["Circuit"].isdigit()) & (df5["Circuit"] >= 1000) & (df5["Circuit"] <= 9999)
condition_2 = df5["Circuit"].str.contains('[a-zA-Z]').any()
condition_3 = (df5["Circuit"].isdigit()) & (df5["Circuit"] <= 9999)
conditions = [condition_1, condition_2, condition_3]
choices = [1,2,3]
df["Type"] = np.select(conditions, choices, default="")
How should I go about this? Thanks for your help!
Since there is a mix of text and numbers, you might be better off with applying a function per element, because normal comparison operations won’t work. (E.g.: You can’t do "85GA" <= 9999
)
def calc_type(x):
if x.isdigit():
if 1000 <= int(x) <= 9999:
return 1
return 3
return 2
df['Type'] = df['Circuit'].apply(calc_type)
For processing, the only issue here is that some values in df["Circuit"] are integers and some are strings.
Once changed to strings, you can solve this with a simple one-liner:
df["Circuit"] = [str(x) for x in df["Circuit"]]
df["Type"] = [2 if any(char.isalpha() for char in circuit) else (1 if len(circuit) == 4 else 3) for circuit in df["Circuit"]]
Not sure about the conditions but how about using np.where
and combine the results finally:
a1 = np.where(((pd.to_numeric(df["Circuit"], errors="coerce")>=1000) & (pd.to_numeric(df["Circuit"], errors="coerce")<=9999)),1,0)
a2 = np.where(df["Circuit"].str.contains('[a-zA-Z]'), 2, 0)
a3 = np.where((pd.to_numeric(df["Circuit"], errors="coerce")<=1000), 3, 0)
df["Type"] = a1 + a2 + a3
Here is a way by using isupper()
to check for letters.
l = df['Circuit'].str.len()
s = df['Circuit'].str.upper().str.isupper()
df.loc[s,'Type'] = 2
df.loc[(l.eq(4)) & (~s),'Type'] = 1
df.loc[(~l.eq(4)) & (~s),'Type'] = 3
or np.select()
l = [df['Circuit'].str.upper().str.isupper(),df['Circuit'].str.len().eq(4)]
np.select(l,[2,1],default=3)