Pandas check if two columns can be considered the composite key of the dataframe
Question:
A sample dataframe:
data = {
"col_A": ["a","a","b","c"],
"col_B": [1, 2, 2, 3],
"col_C": ["demo", "demo", "demo", "demo"]
}
df = pd.DataFrame(data)
Dataframe
col_A col_B col_C
a 1 demo
a 2 demo
b 2 demo
c 3 demo
I can easily check if all values in col_A
are unique or not by df['col_A'].is_unique
.
Is there any way to check for two columns i.e. something like df['col_A', 'col_B'].is_unique
If col_A
and col_B
are the composite key of the data frame or not?
Answers:
You can set all columns that should be included in the composite key as index and then check for is_unique
on the index.
df.set_index(['col_A', 'col_B']).index.is_unique
#True
Use DataFrame.duplicated
with Series.any()
not df[['col_A', 'col_B']].duplicated().any()
A sample dataframe:
data = {
"col_A": ["a","a","b","c"],
"col_B": [1, 2, 2, 3],
"col_C": ["demo", "demo", "demo", "demo"]
}
df = pd.DataFrame(data)
Dataframe
col_A col_B col_C
a 1 demo
a 2 demo
b 2 demo
c 3 demo
I can easily check if all values in col_A
are unique or not by df['col_A'].is_unique
.
Is there any way to check for two columns i.e. something like df['col_A', 'col_B'].is_unique
If col_A
and col_B
are the composite key of the data frame or not?
You can set all columns that should be included in the composite key as index and then check for is_unique
on the index.
df.set_index(['col_A', 'col_B']).index.is_unique
#True
Use DataFrame.duplicated
with Series.any()
not df[['col_A', 'col_B']].duplicated().any()