Remove palindromic rows based on two columns
Question:
I have the following data frame:
Data_Frame <- data.frame (
A = c("a", "c", "b", "e", "g", "d", "f", "h"),
B = c("b", "d", "a", "f", "h", "c", "e", "g"),
value = c("0.3", "0.2", "0.1", "0.1", "0.5", "0.7", "0.8", "0.1")
effect = c("123", "345", "123", "444", "123", "345", "444", "123")
)
I want to find rows where the value at columns A
and B
are palindromic and the value at effect
is equal. For example, in the provided data frame, rows 1 and row 3 & and rows 2 and row 6 meet this condition. Then from each pair of palindromic rows, I want to retain the row with the lowest value in the "value" column.
The output column should look like this:
Data_Frame <- data.frame (
A = c("c", "b", "e", "h"),
B = c("d", "a", "f", "g"),
value = c("0.2", "0.1", "0.1", "0.1)
effect = c("345", "123", "444", "123")
)
The levels(Data_Frame$A)
and levels(Data_Frame$B)
are not equal and, as.character()
does not solve my problem.
I appreciate any hints in R or python!
Answers:
In Python/Pandas, you can do:
# Create a frozenset: [a, b] and [b, a] will have the same representation
AB = df[['A', 'B']].agg(frozenset, axis=1)
out = df.loc[df.groupby([AB, 'effect'])['value'].idxmin().values]
print(out.sort_index())
# Output
A B value effect
1 c d 0.2 345
2 b a 0.1 123
3 e f 0.1 444
7 h g 0.1 123
Reproducible example:
data = {'A': ['a', 'c', 'b', 'e', 'g', 'd', 'f', 'h'], # str
'B': ['b', 'd', 'a', 'f', 'h', 'c', 'e', 'g'], # str
'value': [0.3, 0.2, 0.1, 0.1, 0.5, 0.7, 0.8, 0.1], # float
'effect': [123, 345, 123, 444, 123, 345, 444, 123]} # int
df = pd.DataFrame(data)
print(df)
# Output
A B value effect
0 a b 0.3 123
1 c d 0.2 345
2 b a 0.1 123
3 e f 0.1 444
4 g h 0.5 123
5 d c 0.7 345
6 f e 0.8 444
7 h g 0.1 123
I have the following data frame:
Data_Frame <- data.frame (
A = c("a", "c", "b", "e", "g", "d", "f", "h"),
B = c("b", "d", "a", "f", "h", "c", "e", "g"),
value = c("0.3", "0.2", "0.1", "0.1", "0.5", "0.7", "0.8", "0.1")
effect = c("123", "345", "123", "444", "123", "345", "444", "123")
)
I want to find rows where the value at columns A
and B
are palindromic and the value at effect
is equal. For example, in the provided data frame, rows 1 and row 3 & and rows 2 and row 6 meet this condition. Then from each pair of palindromic rows, I want to retain the row with the lowest value in the "value" column.
The output column should look like this:
Data_Frame <- data.frame (
A = c("c", "b", "e", "h"),
B = c("d", "a", "f", "g"),
value = c("0.2", "0.1", "0.1", "0.1)
effect = c("345", "123", "444", "123")
)
The levels(Data_Frame$A)
and levels(Data_Frame$B)
are not equal and, as.character()
does not solve my problem.
I appreciate any hints in R or python!
In Python/Pandas, you can do:
# Create a frozenset: [a, b] and [b, a] will have the same representation
AB = df[['A', 'B']].agg(frozenset, axis=1)
out = df.loc[df.groupby([AB, 'effect'])['value'].idxmin().values]
print(out.sort_index())
# Output
A B value effect
1 c d 0.2 345
2 b a 0.1 123
3 e f 0.1 444
7 h g 0.1 123
Reproducible example:
data = {'A': ['a', 'c', 'b', 'e', 'g', 'd', 'f', 'h'], # str
'B': ['b', 'd', 'a', 'f', 'h', 'c', 'e', 'g'], # str
'value': [0.3, 0.2, 0.1, 0.1, 0.5, 0.7, 0.8, 0.1], # float
'effect': [123, 345, 123, 444, 123, 345, 444, 123]} # int
df = pd.DataFrame(data)
print(df)
# Output
A B value effect
0 a b 0.3 123
1 c d 0.2 345
2 b a 0.1 123
3 e f 0.1 444
4 g h 0.5 123
5 d c 0.7 345
6 f e 0.8 444
7 h g 0.1 123