Remove palindromic rows based on two columns

Question:

I have the following data frame:

Data_Frame <- data.frame (
  A = c("a", "c", "b", "e", "g", "d", "f", "h"),
  B = c("b", "d", "a", "f", "h", "c", "e", "g"),
  value = c("0.3", "0.2", "0.1", "0.1", "0.5", "0.7", "0.8", "0.1")
  effect = c("123", "345", "123", "444", "123", "345", "444", "123")
)

I want to find rows where the value at columns A and B are palindromic and the value at effect is equal. For example, in the provided data frame, rows 1 and row 3 & and rows 2 and row 6 meet this condition. Then from each pair of palindromic rows, I want to retain the row with the lowest value in the "value" column.

The output column should look like this:

Data_Frame <- data.frame (
  A = c("c", "b", "e", "h"),
  B = c("d", "a", "f", "g"),
  value = c("0.2", "0.1", "0.1", "0.1)
  effect = c("345", "123", "444", "123")
)

The levels(Data_Frame$A) and levels(Data_Frame$B) are not equal and, as.character() does not solve my problem.

I appreciate any hints in R or python!

Asked By: RJF

||

Answers:

In Python/Pandas, you can do:

# Create a frozenset: [a, b] and [b, a] will have the same representation
AB = df[['A', 'B']].agg(frozenset, axis=1)
out = df.loc[df.groupby([AB, 'effect'])['value'].idxmin().values]
print(out.sort_index())

# Output
   A  B  value  effect
1  c  d    0.2     345
2  b  a    0.1     123
3  e  f    0.1     444
7  h  g    0.1     123

Reproducible example:

data = {'A': ['a', 'c', 'b', 'e', 'g', 'd', 'f', 'h'],       # str
        'B': ['b', 'd', 'a', 'f', 'h', 'c', 'e', 'g'],       # str
        'value': [0.3, 0.2, 0.1, 0.1, 0.5, 0.7, 0.8, 0.1],   # float
        'effect': [123, 345, 123, 444, 123, 345, 444, 123]}  # int
df = pd.DataFrame(data)
print(df)

# Output
   A  B value effect
0  a  b   0.3    123
1  c  d   0.2    345
2  b  a   0.1    123
3  e  f   0.1    444
4  g  h   0.5    123
5  d  c   0.7    345
6  f  e   0.8    444
7  h  g   0.1    123
Answered By: Corralien
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.