Longest Repeating Subsequence: Edge Cases
Question:
Problem
While solving the Longest Repeating Subsequence problem using bottomup dynamic programming, I started running into an edge case whenever a letter was repeated an odd number of times.
The goal is to find the longest subsequence that occurs twice in the string using elements at different indices. The ranges can overlap, but the indices should be disjoint (i.e., str[1]
, str[4]
and str[2]
, str[6]
can be a solution, but not str[1]
, str[2]
and str[2]
, str[3]
.
Minimum Reproducible Example
s = 'AXXXA'
n = len(s)
dp = [['' for i in range(n + 1)] for j in range(n + 1)]
for i in range(1, n + 1):
for j in range(1, n + 1):
if (i != j and s[i  1] == s[j  1]):
dp[i][j] = dp[i  1][j  1] + s[i  1]
else:
dp[i][j] = max(dp[i  1][j], dp[i][j  1])
print(dp[n][n])
Question
Any pointers on how to avoid this?
With input s = ‘AXXXA’, the answer should be either A or X, but the final result returns XX, apparently pairing up the third X with both the first X and the second X.
False Start
I don’t want to add a check on a match (s[i  1] == s[j  1]
) to see if s[i  1] in dp[i  1][j  1]
because another input might be something like AAJDDAJJTATA
, which must add the A
twice.
Answers:

For retrieving the longest, it’s best to implement a new function, with the result of
dp
grid. 
Your algorithm is fine, you only need to increment your new
dp
by 1, whens[i  1] == s[j  1]
:
n = len(s)
dp = [[0 for _ in range(n + 1)] for _ in range(n + 1)]
for i in range(1, n + 1):
for j in range(1, n + 1):
if s[i  1] == s[j  1] and i != j:
dp[i][j] = 1 + dp[i  1][j  1]
else:
dp[i][j] = max(dp[i  1][j], dp[i][j  1])
return dp[1][1]
If you want to build the longest, you can use the dp grid and whenever the if s[i  1] == s[j  1] and i != j
satisfies, append the char to the longest:
def get_longest():
i, j = n, n
lrs = []
while i > 0 and j > 0:
if s[i  1] == s[j  1] and i != j:
lrs.append(s[i  1])
i = 1
j = 1
elif dp[i  1][j] > dp[i][j  1]:
i = 1
else:
j = 1
return ''.join(lrs[::1])
Code
def LRS(s):
def get_longest():
i, j = n, n
lrs = []
while i > 0 and j > 0:
if s[i  1] == s[j  1] and i != j:
lrs.append(s[i  1])
i = 1
j = 1
elif dp[i  1][j] > dp[i][j  1]:
i = 1
else:
j = 1
return ''.join(lrs[::1])
n = len(s)
dp = [[0 for _ in range(n + 1)] for _ in range(n + 1)]
for i in range(1, n + 1):
for j in range(1, n + 1):
if s[i  1] == s[j  1] and i != j:
dp[i][j] = 1 + dp[i  1][j  1]
else:
dp[i][j] = max(dp[i  1][j], dp[i][j  1])
print(get_longest())
return dp[1][1]
s = 'AXXXA'
print(LRS(s))
Prints
XX
2
You can keep track of the indices of the last added characters, and make sure that when two characters are the same, their indices have to be not only different to each other but also to that of the last added character:
s = 'AXXXA'
n = len(s)
dp = [[('', 0, 0) for i in range(n + 1)] for j in range(n + 1)]
for i in range(1, n + 1):
for j in range(1, n + 1):
last = dp[i  1][j  1]
if last[2] != i != j != last[1] and s[i  1] == s[j  1]:
dp[i][j] = last[0] + s[i  1], i, j
else:
dp[i][j] = max(dp[i  1][j], dp[i][j  1], key=lambda t: t[0])
print(dp[n][n][0]) # outputs X
Actually, your initial algorithm and its answer are correct (… but this is a good question because others might confuse what an LRS means).
Given your input (in
), the subsequences (s1
, s2
) are:
in: AXXXA
s1: XX
s2: XX
So XX
(length 2) is indeed the correct answer here.
X
would be the correct answer for the problem’s nonoverlapping version, where the ranges – not just individual indices – must be disjoint.