Remove duplicate numbers separated by a symbol in a string using Hive's REGEXP_REPLACE

Question

I have a spark dataframe with a string column that includes numbers separated by ;, for example: 862;1595;17;862;49;862;19;100;17;49, I would like to remove the duplicated numbers, leaving the following: 862;1595;17;49;19;100

As far as patterns go I have tried

"\b(\d+(?:\.\d+)?) ([^;]+); (?=.*\b\1 \2\b)
(?<=b1:.*)b(w+):?
\b(+)\b(?=.*?\b1\b)
(b[^,]+)(?=.*, *1(?:,|$)), *

But nothing has yielded what I need thus far.

Asked By: Cyrus Mohammadian

||

Source

Answer 1

Try the following query (to replace duplicate numbers in a string column):

SELECT  regexp_replace
        (
            your_column,
            '(?<=^|;)(?<num>.*?);(?=.*(?<=;)\k<num>(?=;|$))',
            ''
        )

FROM table;

Answered By: RomanPerekhrest

Remove duplicate numbers separated by a symbol in a string using Hive's REGEXP_REPLACE

Question:

Answers: