Remove duplicate numbers separated by a symbol in a string using Hive's REGEXP_REPLACE
Question:
I have a spark dataframe with a string column that includes numbers separated by ;
, for example: 862;1595;17;862;49;862;19;100;17;49
, I would like to remove the duplicated numbers, leaving the following: 862;1595;17;49;19;100
As far as patterns go I have tried
"\b(\d+(?:\.\d+)?) ([^;]+); (?=.*\b\1 \2\b)
(?<=b1:.*)b(w+):?
\b(+)\b(?=.*?\b1\b)
(b[^,]+)(?=.*, *1(?:,|$)), *
But nothing has yielded what I need thus far.
Answers:
Try the following query (to replace duplicate numbers in a string column):
SELECT regexp_replace
(
your_column,
'(?<=^|;)(?<num>.*?);(?=.*(?<=;)\k<num>(?=;|$))',
''
)
FROM table;
I have a spark dataframe with a string column that includes numbers separated by ;
, for example: 862;1595;17;862;49;862;19;100;17;49
, I would like to remove the duplicated numbers, leaving the following: 862;1595;17;49;19;100
As far as patterns go I have tried
"\b(\d+(?:\.\d+)?) ([^;]+); (?=.*\b\1 \2\b)
(?<=b1:.*)b(w+):?
\b(+)\b(?=.*?\b1\b)
(b[^,]+)(?=.*, *1(?:,|$)), *
But nothing has yielded what I need thus far.
Try the following query (to replace duplicate numbers in a string column):
SELECT regexp_replace
(
your_column,
'(?<=^|;)(?<num>.*?);(?=.*(?<=;)\k<num>(?=;|$))',
''
)
FROM table;