How to get the consecutive items from string
Question:
I need to get the substring which is continuous more than one char
This is my code:
l = []
p = 'abbdccc'
for i in range(len(p)-1):
m = ''
if p[i] == p[i+1]:
m +=p[i]
l.append(m)
print(l)
- My string is ‘abbdccc’
b
and c
are repeated more than 1 times
- expected out is
['bb', 'ccc']
if My string is '34456788'
then my out is ['44', '88']
Answers:
If using regex is possible or of interest to you, re.findall
offers a very straightforward way to do this:
inp = "abbdccc"
matches = [x[0] for x in re.findall(r'((.)2+)', inp)]
print(matches) # ['bb', 'ccc']
If you wanted to avoid regex (not sure why as Tim’s solution is really elegant):
main_str = 'abbdccc'
patterns = []
current_pattern = None
for i, char in enumerate(main_str):
# First iteration
if i == 0:
continue
if char == main_str[i-1]:
current_pattern = current_pattern + char if current_pattern else char*2
elif current_pattern:
patterns.append(current_pattern)
current_pattern = None
# Last iteration
if current_pattern and i+1 == len(main_str):
patterns.append(current_pattern)
print(patterns)
Explanation of how it works:
First iteration:
The first iteration is skipped as there is no previous character to compare to.
Following iterations:
If the char is equal to previous char:
and if a pattern already exists then just add the char to the current pattern.
But if a current pattern does not exist then do char*2. This works because previous char and char are equal so we are simplifying previous char + char to be char*2.
If the char is not equal to previous char,
then add the current pattern to the previous patterns and clear the current pattern.
Last iteration:
This needs to be there, otherwise the current pattern won’t get added onto patterns. This is because we only add current_pattern to patterns if the char is different from the previous char but there is no char after the last char to compare to the last char.
Solution with groupby
from itertools import groupby
[v for _, g in groupby(s) if (v := ''.join(g)) and len(v) > 1]
Sample run for input string s
:
# input: 'abbdccc'
# output: ['bb', 'ccc']
# input: '34456788'
# output: ['44', '88']
If you’re interested in some more itertools – there is one: more_itertools, which includes more powerful functions.
Here is one that you could try, and put in here as for future reference:
from more_itertools import run_length
s = 'abbdccc'
print(list(run_length.encode(s)))
# [('a', 1), ('b', 2), ('d', 1), ('c', 3)] # return tuples of ('char', count)
groups = [''.join(x[0]* x[1]) for x in run_length.encode(s) if x[1] > 1]
print(groups)
# ['bb', 'ccc']
# let's see if you're only interested in len(s) >= 3, easy:
groups = [''.join(x[0]* x[1]) for x in run_length.encode(s) if x[1] >= 3]
# ['ccc']
I need to get the substring which is continuous more than one char
This is my code:
l = []
p = 'abbdccc'
for i in range(len(p)-1):
m = ''
if p[i] == p[i+1]:
m +=p[i]
l.append(m)
print(l)
- My string is ‘abbdccc’
b
andc
are repeated more than 1 times- expected out is
['bb', 'ccc']
if My string is '34456788'
then my out is ['44', '88']
If using regex is possible or of interest to you, re.findall
offers a very straightforward way to do this:
inp = "abbdccc"
matches = [x[0] for x in re.findall(r'((.)2+)', inp)]
print(matches) # ['bb', 'ccc']
If you wanted to avoid regex (not sure why as Tim’s solution is really elegant):
main_str = 'abbdccc'
patterns = []
current_pattern = None
for i, char in enumerate(main_str):
# First iteration
if i == 0:
continue
if char == main_str[i-1]:
current_pattern = current_pattern + char if current_pattern else char*2
elif current_pattern:
patterns.append(current_pattern)
current_pattern = None
# Last iteration
if current_pattern and i+1 == len(main_str):
patterns.append(current_pattern)
print(patterns)
Explanation of how it works:
First iteration:
The first iteration is skipped as there is no previous character to compare to.
Following iterations:
If the char is equal to previous char:
and if a pattern already exists then just add the char to the current pattern.
But if a current pattern does not exist then do char*2. This works because previous char and char are equal so we are simplifying previous char + char to be char*2.
If the char is not equal to previous char,
then add the current pattern to the previous patterns and clear the current pattern.
Last iteration:
This needs to be there, otherwise the current pattern won’t get added onto patterns. This is because we only add current_pattern to patterns if the char is different from the previous char but there is no char after the last char to compare to the last char.
Solution with groupby
from itertools import groupby
[v for _, g in groupby(s) if (v := ''.join(g)) and len(v) > 1]
Sample run for input string s
:
# input: 'abbdccc'
# output: ['bb', 'ccc']
# input: '34456788'
# output: ['44', '88']
If you’re interested in some more itertools – there is one: more_itertools, which includes more powerful functions.
Here is one that you could try, and put in here as for future reference:
from more_itertools import run_length
s = 'abbdccc'
print(list(run_length.encode(s)))
# [('a', 1), ('b', 2), ('d', 1), ('c', 3)] # return tuples of ('char', count)
groups = [''.join(x[0]* x[1]) for x in run_length.encode(s) if x[1] > 1]
print(groups)
# ['bb', 'ccc']
# let's see if you're only interested in len(s) >= 3, easy:
groups = [''.join(x[0]* x[1]) for x in run_length.encode(s) if x[1] >= 3]
# ['ccc']