How to get the consecutive items from string

Question

I need to get the substring which is continuous more than one char

This is my code:

l = []
p = 'abbdccc'
for i in range(len(p)-1):
    m = ''
    if p[i] == p[i+1]:
        m +=p[i]
        l.append(m)
print(l)

My string is ‘abbdccc’
b and c are repeated more than 1 times
expected out is ['bb', 'ccc']

if My string is '34456788' then my out is ['44', '88']

Asked By: sim

||

Source

Answer 1

If using regex is possible or of interest to you, re.findall offers a very straightforward way to do this:

inp = "abbdccc"
matches = [x[0] for x in re.findall(r'((.)2+)', inp)]
print(matches)  # ['bb', 'ccc']

Answered By: Tim Biegeleisen

Answer 2

If you wanted to avoid regex (not sure why as Tim’s solution is really elegant):

main_str = 'abbdccc'

patterns = []
current_pattern = None
for i, char in enumerate(main_str):
    # First iteration
    if i == 0:
        continue

    if char == main_str[i-1]:
        current_pattern = current_pattern + char if current_pattern else char*2
    elif current_pattern:
        patterns.append(current_pattern)
        current_pattern = None

    # Last iteration
    if current_pattern and i+1 == len(main_str):
        patterns.append(current_pattern)

print(patterns)

Explanation of how it works:
First iteration:
The first iteration is skipped as there is no previous character to compare to.

Following iterations:
If the char is equal to previous char:
and if a pattern already exists then just add the char to the current pattern.
But if a current pattern does not exist then do char*2. This works because previous char and char are equal so we are simplifying previous char + char to be char*2.

If the char is not equal to previous char,
then add the current pattern to the previous patterns and clear the current pattern.

Last iteration:
This needs to be there, otherwise the current pattern won’t get added onto patterns. This is because we only add current_pattern to patterns if the char is different from the previous char but there is no char after the last char to compare to the last char.

Answered By: theQuestionMan

Answer 3

Solution with `groupby`

from itertools import groupby

[v for _, g in groupby(s) if (v := ''.join(g)) and len(v) > 1]

Sample run for input string s:

# input: 'abbdccc'
# output: ['bb', 'ccc']

# input: '34456788'
# output: ['44', '88']

Answered By: Shubham Sharma

Answer 4

If you’re interested in some more itertools – there is one: more_itertools, which includes more powerful functions.

Here is one that you could try, and put in here as for future reference:

from more_itertools import run_length

s = 'abbdccc' 

print(list(run_length.encode(s))) 
# [('a', 1), ('b', 2), ('d', 1), ('c', 3)]   # return tuples of ('char', count) 

groups = [''.join(x[0]* x[1]) for x in run_length.encode(s) if x[1] > 1]
print(groups)
# ['bb', 'ccc']

# let's see if you're only interested in len(s) >= 3, easy:
groups = [''.join(x[0]* x[1]) for x in run_length.encode(s) if x[1] >= 3]
# ['ccc']

Answered By: Daniel Hao

How to get the consecutive items from string

Question:

I need to get the substring which is continuous more than one char

Answers:

Solution with `groupby`

How to get the consecutive items from string

Question:

I need to get the substring which is continuous more than one char

Answers:

Solution with groupby

Solution with `groupby`