# Count consecutive characters

## Question:

How would I count consecutive characters in Python to see the number of times each unique digit repeats before the next unique digit?

At first, I thought I could do something like:

```
word = '1000'
counter = 0
print range(len(word))
for i in range(len(word) - 1):
while word[i] == word[i + 1]:
counter += 1
print counter * "0"
else:
counter = 1
print counter * "1"
```

So that in this manner I could see the number of times each unique digit repeats. But this, of course, falls out of range when `i`

reaches the last value.

In the example above, I would want Python to tell me that 1 repeats 1, and that 0 repeats 3 times. The code above fails, however, because of my `while`

statement.

How could I do this with just built-in functions?

## Answers:

### Totals (without sub-groupings)

```
#!/usr/bin/python3 -B
charseq = 'abbcccdddd'
distros = { c:1 for c in charseq }
for c in range(len(charseq)-1):
if charseq[c] == charseq[c+1]:
distros[charseq[c]] += 1
print(distros)
```

I’ll provide a brief explanation for the interesting lines.

```
distros = { c:1 for c in charseq }
```

The line above is a dictionary comprehension, and it basically iterates over the characters in `charseq`

and creates a key/value pair for a dictionary where the key is the character and the value is the number of times it has been encountered so far.

Then comes the loop:

```
for c in range(len(charseq)-1):
```

We go from `0`

to `length - 1`

to avoid going out of bounds with the `c+1`

indexing in the loop’s body.

```
if charseq[c] == charseq[c+1]:
distros[charseq[c]] += 1
```

At this point, every match we encounter we know is consecutive, so we simply add 1 to the character key. For example, if we take a snapshot of one iteration, the code could look like this (using direct values instead of variables, for illustrative purposes):

```
# replacing vars for their values
if charseq[1] == charseq[1+1]:
distros[charseq[1]] += 1
# this is a snapshot of a single comparison here and what happens later
if 'b' == 'b':
distros['b'] += 1
```

You can see the program output below with the correct counts:

```
➜ /tmp ./counter.py
{'b': 2, 'a': 1, 'c': 3, 'd': 4}
```

You only need to change `len(word)`

to `len(word) - 1`

. That said, you could also use the fact that `False`

‘s value is 0 and `True`

‘s value is 1 with `sum`

:

```
sum(word[i] == word[i+1] for i in range(len(word)-1))
```

This produces the sum of `(False, True, True, False)`

where `False`

is 0 and `True`

is 1 – which is what you’re after.

If you want this to be safe you need to guard empty words (index -1 access):

```
sum(word[i] == word[i+1] for i in range(max(0, len(word)-1)))
```

And this can be improved with `zip`

:

```
sum(c1 == c2 for c1, c2 in zip(word[:-1], word[1:]))
```

A solution “that way”, with only basic statements:

```
word="100011010" #word = "1"
count=1
length=""
if len(word)>1:
for i in range(1,len(word)):
if word[i-1]==word[i]:
count+=1
else :
length += word[i-1]+" repeats "+str(count)+", "
count=1
length += ("and "+word[i]+" repeats "+str(count))
else:
i=0
length += ("and "+word[i]+" repeats "+str(count))
print (length)
```

Output :

```
'1 repeats 1, 0 repeats 3, 1 repeats 2, 0 repeats 1, 1 repeats 1, and 0 repeats 1'
#'1 repeats 1'
```

## Consecutive counts:

You can use `itertools.groupby`

:

```
s = "111000222334455555"
from itertools import groupby
groups = groupby(s)
result = [(label, sum(1 for _ in group)) for label, group in groups]
```

After which, `result`

looks like:

```
[("1": 3), ("0", 3), ("2", 3), ("3", 2), ("4", 2), ("5", 5)]
```

And you could format with something like:

```
", ".join("{}x{}".format(label, count) for label, count in result)
# "1x3, 0x3, 2x3, 3x2, 4x2, 5x5"
```

## Total counts:

Someone in the comments is concerned that you want a *total* count of numbers so `"11100111" -> {"1":6, "0":2}`

. In that case you want to use a `collections.Counter`

:

```
from collections import Counter
s = "11100111"
result = Counter(s)
# {"1":6, "0":2}
```

## Your method:

As many have pointed out, your method fails because you’re looping through `range(len(s))`

but addressing `s[i+1]`

. This leads to an off-by-one error when `i`

is pointing at the last index of `s`

, so `i+1`

raises an `IndexError`

. One way to fix this would be to loop through `range(len(s)-1)`

, but it’s more pythonic to generate something to iterate over.

For string that’s not absolutely huge, `zip(s, s[1:])`

isn’t a a performance issue, so you could do:

```
counts = []
count = 1
for a, b in zip(s, s[1:]):
if a==b:
count += 1
else:
counts.append((a, count))
count = 1
```

The only problem being that you’ll have to special-case the last character if it’s unique. That can be fixed with `itertools.zip_longest`

```
import itertools
counts = []
count = 1
for a, b in itertools.zip_longest(s, s[1:], fillvalue=None):
if a==b:
count += 1
else:
counts.append((a, count))
count = 1
```

If you do have a truly *huge* string and can’t stand to hold two of them in memory at a time, you can use the `itertools`

recipe `pairwise`

.

```
def pairwise(iterable):
"""iterates pairwise without holding an extra copy of iterable in memory"""
a, b = itertools.tee(iterable)
next(b, None)
return itertools.zip_longest(a, b, fillvalue=None)
counts = []
count = 1
for a, b in pairwise(s):
...
```

This is my simple code for finding maximum number of consecutive 1’s in binaray string in python 3:

```
count= 0
maxcount = 0
for i in str(bin(13)):
if i == '1':
count +=1
elif count > maxcount:
maxcount = count;
count = 0
else:
count = 0
if count > maxcount: maxcount = count
maxcount
```

If we want to count **consecutive** characters **without looping**, we can make use of `pandas`

:

```
In [1]: import pandas as pd
In [2]: sample = 'abbcccddddaaaaffaaa'
In [3]: d = pd.Series(list(sample))
In [4]: [(cat[1], grp.shape[0]) for cat, grp in d.groupby([d.ne(d.shift()).cumsum(), d])]
Out[4]: [('a', 1), ('b', 2), ('c', 3), ('d', 4), ('a', 4), ('f', 2), ('a', 3)]
```

The key is to find **the first elements** that are different from their previous values and then make proper groupings in `pandas`

:

```
In [5]: sample = 'abba'
In [6]: d = pd.Series(list(sample))
In [7]: d.ne(d.shift())
Out[7]:
0 True
1 True
2 False
3 True
dtype: bool
In [8]: d.ne(d.shift()).cumsum()
Out[8]:
0 1
1 2
2 2
3 3
dtype: int32
```

There is no need to count or groupby. Just note the indices where a change occurs and subtract consecutive indicies.

```
w = "111000222334455555"
iw = [0] + [i+1 for i in range(len(w)-1) if w[i] != w[i+1]] + [len(w)]
dw = [w[i] for i in range(len(w)-1) if w[i] != w[i+1]] + [w[-1]]
cw = [ iw[j] - iw[j-1] for j in range(1, len(iw) ) ]
print(dw) # digits
['1', '0', '2', '3', '4']
print(cw) # counts
[3, 3, 3, 2, 2, 5]
w = 'XXYXYYYXYXXzzzzzYYY'
iw = [0] + [i+1 for i in range(len(w)-1) if w[i] != w[i+1]] + [len(w)]
dw = [w[i] for i in range(len(w)-1) if w[i] != w[i+1]] + [w[-1]]
cw = [ iw[j] - iw[j-1] for j in range(1, len(iw) ) ]
print(dw) # characters
print(cw) # digits
['X', 'Y', 'X', 'Y', 'X', 'Y', 'X', 'z', 'Y']
[2, 1, 1, 3, 1, 1, 2, 5, 3]
```

A one liner that returns the amount of consecutive characters with no imports:

```
def f(x):s=x+" ";t=[x[1] for x in zip(s[0:],s[1:],s[2:]) if (x[1]==x[0])or(x[1]==x[2])];return {h: t.count(h) for h in set(t)}
```

That returns the amount of times any repeated character in a list is in a consecutive run of characters.

alternatively, this accomplishes the same thing, albeit much slower:

```
def A(m):t=[thing for x,thing in enumerate(m) if thing in [(m[x+1] if x+1<len(m) else None),(m[x-1] if x-1>0 else None)]];return {h: t.count(h) for h in set(t)}
```

In terms of performance, I ran them with

```
site = 'https://web.njit.edu/~cm395/theBeeMovieScript/'
s = urllib.request.urlopen(site).read(100_000)
s = str(copy.deepcopy(s))
print(timeit.timeit('A(s)',globals=locals(),number=100))
print(timeit.timeit('f(s)',globals=locals(),number=100))
```

which resulted in:

```
12.528256356999918
5.351301653001428
```

This method can definitely be improved, but without using any external libraries, this was the best I could come up with.

In python

```
your_string = "wwwwweaaaawwbbbbn"
current = ''
count = 0
for index, loop in enumerate(your_string):
current = loop
count = count + 1
if index == len(your_string)-1:
print(f"{count}{current}", end ='')
break
if your_string[index+1] != current:
print(f"{count}{current}",end ='')
count = 0
continue
```

This will output

```
5w1e4a2w4b1n
```

```
#I wrote the code using simple loops and if statement
s='feeekksssh' #len(s) =11
count=1 #f:0, e:3, j:2, s:3 h:1
l=[]
for i in range(1,len(s)): #range(1,10)
if s[i-1]==s[i]:
count = count+1
else:
l.append(count)
count=1
if i == len(s)-1: #To check the last character sequence we need loop reverse order
reverse_count=1
for i in range(-1,-(len(s)),-1): #Lopping only for last character
if s[i] == s[i-1]:
reverse_count = reverse_count+1
else:
l.append(reverse_count)
break
print(l)
```

Today I had an interview and was asked the same question. I was struggling with the original solution in mind:

```
s = 'abbcccda'
old = ''
cnt = 0
res = ''
for c in s:
cnt += 1
if old != c:
res += f'{old}{cnt}'
old = c
cnt = 0 # default 0 or 1 neither work
print(res)
# 1a1b2c3d1
```

Sadly this solution always got unexpected edge cases result(is there anyone to fix the code? maybe i need post another question), and finally timeout the interview.

After the interview I calmed down and soon got a stable solution I think(though I like the groupby best).

```
s = 'abbcccda'
olds = []
for c in s:
if olds and c in olds[-1]:
olds[-1].append(c)
else:
olds.append([c])
print(olds)
res = ''.join([f'{lst[0]}{len(lst)}' for lst in olds])
print(res)
# [['a'], ['b', 'b'], ['c', 'c', 'c'], ['d'], ['a']]
# a1b2c3d1a1
```

Here is my simple solution:

```
def count_chars(s):
size = len(s)
count = 1
op = ''
for i in range(1, size):
if s[i] == s[i-1]:
count += 1
else:
op += "{}{}".format(count, s[i-1])
count = 1
if size:
op += "{}{}".format(count, s[size-1])
return op
```

```
data_input = 'aabaaaabbaaaaax'
start = 0
end = 0
temp_dict = dict()
while start < len(data_input):
if data_input[start] == data_input[end]:
end = end + 1
if end == len(data_input):
value = data_input[start:end]
temp_dict[value] = len(value)
break
if data_input[start] != data_input[end]:
value = data_input[start:end]
temp_dict[value] = len(value)
start = end
print(temp_dict)
```

PROBLEM: we need to count consecutive characters and return characters with their count.

```
def countWithString(input_string:str)-> str:
count = 1
output = ''
for i in range(1,len(input_string)):
if input_string[i]==input_string[i-1]:
count +=1
else:
output += f"{count}{input_string[i-1]}"
count = 1
# Used to add last string count (at last else condition will not run and data will not be inserted to ouput string)
output += f"{count}{input_string[-1]}"
return output
```

`countWithString(input)`

`input:'aaabbbaabbcc'`

`output:'3a3b2a2b2c'`

`Time Complexity: O(n)`

`Space Complexity: O(1)`

```
[enter image description here][1]
```

```
temp_str = "aaaajjbbbeeeeewwjjj"
def consecutive_charcounter(input_str):
counter = 0
temp_list = []
for i in range(len(input_str)):
if i==0:
counter+=1
elif input_str[i]== input_str[i-1]:
counter+=1
if i == len(input_str)-1:
temp_list.extend([input_str[i - 1], str(counter)])
else:
temp_list.extend([input_str[i-1],str(counter)])
counter = 1
print("".join(temp_list))
```

consecutive_charcounter(temp_str)