Find frequency of strings without using loop

Question:

I have strings of words and I want to find the frequency of each word group, print the words (doesn’t matter if words appear multiple times), and the total frequency for each word group by each word.

PLEASE NOTE: In the solution, I don’t want to use any loop like ‘for’ loop but arrive at same results.

For example, I have words as follows:

'abc'
'abc'
'abc'
'abc'
'xyz'
'xyz'
'tuf'
'pol'
'pol'
'pol'
'pol'
'pol'
'pol'

and need output as:

'abc', 4
'abc', 4
'abc', 4
'abc', 4
'xyz', 2
'xyz', 2
'tuf', 1
'pol', 6
'pol', 6
'pol', 6
'pol', 6
'pol', 6
'pol', 6

I am using python3 and I have tried this code and it doesn’t work as expected:

curr_tk = None                         
tk = None  
count = 0 

for items in data:
    line = items.strip()
    file = line.split(",") 
    tk = file[0]

   if curr_tk == tk:
      count += 1

   else:
      if curr_tk:
         print ('%s , %s' % (curr_tk, count))
      count = 1
      curr_tk = tk

  #print last word
  if curr_tk == tk:
      print ('%s , %s' % (curr_tk,count))

The above code gives me output as:

'abc', 4
'xyz', 2
'tuf', 1
'pol', 6
Asked By: nkah

||

Answers:

I probably understand what you want to do. You need to print the repeated strings, like 'abc', 4 for 4 times, but don’t want to do this using a for loop. I don’t understand why you restrict yourself.

A method is to use a buffer for the output content. I provide two ways, controlled by boolean first_way, to demonstrate this.

curr_tk = None                         
tk = None  
count = 0 

first_way = True
base_buffer = '{tk} , {count}n'
output_buffer = ''
for items in data:
    line = items.strip()
    file = line.split(',') 
    tk = file[0]

    if curr_tk == tk:
        count += 1
        if first_way:
            output_buffer += base_buffer
    else:
        if curr_tk:
             if not first_way: # use operator '*' to copy str
                 # I guess the underlying implementation is also a loop
                 # not sure whether this violates the requirement
                 output_buffer = base_buffer * count
             print (output_buffer.format(tk=curr_tk, count=count), end='')
        count = 1
        curr_tk = tk
        if first_way:
            output_buffer = base_buffer

#print the last word group
if curr_tk:
    if not first_way:
        output_buffer = base_buffer * count
    print (output_buffer.format(tk=curr_tk, count=count), end='')

Giving data = ['abc', 'abc', 'abc', 'abc', 'xyz', 'xyz', 'tuf'], you will get the ouput:

abc , 4
abc , 4
abc , 4
abc , 4
xyz , 2
xyz , 2
tuf , 1
Answered By: ILS

Using loop is unavoidable. But if you prefer not to see it, you can use pandas and let the package do the calculations in the background:

words = ['abc', 'abc', 'abc', 'abc', 'xyz', 'xyz', 'tuf', 'pol', 'pol', 'pol', 'pol', 'pol', 'pol']

import pandas as pd
df = pd.DataFrame(words, columns=['words'])
df1 = pd.DataFrame(df.value_counts(), columns=['counts'])
df.join(df1, on='words', how='inner')

output:

   words  counts
0    abc       4
1    abc       4
2    abc       4
3    abc       4
4    xyz       2
5    xyz       2
6    tuf       1
7    pol       6
8    pol       6
9    pol       6
10   pol       6
11   pol       6
12   pol       6
Answered By: Mohammad Tehrani

I don’t know if this will help but if you really don’t want to use loop then don’t use python at all use Sql.
here’s the code,

DECLARE @phrases TABLE (id int, phrase varchar(max))
INSERT @phrases values
(1,’Red and White’ ),
(2,’green’ ),
(3,’White and blue’ ),
(4,’Blue’ ),
(5,’Dark blue’ );

SELECT word, COUNT(*) c
FROM @phrases
CROSS APPLY (SELECT CAST(”+REPLACE(phrase,’ ‘,”)+” AS xml) xml1 ) t1
CROSS APPLY (SELECT n.value(‘.’,’varchar(max)’) AS word FROM xml1.nodes(‘a’) x(n) ) t2
GROUP BY word

Answered By: Babulo
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.