Using only for loop and if statement (no built in functions), group the similar values in a column and add the corresponding values in another column

Question

I have a following dataframe – df (this is a demo one, actual one is very big):

Text	Score
‘I love pizza!’	2
‘I love pizza!’	1
‘I love pizza!’	3
‘Python rules!’	0
‘Python rules!’	5

I want to group the ‘Text’ column values and then add the following rows of the ‘Score’ column.
The output I desire is thus:

Text	Score	Sum
‘I love pizza!’	2	6
‘I love pizza!’	1	6
‘I love pizza!’	3	6
‘Python rules!’	0	5
‘Python rules!’	5	5

I know how to get the desired output using Python/Pandas groupby and sum() (and aggregate) methods, for instance,

df1 = df.groupby('Text')['Score'].sum().reset_index(name='Sum')
df3 = df.merge(df1, on='Text', how='left')

However, I do not want to use any such in-built functions. I want to only use simple for loop and if statement to accomplish this.

I tried doing this the following way:

def func(df):
    # NOTE, CANNOT USE LIST APPEND (as it is an in-built function).
    sum = 0
    n = len(df['text']) # NEED TO WORK FOR-LOOP USING INTEGERS AND HENCE NEED LENGTH
   
    for i in range(0,n):
        exists = False  #flag to track repeated values

        for j in range(i+1,n):            
            if df['text'][i] == df['text'][j]: # IF TRUE, THEN THE 'TEXT' ROWS ARE SIMILAR I.E. GROUPED
                exists = True
                sum = df['score'][i] + df['score'][j]
                
                break;  
        
        if not exists:
            sum += sum

    return sum

df['Sum'] = func(df)

The output for this script is incorrect:

Text	Score	Sum
‘I love pizza!’	2	10
‘I love pizza!’	1	10
‘I love pizza!’	3	10
‘Python rules!’	0	10
‘Python rules!’	5	10

I have tried playing around with the above script, I get different results, but never the correct one. Any help with this is greatly appreciated!
Thank you so much in advance!

Asked By: GaussEuler

||

Source

Answer 1

Herein is the script that produces the correct output for the above question:

def func(df):
    result = []
    final_result = []
    n = len(df['Text'])
    #Add a list of zeros the same length as the original list (= n) to flag positions already checked
    flags = [0] * n
    for k in range(0,n):
        sum = df['Score'][k]
        for i in range(0,n):
            #Step to skip (continue) without doing anything if the position has already been flagged (processed, counted)
            if flags[i]:
                continue
            else:
                if i==k:
                    for j in range(i+1,n):
                        if df['Text'][i]==df['Text'][j]: #If true, then the 'Text' rows are similar, i.e. grouped
                            #Every time there is a match, the position is flageed by turning it to 1
                            flags[j] = 1
                            sum += df['Score'][j]                    
                    result = sum                    
                    break        
        final_result += [result]
        
    return final_result


df['Sum'] = func(df)

Answered By: GaussEuler

Using only for loop and if statement (no built in functions), group the similar values in a column and add the corresponding values in another column

Question:

Answers: