How do I count the times the word true was printed in my csv file?

Question

The output that is generated is a column of true or false, I need to count how many times the word true was printed every 10 rows
In the code that I present, it reads csv files that are in one folder and prints them in another. In each of these csv’s it contains two columns that were chosen when the dataframe was defined. In addition, two columns were added which, through the how_many_times function, count how many times the value meets the condition that I give it.

Example of my csv(original df has more rows):

In [1]: dff = pd.DataFrame([['20220901-00:00:00', 50.0335,False,True], ['20220901-00:00:01', 50.024,False,False], ['20220901-00:00:02', 50.021,False,False]], columns=['t', 'f','f<49.975','f>50.025'])

This is my code (I used .sum but it didn’t work for what I needed):

import pandas as pd   
import numpy as np       
import glob   
import os  
all_files = glob.glob("C:/Users/Gamer/Documents/Colbun/Saturn/*.csv")   


file_list = []   
for i,f in enumerate(all_files):   
    df = pd.read_csv(f,header=0,usecols=["t","f"])
    how_many_times1= df.apply(lambda x: x['f'] < 49.975, axis=1).sum
    df['f<49.975']=how_many_times1
    how_many_times2= df.apply(lambda x: x['f'] > 50.025, axis=1).sum
    df['f>50.025']=how_many_times2
    df.to_csv(f'C:/Users/Gamer/Documents/Colbun/Saturn2/{os.path.basename(f).split(".")[0]}_ext.csv')

Asked By: Isaac Rojas Oyarzun

||

Source

Answer 1

You apply the .sum() method directly to a Pandas Series being a DataFrame column ( how_many_times1.sum() ). And because True is equivalent to 1 and False to 0 you can directly count the entries without applying a condition.
This makes sense in case you need the total sum. In case you need the sum periodically each ten rows it makes sense to count the True values in the to the column applied function.
The code below defines two ‘apply’ functions which do the job of creating the right entries for the columns and printing the sum each ten rows.

See the code below for how it is done in detail:

import pandas as pd   
df = pd.DataFrame([['20220901-00:00:00', 50.0335], 
                   ['20220901-00:00:01', 50.100 ], 
                   ['20220901-00:00:02', 48.021 ], 
                   ['20220901-00:00:01', 50.100 ], 
                   ['20220901-00:00:01', 50.100 ], 
                   ['20220901-00:00:02', 48.021 ], 
                   ['20220901-00:00:01', 50.100 ], 
                   ['20220901-00:00:01', 50.100 ], 
                   ['20220901-00:00:02', 48.021 ], 
                   ['20220901-00:00:01', 50.100 ], 
                   ['20220901-00:00:02', 48.021 ], 
                   ['20220901-00:00:01', 50.100 ], 
                   ['20220901-00:00:01', 50.100 ], 
                   ['20220901-00:00:01', 50.100 ]], 
                    columns=['t',          'f'  ])
columns = ['dummy', 'f<49.975','f>50.025']
print(df)

row1 = 1
sum1 = 0
lsm1 = []
def cond1(x):
    global row1, sum1
    cond = False
    if x < 49.975:
       cond=True
       sum1+=1
    if row1%10==0:
        print('sum1:', sum1, 'at row:', row1)
        lsm1.append(sum1)
        sum1=0 # outcomment if cummulative sum required
    else: 
        lsm1.append(None)
    row1 += 1
    return cond     
row2 = 1
sum2 = 0
lsm2 = []
def cond2(x):
    global row2, sum2
    cond = False
    if x > 50.025:
       cond=True
       sum2+=1
    if row2%10==0:
        print('sum2:', sum2, 'at row:', row2)
        lsm2.append(sum2)
        sum2=0 # outcomment if cummulative sum required
    else: 
        lsm2.append(None)
    row2 += 1
    return cond

how_many_times1 = df['f'].apply(cond1)
df[columns[1]]  = how_many_times1
df['sum1']      = lsm1
how_many_times2 = df['f'].apply(cond2)
df[columns[2]]  =how_many_times2
df['sum2']      = lsm2
print(df)

prints

                    t        f
0   20220901-00:00:00  50.0335
1   20220901-00:00:01  50.1000
2   20220901-00:00:02  48.0210
3   20220901-00:00:01  50.1000
4   20220901-00:00:01  50.1000
5   20220901-00:00:02  48.0210
6   20220901-00:00:01  50.1000
7   20220901-00:00:01  50.1000
8   20220901-00:00:02  48.0210
9   20220901-00:00:01  50.1000
10  20220901-00:00:02  48.0210
11  20220901-00:00:01  50.1000
12  20220901-00:00:01  50.1000
13  20220901-00:00:01  50.1000
sum1: 3 at row: 10
sum2: 7 at row: 10
                    t        f  f<49.975  sum1  f>50.025  sum2
0   20220901-00:00:00  50.0335     False   NaN      True   NaN
1   20220901-00:00:01  50.1000     False   NaN      True   NaN
2   20220901-00:00:02  48.0210      True   NaN     False   NaN
3   20220901-00:00:01  50.1000     False   NaN      True   NaN
4   20220901-00:00:01  50.1000     False   NaN      True   NaN
5   20220901-00:00:02  48.0210      True   NaN     False   NaN
6   20220901-00:00:01  50.1000     False   NaN      True   NaN
7   20220901-00:00:01  50.1000     False   NaN      True   NaN
8   20220901-00:00:02  48.0210      True   NaN     False   NaN
9   20220901-00:00:01  50.1000     False   3.0      True   7.0
10  20220901-00:00:02  48.0210      True   NaN     False   NaN
11  20220901-00:00:01  50.1000     False   NaN      True   NaN
12  20220901-00:00:01  50.1000     False   NaN      True   NaN
13  20220901-00:00:01  50.1000     False   NaN      True   NaN

Answered By: Claudio

Answer 2

Another option would be:

df[["f<49.975", "f>50.025"]] = (
    df.assign(f1=df["f"].lt(49.975), f2=df["f"].gt(50.025))
    .groupby(df.index // 10)[["f1", "f2"]].transform("sum")
    .loc[df.index % 10 == 9]
)

Add two columns f1, f2 to df, defined by the two conditions.
Now group every 10 rows and sum over the two new columns to get the truth-count per block. Use .transform to do that to keep the original index.
Then take only every tenth row and assign the result to the two new columns.

Answered By: Timus

How do I count the times the word true was printed in my csv file?

Question:

Answers: