Python: How to print only values that appears at least n times

Question:

I am currently working with .csv file in my python code. there are over than 1 million data and I want to print only the value (time stamp) that appears at least 10 times in the data. This is my current code:

import csv
with open('cut2.csv', newline='') as csvfile:
 data = csv.DictReader(csvfile)
 time_stamp = 't'

 for row, time_stamp in data:
    if time_stamp >= 10:
        print('object found at {}'.format(time_stamp)) 

the format of data in my .csv file is x,y,p,t this is a short snap of my csv file

x,y,p,t
1050,397,1,31531
1074,397,1,31531
1025,398,1,31531
1026,398,1,31531
1048,398,1,31531
1052,398,1,31531
1067,398,1,31531
1084,398,1,31531
1011,399,1,31532
1018,407,1,31532
1024,407,1,31532
1033,407,1,31532
1042,407,1,31532
1054,407,1,31532
1058,407,1,31532
1061,407,1,31532
1077,407,1,31532
1030,406,1,31532
1033,406,1,31532
1044,406,1,31532
1056,406,1,31532
1058,406,1,31532
1063,406,1,31532
1087,406,1,31532
1094,406,1,31532
1036,405,1,31532
1050,405,1,31532
1069,405,1,31532
1079,405,1,31532
1098,405,1,31532

I get this error message

    for row, time_stamp in data:
ValueError: too many values to unpack (expected 2)

does anyone knows how to fix this ? any help would be appreciated. thanks.

Asked By: poohbear119

||

Answers:

You can use pandas to make matching and filtering easier:

import pandas as pd

df = pd.read_csv('cut2.csv')

repeated_timestamps = df.groupby('t').size() >= 10
timestamps_of_interest = [row[0] for row in repeated_timestamps.items() if row[1]]  # row[0] is the timestamp, row[1] is True or False depending on the count

for ts in timestamps_of_interest:
    print(f'Timestamp: {ts}')
    for row in df[df['t'] == ts].iterrows():  # Note the parenthesis!
        print(f'x: {row[1]["x"]}, y: {row[1]["y"]}, p: {row[1]["p"]}')

Output (with your example slightly modified to have some p’s as 1 and others as 0):

Timestamp: 31532
x: 1011, y: 399, p: 1
x: 1018, y: 407, p: 1
x: 1024, y: 407, p: 1
x: 1033, y: 407, p: 1
x: 1042, y: 407, p: 1
x: 1054, y: 407, p: 1
x: 1058, y: 407, p: 0
x: 1061, y: 407, p: 0
x: 1077, y: 407, p: 0
x: 1030, y: 406, p: 0
x: 1033, y: 406, p: 0
x: 1044, y: 406, p: 0
x: 1056, y: 406, p: 0
x: 1058, y: 406, p: 0
x: 1063, y: 406, p: 0
x: 1087, y: 406, p: 0
x: 1094, y: 406, p: 0
x: 1036, y: 405, p: 0
x: 1050, y: 405, p: 0
x: 1069, y: 405, p: 0
x: 1079, y: 405, p: 0
x: 1098, y: 405, p: 0

You can off course change the format you use to print (or what you’ll do with the data later).

In case you want to add additional filters (after the number of timestamps was counted), you can add that to the if:

filtered_df = df[df['p'] == 1]  # Filtering before the loop is more efficient
for ts in timestamps_of_interest:
    print(f'Timestamp: {ts}')
    for row in filtered_df[filtered_df['t'] == ts].iterrows():
        print(f'x: {row[1]["x"]}, y: {row[1]["y"]}, p: {row[1]["p"]}')

The result shows only the values with p == 1:

Timestamp: 31532
x: 1011, y: 399, p: 1
x: 1018, y: 407, p: 1
x: 1024, y: 407, p: 1
x: 1033, y: 407, p: 1
x: 1042, y: 407, p: 1
x: 1054, y: 407, p: 1
Answered By: nonDucor

There is a bunch of rewriting needed to make your code work like you want it to. Most importantly, Python will not magically group your data based on some condition – you need to tell it how it is supposed to do this. Easiest way would be using collections.Counter, which you increment for each row with a given timestamp.

import csv
from collections import Counter
with open('cut2.csv', newline='') as csvfile:
    data = csv.DictReader(csvfile)
    counter = Counter()
    for row_dict in data:
        counter[row_dict['t']] += 1
    for time_stamp, occurences in counter.items():
        if occurences >= 10:
            print('object found at {}'.format(time_stamp)) 
Answered By: matszwecja
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.