Read file.csv (two columns; x and y) then calculate cumulative moving average of second column

Question:

I wanted to read my CSV file first.
https://github.com/hamzaal014/file/blob/main/file.csv

the .csv file contains two columns X and Y
here is my script:

import numpy as np
from pandas import DataFrame as df
import csv

origin_data = open("file.csv", "r")
dato = list(csv.reader(origin_data, delimiter=","))
print(dato)

rowcount  = 0
#iterating through the whole file
for row in dato:
  rowcount+= 1
#printing the result
#_ print("Number of lines present:-", rowcount)
print(rowcount)

dati = df(dato, columns=['x', 'y'])

window = 6
roll_avg = dati.rolling(window).mean()

roll_avg_cumulative = dati['y'].cumsum()/np.arange(1, 25)
print(roll_avg_cumulative)

but my script is not working ???

Error ——————————————————————–

Traceback (most recent call last):
  File "/home/haz/miniconda39/lib/python3.9/site-packages/pandas/core/ops/array_ops.py", line 163, in _na_arithmetic_op
    result = func(left, right)
  File "/home/haz/miniconda39/lib/python3.9/site-packages/pandas/core/computation/expressions.py", line 239, in evaluate
    return _evaluate(op, op_str, a, b)  # type: ignore[misc]
  File "/home/haz/miniconda39/lib/python3.9/site-packages/pandas/core/computation/expressions.py", line 128, in _evaluate_numexpr
    result = _evaluate_standard(op, op_str, a, b)
  File "/home/haz/miniconda39/lib/python3.9/site-packages/pandas/core/computation/expressions.py", line 69, in _evaluate_standard
    return op(a, b)
TypeError: unsupported operand type(s) for /: 'str' and 'int'
Asked By: al ahmed

||

Answers:

When reading from a file you are returned strings. This is the source of your problem since the strings are never converted into numbers. You can fix it by:

dati = df(dato, columns=['x', 'y'], dtype_float)

If it is helpful to you I would also like to poit out a few things that may improve your code:

  • you are using pandas as your container for data so I would suggest using the pandas functions to convert a CSV file to a DataFrame instead of doing it manually (do it by using pandas.read_csv)
  • the row count can be easily calculated with the len operator without needing to iterate over all rows
  • please stick to the more widely used import aliases (import pandas) instead of creating your own. This will help your code be more readable to everyone else

So your code can become:

import numpy as np
import pandas as pd

dati = pd.read_csv("file.csv", sep=",", dtype=float, names=["x", "y"])
rowcount = len(dati)

window = 6
roll_avg = dati.rolling(window).mean()

roll_avg_cumulative = dati["y"].cumsum() / np.arange(1, 25)
print(roll_avg_cumulative)
Answered By: Matteo Zanoni

What went wrong in your code:

  • All vals are loaded as str.

Simple way

import numpy as np
import pandas as pd
import csv

dati = pd.read_csv('file.csv', header=None)

window = 6
roll_avg = dati.rolling(window).mean()
print(dati[1].cumsum())

roll_avg_cumulative = dati[1].cumsum()/np.arange(1, 25)
print(roll_avg_cumulative)
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.