Scipy processing large data
Question:
I have a dataset which contain only one column (Pandas Series). Dataset is .dat file, which has about 2 000 000 rows and 1 column (166 MB). Reading this data with pd.read_csv takes about 7-8 minutes. This data is a signal, which need to be processed (using scipy.signal
). So when I process the data I get MemoryError. Is there a way to speed up the loading of the file and increase the speed of its processing (scipy.signal.ellip
) and bypass the memory problem? Thank you in advance.
Loading the data:
data = pd.read_csv('C:/Users/HP/Desktop/Python and programming/Jupyter/Filter/3200_Hz.dat',
sep='rn', header=None, squeeze=True)
Data processing (takes about 7 minutes too):
b, a = signal.ellip(4, 5, 40, Wn, 'bandpass', analog=False)
output = signal.filtfilt(b, a, data)
#after that plotting 'output' with plt
Example of input data:
6954
25903
42882
17820
3485
-11456
4574
34594
25520
26533
9331
-22503
14950
30973
23398
41474
-860
-8528
Answers:
You set 'rn'
as a separator, which means (if I understand correctly) that each line equals a new column. That means you’ll end up with millions of columns, and the squeeze
argument doesn’t do anything.
Don’t set the sep
argument (leave it at its default): newlines will separate the records, and squeeze
will then return it into a Series
.
I have a dataset which contain only one column (Pandas Series). Dataset is .dat file, which has about 2 000 000 rows and 1 column (166 MB). Reading this data with pd.read_csv takes about 7-8 minutes. This data is a signal, which need to be processed (using scipy.signal
). So when I process the data I get MemoryError. Is there a way to speed up the loading of the file and increase the speed of its processing (scipy.signal.ellip
) and bypass the memory problem? Thank you in advance.
Loading the data:
data = pd.read_csv('C:/Users/HP/Desktop/Python and programming/Jupyter/Filter/3200_Hz.dat',
sep='rn', header=None, squeeze=True)
Data processing (takes about 7 minutes too):
b, a = signal.ellip(4, 5, 40, Wn, 'bandpass', analog=False)
output = signal.filtfilt(b, a, data)
#after that plotting 'output' with plt
Example of input data:
6954
25903
42882
17820
3485
-11456
4574
34594
25520
26533
9331
-22503
14950
30973
23398
41474
-860
-8528
You set 'rn'
as a separator, which means (if I understand correctly) that each line equals a new column. That means you’ll end up with millions of columns, and the squeeze
argument doesn’t do anything.
Don’t set the sep
argument (leave it at its default): newlines will separate the records, and squeeze
will then return it into a Series
.