# How to slice and calculate the pearson correlation coefficient between one big and small array with "overlapping" windows arrays

## Question:

Suppose I have two very simple arrays with numpy:

``````import numpy as np
reference=np.array([0,1,2,3,0,0,0,7,8,9,10])
probe=np.zeros(3)
``````

I would like to find which slice of array `reference` has the highest pearson’s correlation coefficient with array `probe`. To do that, I would like to slice the array `reference` using some sort of sub-arrays that are overlapped in a for loop, which means I shift one element at a time of `reference`, and compare it against array `probe`. I did the slicing using the non elegant code below:

``````from statistics import correlation
for i in range(0,len(reference)):
#get the slice of the data
sliced_data=reference[i:i+len(probe)]
#only calculate the correlation when probe and reference have the same number of elements
if len(sliced_data)==len(probe):
my_rho = correlation(sliced_data, probe)

``````

I have one issues and one question about such a code:

1-once I run the code, I have the error below:

``````my_rho = correlation(sliced_data, probe)
File "/usr/lib/python3.10/statistics.py", line 919, in correlation
raise StatisticsError('at least one of the inputs is constant')
statistics.StatisticsError: at least one of the inputs is constant
``````

2- is there a more elegant way of doing such slicing with python?

You can use `sliding_window_view` to get the successive values, for a vectorized computation of the correlation, use a custom function:

``````from numpy.lib.stride_tricks import sliding_window_view as swv

def np_corr(X, y):
denom = (np.sqrt((len(y) * np.sum(X**2, axis=-1) - np.sum(X, axis=-1) ** 2)
* (len(y) * np.sum(y**2) - np.sum(y)**2)))
return np.divide((len(y) * np.sum(X * y[None, :], axis=-1) - (np.sum(X, axis=-1) * np.sum(y))),
denom, where=denom!=0
)

corr = np_corr(swv(reference, len(probe)), probe)
``````

Output:

``````array([ 1.        ,  1.        , -0.65465367, -0.8660254 ,  0.        ,
0.8660254 ,  0.91766294,  1.        ,  1.        ])
``````
Categories: questions
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.