Is there a numpy function for generating sequences similar to R's seq function?

Question:

In R, you can create a sequence by specifying the start point, end point, and desired length of output

seq(1, 1.5, length.out=10)
# [1] 1.000000 1.055556 1.111111 1.166667 1.222222 1.277778 1.333333 1.388889 1.444444 1.500000

In Python, you can use the numpy arange function in a similar way, but there’s no easy way to specify the output length. The best I can come up with:

np.append(np.arange(1, 1.5, step = (1.5-1)/9), 1.5)
# array([ 1.        ,  1.05555556,  1.11111111,  1.16666667,  1.22222222, 1.27777778,  1.33333333,  1.38888889,  1.44444444,  1.5       ])

Is there a cleaner way to perform this operation?

Asked By: C_Z_

||

Answers:

Yes! An easy way to do this will be using numpy.linspace

Numpy Docs

numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)

Return evenly spaced numbers over a specified interval.
Returns num evenly spaced samples, calculated over the interval [start, stop].
The endpoint of the interval can optionally be excluded.

Example:

[In 1] np.linspace(start=0, stop=50, num=5)

[Out 1] array([  0. ,  12.5,  25. ,  37.5,  50. ])

Notice that the distance between the start and stop values is evenly spaced, i.e. evenly divided by num=5.

For those having problems installing numpy (a problem less common these days), you might look in to using anaconda (or miniconda), or some other similar distribution.

Answered By: PaulG

@PaulG’s answer is very good to generate series of floating point numbers. In case you are looking for the R equivalent of 1:5 to create a numpy vector containing 5 integer elements, use:

a = np.array(range(0,5))
a
# array([0, 1, 2, 3, 4])

a.dtype
# dtype('int64')

In contrast to R vectors, Python lists and numpy arrays are zero indexed. In general you will use np.array(range(n)) which returns values from 0 to n-1.

Answered By: Paul Rougieux

As an alternative (and for those interested), if one wanted the functionality of seq(start, end, by, length.out) from R, the following function provides the full functionality.

def seq(start, end, by = None, length_out = None):
    len_provided = True if (length_out is not None) else False
    by_provided = True if (by is not None) else False
    if (not by_provided) & (not len_provided):
        raise ValueError('At least by or length_out must be provided')
    width = end - start
    eps = pow(10.0, -14)
    if by_provided:
        if (abs(by) < eps):
            raise ValueError('by must be non-zero.')
    #Switch direction in case in start and end seems to have been switched (use sign of by to decide this behaviour)
        if start > end and by > 0:
            e = start
            start = end
            end = e
        elif start < end and by < 0:
            e = end
            end = start
            start = e
        absby = abs(by)
        if absby - width < eps: 
            length_out = int(width / absby)
        else: 
            #by is too great, we assume by is actually length_out
            length_out = int(by)
            by = width / (by - 1)
    else:
        length_out = int(length_out)
        by = width / (length_out - 1) 
    out = [float(start)]*length_out
    for i in range(1, length_out):
        out[i] += by * i
    if abs(start + by * length_out - end) < eps:
        out.append(end)
    return out

This function is a bit slower than numpy.linspace (which is roughly 4x-5x faster), but using numba the speed we can obtain a function that is about 2x as fast as np.linspace while keeping the syntax from R.

from numba import jit
@jit(nopython = True, fastmath = True)
def seq(start, end, by = None, length_out = None):
    [function body]

And we can execute this just like we would in R.

seq(0, 5, 0.3)
#out: [3.0, 3.3, 3.6, 3.9, 4.2, 4.5, 4.8]

In the implementation above it also allows (somewhat) for swaps between ‘by’ and ‘length_out’

seq(0, 5, 10)
#out: [0.0,
 0.5555555555555556,
 1.1111111111111112,
 1.6666666666666667,
 2.2222222222222223,
 2.7777777777777777,
 3.3333333333333335,
 3.8888888888888893,
 4.444444444444445,
 5.0]

Benchmarks:

%timeit -r 100 py_seq(0.5, 1, 1000) #Python no jit
133 µs ± 20.9 µs per loop (mean ± std. dev. of 100 runs, 1000 loops each)

%timeit -r 100 seq(0.5, 1, 1000) #adding @jit(nopython = True, fastmath = True) prior to function definition
20.1 µs ± 2 µs per loop (mean ± std. dev. of 100 runs, 10000 loops each)

%timeit -r 100 linspace(0.5, 1, 1000)
46.2 µs ± 6.11 µs per loop (mean ± std. dev. of 100 runs, 10000 loops each)
Answered By: Oliver

You can find more examples here, it contains a lot of R functions with numpy package.
enter image description here

Answered By: Walid Bousseta
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.