# Define a random variable

## Question:

I’m new with Python. I have to create a new `rv U(t)`. Assuming that `Z` has a standard normal distribution `c = 1.57`, I have that:

``````U(t) = 0 if Z(t) <= c
U(t) = Φ(Z(t)) Z(t) > c
``````

Where `Φ(·)` is the `cdf` of the standard normal distribution `N(0, 1)`.

I start sampling random numbers from the normal distribution and I create an array of zeros:

``````z = np.random.normal(0, 1, 100)
u = np.zeros([1, 100])
``````

Then, I write the following loop:

``````for i in list(range(1, 101, 1)):
if z[:i] < c:
u[:i] = 0
else z[:i] > c:
u[:i] = norm.cdf(z[:i], loc=0, scale=1)
``````

However, there is something wrong. I got this error:

``````File "<ipython-input-837-1cbdd0641a75>", line 4
else z[:i] > c:
^
SyntaxError: invalid syntax
``````

Can someone please help me find out where the error is? Or suggest another way to deal with the problem?

Thank you!

Use the power of numpy!

``````z = np.random.normal(0, 1, 100)
u = np.zeros(z.shape)
``````

Since you initialized `u` to zeros, you don’t need to do anything for the `z <= c` cases. For the others, you can use numpy’s logical indexing to only set the elements that fulfill the condition

``````# Get only the elements of z where z > c
z_filt = z[z > c]

# Calculate the norm.cdf at these values of z,
norm_vals = norm.cdf(z_filt, loc=0, scale=1)

# Assign those to the elements of u where z > c
u[z > c] = norm_vals
``````

Of course, you can condense these three lines down to a single line:

``````u[z > c] = norm.cdf(z[z > c], loc=0, scale=1)
``````

This approach will be significantly faster than iterating over the arrays and setting individual elements.

If you’re curious why your code didn’t work and how to fix it,

• You don’t need to list out a `range` to iterate on it. Just `for i in range(100)` is good enough
• `z[:i] < c` will give an array containing `i` boolean values. Putting an `if` condition on that will give you an error: `ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()`. I suspect you meant to do `z[i] < c`
• `u[:i] = 0` will set all elements of `u` from the start to the `i`th index to zero. You probably only wanted `u[i] = 0`. This is really not even necessary since you already initialized `u` to be zeros
• `else <condition>` doesn’t work. You want `elif <condition>`. Although you don’t really need that here, since you could just do `if z[i] > c` and that’s the only condition you need.
``````for i in range(100):
if z[i] > c:
u[i] = norm.cdf(z[i], loc=0, scale=1)
``````

Comparing the runtimes of these two approaches:

``````import timeit
import numpy as np
from scipy.stats import norm
from matplotlib import pyplot as plt

def f_numpy(size):
z = np.random.normal(0, 1, size)
u = np.zeros(z.shape)
u[z > c] = norm.cdf(z[z > c], loc=0, scale=1)
return u

def f_loopy(size):
z = np.random.normal(0, 1, size)
u = np.zeros(z.shape)
for i in range(size):
if z[i] > c:
u[i] = norm.cdf(z[i], loc=0, scale=1)
return u

c = 0

sizes = [10, 100, 1000, 10_000, 100_000]
times = np.zeros((len(sizes), 2))
for i, s in enumerate(sizes):
times[i, 0] = timeit.timeit('f_numpy(s)', globals=globals(), number=100) / 100
print(">")
times[i, 1] = timeit.timeit('f_loopy(s)', globals=globals(), number=10) / 10
print(".")

fig, ax = plt.subplots()
ax.plot(sizes, times[:, 0], label="numpy")
ax.plot(sizes, times[:, 1], label="loopy")
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('Array size')
ax.set_ylabel('Time per function call (s)')
ax.legend()
fig.tight_layout()
``````

we get the following plot, which shows that the numpy approach is orders of magnitude faster than the loopy approach.

Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.