# Define a random variable

## Question:

I’m new with Python. I have to create a new `rv U(t)`

. Assuming that `Z`

has a standard normal distribution `c = 1.57`

, I have that:

```
U(t) = 0 if Z(t) <= c
U(t) = Φ(Z(t)) Z(t) > c
```

Where `Φ(·)`

is the `cdf`

of the standard normal distribution `N(0, 1)`

.

I start sampling random numbers from the normal distribution and I create an array of zeros:

```
z = np.random.normal(0, 1, 100)
u = np.zeros([1, 100])
```

Then, I write the following loop:

```
for i in list(range(1, 101, 1)):
if z[:i] < c:
u[:i] = 0
else z[:i] > c:
u[:i] = norm.cdf(z[:i], loc=0, scale=1)
```

However, there is something wrong. I got this error:

```
File "<ipython-input-837-1cbdd0641a75>", line 4
else z[:i] > c:
^
SyntaxError: invalid syntax
```

Can someone please help me find out where the error is? Or suggest another way to deal with the problem?

Thank you!

## Answers:

Use the power of numpy!

```
z = np.random.normal(0, 1, 100)
u = np.zeros(z.shape)
```

Since you initialized `u`

to zeros, you don’t need to do anything for the `z <= c`

cases. For the others, you can use numpy’s logical indexing to only set the elements that fulfill the condition

```
# Get only the elements of z where z > c
z_filt = z[z > c]
# Calculate the norm.cdf at these values of z,
norm_vals = norm.cdf(z_filt, loc=0, scale=1)
# Assign those to the elements of u where z > c
u[z > c] = norm_vals
```

Of course, you can condense these three lines down to a single line:

```
u[z > c] = norm.cdf(z[z > c], loc=0, scale=1)
```

This approach will be *significantly* faster than iterating over the arrays and setting individual elements.

If you’re curious why your code didn’t work and how to fix it,

- You don’t need to
*list*out a`range`

to iterate on it. Just`for i in range(100)`

is good enough `z[:i] < c`

will give an array containing`i`

boolean values. Putting an`if`

condition on that will give you an error:`ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()`

. I suspect you meant to do`z[i] < c`

`u[:i] = 0`

will set*all*elements of`u`

from the start to the`i`

th index to zero. You probably only wanted`u[i] = 0`

. This is really not even necessary since you already initialized`u`

to be zeros`else <condition>`

doesn’t work. You want`elif <condition>`

. Although you don’t really need that here, since you could just do`if z[i] > c`

and that’s the only condition you need.

```
for i in range(100):
if z[i] > c:
u[i] = norm.cdf(z[i], loc=0, scale=1)
```

Comparing the runtimes of these two approaches:

```
import timeit
import numpy as np
from scipy.stats import norm
from matplotlib import pyplot as plt
def f_numpy(size):
z = np.random.normal(0, 1, size)
u = np.zeros(z.shape)
u[z > c] = norm.cdf(z[z > c], loc=0, scale=1)
return u
def f_loopy(size):
z = np.random.normal(0, 1, size)
u = np.zeros(z.shape)
for i in range(size):
if z[i] > c:
u[i] = norm.cdf(z[i], loc=0, scale=1)
return u
c = 0
sizes = [10, 100, 1000, 10_000, 100_000]
times = np.zeros((len(sizes), 2))
for i, s in enumerate(sizes):
times[i, 0] = timeit.timeit('f_numpy(s)', globals=globals(), number=100) / 100
print(">")
times[i, 1] = timeit.timeit('f_loopy(s)', globals=globals(), number=10) / 10
print(".")
fig, ax = plt.subplots()
ax.plot(sizes, times[:, 0], label="numpy")
ax.plot(sizes, times[:, 1], label="loopy")
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('Array size')
ax.set_ylabel('Time per function call (s)')
ax.legend()
fig.tight_layout()
```

we get the following plot, which shows that the numpy approach is orders of magnitude faster than the loopy approach.