Generation numbers

Question:

I am trying to generate data randomly. Below you can see my example

import numpy as np
import pandas as pd
import random

df_categories = pd.DataFrame(np.random.choice(a=["0", "1"], size=100, p=[0.7, 0.3]),
columns = ['number'])
df_categories

This code works well and generates data. Now I want to change this code in order to generate integer data in some range, instead "1" to generate data in a range from 1 to 100.

df_categories = pd.DataFrame(np.random.choice(a=[0, random.randint(0, 100)], size=100, p=[0.7, 0.3]),
columns = ['number'])
df_categories

I tried the code above but this code generates only one value in 30% of the fields. So can anybody help me how to solve this problem and generate different numbers instead of only one number?

Asked By: silent_hunter

||

Answers:

Why don’t you use numpy.random.randint and a mask?

# random integers
a = np.random.randint(0, 100, size=100)
# random mask for ~70% of values
m = np.random.choice([True, False], size=100, p=[0.7, 0.3])

df_categories = pd.DataFrame(np.where(m, 0, a),
                             columns=['number'])
df_categories
Answered By: mozway

You can do the following:

n = 100
prob_0 = 0.7
a = [0] + list(np.arange(0, n)) # [0, 0, 1, 2, 3, ..., 99]
p = [prob_0] + [(1 - prob_0)/n] * n # [0.7, 0.003, ..., 0.003]
df_categories = pd.DataFrame(np.random.choice(a=a, size=n, p=p), columns=['number'])

Output (for example):

    number
0        0
1       32
2        0
3       39
4        0
..     ...
95       0
96      63
97      55
98       0
99       0

[100 rows x 1 columns]
Answered By: T C Molenaar

You need this:

import pandas as pd
import numpy as np
import random

my_range=100

df_categories = pd.DataFrame(np.random.choice(a=[0] + list(np.arange(0, my_range)), size=100, p=[0.7] + [(0.3/my_range )]*my_range),
columns = ['number'])
df_categories

Output:

  number
0   0
1   8
2   40
3   73
4   0
... ...
95  75
96  94
97  4
98  0
99  25
100 rows × 1 columns
Answered By: Shahab Rahnama
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.