Adding new variable to dataframe

Question:

I am new to Python. I am trying to add a randomly generated variable to an already existing dataframe. I get an error message, but can’t figure out why.

import pandas as pd
import numpy as np

data=[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
df=pd.DataFrame(data, columns=['age'])


 # Add income:
income_5 = np.random.randint(low=0, high=4, size=(nrows(df,))+1                          
df['income5'] = income_5

What am I doing wrong?

Asked By: Stata_user

||

Answers:

After changing size=(nrows(df,) to size=(len(df),) it works, so:

import pandas as pd
import numpy as np

data=[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
df=pd.DataFrame(data, columns=['age'])


 # Add income:
income_5 = np.random.randint(low=0, high=4, size=(len(df),))+1                          
df['income5'] = income_5
Answered By: Deepak Tripathi

The correct syntax would be size=df.shape[0] or size=len(df):

income_5 = np.random.randint(low=0, high=4, size=df.shape[0])
df['income5'] = income_5

Example:

   age  income5
0   10        0
1   20        3
2   30        0
3   40        3
4   50        0
5   60        3
6   70        0
7   80        0
8   90        2
9  100        1

NB. You don’t need the intermediate variable:

df['income5'] = np.random.randint(low=0, high=4, size=df.shape[0])
Answered By: mozway
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.