Adding new variable to dataframe

Question

I am new to Python. I am trying to add a randomly generated variable to an already existing dataframe. I get an error message, but can’t figure out why.

import pandas as pd
import numpy as np

data=[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
df=pd.DataFrame(data, columns=['age'])


 # Add income:
income_5 = np.random.randint(low=0, high=4, size=(nrows(df,))+1                          
df['income5'] = income_5

What am I doing wrong?

Asked By: Stata_user

||

Source

Answer 1

After changing size=(nrows(df,) to size=(len(df),) it works, so:

import pandas as pd
import numpy as np

data=[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
df=pd.DataFrame(data, columns=['age'])


 # Add income:
income_5 = np.random.randint(low=0, high=4, size=(len(df),))+1                          
df['income5'] = income_5

Answered By: Deepak Tripathi

Answer 2

The correct syntax would be size=df.shape[0] or size=len(df):

income_5 = np.random.randint(low=0, high=4, size=df.shape[0])
df['income5'] = income_5

Example:

   age  income5
0   10        0
1   20        3
2   30        0
3   40        3
4   50        0
5   60        3
6   70        0
7   80        0
8   90        2
9  100        1

NB. You don’t need the intermediate variable:

df['income5'] = np.random.randint(low=0, high=4, size=df.shape[0])

Answered By: mozway

Adding new variable to dataframe

Question:

Answers: