Taking a row from a pandas DataFrame and adding it to a new DataFrame
Question:
So, I have a DataFrame with with 14 columns (with headers). I’d like to take the values of a row of this DataFrame at an index that’s a random integer between 0 and the maximum length of the DataFrame, extract those values, and then add them to a new DataFrame with the same headers.
The data frame looks something like this:
I’ve tried using various combinations with .iloc, but for some reason that produces a DataFrame that’s nothing but column headers, not the actual numerical values of the DataFrame itself.
What’s the best way to do this?
Thanks
Answers:
Assuming that your dataframe is named df
, you should be able to query a specific row using iloc
, e.g.:
df.iloc[[i]]
Where i
is your random number.
You can then concatenate this dataframe with your new dataframe df_new
.
for i in np.random.randint(0,10,size=10):
if df_new.empty:
df_new = df.iloc[[i]]
else:
df_new = pd.concat([df_new, df.iloc[[i]]])
Let’s say you have same dummy dataframe like this:
df = pd.DataFrame({'A': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'],
'B': ['one', 'one', 'two', 'two', 'one', 'one'],
'C': [1, 2, 3, 4, 5, 6],
'D': [7, 8, 9, 10, 11, 12]})
Let’s create an emplty df:
df2 = pd.DataFrame(columns=df.columns)
If you want take a row from df to df2 then just do this:
df2 = df2.append(df.iloc[1], ignore_index=True)
# if append is deprecated you can do this:
df2 = pd.concat([df2, df.iloc[[1]]], axis=0, ignore_index=True)
If you want a random one you can do:
df2 = df2.append(df.sample(1), ignore_index=True)
#or
df2 = pd.concat([df2, df.sample(1)], ignore_index=True)
So, I have a DataFrame with with 14 columns (with headers). I’d like to take the values of a row of this DataFrame at an index that’s a random integer between 0 and the maximum length of the DataFrame, extract those values, and then add them to a new DataFrame with the same headers.
The data frame looks something like this:
I’ve tried using various combinations with .iloc, but for some reason that produces a DataFrame that’s nothing but column headers, not the actual numerical values of the DataFrame itself.
What’s the best way to do this?
Thanks
Assuming that your dataframe is named df
, you should be able to query a specific row using iloc
, e.g.:
df.iloc[[i]]
Where i
is your random number.
You can then concatenate this dataframe with your new dataframe df_new
.
for i in np.random.randint(0,10,size=10):
if df_new.empty:
df_new = df.iloc[[i]]
else:
df_new = pd.concat([df_new, df.iloc[[i]]])
Let’s say you have same dummy dataframe like this:
df = pd.DataFrame({'A': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'],
'B': ['one', 'one', 'two', 'two', 'one', 'one'],
'C': [1, 2, 3, 4, 5, 6],
'D': [7, 8, 9, 10, 11, 12]})
Let’s create an emplty df:
df2 = pd.DataFrame(columns=df.columns)
If you want take a row from df to df2 then just do this:
df2 = df2.append(df.iloc[1], ignore_index=True)
# if append is deprecated you can do this:
df2 = pd.concat([df2, df.iloc[[1]]], axis=0, ignore_index=True)
If you want a random one you can do:
df2 = df2.append(df.sample(1), ignore_index=True)
#or
df2 = pd.concat([df2, df.sample(1)], ignore_index=True)