Is it possible to append Series to rows of DataFrame without making a list first?
Question:
I have some data I’m trying to organize into a DataFrame
in Pandas
. I was trying to make each row a Series
and append it to the DataFrame
. I found a way to do it by appending the Series
to an empty list
and then converting the list
of Series
to a DataFrame
e.g. DF = DataFrame([series1,series2],columns=series1.index)
This list
to DataFrame
step seems to be excessive. I’ve checked out a few examples on here but none of the Series
preserved the Index
labels from the Series
to use them as column labels.
My long way where columns are id_names and rows are type_names:
Is it possible to append Series to rows of DataFrame without making a list first?
#!/usr/bin/python
DF = DataFrame()
for sample,data in D_sample_data.items():
SR_row = pd.Series(data.D_key_value)
DF.append(SR_row)
DF.head()
TypeError: Can only append a Series if ignore_index=True or if the Series has a name
Then I tried
DF = DataFrame()
for sample,data in D_sample_data.items():
SR_row = pd.Series(data.D_key_value,name=sample)
DF.append(SR_row)
DF.head()
Empty DataFrame
Tried Insert a row to pandas dataframe
Still getting an empty dataframe :/
I am trying to get the Series to be the rows, where the index of the Series becomes the column labels of the DataFrame
Answers:
Maybe an easier way would be to add the pandas.Series
into the pandas.DataFrame
with ignore_index=True
argument to DataFrame.append()
. Example –
DF = DataFrame()
for sample,data in D_sample_data.items():
SR_row = pd.Series(data.D_key_value)
DF = DF.append(SR_row,ignore_index=True)
Demo –
In [1]: import pandas as pd
In [2]: df = pd.DataFrame([[1,2],[3,4]],columns=['A','B'])
In [3]: df
Out[3]:
A B
0 1 2
1 3 4
In [5]: s = pd.Series([5,6],index=['A','B'])
In [6]: s
Out[6]:
A 5
B 6
dtype: int64
In [36]: df.append(s,ignore_index=True)
Out[36]:
A B
0 1 2
1 3 4
2 5 6
Another issue in your code is that DataFrame.append()
is not in-place, it returns the appended dataframe, you would need to assign it back to your original dataframe for it to work. Example –
DF = DF.append(SR_row,ignore_index=True)
To preserve the labels, you can use your solution to include name for the series along with assigning the appended DataFrame back to DF
. Example –
DF = DataFrame()
for sample,data in D_sample_data.items():
SR_row = pd.Series(data.D_key_value,name=sample)
DF = DF.append(SR_row)
DF.head()
DataFrame.append
does not modify the DataFrame in place. You need to do df = df.append(...)
if you want to reassign it back to the original variable.
Something like this could work…
mydf.loc['newindex'] = myseries
Here is an example where I used it…
stats = df[['bp_prob', 'ICD9_prob', 'meds_prob', 'regex_prob']].describe()
stats
Out[32]:
bp_prob ICD9_prob meds_prob regex_prob
count 171.000000 171.000000 171.000000 171.000000
mean 0.179946 0.059071 0.067020 0.126812
std 0.271546 0.142681 0.152560 0.207014
min 0.000000 0.000000 0.000000 0.000000
25% 0.000000 0.000000 0.000000 0.000000
50% 0.000000 0.000000 0.000000 0.013116
75% 0.309019 0.065248 0.066667 0.192954
max 1.000000 1.000000 1.000000 1.000000
medians = df[['bp_prob', 'ICD9_prob', 'meds_prob', 'regex_prob']].median()
stats.loc['median'] = medians
stats
Out[36]:
bp_prob ICD9_prob meds_prob regex_prob
count 171.000000 171.000000 171.000000 171.000000
mean 0.179946 0.059071 0.067020 0.126812
std 0.271546 0.142681 0.152560 0.207014
min 0.000000 0.000000 0.000000 0.000000
25% 0.000000 0.000000 0.000000 0.000000
50% 0.000000 0.000000 0.000000 0.013116
75% 0.309019 0.065248 0.066667 0.192954
max 1.000000 1.000000 1.000000 1.000000
median 0.000000 0.000000 0.000000 0.013116
Convert the series to a dataframe and transpose it, then append normally.
srs = srs.to_frame().T
df = df.append(srs)
This would work as well:
df = pd.DataFrame()
new_line = pd.Series({'A2M': 4.059, 'A2ML1': 4.28}, name='HCC1419')
df = df.append(new_line, ignore_index=False)
The name
in the Series will be the index in the dataframe. ignore_index=False
is the important flag in this case.
append is deprecating so, the best choice would be to_frame().T
df1 = pd.DataFrame({'name':['john','mark'],'job':['manager','salesman'],'age':[43,23]})
ser1 = df1.iloc[-1]
pd.concat([df1,ser1.to_frame().T],ignore_index=True)
name job age
0 john manager 43
1 mark salesman 23
2 mark salesman 23
I have some data I’m trying to organize into a DataFrame
in Pandas
. I was trying to make each row a Series
and append it to the DataFrame
. I found a way to do it by appending the Series
to an empty list
and then converting the list
of Series
to a DataFrame
e.g. DF = DataFrame([series1,series2],columns=series1.index)
This list
to DataFrame
step seems to be excessive. I’ve checked out a few examples on here but none of the Series
preserved the Index
labels from the Series
to use them as column labels.
My long way where columns are id_names and rows are type_names:
Is it possible to append Series to rows of DataFrame without making a list first?
#!/usr/bin/python
DF = DataFrame()
for sample,data in D_sample_data.items():
SR_row = pd.Series(data.D_key_value)
DF.append(SR_row)
DF.head()
TypeError: Can only append a Series if ignore_index=True or if the Series has a name
Then I tried
DF = DataFrame()
for sample,data in D_sample_data.items():
SR_row = pd.Series(data.D_key_value,name=sample)
DF.append(SR_row)
DF.head()
Empty DataFrame
Tried Insert a row to pandas dataframe
Still getting an empty dataframe :/
I am trying to get the Series to be the rows, where the index of the Series becomes the column labels of the DataFrame
Maybe an easier way would be to add the pandas.Series
into the pandas.DataFrame
with ignore_index=True
argument to DataFrame.append()
. Example –
DF = DataFrame()
for sample,data in D_sample_data.items():
SR_row = pd.Series(data.D_key_value)
DF = DF.append(SR_row,ignore_index=True)
Demo –
In [1]: import pandas as pd
In [2]: df = pd.DataFrame([[1,2],[3,4]],columns=['A','B'])
In [3]: df
Out[3]:
A B
0 1 2
1 3 4
In [5]: s = pd.Series([5,6],index=['A','B'])
In [6]: s
Out[6]:
A 5
B 6
dtype: int64
In [36]: df.append(s,ignore_index=True)
Out[36]:
A B
0 1 2
1 3 4
2 5 6
Another issue in your code is that DataFrame.append()
is not in-place, it returns the appended dataframe, you would need to assign it back to your original dataframe for it to work. Example –
DF = DF.append(SR_row,ignore_index=True)
To preserve the labels, you can use your solution to include name for the series along with assigning the appended DataFrame back to DF
. Example –
DF = DataFrame()
for sample,data in D_sample_data.items():
SR_row = pd.Series(data.D_key_value,name=sample)
DF = DF.append(SR_row)
DF.head()
DataFrame.append
does not modify the DataFrame in place. You need to do df = df.append(...)
if you want to reassign it back to the original variable.
Something like this could work…
mydf.loc['newindex'] = myseries
Here is an example where I used it…
stats = df[['bp_prob', 'ICD9_prob', 'meds_prob', 'regex_prob']].describe()
stats
Out[32]:
bp_prob ICD9_prob meds_prob regex_prob
count 171.000000 171.000000 171.000000 171.000000
mean 0.179946 0.059071 0.067020 0.126812
std 0.271546 0.142681 0.152560 0.207014
min 0.000000 0.000000 0.000000 0.000000
25% 0.000000 0.000000 0.000000 0.000000
50% 0.000000 0.000000 0.000000 0.013116
75% 0.309019 0.065248 0.066667 0.192954
max 1.000000 1.000000 1.000000 1.000000
medians = df[['bp_prob', 'ICD9_prob', 'meds_prob', 'regex_prob']].median()
stats.loc['median'] = medians
stats
Out[36]:
bp_prob ICD9_prob meds_prob regex_prob
count 171.000000 171.000000 171.000000 171.000000
mean 0.179946 0.059071 0.067020 0.126812
std 0.271546 0.142681 0.152560 0.207014
min 0.000000 0.000000 0.000000 0.000000
25% 0.000000 0.000000 0.000000 0.000000
50% 0.000000 0.000000 0.000000 0.013116
75% 0.309019 0.065248 0.066667 0.192954
max 1.000000 1.000000 1.000000 1.000000
median 0.000000 0.000000 0.000000 0.013116
Convert the series to a dataframe and transpose it, then append normally.
srs = srs.to_frame().T
df = df.append(srs)
This would work as well:
df = pd.DataFrame()
new_line = pd.Series({'A2M': 4.059, 'A2ML1': 4.28}, name='HCC1419')
df = df.append(new_line, ignore_index=False)
The name
in the Series will be the index in the dataframe. ignore_index=False
is the important flag in this case.
append is deprecating so, the best choice would be to_frame().T
df1 = pd.DataFrame({'name':['john','mark'],'job':['manager','salesman'],'age':[43,23]})
ser1 = df1.iloc[-1]
pd.concat([df1,ser1.to_frame().T],ignore_index=True)
name job age
0 john manager 43
1 mark salesman 23
2 mark salesman 23