Appending row to dataframe with concat()
Question:
I have defined an empty data frame with
df = pd.DataFrame(columns=['Name', 'Weight', 'Sample'])
and want to append rows in a for loop like this:
for key in my_dict:
...
row = {'Name':key, 'Weight':wg, 'Sample':sm}
df = pd.concat(row, axis=1, ignore_index=True)
But I get this error
cannot concatenate object of type '<class 'str'>'; only Series and DataFrame objs are valid
If I use df = df.append(row, ignore_index=True)
, it works but it seems that append
is deprecated. So, I want to use concat()
. How can I fix that?
Answers:
You can transform your dict in pandas DataFrame
import pandas as pd
df = pd.DataFrame(columns=['Name', 'Weight', 'Sample'])
for key in my_dict:
...
#transform your dic in DataFrame
new_df = pd.DataFrame([row])
df = pd.concat([df, new_df], axis=0, ignore_index=True)
Concat needs a list of series or df objects as first argument.
import pandas as pd
my_dict = {'the_key': 'the_value'}
for key in my_dict:
row = {'Name': 'name_test', 'Weight':'weight_test', 'Sample':'sample_test'}
df = pd.concat([pd.DataFrame(row, index=[key])], axis=1, ignore_index=True)
print(df)
0 1 2
the_key name_test weight_test sample_test
As user7864386 suggested, the most efficient way would be to collect the dict
s and to concatenate them later, but if you for some reason have to add rows in a loop, a more efficient way would be .loc
, because that way you don’t have to turn your dict
into a single-row DataFrame
first:
df.loc[len(df),:] = row
It’s rather hard to benchmark this properly, because %timeit
of that row will grow the DataFrame
and make the call slower over time, while the alternative
pd.concat([df, pd.DataFrame(row)], axis=0, ignore_index=True)
does not mutate df
, and df = ...
can’t be %timeit
ed as it causes an UnboundLocalError
. Running one %timeit
before and one after the other one makes me assume a speed advantage of a factor of 2, though.
I have defined an empty data frame with
df = pd.DataFrame(columns=['Name', 'Weight', 'Sample'])
and want to append rows in a for loop like this:
for key in my_dict:
...
row = {'Name':key, 'Weight':wg, 'Sample':sm}
df = pd.concat(row, axis=1, ignore_index=True)
But I get this error
cannot concatenate object of type '<class 'str'>'; only Series and DataFrame objs are valid
If I use df = df.append(row, ignore_index=True)
, it works but it seems that append
is deprecated. So, I want to use concat()
. How can I fix that?
You can transform your dict in pandas DataFrame
import pandas as pd
df = pd.DataFrame(columns=['Name', 'Weight', 'Sample'])
for key in my_dict:
...
#transform your dic in DataFrame
new_df = pd.DataFrame([row])
df = pd.concat([df, new_df], axis=0, ignore_index=True)
Concat needs a list of series or df objects as first argument.
import pandas as pd
my_dict = {'the_key': 'the_value'}
for key in my_dict:
row = {'Name': 'name_test', 'Weight':'weight_test', 'Sample':'sample_test'}
df = pd.concat([pd.DataFrame(row, index=[key])], axis=1, ignore_index=True)
print(df)
0 1 2
the_key name_test weight_test sample_test
As user7864386 suggested, the most efficient way would be to collect the dict
s and to concatenate them later, but if you for some reason have to add rows in a loop, a more efficient way would be .loc
, because that way you don’t have to turn your dict
into a single-row DataFrame
first:
df.loc[len(df),:] = row
It’s rather hard to benchmark this properly, because %timeit
of that row will grow the DataFrame
and make the call slower over time, while the alternative
pd.concat([df, pd.DataFrame(row)], axis=0, ignore_index=True)
does not mutate df
, and df = ...
can’t be %timeit
ed as it causes an UnboundLocalError
. Running one %timeit
before and one after the other one makes me assume a speed advantage of a factor of 2, though.