How to convert DataFrame.append() to pandas.concat()?
Question:
In pandas 1.4.0: append()
was deprecated, and the docs say to use concat()
instead.
FutureWarning: The frame.append method is deprecated and will be
removed from pandas in a future version. Use pandas.concat instead.
Codeblock in question:
def generate_features(data, num_samples, mask):
"""
The main function for generating features to train or evaluate on.
Returns a pd.DataFrame()
"""
logger.debug("Generating features, number of samples", num_samples)
features = pd.DataFrame()
for count in range(num_samples):
row, col = get_pixel_within_mask(data, mask)
input_vars = get_pixel_data(data, row, col)
features = features.append(input_vars)
print_progress(count, num_samples)
return features
These are the two options I’ve tried, but did not work:
features = pd.concat([features],[input_vars])
and
pd.concat([features],[input_vars])
This is the line that is deprecated and throwing the error:
features = features.append(input_vars)
Answers:
This will "append" the blank df and prevent errors in the future by using the concat option
features= pd.concat([features, input_vars])
However, still, without having access to actually data and data structures this would be hard to test replicate.
You can store the DataFrames generated in the loop in a list and concatenate them with features
once you finish the loop.
In other words, replace the loop:
for count in range(num_samples):
# .... code to produce `input_vars`
features = features.append(input_vars) # remove this `DataFrame.append`
with the one below:
tmp = [] # initialize list
for count in range(num_samples):
# .... code to produce `input_vars`
tmp.append(input_vars) # append to the list, (not DF)
features = pd.concat(tmp) # concatenate after loop
You can certainly concatenate in the loop but it’s more efficient to do it only once.
For example, you have a list of dataframes called collector
, e.g. for cryptocurrencies, and you want to harvest first rows from two particular columns from each datafarme in our ‘collector’. You do as follows
pd.concat([cap[['Ticker', 'Market Cap']].iloc[:1] for cap in collector] )
In pandas 1.4.0: append()
was deprecated, and the docs say to use concat()
instead.
FutureWarning: The frame.append method is deprecated and will be
removed from pandas in a future version. Use pandas.concat instead.
Codeblock in question:
def generate_features(data, num_samples, mask):
"""
The main function for generating features to train or evaluate on.
Returns a pd.DataFrame()
"""
logger.debug("Generating features, number of samples", num_samples)
features = pd.DataFrame()
for count in range(num_samples):
row, col = get_pixel_within_mask(data, mask)
input_vars = get_pixel_data(data, row, col)
features = features.append(input_vars)
print_progress(count, num_samples)
return features
These are the two options I’ve tried, but did not work:
features = pd.concat([features],[input_vars])
and
pd.concat([features],[input_vars])
This is the line that is deprecated and throwing the error:
features = features.append(input_vars)
This will "append" the blank df and prevent errors in the future by using the concat option
features= pd.concat([features, input_vars])
However, still, without having access to actually data and data structures this would be hard to test replicate.
You can store the DataFrames generated in the loop in a list and concatenate them with features
once you finish the loop.
In other words, replace the loop:
for count in range(num_samples):
# .... code to produce `input_vars`
features = features.append(input_vars) # remove this `DataFrame.append`
with the one below:
tmp = [] # initialize list
for count in range(num_samples):
# .... code to produce `input_vars`
tmp.append(input_vars) # append to the list, (not DF)
features = pd.concat(tmp) # concatenate after loop
You can certainly concatenate in the loop but it’s more efficient to do it only once.
For example, you have a list of dataframes called collector
, e.g. for cryptocurrencies, and you want to harvest first rows from two particular columns from each datafarme in our ‘collector’. You do as follows
pd.concat([cap[['Ticker', 'Market Cap']].iloc[:1] for cap in collector] )