How do I CONCAT data from a dataframe to another dataframe?

Question:

I have built the following function and now .append will be removed from pandas in a future version. So I am weeling to convert this code with concat.

def MyDF(self,DF1,DF2):
    OutputDf = pd.DataFrame([]).reset_index(drop=True)
    for i in range(0,len(DF2)):
        OutputDf = OutputDf.append(DF2.loc[[i]])
        OutputDf = OutputDf.append(DF1.loc[(DF1['TheName'] == DF2['TheName'][i]) & (DF1['WGT'].apply(lambda x: float(x)) > 0) ])
        OutputDf = OutputDf.reset_index(drop=True)
    return OutputDf

I don’t know how to use concat in this case, so how would I avoid .append there ?

Not sure that would work :

OutputDf = pd.Concat(OutputDf,DF2.loc[[i]])
Asked By: TourEiffel

||

Answers:

pandas.DataFrame.append and pandas.Series.append are Deprecated since version 1.4.0. See Deprecated DataFrame.append and Series.append

The alternative is using pandas.concat.

In OP’s case, .append() is being used in two cases:

  1. OutputDf = OutputDf.append(DF2.loc[[i]])

  2. OutputDf = OutputDf.append(DF1.loc[(DF1['TheName'] == DF2['TheName'][i]) & (DF1['WGT'].apply(lambda x: float(x)) > 0) ])


Case 1

One can change to the following

OutputDf = pd.concat([OutputDf, DF2.loc[[i]]], ignore_index=True)

Case 2

One can change to the following

OutputDf = pd.concat([OutputDf, DF1.loc[(DF1['TheName'] == DF2['TheName'][i]) & (DF1['WGT'].apply(lambda x: float(x)) > 0) ]], ignore_index=True)

Notes:

  • As I do not have access to the dataframes and do not know the desired output, one might have to do some adjustments.
Answered By: Gonçalo Peres

I think pandas.concat() is easy to understand, so that, you just tell good bye to append and keep up to pandas.

At the beginning, just attention to objs, ignore_index and axis arguments. If you want to add rows one under the other, just you can give this with axis=0 argument. If you give axis=0, you can concat dataFrame objects vertically like .append(). If you give axis=1, this process will be done horizontally like the documentation says:

axis : {0/’index’, 1/’columns’}, default 0
The axis to concatenate along.

Also, you can use ignore_index rather than reset_index. To organize indexes, you can use ignore_index=True argument.

Summarily, if you have 2 dataframes to concat like your question, you can use something like this:

def MyDF(self,DF1,DF2):
    OutputDf = pd.DataFrame([]).reset_index(drop=True)
    for i in range(0,len(DF2)):
        process1 = DF2.loc[[i]]
        process2 = DF1.loc[(DF1['TheName'] == DF2['TheName'][i]) & (DF1['WGT'].apply(lambda x: float(x)) > 0) ]
        OutputDf = pd.concat([process1, process2], ignore_index=True)
    return OutputDf

You can make this code much shorter but it will decrease to readability, obviously. You may want to use:

def MyDF(self,DF1,DF2):
    OutputDf = pd.DataFrame([]).reset_index(drop=True)
    for i in range(0,len(DF2)):
        OutputDf = pd.concat([DF2.loc[[i]], DF1.loc[(DF1['TheName'] == DF2['TheName'][i]) & (DF1['WGT'].apply(lambda x: float(x)) > 0) ]], ignore_index=True)
    return OutputDf

Or, you give the pd.concat() part to return, but it will be harder to read, so that, it is your decision. Just don’t forget to use [] in your code, be careful that the usage of concat:

pd.concat([process1, process2])  # use [] inside concat for dataframes

If you directly use pd.concat(process1, process2), it will give an error.

Answered By: Furkan Akdag