Assign column from one dataframe to another, if it doesn't exist keep it as an empty column

Question:

I’m scraping data from a private API, I’ve managed to gather all the responses I need and then transform the json object into a dataframe, let’s call it raw_df.
What I’d like to do now is to convert this dataframe into another dataframe, called df, with a specific layout so it can be used in another area that uses excel only.
Raw_df consists of 32 columns of which only 12 are useful to me, the desired output in df should be those 12 columns with changed headers, as well as changed datatypes.
The main problem arises when raw_df is missing one of the columns that I need, which gives an error since I can’t assign it to df (raw_df could be missing one or more columns, it depends on the api call), in this case it would mean that such column in df should be empty.

I’ve came up with a rather unelegant solution, I initialized an empty dataframe with 12 columns with their respective datatypes, then assign each column in raw_df to its counterpart in df. Then I added a try except to ignore the error in the cases in which there may not be a column in raw_df.

df = pd.DataFrame({"ID de respuesta": pd.Series(dtype='int'),
                       "TiendaID": pd.Series(dtype='int'),
                       "Dirección IP": pd.Series(dtype='str'),
                       "Marca de tiempo (dd/mm/yyyy)": pd.Series(dtype='float'),
                       "Tiempo necesario para completar (segundos)": pd.Series(dtype='int'),
                       "Latitud": pd.Series(dtype='float'),
                       "Longitud": pd.Series(dtype='float'),
                       "¿El vendedor te recomendó algún producto adicional?": pd.Series(dtype='int'),
                       "¿Cuán probable es que recomiendes las tiendas Sx a un familiar o amigo?": pd.Series(dtype='int'),
                       "¿Por qué nos felicitas?": pd.Series(dtype='str'),
                       "¿En qué debemos mejorar?": pd.Series(dtype='str'),
                       "¿En qué fallamos?": pd.Series(dtype='str')})
    
    df["ID de respuesta"] = raw_df["responseID"]
    df["TiendaID"] = raw_df["surveyID"]
    df["Dirección IP"] = raw_df["ipAddress"]
    df["Marca de tiempo (dd/mm/yyyy)"] = raw_df["timestamp"]
    df["Tiempo necesario para completar (segundos)"] = raw_df["timeTaken"]
    df["Latitud"] = raw_df["location.latitude"]
    df["Longitud"] = raw_df["location.longitude"]
    df["¿El vendedor te recomendó algún producto adicional?"] = raw_df["¿El vendedor te recomendó algún producto adicional? "]
    df["¿Cuán probable es que recomiendes las tiendas Sx a un familiar o amigo?"] = raw_df["¿Cuán probable es que recomiendes las tiendas Sx a un familiar o amigo?"]
    
    try:
        df["¿Por qué nos felicitas?"] = raw_df["¿Por qué nos felicitas?"]
        df["¿En qué debemos mejorar?"] = raw_df["¿En qué debemos mejorar?"]
        df["¿En qué fallamos?"] = raw_df["¿En qué fallamos?"]
    except Exception:
        pass 

I would like help to find a more pythonic way to do this, since I think this is not the best way. Thank you for your help.

Asked By: Irving Pérez

||

Answers:

You could e.g. specify the mapping from your column names in df to the respective column names in raw_df in a dictionary d and then iterate over the key, val pairs of it and first check whether the column is present in raw_df with an if statement, and only assign it in that case, as follows:

d = {
    "ID de respuesta": "responseID",
    "TiendaID" = "surveyID",
     ...
}

for key, val in d.items():
    if val in raw_df.columns:
        df[key] = raw_df[val]
Answered By: Michael Hodel
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.