Extracting values from a special string type dataframe column

Question

I have a string-type pandas dataframe column:

{"min":[0,1,0.1,0,0,0], "max":[0,1,0.4,0,0,0]}

df:

ID                min_max_config  
1    {"min":[0,1,0.1,0,0,0], "max":[0,1,0.4,0,0,0]}  
2    {"min":[0,1,0.1,0,0,0], "max":[0,1,0.5,0,0,0]}  
3    {"min":[0,1,0.6,0,0,0], "max":[0,1,0.7,0,0,0]}  
4    {"min":[0,1,0.8,0,0,0], "max":[0,1,0.2,0,0,0]}

I want to make separate columns out of the sum of the values of min and max:

output_df:

ID.  min.   max                 min_max_config  
1.   1.1    1.4        {"min":[0,1,0.1,0,0,0], "max":[0,1,0.4,0,0,0]}  
2.   1.1    1.5        {"min":[0,1,0.1,0,0,0], "max":[0,1,0.5,0,0,0]}  
3.   1.6    1.7        {"min":[0,1,0.6,0,0,0], "max":[0,1,0.7,0,0,0]}  
4.   1.8    1.2        {"min":[0,1,0.8,0,0,0], "max":[0,1,0.2,0,0,0]}

How to achieve this output

Asked By: genz_on_code

||

Source

Answer 1

You can use the apply() function to iterate through each row of the dataframe, and the json library to parse the JSON strings in the min_max_config column. Here is an example of how you can achieve this:

import json

def extract_min_max(row):
    config = json.loads(row['min_max_config'])
    return pd.Series({'min': sum(config['min']), 'max': sum(config['max'])})

output_df = df.join(df.apply(extract_min_max, axis=1))

This will add two new columns, min and max, to the original dataframe df, containing the sum of the values of the min and max lists respectively, for each row.

You could also use pandas.io.json.json_normalize() to extract the nested JSON into separate columns and then use the sum() function to calculate the sum of the min and max columns.

import json
from pandas.io.json import json_normalize

df = df.join(json_normalize(df['min_max_config'].apply(json.loads)))
df['min'] = df.min.apply(lambda x: sum(x))
df['max'] = df.max.apply(lambda x: sum(x))

Answered By: Zanmato

Answer 2

Here is one way to do it with the help of ast.literal_eval and pandas.Series.str :

from ast import literal_eval

df["min_max_config"] = df["min_max_config"].apply(literal_eval)

out = df.assign(**{k: df['min_max_config'].str[k].apply(np.sum) for k in ["min", "max"]})

Output :

print(out)

   ID                                              min_max_config  min  max
0   1  {'min': [0, 1, 0.1, 0, 0, 0], 'max': [0, 1, 0.4, 0, 0, 0]}  1.1  1.4
1   2  {'min': [0, 1, 0.1, 0, 0, 0], 'max': [0, 1, 0.5, 0, 0, 0]}  1.1  1.5
2   3  {'min': [0, 1, 0.6, 0, 0, 0], 'max': [0, 1, 0.7, 0, 0, 0]}  1.6  1.7
3   4  {'min': [0, 1, 0.8, 0, 0, 0], 'max': [0, 1, 0.2, 0, 0, 0]}  1.8  1.2

Answered By: Timeless

Extracting values from a special string type dataframe column

Question:

Answers: