Extracting values from a special string type dataframe column

Question:

I have a string-type pandas dataframe column:

{"min":[0,1,0.1,0,0,0], "max":[0,1,0.4,0,0,0]}  

df:

ID                min_max_config  
1    {"min":[0,1,0.1,0,0,0], "max":[0,1,0.4,0,0,0]}  
2    {"min":[0,1,0.1,0,0,0], "max":[0,1,0.5,0,0,0]}  
3    {"min":[0,1,0.6,0,0,0], "max":[0,1,0.7,0,0,0]}  
4    {"min":[0,1,0.8,0,0,0], "max":[0,1,0.2,0,0,0]}  

I want to make separate columns out of the sum of the values of min and max:

output_df:

ID.  min.   max                 min_max_config  
1.   1.1    1.4        {"min":[0,1,0.1,0,0,0], "max":[0,1,0.4,0,0,0]}  
2.   1.1    1.5        {"min":[0,1,0.1,0,0,0], "max":[0,1,0.5,0,0,0]}  
3.   1.6    1.7        {"min":[0,1,0.6,0,0,0], "max":[0,1,0.7,0,0,0]}  
4.   1.8    1.2        {"min":[0,1,0.8,0,0,0], "max":[0,1,0.2,0,0,0]}  

How to achieve this output

Asked By: genz_on_code

||

Answers:

You can use the apply() function to iterate through each row of the dataframe, and the json library to parse the JSON strings in the min_max_config column. Here is an example of how you can achieve this:

import json

def extract_min_max(row):
    config = json.loads(row['min_max_config'])
    return pd.Series({'min': sum(config['min']), 'max': sum(config['max'])})

output_df = df.join(df.apply(extract_min_max, axis=1))

This will add two new columns, min and max, to the original dataframe df, containing the sum of the values of the min and max lists respectively, for each row.

You could also use pandas.io.json.json_normalize() to extract the nested JSON into separate columns and then use the sum() function to calculate the sum of the min and max columns.

import json
from pandas.io.json import json_normalize

df = df.join(json_normalize(df['min_max_config'].apply(json.loads)))
df['min'] = df.min.apply(lambda x: sum(x))
df['max'] = df.max.apply(lambda x: sum(x))
Answered By: Zanmato

Here is one way to do it with the help of ast.literal_eval and pandas.Series.str :

from ast import literal_eval
​
df["min_max_config"] = df["min_max_config"].apply(literal_eval)
​
out = df.assign(**{k: df['min_max_config'].str[k].apply(np.sum) for k in ["min", "max"]})

Output :

print(out)

   ID                                              min_max_config  min  max
0   1  {'min': [0, 1, 0.1, 0, 0, 0], 'max': [0, 1, 0.4, 0, 0, 0]}  1.1  1.4
1   2  {'min': [0, 1, 0.1, 0, 0, 0], 'max': [0, 1, 0.5, 0, 0, 0]}  1.1  1.5
2   3  {'min': [0, 1, 0.6, 0, 0, 0], 'max': [0, 1, 0.7, 0, 0, 0]}  1.6  1.7
3   4  {'min': [0, 1, 0.8, 0, 0, 0], 'max': [0, 1, 0.2, 0, 0, 0]}  1.8  1.2
Answered By: Timeless
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.