Extracting values from a special string type dataframe column
Question:
I have a string-type pandas dataframe column:
{"min":[0,1,0.1,0,0,0], "max":[0,1,0.4,0,0,0]}
df:
ID min_max_config
1 {"min":[0,1,0.1,0,0,0], "max":[0,1,0.4,0,0,0]}
2 {"min":[0,1,0.1,0,0,0], "max":[0,1,0.5,0,0,0]}
3 {"min":[0,1,0.6,0,0,0], "max":[0,1,0.7,0,0,0]}
4 {"min":[0,1,0.8,0,0,0], "max":[0,1,0.2,0,0,0]}
I want to make separate columns out of the sum of the values of min and max:
output_df:
ID. min. max min_max_config
1. 1.1 1.4 {"min":[0,1,0.1,0,0,0], "max":[0,1,0.4,0,0,0]}
2. 1.1 1.5 {"min":[0,1,0.1,0,0,0], "max":[0,1,0.5,0,0,0]}
3. 1.6 1.7 {"min":[0,1,0.6,0,0,0], "max":[0,1,0.7,0,0,0]}
4. 1.8 1.2 {"min":[0,1,0.8,0,0,0], "max":[0,1,0.2,0,0,0]}
How to achieve this output
Answers:
You can use the apply() function to iterate through each row of the dataframe, and the json library to parse the JSON strings in the min_max_config column. Here is an example of how you can achieve this:
import json
def extract_min_max(row):
config = json.loads(row['min_max_config'])
return pd.Series({'min': sum(config['min']), 'max': sum(config['max'])})
output_df = df.join(df.apply(extract_min_max, axis=1))
This will add two new columns, min and max, to the original dataframe df
, containing the sum of the values of the min and max lists respectively, for each row.
You could also use pandas.io.json.json_normalize()
to extract the nested JSON into separate columns and then use the sum() function to calculate the sum of the min and max columns.
import json
from pandas.io.json import json_normalize
df = df.join(json_normalize(df['min_max_config'].apply(json.loads)))
df['min'] = df.min.apply(lambda x: sum(x))
df['max'] = df.max.apply(lambda x: sum(x))
Here is one way to do it with the help of ast.literal_eval
and pandas.Series.str
:
from ast import literal_eval
df["min_max_config"] = df["min_max_config"].apply(literal_eval)
out = df.assign(**{k: df['min_max_config'].str[k].apply(np.sum) for k in ["min", "max"]})
Output :
print(out)
ID min_max_config min max
0 1 {'min': [0, 1, 0.1, 0, 0, 0], 'max': [0, 1, 0.4, 0, 0, 0]} 1.1 1.4
1 2 {'min': [0, 1, 0.1, 0, 0, 0], 'max': [0, 1, 0.5, 0, 0, 0]} 1.1 1.5
2 3 {'min': [0, 1, 0.6, 0, 0, 0], 'max': [0, 1, 0.7, 0, 0, 0]} 1.6 1.7
3 4 {'min': [0, 1, 0.8, 0, 0, 0], 'max': [0, 1, 0.2, 0, 0, 0]} 1.8 1.2
I have a string-type pandas dataframe column:
{"min":[0,1,0.1,0,0,0], "max":[0,1,0.4,0,0,0]}
df:
ID min_max_config
1 {"min":[0,1,0.1,0,0,0], "max":[0,1,0.4,0,0,0]}
2 {"min":[0,1,0.1,0,0,0], "max":[0,1,0.5,0,0,0]}
3 {"min":[0,1,0.6,0,0,0], "max":[0,1,0.7,0,0,0]}
4 {"min":[0,1,0.8,0,0,0], "max":[0,1,0.2,0,0,0]}
I want to make separate columns out of the sum of the values of min and max:
output_df:
ID. min. max min_max_config
1. 1.1 1.4 {"min":[0,1,0.1,0,0,0], "max":[0,1,0.4,0,0,0]}
2. 1.1 1.5 {"min":[0,1,0.1,0,0,0], "max":[0,1,0.5,0,0,0]}
3. 1.6 1.7 {"min":[0,1,0.6,0,0,0], "max":[0,1,0.7,0,0,0]}
4. 1.8 1.2 {"min":[0,1,0.8,0,0,0], "max":[0,1,0.2,0,0,0]}
How to achieve this output
You can use the apply() function to iterate through each row of the dataframe, and the json library to parse the JSON strings in the min_max_config column. Here is an example of how you can achieve this:
import json
def extract_min_max(row):
config = json.loads(row['min_max_config'])
return pd.Series({'min': sum(config['min']), 'max': sum(config['max'])})
output_df = df.join(df.apply(extract_min_max, axis=1))
This will add two new columns, min and max, to the original dataframe df
, containing the sum of the values of the min and max lists respectively, for each row.
You could also use pandas.io.json.json_normalize()
to extract the nested JSON into separate columns and then use the sum() function to calculate the sum of the min and max columns.
import json
from pandas.io.json import json_normalize
df = df.join(json_normalize(df['min_max_config'].apply(json.loads)))
df['min'] = df.min.apply(lambda x: sum(x))
df['max'] = df.max.apply(lambda x: sum(x))
Here is one way to do it with the help of ast.literal_eval
and pandas.Series.str
:
from ast import literal_eval
df["min_max_config"] = df["min_max_config"].apply(literal_eval)
out = df.assign(**{k: df['min_max_config'].str[k].apply(np.sum) for k in ["min", "max"]})
Output :
print(out)
ID min_max_config min max
0 1 {'min': [0, 1, 0.1, 0, 0, 0], 'max': [0, 1, 0.4, 0, 0, 0]} 1.1 1.4
1 2 {'min': [0, 1, 0.1, 0, 0, 0], 'max': [0, 1, 0.5, 0, 0, 0]} 1.1 1.5
2 3 {'min': [0, 1, 0.6, 0, 0, 0], 'max': [0, 1, 0.7, 0, 0, 0]} 1.6 1.7
3 4 {'min': [0, 1, 0.8, 0, 0, 0], 'max': [0, 1, 0.2, 0, 0, 0]} 1.8 1.2