Exploding specific type of json, into columns pandas
Question:
So i have the following sample json, contained into a series of mine
s = pd.Series(['{"city":"Uberlândia","bot-origin":null,"campaign-source":"carrinho-abandonado-ecommerce-sms","lastState":"productAvailabilityCepInDatabaseEqualTrue","main-installation-date":null,"userid":"[email protected]","full-name":null,"alternative-installation-date":null,"chosen-product":"Internet","bank":null,"postalcode":"38405328","due-date":null,"cpf":"01548226041","origin-link":"","payment":null,"state":"MG","api-orders-hash-id":null,"email":null,"api-orders-error":null,"plan-name":null,"userphone":"34 9342-8011","plan-offer":null,"completed-address":"38405328 - R IGUACU, 1289 - UMUARAMA - null - Uberlândia - MG","type-of-person":"CPF","onboarding-simplified":null,"type-of-product":"Residencial","main-installation-period-day":null,"plan-value":null,"alternative-installation-period-day":null,"data-change":"false"}',
'{"city":"Uberlândia","bot-origin":null,"campaign-source":"carrinho-abandonado-ecommerce-sms","lastState":"productAvailabilityCepInDatabaseEqualTrue","main-installation-date":null,"userid":"[email protected]","full-name":null,"alternative-installation-date":null,"chosen-product":"Internet","bank":null,"postalcode":"38405328","due-date":null,"cpf":"01548226041","origin-link":"","payment":null,"state":"MG","api-orders-hash-id":null,"email":null,"api-orders-error":null,"plan-name":null,"userphone":"34 9342-8011","plan-offer":null,"completed-address":"38405328 - R IGUACU, 1289 - UMUARAMA - null - Uberlândia - MG","type-of-person":"CPF","onboarding-simplified":null,"type-of-product":"Residencial","main-installation-period-day":null,"plan-value":null,"alternative-installation-period-day":null,"data-change":"false"}',
'{"city":"Uberlândia","bot-origin":null,"campaign-source":"carrinho-abandonado-ecommerce-sms","lastState":"productAvailabilityAddressConfirmation","main-installation-date":null,"userid":"[email protected]","full-name":null,"alternative-installation-date":null,"chosen-product":"Internet","bank":null,"postalcode":"38405328","due-date":null,"cpf":"01548226041","origin-link":"","payment":null,"state":"MG","api-orders-hash-id":null,"email":null,"api-orders-error":null,"plan-name":null,"userphone":"34 9342-8011","plan-offer":null,"completed-address":"38405328 - R IGUACU, 1289 - UMUARAMA - null - Uberlândia - MG","type-of-person":"CPF","onboarding-simplified":null,"type-of-product":"Residencial","main-installation-period-day":null,"plan-value":null,"alternative-installation-period-day":null,"data-change":"false"}'])
I dont actually know a way of exploding this type of json, into columns, would appreciate some help? I tried json_normalize loads and so on but i get empty results.
Wanted result would be something like this:
df = pd.DataFrame({'city':['Uberlandia','Uberlandia','Uberlandia'],'bot-origin':[null,null,null]}) # There are more columns but you get the jist.
Since there were a lot of answers would appreciate if someone showed me the most time efficient way, have lots of rows.
Answers:
The following should work
import json
s = pd.Series(['{"city":"Uberlândia","bot-origin":null,"campaign-source":"carrinho-abandonado-ecommerce-sms","lastState":"productAvailabilityCepInDatabaseEqualTrue","main-installation-date":null,"userid":"[email protected]","full-name":null,"alternative-installation-date":null,"chosen-product":"Internet","bank":null,"postalcode":"38405328","due-date":null,"cpf":"01548226041","origin-link":"","payment":null,"state":"MG","api-orders-hash-id":null,"email":null,"api-orders-error":null,"plan-name":null,"userphone":"34 9342-8011","plan-offer":null,"completed-address":"38405328 - R IGUACU, 1289 - UMUARAMA - null - Uberlândia - MG","type-of-person":"CPF","onboarding-simplified":null,"type-of-product":"Residencial","main-installation-period-day":null,"plan-value":null,"alternative-installation-period-day":null,"data-change":"false"}',
'{"city":"Uberlândia","bot-origin":null,"campaign-source":"carrinho-abandonado-ecommerce-sms","lastState":"productAvailabilityCepInDatabaseEqualTrue","main-installation-date":null,"userid":"[email protected]","full-name":null,"alternative-installation-date":null,"chosen-product":"Internet","bank":null,"postalcode":"38405328","due-date":null,"cpf":"01548226041","origin-link":"","payment":null,"state":"MG","api-orders-hash-id":null,"email":null,"api-orders-error":null,"plan-name":null,"userphone":"34 9342-8011","plan-offer":null,"completed-address":"38405328 - R IGUACU, 1289 - UMUARAMA - null - Uberlândia - MG","type-of-person":"CPF","onboarding-simplified":null,"type-of-product":"Residencial","main-installation-period-day":null,"plan-value":null,"alternative-installation-period-day":null,"data-change":"false"}',
'{"city":"Uberlândia","bot-origin":null,"campaign-source":"carrinho-abandonado-ecommerce-sms","lastState":"productAvailabilityAddressConfirmation","main-installation-date":null,"userid":"[email protected]","full-name":null,"alternative-installation-date":null,"chosen-product":"Internet","bank":null,"postalcode":"38405328","due-date":null,"cpf":"01548226041","origin-link":"","payment":null,"state":"MG","api-orders-hash-id":null,"email":null,"api-orders-error":null,"plan-name":null,"userphone":"34 9342-8011","plan-offer":null,"completed-address":"38405328 - R IGUACU, 1289 - UMUARAMA - null - Uberlândia - MG","type-of-person":"CPF","onboarding-simplified":null,"type-of-product":"Residencial","main-installation-period-day":null,"plan-value":null,"alternative-installation-period-day":null,"data-change":"false"}'])
df = s.apply(lambda row: pd.Series(json.loads(row)))
Output:
city
bot-origin
campaign-source
lastState
main-installation-date
userid
full-name
alternative-installation-date
chosen-product
bank
postalcode
due-date
cpf
origin-link
payment
state
api-orders-hash-id
email
api-orders-error
plan-name
userphone
plan-offer
completed-address
type-of-person
onboarding-simplified
type-of-product
main-installation-period-day
plan-value
alternative-installation-period-day
data-change
0
Uberlândia
carrinho-abandonado-ecommerce-sms
productAvailabilityCepInDatabaseEqualTrue
[email protected]
Internet
38405328
01548226041
MG
34 9342-8011
38405328 – R IGUACU, 1289 – UMUARAMA – null – Uberlândia – MG
CPF
Residencial
false
1
Uberlândia
carrinho-abandonado-ecommerce-sms
productAvailabilityCepInDatabaseEqualTrue
[email protected]
Internet
38405328
01548226041
MG
34 9342-8011
38405328 – R IGUACU, 1289 – UMUARAMA – null – Uberlândia – MG
CPF
Residencial
false
2
Uberlândia
carrinho-abandonado-ecommerce-sms
productAvailabilityAddressConfirmation
[email protected]
Internet
38405328
01548226041
MG
34 9342-8011
38405328 – R IGUACU, 1289 – UMUARAMA – null – Uberlândia – MG
CPF
Residencial
false
The null
values are converted to None
. If you want to convert them to NaN you can use .fillna(np.nan)
:
import json
import numpy as np
df = (
s.apply(lambda row: pd.Series(json.loads(row)))
.fillna(np.nan)
)
Try this:
df = pd.DataFrame.from_records(s.map(json.loads))
print(df)
only printed a part of the whole df for presentation.
city bot-origin campaign-source lastState main-installation-date
0 Uberlândia None carrinho-abandonado-ecommerce-sms productAvailabilityCepInDatabaseEqualTrue None
1 Uberlândia None carrinho-abandonado-ecommerce-sms productAvailabilityCepInDatabaseEqualTrue None
2 Uberlândia None carrinho-abandonado-ecommerce-sms productAvailabilityAddressConfirmation None
Looks like you have string
formatted json objects, first we need to convert them then pass them into a json_normalize
method.
import json
df1 = pd.json_normalize(s.map(json.loads))
city bot-origin campaign-source lastState ... main-installation-period-day plan-value alternative-installation-period-day data-change
0 Uberlândia None carrinho-abandonado-ecommerce-sms productAvailabilityCepInDatabaseEqualTrue ... None None None false
1 Uberlândia None carrinho-abandonado-ecommerce-sms productAvailabilityCepInDatabaseEqualTrue ... None None None false
2 Uberlândia None carrinho-abandonado-ecommerce-sms productAvailabilityAddressConfirmation ... None None None false
So i have the following sample json, contained into a series of mine
s = pd.Series(['{"city":"Uberlândia","bot-origin":null,"campaign-source":"carrinho-abandonado-ecommerce-sms","lastState":"productAvailabilityCepInDatabaseEqualTrue","main-installation-date":null,"userid":"[email protected]","full-name":null,"alternative-installation-date":null,"chosen-product":"Internet","bank":null,"postalcode":"38405328","due-date":null,"cpf":"01548226041","origin-link":"","payment":null,"state":"MG","api-orders-hash-id":null,"email":null,"api-orders-error":null,"plan-name":null,"userphone":"34 9342-8011","plan-offer":null,"completed-address":"38405328 - R IGUACU, 1289 - UMUARAMA - null - Uberlândia - MG","type-of-person":"CPF","onboarding-simplified":null,"type-of-product":"Residencial","main-installation-period-day":null,"plan-value":null,"alternative-installation-period-day":null,"data-change":"false"}',
'{"city":"Uberlândia","bot-origin":null,"campaign-source":"carrinho-abandonado-ecommerce-sms","lastState":"productAvailabilityCepInDatabaseEqualTrue","main-installation-date":null,"userid":"[email protected]","full-name":null,"alternative-installation-date":null,"chosen-product":"Internet","bank":null,"postalcode":"38405328","due-date":null,"cpf":"01548226041","origin-link":"","payment":null,"state":"MG","api-orders-hash-id":null,"email":null,"api-orders-error":null,"plan-name":null,"userphone":"34 9342-8011","plan-offer":null,"completed-address":"38405328 - R IGUACU, 1289 - UMUARAMA - null - Uberlândia - MG","type-of-person":"CPF","onboarding-simplified":null,"type-of-product":"Residencial","main-installation-period-day":null,"plan-value":null,"alternative-installation-period-day":null,"data-change":"false"}',
'{"city":"Uberlândia","bot-origin":null,"campaign-source":"carrinho-abandonado-ecommerce-sms","lastState":"productAvailabilityAddressConfirmation","main-installation-date":null,"userid":"[email protected]","full-name":null,"alternative-installation-date":null,"chosen-product":"Internet","bank":null,"postalcode":"38405328","due-date":null,"cpf":"01548226041","origin-link":"","payment":null,"state":"MG","api-orders-hash-id":null,"email":null,"api-orders-error":null,"plan-name":null,"userphone":"34 9342-8011","plan-offer":null,"completed-address":"38405328 - R IGUACU, 1289 - UMUARAMA - null - Uberlândia - MG","type-of-person":"CPF","onboarding-simplified":null,"type-of-product":"Residencial","main-installation-period-day":null,"plan-value":null,"alternative-installation-period-day":null,"data-change":"false"}'])
I dont actually know a way of exploding this type of json, into columns, would appreciate some help? I tried json_normalize loads and so on but i get empty results.
Wanted result would be something like this:
df = pd.DataFrame({'city':['Uberlandia','Uberlandia','Uberlandia'],'bot-origin':[null,null,null]}) # There are more columns but you get the jist.
Since there were a lot of answers would appreciate if someone showed me the most time efficient way, have lots of rows.
The following should work
import json
s = pd.Series(['{"city":"Uberlândia","bot-origin":null,"campaign-source":"carrinho-abandonado-ecommerce-sms","lastState":"productAvailabilityCepInDatabaseEqualTrue","main-installation-date":null,"userid":"[email protected]","full-name":null,"alternative-installation-date":null,"chosen-product":"Internet","bank":null,"postalcode":"38405328","due-date":null,"cpf":"01548226041","origin-link":"","payment":null,"state":"MG","api-orders-hash-id":null,"email":null,"api-orders-error":null,"plan-name":null,"userphone":"34 9342-8011","plan-offer":null,"completed-address":"38405328 - R IGUACU, 1289 - UMUARAMA - null - Uberlândia - MG","type-of-person":"CPF","onboarding-simplified":null,"type-of-product":"Residencial","main-installation-period-day":null,"plan-value":null,"alternative-installation-period-day":null,"data-change":"false"}',
'{"city":"Uberlândia","bot-origin":null,"campaign-source":"carrinho-abandonado-ecommerce-sms","lastState":"productAvailabilityCepInDatabaseEqualTrue","main-installation-date":null,"userid":"[email protected]","full-name":null,"alternative-installation-date":null,"chosen-product":"Internet","bank":null,"postalcode":"38405328","due-date":null,"cpf":"01548226041","origin-link":"","payment":null,"state":"MG","api-orders-hash-id":null,"email":null,"api-orders-error":null,"plan-name":null,"userphone":"34 9342-8011","plan-offer":null,"completed-address":"38405328 - R IGUACU, 1289 - UMUARAMA - null - Uberlândia - MG","type-of-person":"CPF","onboarding-simplified":null,"type-of-product":"Residencial","main-installation-period-day":null,"plan-value":null,"alternative-installation-period-day":null,"data-change":"false"}',
'{"city":"Uberlândia","bot-origin":null,"campaign-source":"carrinho-abandonado-ecommerce-sms","lastState":"productAvailabilityAddressConfirmation","main-installation-date":null,"userid":"[email protected]","full-name":null,"alternative-installation-date":null,"chosen-product":"Internet","bank":null,"postalcode":"38405328","due-date":null,"cpf":"01548226041","origin-link":"","payment":null,"state":"MG","api-orders-hash-id":null,"email":null,"api-orders-error":null,"plan-name":null,"userphone":"34 9342-8011","plan-offer":null,"completed-address":"38405328 - R IGUACU, 1289 - UMUARAMA - null - Uberlândia - MG","type-of-person":"CPF","onboarding-simplified":null,"type-of-product":"Residencial","main-installation-period-day":null,"plan-value":null,"alternative-installation-period-day":null,"data-change":"false"}'])
df = s.apply(lambda row: pd.Series(json.loads(row)))
Output:
city | bot-origin | campaign-source | lastState | main-installation-date | userid | full-name | alternative-installation-date | chosen-product | bank | postalcode | due-date | cpf | origin-link | payment | state | api-orders-hash-id | api-orders-error | plan-name | userphone | plan-offer | completed-address | type-of-person | onboarding-simplified | type-of-product | main-installation-period-day | plan-value | alternative-installation-period-day | data-change | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Uberlândia | carrinho-abandonado-ecommerce-sms | productAvailabilityCepInDatabaseEqualTrue | [email protected] | Internet | 38405328 | 01548226041 | MG | 34 9342-8011 | 38405328 – R IGUACU, 1289 – UMUARAMA – null – Uberlândia – MG | CPF | Residencial | false | |||||||||||||||||
1 | Uberlândia | carrinho-abandonado-ecommerce-sms | productAvailabilityCepInDatabaseEqualTrue | [email protected] | Internet | 38405328 | 01548226041 | MG | 34 9342-8011 | 38405328 – R IGUACU, 1289 – UMUARAMA – null – Uberlândia – MG | CPF | Residencial | false | |||||||||||||||||
2 | Uberlândia | carrinho-abandonado-ecommerce-sms | productAvailabilityAddressConfirmation | [email protected] | Internet | 38405328 | 01548226041 | MG | 34 9342-8011 | 38405328 – R IGUACU, 1289 – UMUARAMA – null – Uberlândia – MG | CPF | Residencial | false |
The null
values are converted to None
. If you want to convert them to NaN you can use .fillna(np.nan)
:
import json
import numpy as np
df = (
s.apply(lambda row: pd.Series(json.loads(row)))
.fillna(np.nan)
)
Try this:
df = pd.DataFrame.from_records(s.map(json.loads))
print(df)
only printed a part of the whole df for presentation.
city bot-origin campaign-source lastState main-installation-date
0 Uberlândia None carrinho-abandonado-ecommerce-sms productAvailabilityCepInDatabaseEqualTrue None
1 Uberlândia None carrinho-abandonado-ecommerce-sms productAvailabilityCepInDatabaseEqualTrue None
2 Uberlândia None carrinho-abandonado-ecommerce-sms productAvailabilityAddressConfirmation None
Looks like you have string
formatted json objects, first we need to convert them then pass them into a json_normalize
method.
import json
df1 = pd.json_normalize(s.map(json.loads))
city bot-origin campaign-source lastState ... main-installation-period-day plan-value alternative-installation-period-day data-change
0 Uberlândia None carrinho-abandonado-ecommerce-sms productAvailabilityCepInDatabaseEqualTrue ... None None None false
1 Uberlândia None carrinho-abandonado-ecommerce-sms productAvailabilityCepInDatabaseEqualTrue ... None None None false
2 Uberlândia None carrinho-abandonado-ecommerce-sms productAvailabilityAddressConfirmation ... None None None false