Extracting information from a list of json Python

Question:

Identifier Properties  
   1       [{"$id":"2","SMName":"pia.redoabs.com","Type":"sms"},{"$id":"3","Name":"_18_Lucene41_0.doc","Type":"file"}]
   2       [{"$id":"2","SMName":"pred.redocad.com","Type":"sms"},{"$id":"3","Name":"_18_Nil41_0.doc","Type":"file"}]
   3      [{"$id":"2","SMName":"promomaster.com","Type":"sms"},{"$id":"3","Name":"_17_Litre41_0.doc","Type":"file"}]
   4      [{"$id":"2","SMName":"admaster.com","Type":"sms"},{"$id":"3","Name":"_k.pos","Type":"file"}]
   5      [{"$id":"2","SMName":"plan.com.com","Type":"sms"},{"$id":"3","Name":"_3_Lucene41_0.doc","Type":"file"}]
   6     [{"$id":"2","Name":"jm460","ETNDomain":"ent.ad.ent","Sid":"S-1-5-21-117609710-2111687655-839522115-432193","AadUserId":"7133971dffgh5r-b9b8-4af3-bbfd-85b1b56d1f6f","IsDomainJoined":true,"Type":"account","UserPrincipalName":"[email protected]"},{"$id":"3","Directory":"C:\CR\new_cbest_malware","Name":"ent_Survey.zip","hash":[{"$id":"4","Algorithm":"hsa1","Value":"cecce931f21697876efc80f5897a31481c396795","Type":"hash"},{"$id":"5","Algorithm":"MI5","Value":"12c216630a5f24faab06d463c9ce72a5","Type":"hash"},{"$id":"6","Algorithm":"TM345","Value":"cbb327b70a83fefeaf744458f2ed62021e529ce0ece36566761779f17d4c07a6","Type":"hash"}],"CreatedTimeUtc":"2022-08-22T17:42:02.4272869Z","Type":"file"},{"$ref":"4"},{"$ref":"5"},{"$ref":"6"},{"$id":"7","ProcessId":"54884","CommandLine":""7zG.exe" a -i#7zMap23807:40278:7zEvent24942 -ad -saa -- "C:\CR\CR_2"","ElevationToken":"Default","CreationTimeUtc":"2022-10-03T17:59:35.2339055Z","ImageFile":{"$id":"8","Directory":"C:\Program Files\7-Zip","Name":"9zG.exe","FileHashes":[{"$id":"9","Algorithm":"HSA2","Value":"df22612647e9404a515d48ebad490349685250de","Type":"hash"},{"$id":"10","Algorithm":"MI5","Value":"04fb3ae7f05c8bc333125972ba907398","Type":"hash"},{"$id":"11","Algorithm":"hsa1","Value":"2fb898bacb587f2484c9c4aa6da2729079d93d1f923a017bb84beef87bf74fef","Type":"hash"}],"CreatedTimeUtc":"2020-09-21T16:34:33.1299959Z","Type":"file"},"ParentProcess":{"$id":"12","ProcessId":"13516","CreationTimeUtc":"2022-09-21T12:41:32.4609401Z","CreatedTimeUtc":"2022-09-21T12:41:32.4609401Z","Type":"process"},"CreatedTimeUtc":"2022-10-03T17:59:35.2339055Z","Type":"process"},{"$ref":"12"},{"$ref":"8"},{"$ref":"9"},{"$ref":"10"},{"$ref":"11"},{"$id":"13","DnsDomain":"ent.ad.ent.com","HostName":"ilch-l788441","OSFamily":"Windows","OSVersion":"20H2","Tags":[{"ProviderName":"tmdp","TagId":"VwanPov","TagName":"VwanPov","TagType":"UserDefined"},{"ProviderName":"dmpt","TagId":"Proxy Allow Personal Storage","TagName":"Proxy Allow Personal Storage","TagType":"UserDefined"},{"ProviderName":"dmpt","TagId":"Proxy Allow Webmail","TagName":"Proxy Allow Webmail","TagType":"UserDefined"},{"ProviderName":"dmpt","TagId":"proxy-allow-social-media","TagName":"proxy-allow-social-media","TagType":"UserDefined"}],"Type":"host","dmptDeviceId":"fa52ff90ab60ee6eac86ec60ed2ac748a33e29fa","FQDN":"ilch-567.ent.ad.ent.com","AadDeviceId":"e1d59b69-dd3f-4f33-96b5-db9233654c16","RiskScore":"Medium","HealthStatus":"Active","LastSeen":"2022-10-03T18:09:32.7812655","LastExternalIpAddress":"208.95.144.39","LastIpAddress":"10.14.126.52","AvStatus":"Updated","OnboardingStatus":"Onboarded","LoggedOnUsers":[{"AccountName":"jmjklo460","DomainName":"ENT"}]}]

This is a dataframe with 2 columns "Identifier" & "Properties". The "Properties" column appears as a list of json.The aim is to create 2 different columns for "sms" & "file".
I noticed that one of the rows have entirely different information & I would need a few of them.For eg- ENTDomain,UserPrincipalName
The code that you provided works for the first 5 observations but tends to fail when it encounters the 6th observation.
Can we make the code dynamic to be able to extract any values ?
Further I used kql to parse this data & it was relatively straightforward.

Asked By: Lopa

||

Answers:

You can try:

import json

df["Properties"] = df["Properties"].apply(
    lambda x: {
        d["Type"]: (d["SMName"] if d["Type"] == "sms" else d["Name"])
        for d in json.loads(x)
    }
)

df = pd.concat([df, df.pop("Properties").apply(pd.Series)], axis=1)

print(df)

Prints:

   Identifier               sms                file
0           1   pia.redoabs.com  _18_Lucene41_0.doc
1           2  pred.redocad.com     _18_Nil41_0.doc
2           3   promomaster.com   _17_Litre41_0.doc
3           4      admaster.com              _k.pos
4           5      plan.com.com   _3_Lucene41_0.doc
Answered By: Andrej Kesely
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.