Read CSV with JSON feature
Question:
I am trying to read a large CSV which includes JSON features (location here). For the first, say 100 lines, the file looks like this:
Time,location,labelA,labelB
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},nan,nan
I followed this question to parse the location column. The solution basically defines a helper as:
def CustomParser(data):
import json
j1 = json.loads(data)
return j1
and then
df=pd.read_csv('data.csv', nrows=100,converters={'location':CustomParser},header=0)
I get the following error which is related to JSON format:
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Q1: How can I parse the feature location onto new columns?
Q2 (for general case): For nrows>100 in the data, also the last features (labelA and labelB) have JSON formats with different key and value. How can I possibly read the entire CSV with parsing every feature which includes JSON (even partially)?
test100v1.csv
Zeit,device,Text,Typ,Position,Data,Data1,Data2
2019-09-10T12:13:24.000Z,CO 5052994,Lifesign,cgmon_Lifesign,{"lng":12.9975201,"alt":413.0,"time":"2019-09-10T12:09:58Z","error":7.0,"lat":47.8258582},N/A,N/A,N/A
2019-09-10T12:13:23.000Z,CO 5050450,Lifesign,cgmon_Lifesign,{"lng":14.3195367,"alt":260.0,"time":"2019-09-10T12:12:37Z","error":10.0,"lat":48.2695571},N/A,N/A,N/A
2019-09-10T12:13:21.000Z,CO 5050903,Location updated,c8y_LocationUpdate,{"lng":15.2678846,"alt":494.0,"time":"2019-09-10T12:13:21Z","error":11.0,"lat":48.7477466},N/A,N/A,N/A
2019-09-10T12:13:20.000Z,CO 5051466,Location updated,c8y_LocationUpdate,{"lng":17.64815,"alt":106.0,"time":"2019-09-10T12:13:20Z","error":3.0,"lat":47.6851036},N/A,N/A,N/A
2019-09-10T12:13:20.000Z,CO 5050569,Location updated,c8y_LocationUpdate,{"lng":14.0582286,"alt":286.0,"time":"2019-09-10T12:13:20Z","error":14.0,"lat":48.1808019},N/A,N/A,N/A
2019-09-10T12:13:18.000Z,CO 5050666,Location updated,c8y_LocationUpdate,{"lng":14.5788998,"alt":25.0,"time":"2019-09-10T12:13:18Z","error":12.0,"lat":53.4233772},N/A,N/A,N/A
2019-09-10T12:13:17.000Z,CO 5051113,Location updated,c8y_LocationUpdate,{"lng":14.325237,"alt":254.0,"time":"2019-09-10T12:13:17Z","error":13.0,"lat":48.2600698},N/A,N/A,N/A
2019-09-10T12:13:10.000Z,CO 5050666,Lifesign,cgmon_Lifesign,{"lng":14.5788998,"alt":25.0,"time":"2019-09-10T12:13:18Z","error":12.0,"lat":53.4233772},N/A,N/A,N/A
2019-09-10T12:13:07.000Z,CO 5051887,Location updated,c8y_LocationUpdate,{"lng":13.8064589,"alt":510.0,"time":"2019-09-10T12:13:07Z","error":10.0,"lat":46.5672814},N/A,N/A,N/A
2019-09-10T12:12:58.000Z,CO 5051131,Lifesign,cgmon_Lifesign,{"lng":11.4933341,"alt":581.0,"time":"2019-09-10T12:08:43Z","error":13.0,"lat":47.2738262},N/A,N/A,N/A
2019-09-10T12:12:55.000Z,CO 5051696,Lifesign,cgmon_Lifesign,{"lng":14.3200391,"alt":249.0,"time":"2019-09-10T12:04:38Z","error":10.0,"lat":48.26912},N/A,N/A,N/A
2019-09-10T12:12:48.000Z,CO 5051326,Lifesign,cgmon_Lifesign,{"lng":9.7326865,"alt":403.0,"time":"2019-09-10T12:04:34Z","error":10.0,"lat":47.4595067},N/A,N/A,N/A
2019-09-10T12:12:47.000Z,CO 5052218,Lifesign,cgmon_Lifesign,{"lng":14.3197285,"alt":262.0,"time":"2019-09-10T12:10:11Z","error":9.0,"lat":48.2688562},N/A,N/A,N/A
2019-09-10T12:12:45.000Z,CO 5050405,Lifesign,cgmon_Lifesign,{"lng":14.2755301,"alt":253.0,"time":"2019-09-08T12:13:37Z","error":8.0,"lat":48.2468603},N/A,N/A,N/A
2019-09-10T12:12:44.000Z,CO 5050706,Lifesign,cgmon_Lifesign,{"lng":15.0519029,"alt":124.0,"time":"2019-09-10T12:07:07Z","error":13.0,"lat":59.0569164},N/A,N/A,N/A
2019-09-10T12:12:42.000Z,CO 5050903,Lifesign,cgmon_Lifesign,{"lng":15.2678846,"alt":494.0,"time":"2019-09-10T12:13:21Z","error":11.0,"lat":48.7477466},N/A,N/A,N/A
2019-09-10T12:12:38.000Z,CO 5051303,Lifesign,cgmon_Lifesign,{"lng":21.9561564,"alt":244.0,"time":"2019-09-10T09:04:08Z","error":11.0,"lat":42.9978861},N/A,N/A,N/A
2019-09-10T12:12:37.000Z,CO 5051558,Location updated,c8y_LocationUpdate,{"lng":13.806765,"alt":514.0,"time":"2019-09-10T12:12:37Z","error":6.0,"lat":46.5672868},N/A,N/A,N/A
2019-09-10T12:12:37.000Z,CO 5050450,Location updated,c8y_LocationUpdate,{"lng":14.3195367,"alt":260.0,"time":"2019-09-10T12:12:37Z","error":10.0,"lat":48.2695571},N/A,N/A,N/A
2019-09-10T12:12:37.000Z,CO 5050450,Location updated,c8y_LocationUpdate,{"lng":14.3195367,"alt":260.0,"time":"2019-09-10T12:12:37Z","error":10.0,"lat":48.2695571},N/A,N/A,N/A
2019-09-10T12:12:26.000Z,CO 5050408,Lifesign,cgmon_Lifesign,{"lng":14.2761472,"alt":280.0,"time":"2019-09-08T12:13:28Z","error":11.0,"lat":48.246868},N/A,N/A,N/A
2019-09-10T12:12:25.000Z,CO 5051418,Location updated,c8y_LocationUpdate,{"lng":15.5343521,"alt":550.0,"time":"2019-09-10T12:12:25Z","error":11.0,"lat":48.7483843},N/A,N/A,N/A
2019-09-10T12:12:24.000Z,CO 5050556,Location updated,c8y_LocationUpdate,{"lng":13.0783658,"alt":435.0,"time":"2019-09-10T12:12:24Z","error":6.0,"lat":47.7692905},N/A,N/A,N/A
2019-09-10T12:12:22.000Z,CO 5052730,Lifesign,cgmon_Lifesign,{"lng":14.3180816,"alt":251.0,"time":"2019-09-10T12:07:29Z","error":14.0,"lat":48.2771342},N/A,N/A,N/A
2019-09-10T12:12:11.000Z,CO 5051654,Location updated,c8y_LocationUpdate,{"lng":15.3298821,"alt":404.0,"time":"2019-09-10T12:12:11Z","error":13.0,"lat":47.1319909},N/A,N/A,N/A
2019-09-10T12:12:01.000Z,CO 5051400,Location updated,c8y_LocationUpdate,{"lng":13.4580769,"alt":306.0,"time":"2019-09-10T12:12:01Z","error":6.0,"lat":48.4494078},N/A,N/A,N/A
2019-09-10T12:11:25.000Z,CO 5050495,Location updated,c8y_LocationUpdate,{"lng":13.3380207,"alt":423.0,"time":"2019-09-10T12:11:25Z","error":14.0,"lat":48.6001935},N/A,N/A,N/A
2019-09-10T12:11:15.000Z,CO 5052483,Motion started,c8y_MotionDetected,{"lng":12.0622763,"alt":511.0,"time":"2019-09-10T12:11:04Z","error":5.0,"lat":47.4938857},N/A,N/A,N/A
2019-09-10T12:11:13.000Z,CO 5052999,Location updated,c8y_LocationUpdate,{"lng":13.06406,"alt":425.0,"time":"2019-09-10T12:11:13Z","error":5.0,"lat":47.8167399},N/A,N/A,N/A
2019-09-10T12:11:04.000Z,CO 5052483,Location updated,c8y_LocationUpdate,{"lng":12.0622763,"alt":511.0,"time":"2019-09-10T12:11:04Z","error":5.0,"lat":47.4938857},N/A,N/A,N/A
2019-09-10T12:11:01.000Z,CO 5051844,Location updated,c8y_LocationUpdate,{"lng":11.5022149,"alt":556.0,"time":"2019-09-10T12:11:01Z","error":6.0,"lat":47.2765674},N/A,N/A,N/A
2019-09-10T12:11:01.000Z,CO 5051920,Lifesign,cgmon_Lifesign,{"lng":15.0575633,"alt":619.0,"time":"2019-09-10T12:10:44Z","error":13.0,"lat":47.3821983},N/A,N/A,N/A
2019-09-10T12:10:59.000Z,CO 5051679,Location updated,c8y_LocationUpdate,{"lng":15.0565198,"alt":599.0,"time":"2019-09-10T12:10:59Z","error":14.0,"lat":47.3821768},N/A,N/A,N/A
2019-09-10T12:10:55.000Z,CO 5050630,Location updated,c8y_LocationUpdate,{"lng":15.0587754,"alt":596.0,"time":"2019-09-10T12:10:55Z","error":14.0,"lat":47.3820239},N/A,N/A,N/A
2019-09-10T12:10:52.000Z,CO 5051844,Lifesign,cgmon_Lifesign,{"lng":11.5022149,"alt":556.0,"time":"2019-09-10T12:11:01Z","error":6.0,"lat":47.2765674},N/A,N/A,N/A
2019-09-10T12:10:51.000Z,CO 5052999,Lifesign,cgmon_Lifesign,{"lng":13.06406,"alt":425.0,"time":"2019-09-10T12:11:13Z","error":5.0,"lat":47.8167399},N/A,N/A,N/A
2019-09-10T12:10:50.000Z,CO 5051921,Lifesign,cgmon_Lifesign,{"lng":15.0581282,"alt":606.0,"time":"2019-09-10T12:10:36Z","error":6.0,"lat":47.3817808},N/A,N/A,N/A
2019-09-10T12:10:49.000Z,CO 5051679,Lifesign,cgmon_Lifesign,{"lng":15.0565198,"alt":599.0,"time":"2019-09-10T12:10:59Z","error":14.0,"lat":47.3821768},N/A,N/A,N/A
2019-09-10T12:10:47.000Z,CO 5050630,Lifesign,cgmon_Lifesign,{"lng":15.0587754,"alt":596.0,"time":"2019-09-10T12:10:55Z","error":14.0,"lat":47.3820239},N/A,N/A,N/A
2019-09-10T12:10:44.000Z,CO 5051920,Location updated,c8y_LocationUpdate,{"lng":15.0575633,"alt":619.0,"time":"2019-09-10T12:10:44Z","error":13.0,"lat":47.3821983},N/A,N/A,N/A
2019-09-10T12:10:41.000Z,CO 5051088,Location updated,c8y_LocationUpdate,{"lng":16.6432683,"alt":161.0,"time":"2019-09-10T12:10:41Z","error":8.0,"lat":48.3200659},N/A,N/A,N/A
2019-09-10T12:10:41.000Z,CO 5050020,Location updated,c8y_LocationUpdate,{"lng":15.9287275,"alt":193.0,"time":"2019-09-10T12:10:41Z","error":8.0,"lat":48.3246395},N/A,N/A,N/A
2019-09-10T12:10:40.000Z,CO 5052681,Location updated,c8y_LocationUpdate,{"lng":16.4388427,"alt":173.0,"time":"2019-09-10T12:10:40Z","error":8.0,"lat":48.1359584},N/A,N/A,N/A
2019-09-10T12:10:36.000Z,CO 5051921,Location updated,c8y_LocationUpdate,{"lng":15.0581282,"alt":606.0,"time":"2019-09-10T12:10:36Z","error":6.0,"lat":47.3817808},N/A,N/A,N/A
2019-09-10T12:10:35.000Z,CO 5051406,Location updated,c8y_LocationUpdate,{"lng":19.0824957,"alt":108.0,"time":"2019-09-10T12:10:35Z","error":7.0,"lat":47.4680908},N/A,N/A,N/A
2019-09-10T12:10:33.000Z,CO 5052676,Location updated,c8y_LocationUpdate,{"lng":16.4368017,"alt":166.0,"time":"2019-09-10T12:10:33Z","error":7.0,"lat":48.1376442},N/A,N/A,N/A
2019-09-10T12:10:33.000Z,CO 5051767,Location updated,c8y_LocationUpdate,{"lng":14.3252332,"alt":266.0,"time":"2019-09-10T12:10:33Z","error":6.0,"lat":48.2598268},N/A,N/A,N/A
2019-09-10T12:10:32.000Z,CO 5050710,Location updated,c8y_LocationUpdate,{"lng":16.4767327,"alt":164.0,"time":"2019-09-10T12:10:32Z","error":5.0,"lat":48.2780685},N/A,N/A,N/A
2019-09-10T12:10:32.000Z,CO 5050565,Location updated,c8y_LocationUpdate,{"lng":15.0918659,"alt":544.0,"time":"2019-09-10T12:10:32Z","error":12.0,"lat":47.3648989},N/A,N/A,N/A
2019-09-10T12:10:31.000Z,CO 5051820,Location updated,c8y_LocationUpdate,{"lng":13.3525861,"alt":296.0,"time":"2019-09-10T12:10:31Z","error":12.0,"lat":48.5992175},N/A,N/A,N/A
2019-09-10T12:10:25.000Z,CO 5051464,Location updated,c8y_LocationUpdate,{"lng":14.3240624,"alt":271.0,"time":"2019-09-10T12:10:25Z","error":12.0,"lat":48.2607067},N/A,N/A,N/A
2019-09-10T12:10:22.000Z,CO 5050655,Lifesign,cgmon_Lifesign,{"lng":16.4315322,"alt":190.0,"time":"2019-09-10T12:01:19Z","error":13.0,"lat":48.1431609},N/A,N/A,N/A
2019-09-10T12:10:20.000Z,CO 5050581,Location updated,c8y_LocationUpdate,{"lng":13.045159,"alt":422.0,"time":"2019-09-10T12:10:20Z","error":11.0,"lat":47.8110246},N/A,N/A,N/A
2019-09-10T12:10:18.000Z,CO 5051496,Location updated,c8y_LocationUpdate,{"lng":14.3246911,"alt":271.0,"time":"2019-09-10T12:10:18Z","error":7.0,"lat":48.2602569},N/A,N/A,N/A
2019-09-10T12:10:17.000Z,CO 5051111,Location updated,c8y_LocationUpdate,{"lng":12.9975553,"alt":398.0,"time":"2019-09-10T12:10:17Z","error":11.0,"lat":47.8261238},N/A,N/A,N/A
2019-09-10T12:10:11.000Z,CO 5052218,Location updated,c8y_LocationUpdate,{"lng":14.3197285,"alt":262.0,"time":"2019-09-10T12:10:11Z","error":9.0,"lat":48.2688562},N/A,N/A,N/A
2019-09-10T12:10:11.000Z,CO 5052218,Location updated,c8y_LocationUpdate,{"lng":14.3197285,"alt":262.0,"time":"2019-09-10T12:10:11Z","error":9.0,"lat":48.2688562},N/A,N/A,N/A
2019-09-10T12:10:10.000Z,CO 5050889,Location updated,c8y_LocationUpdate,{"lng":15.2681143,"alt":526.0,"time":"2019-09-10T12:10:10Z","error":6.0,"lat":48.7494337},N/A,N/A,N/A
2019-09-10T12:10:06.000Z,CO 5050941,Location updated,c8y_LocationUpdate,{"lng":14.3259313,"alt":254.0,"time":"2019-09-10T12:10:06Z","error":12.0,"lat":48.2594256},N/A,N/A,N/A
2019-09-10T12:10:02.000Z,CO 5052698,Location updated,c8y_LocationUpdate,{"lng":16.4387847,"alt":155.0,"time":"2019-09-10T12:10:02Z","error":12.0,"lat":48.1361544},N/A,N/A,N/A
2019-09-10T12:09:58.000Z,CO 5052994,Location updated,c8y_LocationUpdate,{"lng":12.9975201,"alt":413.0,"time":"2019-09-10T12:09:58Z","error":7.0,"lat":47.8258582},N/A,N/A,N/A
2019-09-10T12:09:58.000Z,CO 5052994,Location updated,c8y_LocationUpdate,{"lng":12.9975201,"alt":413.0,"time":"2019-09-10T12:09:58Z","error":7.0,"lat":47.8258582},N/A,N/A,N/A
2019-09-10T12:09:53.000Z,CO 5050172,Location updated,c8y_LocationUpdate,{"lng":12.5073911,"alt":413.0,"time":"2019-09-10T12:09:53Z","error":6.0,"lat":48.2486859},N/A,N/A,N/A
2019-09-10T12:09:46.000Z,CO 5050036,Location updated,c8y_LocationUpdate,{"lng":15.5402195,"alt":546.0,"time":"2019-09-10T12:09:46Z","error":10.0,"lat":48.7482861},N/A,N/A,N/A
2019-09-10T12:09:42.000Z,CO 5051360,Location updated,c8y_LocationUpdate,{"lng":15.5412234,"alt":546.0,"time":"2019-09-10T12:09:42Z","error":14.0,"lat":48.7482963},N/A,N/A,N/A
2019-09-10T12:09:41.000Z,CO 5052254,Lifesign,cgmon_Lifesign,{"lng":14.1636504,"alt":497.0,"time":"2019-09-10T12:06:33Z","error":3.0,"lat":47.8020297},N/A,N/A,N/A
2019-09-10T12:09:36.000Z,CO 5051886,Location updated,c8y_LocationUpdate,{"lng":14.0586228,"alt":317.0,"time":"2019-09-10T12:09:36Z","error":4.0,"lat":48.1806919},N/A,N/A,N/A
2019-09-10T12:09:36.000Z,CO 5052270,Lifesign,cgmon_Lifesign,{"lng":14.1637559,"alt":497.0,"time":"2019-09-10T12:06:33Z","error":13.0,"lat":47.8015199},N/A,N/A,N/A
2019-09-10T12:09:35.000Z,CO 5050625,Location updated,c8y_LocationUpdate,{"lng":15.0918728,"alt":551.0,"time":"2019-09-10T12:09:35Z","error":14.0,"lat":47.3645485},N/A,N/A,N/A
2019-09-10T12:09:35.000Z,CO 5052165,Location updated,c8y_LocationUpdate,{"lng":13.8262713,"alt":535.0,"time":"2019-09-10T12:09:35Z","error":14.0,"lat":46.5696408},N/A,N/A,N/A
2019-09-10T12:09:32.000Z,CO 5051569,Location updated,c8y_LocationUpdate,{"lng":15.0962545,"alt":251.0,"time":"2019-09-10T12:09:32Z","error":9.0,"lat":48.1569883},N/A,N/A,N/A
2019-09-10T12:09:29.000Z,CO 5051886,Lifesign,cgmon_Lifesign,{"lng":14.0586228,"alt":317.0,"time":"2019-09-10T12:09:36Z","error":4.0,"lat":48.1806919},N/A,N/A,N/A
2019-09-10T12:09:26.000Z,CO 5050079,Location updated,c8y_LocationUpdate,{"lng":14.3260754,"alt":273.0,"time":"2019-09-10T12:09:26Z","error":12.0,"lat":48.259309},N/A,N/A,N/A
2019-09-10T12:09:24.000Z,CO 5051608,Lifesign,cgmon_Lifesign,{"lng":13.0620331,"alt":443.0,"time":"2019-09-10T12:01:33Z","error":4.0,"lat":47.8183534},N/A,N/A,N/A
2019-09-10T12:09:22.000Z,CO 5050636,Location updated,c8y_LocationUpdate,{"lng":15.7496359,"alt":214.0,"time":"2019-09-10T12:09:22Z","error":10.0,"lat":48.3474868},N/A,N/A,N/A
2019-09-10T12:09:13.000Z,CO 5051374,Lifesign,cgmon_Lifesign,{"lng":16.2192937,"alt":290.0,"time":"2019-09-10T12:00:44Z","error":11.0,"lat":47.7971662},N/A,N/A,N/A
2019-09-10T12:09:13.000Z,CO 5050449,Lifesign,cgmon_Lifesign,{"lng":14.5795362,"alt":1.0,"time":"2019-09-10T11:58:43Z","error":5.0,"lat":53.4248321},N/A,N/A,N/A
2019-09-10T12:09:09.000Z,CO 5052285,Location updated,c8y_LocationUpdate,{"lng":14.3242807,"alt":279.0,"time":"2019-09-10T12:09:09Z","error":11.0,"lat":48.2603765},{"cgmon_TrackVersion":1,"cgmon_AccumulatedTrackLength":22.6966869,"cgmon_PrivateMappingData":"{"MappedCoords":[{"OriginalCoords":{"Longitude":14.3242807,"Latitude":48.2603765},"MappedCoords":{"Longitude":14.324294807614198,"Latitude":48.260394023504993},"Distance":2.05000634,"MappedObject":380848,"Source":352093,"Target":355952,"Length":0.5924257},{"OriginalCoords":{"Longitude":14.3242807,"Latitude":48.2603765},"MappedCoords":{"Longitude":14.324331226025482,"Latitude":48.260439145193047},"Distance":7.32469217,"MappedObject":384713,"Source":355935,"Target":355945,"Length":0.7556776},{"OriginalCoords":{"Longitude":14.3242807,"Latitude":48.2603765},"MappedCoords":{"Longitude":14.324228797434349,"Latitude":48.260311767396061},"Distance":7.55397675,"MappedObject":304419,"Source":278400,"Target":278401,"Length":0.2397567}],"LastCoordUsed":{"OriginalCoords":{"Longitude":14.3242807,"Latitude":48.2603765},"MappedCoords":{"Longitude":14.324228797434349,"Latitude":48.260311767396061},"Distance":7.55397675,"MappedObject":304419,"Source":278400,"Target":278401,"Length":0.2397567}}","cgmon_TrackSegmentIDs":[384707,384708,555759,555760,380849,384723,304419],"cgmon_TrackLength":0.7655877,"time":"2019-09-10T12:13:48.8192688+00:00","cgmon_MappedPoint":{"lng":14.324294807614198,"offset":2.05000634,"lat":48.26039402350499}},N/A,N/A
2019-09-10T12:09:09.000Z,CO 5052731,Lifesign,cgmon_Lifesign,{"lng":14.3181143,"alt":252.0,"time":"2019-09-09T11:59:34Z","error":12.0,"lat":48.2771772},N/A,N/A,N/A
2019-09-10T12:09:08.000Z,CO 5051642,Lifesign,cgmon_Lifesign,{"lng":14.163689,"alt":477.0,"time":"2019-09-10T12:06:20Z","error":10.0,"lat":47.8022479},N/A,N/A,N/A
2019-09-10T12:09:07.000Z,CO 5052267,Lifesign,cgmon_Lifesign,{"lng":14.1631847,"alt":471.0,"time":"2019-09-10T12:06:42Z","error":11.0,"lat":47.80162},N/A,N/A,N/A
2019-09-10T12:09:07.000Z,CO 5051478,Lifesign,cgmon_Lifesign,{"lng":14.1641262,"alt":497.0,"time":"2019-09-10T12:06:15Z","error":7.0,"lat":47.8003779},N/A,N/A,N/A
2019-09-10T12:09:01.000Z,CO 5052393,Lifesign,cgmon_Lifesign,{"lng":13.0494004,"alt":428.0,"time":"2019-09-10T12:03:39Z","error":11.0,"lat":47.8189722},N/A,N/A,N/A
2019-09-10T12:08:57.000Z,CO 5051020,Lifesign,cgmon_Lifesign,{"lng":16.2196522,"alt":287.0,"time":"2019-09-10T12:01:08Z","error":4.0,"lat":47.7972928},N/A,N/A,N/A
2019-09-10T12:08:51.000Z,CO 5050301,Location updated,c8y_LocationUpdate,{"lng":2.9992244,"alt":-2.0,"time":"2019-09-10T12:08:51Z","error":17.0,"lat":43.1661339},N/A,N/A,N/A
2019-09-10T12:08:50.000Z,CO 5051365,Location updated,c8y_LocationUpdate,{"lng":22.169639,"alt":60.0,"time":"2019-09-10T12:08:50Z","error":14.0,"lat":48.3902318},{"cgmon_TrackVersion":1,"cgmon_AccumulatedTrackLength":2148.951632500001,"cgmon_PrivateMappingData":"{"MappedCoords":[{"OriginalCoords":{"Longitude":22.169639,"Latitude":48.3902318},"MappedCoords":{"Longitude":22.169632788779996,"Latitude":48.390239719391744},"Distance":0.92519023,"MappedObject":1387861,"Source":1210580,"Target":1236897,"Length":0.8962704},{"OriginalCoords":{"Longitude":22.169639,"Latitude":48.3902318},"MappedCoords":{"Longitude":22.169663000344876,"Latitude":48.390201166002555},"Distance":3.55078237,"MappedObject":1388168,"Source":1236896,"Target":1210581,"Length":0.8169932},{"OriginalCoords":{"Longitude":22.169639,"Latitude":48.3902318},"MappedCoords":{"Longitude":22.169597136212552,"Latitude":48.390285304388065},"Distance":6.21585648,"MappedObject":1388154,"Source":1236890,"Target":1236891,"Length":0.9307675}],"LastCoordUsed":{"OriginalCoords":{"Longitude":22.169639,"Latitude":48.3902318},"MappedCoords":{"Longitude":22.169663000344876,"Latitude":48.390201166002555},"Distance":3.55078237,"MappedObject":1388168,"Source":1236896,"Target":1210581,"Length":0.8169932}}","cgmon_TrackSegmentIDs":[1356958,1356952,1356950,1387850,1387851,1387852,1387853,1387860,1357049,1388168],"cgmon_TrackLength":2.4847659999999996,"time":"2019-09-10T12:11:51.8831079+00:00","cgmon_MappedPoint":{"lng":22.169632788779996,"offset":0.92519023,"lat":48.390239719391744}},N/A,N/A
2019-09-10T12:08:48.000Z,CO 5050995,Location updated,c8y_LocationUpdate,{"lng":22.1701667,"alt":99.0,"time":"2019-09-10T12:08:48Z","error":11.0,"lat":48.3905254},{"cgmon_TrackVersion":1,"cgmon_AccumulatedTrackLength":3214.932654,"cgmon_PrivateMappingData":"{"MappedCoords":[{"OriginalCoords":{"Longitude":22.1701667,"Latitude":48.3905254},"MappedCoords":{"Longitude":22.170165780943215,"Latitude":48.390526570555245},"Distance":0.14129331,"MappedObject":1357050,"Source":1236896,"Target":1210581,"Length":0.8176738},{"OriginalCoords":{"Longitude":22.1701667,"Latitude":48.3905254},"MappedCoords":{"Longitude":22.170200136954,"Latitude":48.3904827851519},"Distance":4.9398585,"MappedObject":1388164,"Source":1210575,"Target":1236894,"Length":0.7718482},{"OriginalCoords":{"Longitude":22.1701667,"Latitude":48.3905254},"MappedCoords":{"Longitude":22.17013252678472,"Latitude":48.390569018631112},"Distance":5.06730103,"MappedObject":1388168,"Source":1236896,"Target":1210581,"Length":0.8169932}],"LastCoordUsed":{"OriginalCoords":{"Longitude":22.1701667,"Latitude":48.3905254},"MappedCoords":{"Longitude":22.17013252678472,"Latitude":48.390569018631112},"Distance":5.06730103,"MappedObject":1388168,"Source":1236896,"Target":1210581,"Length":0.8169932}}","cgmon_TrackSegmentIDs":[1356958,1356952,1356950,1387850,1387851,1387852,1387853,1387860,1357049,1388168],"cgmon_TrackLength":2.4847659999999996,"time":"2019-09-10T12:11:03.6011894+00:00","cgmon_MappedPoint":{"lng":22.170165780943215,"offset":0.14129331,"lat":48.390526570555245}},N/A,N/A
2019-09-10T12:08:43.000Z,CO 5051131,Location updated,c8y_LocationUpdate,{"lng":11.4933341,"alt":581.0,"time":"2019-09-10T12:08:43Z","error":13.0,"lat":47.2738262},N/A,N/A,N/A
2019-09-10T12:08:43.000Z,CO 5051131,Location updated,c8y_LocationUpdate,{"lng":11.4933341,"alt":581.0,"time":"2019-09-10T12:08:43Z","error":13.0,"lat":47.2738262},{"cgmon_TrackVersion":1,"cgmon_AccumulatedTrackLength":742.7216828999998,"cgmon_PrivateMappingData":"{"MappedCoords":[{"OriginalCoords":{"Longitude":11.4933341,"Latitude":47.2738262},"MappedCoords":{"Longitude":11.49333458398241,"Latitude":47.27383398720535},"Distance":0.86590321,"MappedObject":563332,"Source":502560,"Target":509686,"Length":0.115316},{"OriginalCoords":{"Longitude":11.4933341,"Latitude":47.2738262},"MappedCoords":{"Longitude":11.493332248142858,"Latitude":47.2737926515935},"Distance":3.72901274,"MappedObject":375313,"Source":502559,"Target":346447,"Length":0.150019},{"OriginalCoords":{"Longitude":11.4933341,"Latitude":47.2738262},"MappedCoords":{"Longitude":11.493342343757833,"Latitude":47.2742148522778},"Distance":43.21458138,"MappedObject":309483,"Source":283444,"Target":283445,"Length":0.1669769}],"LastCoordUsed":{"OriginalCoords":{"Longitude":11.4933341,"Latitude":47.2738262},"MappedCoords":{"Longitude":11.493332248142858,"Latitude":47.2737926515935},"Distance":3.72901274,"MappedObject":375313,"Source":502559,"Target":346447,"Length":0.150019}}","cgmon_TrackSegmentIDs":[],"cgmon_TrackLength":0.0,"time":"2019-09-10T12:11:24.3216749+00:00","cgmon_MappedPoint":{"lng":11.49333458398241,"offset":0.86590321,"lat":47.27383398720535}},N/A,N/A
2019-09-10T12:08:35.000Z,CO 5050866,Location updated,c8y_LocationUpdate,{"lng":14.3215058,"alt":267.0,"time":"2019-09-10T12:08:35Z","error":4.0,"lat":48.2636151},{"cgmon_TrackVersion":1,"cgmon_AccumulatedTrackLength":4978.0674611,"cgmon_PrivateMappingData":"{"MappedCoords":[{"OriginalCoords":{"Longitude":14.3215058,"Latitude":48.2636151},"MappedCoords":{"Longitude":14.321510549351785,"Latitude":48.263617838889992},"Distance":0.43737558,"MappedObject":555759,"Source":278547,"Target":355946,"Length":0.0388289},{"OriginalCoords":{"Longitude":14.3215058,"Latitude":48.2636151},"MappedCoords":{"Longitude":14.321524282808905,"Latitude":48.263622620442391},"Distance":1.53920182,"MappedObject":384719,"Source":278547,"Target":355950,"Length":0.0197342},{"OriginalCoords":{"Longitude":14.3215058,"Latitude":48.2636151},"MappedCoords":{"Longitude":14.321537838128785,"Latitude":48.263639596569256},"Distance":3.3491457,"MappedObject":384714,"Source":355943,"Target":355950,"Length":0.0222092}],"LastCoordUsed":{"OriginalCoords":{"Longitude":14.3215058,"Latitude":48.2636151},"MappedCoords":{"Longitude":14.321524282808905,"Latitude":48.263622620442391},"Distance":1.53920182,"MappedObject":384719,"Source":278547,"Target":355950,"Length":0.0197342}}","cgmon_TrackSegmentIDs":[384707,384708,384719],"cgmon_TrackLength":0.0800117,"time":"2019-09-10T12:10:46.0909247+00:00","cgmon_MappedPoint":{"lng":14.321510549351785,"offset":0.43737558,"lat":48.26361783888999}},N/A,N/A
2019-09-10T12:08:33.000Z,CO 5051872,Location updated,c8y_LocationUpdate,{"lng":9.1817503,"alt":317.0,"time":"2019-09-10T12:08:33Z","error":7.0,"lat":48.8762001},N/A,N/A,N/A
2019-09-10T12:08:33.000Z,CO 5051872,Location updated,c8y_LocationUpdate,{"lng":9.1817503,"alt":317.0,"time":"2019-09-10T12:08:33Z","error":7.0,"lat":48.8762001},{"cgmon_TrackVersion":1,"cgmon_AccumulatedTrackLength":2164.2141009,"cgmon_PrivateMappingData":"{"MappedCoords":[{"OriginalCoords":{"Longitude":9.1817503,"Latitude":48.8762001},"MappedCoords":{"Longitude":9.1817456392881081,"Latitude":48.876203640311175},"Distance":0.48084995,"MappedObject":271673,"Source":253908,"Target":253909,"Length":0.7860666},{"OriginalCoords":{"Longitude":9.1817503,"Latitude":48.8762001},"MappedCoords":{"Longitude":9.1817918660285152,"Latitude":48.876168565239695},"Distance":4.28556164,"MappedObject":271808,"Source":25656,"Target":253942,"Length":0.7500596},{"OriginalCoords":{"Longitude":9.1817503,"Latitude":48.8762001},"MappedCoords":{"Longitude":9.1817007140337878,"Latitude":48.876240680236577},"Distance":5.3165598,"MappedObject":53249,"Source":65777,"Target":65778,"Length":0.7357852}],"LastCoordUsed":{"OriginalCoords":{"Longitude":9.1817503,"Latitude":48.8762001},"MappedCoords":{"Longitude":9.1817918660285152,"Latitude":48.876168565239695},"Distance":4.28556164,"MappedObject":271808,"Source":25656,"Target":253942,"Length":0.7500596}}","cgmon_TrackSegmentIDs":[271722,271721,271808],"cgmon_TrackLength":0.8470511000000001,"time":"2019-09-10T12:13:19.9039541+00:00","cgmon_MappedPoint":{"lng":9.181745639288108,"offset":0.48084995,"lat":48.876203640311175}},N/A,N/A
2019-09-10T12:08:33.000Z,CO 5052718,Location updated,c8y_LocationUpdate,{"lng":14.3244721,"alt":261.0,"time":"2019-09-10T12:08:33Z","error":7.0,"lat":48.2604709},N/A,N/A,N/A
2019-09-10T12:08:32.000Z,CO 5051786,Location updated,c8y_LocationUpdate,{"lng":15.0934742,"alt":520.0,"time":"2019-09-10T12:08:32Z","error":8.0,"lat":47.3602516},{"cgmon_TrackVersion":1,"cgmon_AccumulatedTrackLength":955.3499446999999,"cgmon_PrivateMappingData":"{"MappedCoords":[{"OriginalCoords":{"Longitude":15.0934742,"Latitude":47.3602516},"MappedCoords":{"Longitude":15.093487876023856,"Latitude":47.360256466815642},"Distance":1.12824369,"MappedObject":543110,"Source":500082,"Target":495975,"Length":0.01726},{"OriginalCoords":{"Longitude":15.0934742,"Latitude":47.3602516},"MappedCoords":{"Longitude":15.093411454625754,"Latitude":47.360229436743339},"Distance":5.16984112,"MappedObject":543115,"Source":495974,"Target":495978,"Length":0.1121813},{"OriginalCoords":{"Longitude":15.0934742,"Latitude":47.3602516},"MappedCoords":{"Longitude":15.0935137,"Latitude":47.3601839},"Distance":8.09452479,"MappedObject":543111,"Source":495975,"Target":495976,"Length":0.0788724}],"LastCoordUsed":{"OriginalCoords":{"Longitude":15.0934742,"Latitude":47.3602516},"MappedCoords":{"Longitude":15.093411454625754,"Latitude":47.360229436743339},"Distance":5.16984112,"MappedObject":543115,"Source":495974,"Target":495978,"Length":0.1121813}}","cgmon_TrackSegmentIDs":[391386,391385,391384,543119,543118,543117,543116,543115],"cgmon_TrackLength":1.1705694,"time":"2019-09-10T12:13:03.6134601+00:00","cgmon_MappedPoint":{"lng":15.093487876023856,"offset":1.12824369,"lat":47.36025646681564}},N/A,N/A
2019-09-10T12:08:31.000Z,CO 5051710,Location updated,c8y_LocationUpdate,{"lng":13.5639021,"alt":90.0,"time":"2019-09-10T12:08:31Z","error":5.0,"lat":55.9589473},N/A,N/A,N/A
2019-09-10T12:08:25.000Z,CO 5050045,Location updated,c8y_LocationUpdate,{"lng":16.1018443,"alt":459.0,"time":"2019-09-10T12:08:25Z","error":6.0,"lat":47.5827225},{"cgmon_TrackVersion":1,"cgmon_AccumulatedTrackLength":8331.496018900007,"cgmon_PrivateMappingData":"{"MappedCoords":[{"OriginalCoords":{"Longitude":16.1018443,"Latitude":47.5827225},"MappedCoords":{"Longitude":16.101728908989276,"Latitude":47.58270774508388},"Distance":8.7637246,"MappedObject":378585,"Source":349716,"Target":349717,"Length":0.2371537},{"OriginalCoords":{"Longitude":16.1018443,"Latitude":47.5827225},"MappedCoords":{"Longitude":16.1015723781081,"Latitude":47.58268623088636},"Distance":20.67398342,"MappedObject":522532,"Source":349716,"Target":479666,"Length":0.3900395}],"LastCoordUsed":{"OriginalCoords":{"Longitude":16.1018443,"Latitude":47.5827225},"MappedCoords":{"Longitude":16.101728908989276,"Latitude":47.58270774508388},"Distance":8.7637246,"MappedObject":378585,"Source":349716,"Target":349717,"Length":0.2371537}}","cgmon_TrackSegmentIDs":[],"cgmon_TrackLength":0.0,"time":"2019-09-10T12:09:18.2576463+00:00","cgmon_MappedPoint":{"lng":16.101728908989276,"offset":8.7637246,"lat":47.58270774508388}},N/A,N/A
2019-09-10T12:08:19.000Z,CO 5050276,Lifesign,cgmon_Lifesign,{"lng":14.9097604,"alt":292.0,"time":"2019-09-10T12:04:01Z","error":12.0,"lat":48.1208139},N/A,N/A,N/A
2019-09-10T12:08:11.000Z,CO 5051153,Lifesign,cgmon_Lifesign,{"lng":15.2786873,"alt":476.0,"time":"2019-09-10T12:01:48Z","error":10.0,"lat":47.4239228},N/A,N/A,N/A
2019-09-10T12:08:02.000Z,CO 5051710,Lifesign,cgmon_Lifesign,{"lng":13.5639021,"alt":90.0,"time":"2019-09-10T12:08:31Z","error":5.0,"lat":55.9589473},N/A,N/A,N/A
Answers:
The problem here is that the commas inside your json
string are being treated as delimiters. You should modify the input data (if you don’t have direct access to the file, you can always read the contents into a list of strings using open
first).
Here are a few modification options that you can try:
Option 1: Quote json
string with single quote
Use a single quote (or another character that doesn’t otherwise appear in your data) as a quote character for your json
string.
>> cat data.csv
Time,location,labelA,labelB
2019-09-10,'{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}',nan,nan
Then use quotechar="'"
when you read the data:
import pandas as pd
import json
df=pd.read_csv('data.csv', converters={'location':json.loads}, header=0, quotechar="'")
Option 2: Quote json
string with double quote and escape
If the single quote can’t be used, you can actually use the double quote as the quotechar
, as long as your escape the quotes inside the json
string:
>> cat data.csv
Time,location,labelA,labelB
2019-09-10,"{""lng"":12.9,""alt"":413.0,""time"":""2019-09-10"",""error"":7.0,""lat"":17.8}",nan,nan
Notice that this now matches the format of the question you linked.
df=pd.read_csv('data.csv', converters={'location':json.loads}, header=0, quotechar='"')
Option 3: Change the delimiter
Use a different character, for example the |
as the delimiter
>> cat data.csv
Time|location|labelA|labelB
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|nan|nan
Now use the sep
argument to specify the new delimiter:
df=pd.read_csv('data.csv', converters={'location':json.loads}, header=0, sep="|")
Each of these methods produce the same output:
print(df)
# Time location labelA labelB
#0 2019-09-10 {u'lat': 17.8, u'lng': 12.9, u'error': 7.0, u'... NaN NaN
Once you have that, you can expand the location
column using one of the methods described in Flatten JSON column in a Pandas DataFrame
new_df = df.join(pd.io.json.json_normalize(df["location"])).drop(["location"], axis=1)
print(new_df)
# Time labelA labelB alt error lat lng time
#0 2019-09-10 NaN NaN 413.0 7.0 17.8 12.9 2019-09-10
Fix the file:
- Unfortunately, the file is difficult to read because each row contains a
dict
, whose key-value
pairs are separated by commas.
- The easiest way to resolve the issue, is change the separators outside of each
dict
, from ,
to |
.
- The following code will read the existing file
- It assumes, the first row is the header, use
.replace(',', '|')
- Remaining rows will use a regular expression to replace
,
outside of {}
- Each line will be written to a new file.
Code:
Data:
Time,location,labelA,labelB
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},{"ack":123,"bar":456},{"foo":123,"bar":456}
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},nan,nan
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},{"ack":123,"bar":456},{"foo":123,"bar":456}
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},nan,nan
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},{"ack":123,"bar":456},{"foo":123,"bar":456}
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},nan,nan
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},{"ack":123,"bar":456},{"foo":123,"bar":456}
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},nan,nan
Path.cwd()
assumes current working directory
, if this is not the case:
Path('c:/some_path_to_my_file') / 'file_name.poo'
can be used
- pathlib is part of the standard library
- Python 3’s pathlib Module: Taming the File System
File repair:
import re
from pathlib import Path
p = Path.cwd() / 'test.csv'
p2 = Path.cwd() / 'test2.csv'
with p.open('r') as f:
with p2.open('w') as f2:
for cnt, line in enumerate(f):
if cnt == 0:
line = line.replace(',', '|')
else:
line = re.sub(r',(?=(((?!}).)*{)|[^{}]*$)', '|', line)
f2.write(line)
New file:
Time|location|labelA|labelB
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|{"ack":123,"bar":456}|{"foo":123,"bar":456}
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|nan|nan
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|{"ack":123,"bar":456}|{"foo":123,"bar":456}
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|nan|nan
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|{"ack":123,"bar":456}|{"foo":123,"bar":456}
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|nan|nan
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|{"ack":123,"bar":456}|{"foo":123,"bar":456}
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|nan|nan
Parse the new file:
- Now the columns will be properly separated by
.read_csv
- However, the
location
, labelA
and labelB
columns are str
- Use
ast.literal_eval
to convert to dict
literal_eval
won’t work on nan
, so replace nan
with {}
for col in df.columns[1:]:
loops through each of the columns and:
try-except
will catch any columns that are not properly formed
- converts them from
str
to dict
- separates the
keys
into columns
concats
the columns to the existing dataframe
drops
the old column
import pandas as pd
from ast import literal_eval
df = pd.read_csv('test2.csv', sep='|')
print(df)
Time location labelA labelB
2019-09-10 {"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8} {"ack":123,"bar":456} {"foo":123,"bar":456}
2019-09-10 {"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8} NaN NaN
2019-09-10 {"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8} {"ack":123,"bar":456} {"foo":123,"bar":456}
2019-09-10 {"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8} NaN NaN
2019-09-10 {"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8} {"ack":123,"bar":456} {"foo":123,"bar":456}
2019-09-10 {"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8} NaN NaN
2019-09-10 {"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8} {"ack":123,"bar":456} {"foo":123,"bar":456}
2019-09-10 {"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8} NaN NaN
for col in df.columns[1:]:
try:
df[col].fillna('{}', inplace=True)
df[col] = df[col].apply(literal_eval)
df = pd.concat([df, df[col].apply(pd.Series)], axis=1)
df.drop(columns=[col], inplace=True)
except (SyntaxError, ValueError) as e:
print(f'{col}: {e}')
print(df)
Time lng alt time error lat ack bar foo bar
2019-09-10 12.9 413.0 2019-09-10 7.0 17.8 123.0 456.0 123.0 456.0
2019-09-10 12.9 413.0 2019-09-10 7.0 17.8 NaN NaN NaN NaN
2019-09-10 12.9 413.0 2019-09-10 7.0 17.8 123.0 456.0 123.0 456.0
2019-09-10 12.9 413.0 2019-09-10 7.0 17.8 NaN NaN NaN NaN
2019-09-10 12.9 413.0 2019-09-10 7.0 17.8 123.0 456.0 123.0 456.0
2019-09-10 12.9 413.0 2019-09-10 7.0 17.8 NaN NaN NaN NaN
2019-09-10 12.9 413.0 2019-09-10 7.0 17.8 123.0 456.0 123.0 456.0
2019-09-10 12.9 413.0 2019-09-10 7.0 17.8 NaN NaN NaN NaN
Literal Eval Notes:
- Pandas has methods for importing data in many forms, such as
dict
or list
.
- However,
read_csv
doesn’t interprete containers (e.g. dict
) well, they are interpreted as a string, unless you specify the converters
parameter (pd.read_csv('test3.csv', sep='|', converters={'a': literal_eval})
.
literal_eval
will not work on a column comprised of both containers and strings
or NaN
, unless the string
is only numeric (e.g. ‘8654’)
- Part of the code above, first replaced all
nan
with a {}
so literal_eval
wouldn’t have an error.
- Given the following mixed column example:
column_a
{"ack":123,"bar":456}
some string
{"ack":123,"bar":456}
some string
{"ack":123,"bar":456}
some string
literal_eval
will throw ValueError: malformed node or string:
- This difference between the two solutions is the other solution fixes one column, whereas this solution was implemented in such a way as to fix all the columns and remove the necessity of reading only the first 100 rows.
- You can forgo the loop to fix all the columns and just fix the
location
column, if it is all dicts
. Use the following code:
df['location'] = df['location'].apply(literal_eval)
df = pd.concat([df, df['location'].apply(pd.Series)], axis=1)
Note about the actual data test100v1.csv
:
- the
location
column is not formed properly
'{"lng":12.9975201,alt:413.0,"time:""2019-09-10T12:09:58Z""",error:7.0,lat:47.8258582}'
- Here is the expected form:
'{"lng":12.9975201,"alt":413.0,"time":"2019-09-10T12:09:58Z","error":7.0,"lat":47.8258582}'
Fix the location
column:
- The
location
column is Position
in the real data
def fix_pos(x):
word_dict = {'alt': '"alt"',
'"time:"': '"time":',
'"",error:': ',"error":',
'lat': '"lat"'}
for k, v in word_dict.items():
x = x.replace(k, v)
return x
df.Position = df.Position.apply(lambda x: fix_pos(x))
- Use the following loop with the real data file.
Zeit
, device
, Text
& Type
don’t need to be processed
Position
is at index
4.
for col in df.columns[4:]:
try:
df[col].fillna('{}', inplace=True)
df[col] = df[col].apply(literal_eval)
df = pd.concat([df, df[col].apply(pd.Series)], axis=1)
df.drop(columns=[col], inplace=True)
except (SyntaxError, ValueError) as e:
print(f'{col}: {e}')
- The loop that applies
literal_eval
to all columns has been updated with try-except
- If there’s an
exception
the column
name and error message will be printed out.
- There are a total of 64 columns in the real data, most of them are Furchtbar.
Errors:
- These are the errors for all the columns in the supplied
csv
file.
device: unexpected EOF while parsing (<unknown>, line 1)
Text: malformed node or string: <_ast.Name object at 0x00000203B8473C08>
Typ: malformed node or string: <_ast.Name object at 0x00000203BE217E08>
Data: unexpected EOF while parsing (<unknown>, line 1)
Data1: invalid syntax (<unknown>, line 1)
Data2: invalid syntax (<unknown>, line 1)
Unnamed: 8: invalid syntax (<unknown>, line 1)
Unnamed: 9: unexpected EOF while parsing (<unknown>, line 1)
Unnamed: 10: invalid syntax (<unknown>, line 1)
Unnamed: 11: unexpected EOF while parsing (<unknown>, line 1)
Unnamed: 12: invalid syntax (<unknown>, line 1)
Unnamed: 13: invalid syntax (<unknown>, line 1)
Unnamed: 14: invalid syntax (<unknown>, line 1)
Unnamed: 15: invalid syntax (<unknown>, line 1)
Unnamed: 16: invalid syntax (<unknown>, line 1)
Unnamed: 17: invalid syntax (<unknown>, line 1)
Unnamed: 18: invalid syntax (<unknown>, line 1)
Unnamed: 19: invalid syntax (<unknown>, line 1)
Unnamed: 20: invalid syntax (<unknown>, line 1)
Unnamed: 21: unexpected EOF while parsing (<unknown>, line 1)
Unnamed: 22: invalid syntax (<unknown>, line 1)
Unnamed: 23: invalid syntax (<unknown>, line 1)
Unnamed: 24: invalid syntax (<unknown>, line 1)
Unnamed: 25: invalid syntax (<unknown>, line 1)
Unnamed: 26: invalid syntax (<unknown>, line 1)
Unnamed: 27: invalid syntax (<unknown>, line 1)
I am trying to read a large CSV which includes JSON features (location here). For the first, say 100 lines, the file looks like this:
Time,location,labelA,labelB
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},nan,nan
I followed this question to parse the location column. The solution basically defines a helper as:
def CustomParser(data):
import json
j1 = json.loads(data)
return j1
and then
df=pd.read_csv('data.csv', nrows=100,converters={'location':CustomParser},header=0)
I get the following error which is related to JSON format:
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Q1: How can I parse the feature location onto new columns?
Q2 (for general case): For nrows>100 in the data, also the last features (labelA and labelB) have JSON formats with different key and value. How can I possibly read the entire CSV with parsing every feature which includes JSON (even partially)?
test100v1.csv
Zeit,device,Text,Typ,Position,Data,Data1,Data2
2019-09-10T12:13:24.000Z,CO 5052994,Lifesign,cgmon_Lifesign,{"lng":12.9975201,"alt":413.0,"time":"2019-09-10T12:09:58Z","error":7.0,"lat":47.8258582},N/A,N/A,N/A
2019-09-10T12:13:23.000Z,CO 5050450,Lifesign,cgmon_Lifesign,{"lng":14.3195367,"alt":260.0,"time":"2019-09-10T12:12:37Z","error":10.0,"lat":48.2695571},N/A,N/A,N/A
2019-09-10T12:13:21.000Z,CO 5050903,Location updated,c8y_LocationUpdate,{"lng":15.2678846,"alt":494.0,"time":"2019-09-10T12:13:21Z","error":11.0,"lat":48.7477466},N/A,N/A,N/A
2019-09-10T12:13:20.000Z,CO 5051466,Location updated,c8y_LocationUpdate,{"lng":17.64815,"alt":106.0,"time":"2019-09-10T12:13:20Z","error":3.0,"lat":47.6851036},N/A,N/A,N/A
2019-09-10T12:13:20.000Z,CO 5050569,Location updated,c8y_LocationUpdate,{"lng":14.0582286,"alt":286.0,"time":"2019-09-10T12:13:20Z","error":14.0,"lat":48.1808019},N/A,N/A,N/A
2019-09-10T12:13:18.000Z,CO 5050666,Location updated,c8y_LocationUpdate,{"lng":14.5788998,"alt":25.0,"time":"2019-09-10T12:13:18Z","error":12.0,"lat":53.4233772},N/A,N/A,N/A
2019-09-10T12:13:17.000Z,CO 5051113,Location updated,c8y_LocationUpdate,{"lng":14.325237,"alt":254.0,"time":"2019-09-10T12:13:17Z","error":13.0,"lat":48.2600698},N/A,N/A,N/A
2019-09-10T12:13:10.000Z,CO 5050666,Lifesign,cgmon_Lifesign,{"lng":14.5788998,"alt":25.0,"time":"2019-09-10T12:13:18Z","error":12.0,"lat":53.4233772},N/A,N/A,N/A
2019-09-10T12:13:07.000Z,CO 5051887,Location updated,c8y_LocationUpdate,{"lng":13.8064589,"alt":510.0,"time":"2019-09-10T12:13:07Z","error":10.0,"lat":46.5672814},N/A,N/A,N/A
2019-09-10T12:12:58.000Z,CO 5051131,Lifesign,cgmon_Lifesign,{"lng":11.4933341,"alt":581.0,"time":"2019-09-10T12:08:43Z","error":13.0,"lat":47.2738262},N/A,N/A,N/A
2019-09-10T12:12:55.000Z,CO 5051696,Lifesign,cgmon_Lifesign,{"lng":14.3200391,"alt":249.0,"time":"2019-09-10T12:04:38Z","error":10.0,"lat":48.26912},N/A,N/A,N/A
2019-09-10T12:12:48.000Z,CO 5051326,Lifesign,cgmon_Lifesign,{"lng":9.7326865,"alt":403.0,"time":"2019-09-10T12:04:34Z","error":10.0,"lat":47.4595067},N/A,N/A,N/A
2019-09-10T12:12:47.000Z,CO 5052218,Lifesign,cgmon_Lifesign,{"lng":14.3197285,"alt":262.0,"time":"2019-09-10T12:10:11Z","error":9.0,"lat":48.2688562},N/A,N/A,N/A
2019-09-10T12:12:45.000Z,CO 5050405,Lifesign,cgmon_Lifesign,{"lng":14.2755301,"alt":253.0,"time":"2019-09-08T12:13:37Z","error":8.0,"lat":48.2468603},N/A,N/A,N/A
2019-09-10T12:12:44.000Z,CO 5050706,Lifesign,cgmon_Lifesign,{"lng":15.0519029,"alt":124.0,"time":"2019-09-10T12:07:07Z","error":13.0,"lat":59.0569164},N/A,N/A,N/A
2019-09-10T12:12:42.000Z,CO 5050903,Lifesign,cgmon_Lifesign,{"lng":15.2678846,"alt":494.0,"time":"2019-09-10T12:13:21Z","error":11.0,"lat":48.7477466},N/A,N/A,N/A
2019-09-10T12:12:38.000Z,CO 5051303,Lifesign,cgmon_Lifesign,{"lng":21.9561564,"alt":244.0,"time":"2019-09-10T09:04:08Z","error":11.0,"lat":42.9978861},N/A,N/A,N/A
2019-09-10T12:12:37.000Z,CO 5051558,Location updated,c8y_LocationUpdate,{"lng":13.806765,"alt":514.0,"time":"2019-09-10T12:12:37Z","error":6.0,"lat":46.5672868},N/A,N/A,N/A
2019-09-10T12:12:37.000Z,CO 5050450,Location updated,c8y_LocationUpdate,{"lng":14.3195367,"alt":260.0,"time":"2019-09-10T12:12:37Z","error":10.0,"lat":48.2695571},N/A,N/A,N/A
2019-09-10T12:12:37.000Z,CO 5050450,Location updated,c8y_LocationUpdate,{"lng":14.3195367,"alt":260.0,"time":"2019-09-10T12:12:37Z","error":10.0,"lat":48.2695571},N/A,N/A,N/A
2019-09-10T12:12:26.000Z,CO 5050408,Lifesign,cgmon_Lifesign,{"lng":14.2761472,"alt":280.0,"time":"2019-09-08T12:13:28Z","error":11.0,"lat":48.246868},N/A,N/A,N/A
2019-09-10T12:12:25.000Z,CO 5051418,Location updated,c8y_LocationUpdate,{"lng":15.5343521,"alt":550.0,"time":"2019-09-10T12:12:25Z","error":11.0,"lat":48.7483843},N/A,N/A,N/A
2019-09-10T12:12:24.000Z,CO 5050556,Location updated,c8y_LocationUpdate,{"lng":13.0783658,"alt":435.0,"time":"2019-09-10T12:12:24Z","error":6.0,"lat":47.7692905},N/A,N/A,N/A
2019-09-10T12:12:22.000Z,CO 5052730,Lifesign,cgmon_Lifesign,{"lng":14.3180816,"alt":251.0,"time":"2019-09-10T12:07:29Z","error":14.0,"lat":48.2771342},N/A,N/A,N/A
2019-09-10T12:12:11.000Z,CO 5051654,Location updated,c8y_LocationUpdate,{"lng":15.3298821,"alt":404.0,"time":"2019-09-10T12:12:11Z","error":13.0,"lat":47.1319909},N/A,N/A,N/A
2019-09-10T12:12:01.000Z,CO 5051400,Location updated,c8y_LocationUpdate,{"lng":13.4580769,"alt":306.0,"time":"2019-09-10T12:12:01Z","error":6.0,"lat":48.4494078},N/A,N/A,N/A
2019-09-10T12:11:25.000Z,CO 5050495,Location updated,c8y_LocationUpdate,{"lng":13.3380207,"alt":423.0,"time":"2019-09-10T12:11:25Z","error":14.0,"lat":48.6001935},N/A,N/A,N/A
2019-09-10T12:11:15.000Z,CO 5052483,Motion started,c8y_MotionDetected,{"lng":12.0622763,"alt":511.0,"time":"2019-09-10T12:11:04Z","error":5.0,"lat":47.4938857},N/A,N/A,N/A
2019-09-10T12:11:13.000Z,CO 5052999,Location updated,c8y_LocationUpdate,{"lng":13.06406,"alt":425.0,"time":"2019-09-10T12:11:13Z","error":5.0,"lat":47.8167399},N/A,N/A,N/A
2019-09-10T12:11:04.000Z,CO 5052483,Location updated,c8y_LocationUpdate,{"lng":12.0622763,"alt":511.0,"time":"2019-09-10T12:11:04Z","error":5.0,"lat":47.4938857},N/A,N/A,N/A
2019-09-10T12:11:01.000Z,CO 5051844,Location updated,c8y_LocationUpdate,{"lng":11.5022149,"alt":556.0,"time":"2019-09-10T12:11:01Z","error":6.0,"lat":47.2765674},N/A,N/A,N/A
2019-09-10T12:11:01.000Z,CO 5051920,Lifesign,cgmon_Lifesign,{"lng":15.0575633,"alt":619.0,"time":"2019-09-10T12:10:44Z","error":13.0,"lat":47.3821983},N/A,N/A,N/A
2019-09-10T12:10:59.000Z,CO 5051679,Location updated,c8y_LocationUpdate,{"lng":15.0565198,"alt":599.0,"time":"2019-09-10T12:10:59Z","error":14.0,"lat":47.3821768},N/A,N/A,N/A
2019-09-10T12:10:55.000Z,CO 5050630,Location updated,c8y_LocationUpdate,{"lng":15.0587754,"alt":596.0,"time":"2019-09-10T12:10:55Z","error":14.0,"lat":47.3820239},N/A,N/A,N/A
2019-09-10T12:10:52.000Z,CO 5051844,Lifesign,cgmon_Lifesign,{"lng":11.5022149,"alt":556.0,"time":"2019-09-10T12:11:01Z","error":6.0,"lat":47.2765674},N/A,N/A,N/A
2019-09-10T12:10:51.000Z,CO 5052999,Lifesign,cgmon_Lifesign,{"lng":13.06406,"alt":425.0,"time":"2019-09-10T12:11:13Z","error":5.0,"lat":47.8167399},N/A,N/A,N/A
2019-09-10T12:10:50.000Z,CO 5051921,Lifesign,cgmon_Lifesign,{"lng":15.0581282,"alt":606.0,"time":"2019-09-10T12:10:36Z","error":6.0,"lat":47.3817808},N/A,N/A,N/A
2019-09-10T12:10:49.000Z,CO 5051679,Lifesign,cgmon_Lifesign,{"lng":15.0565198,"alt":599.0,"time":"2019-09-10T12:10:59Z","error":14.0,"lat":47.3821768},N/A,N/A,N/A
2019-09-10T12:10:47.000Z,CO 5050630,Lifesign,cgmon_Lifesign,{"lng":15.0587754,"alt":596.0,"time":"2019-09-10T12:10:55Z","error":14.0,"lat":47.3820239},N/A,N/A,N/A
2019-09-10T12:10:44.000Z,CO 5051920,Location updated,c8y_LocationUpdate,{"lng":15.0575633,"alt":619.0,"time":"2019-09-10T12:10:44Z","error":13.0,"lat":47.3821983},N/A,N/A,N/A
2019-09-10T12:10:41.000Z,CO 5051088,Location updated,c8y_LocationUpdate,{"lng":16.6432683,"alt":161.0,"time":"2019-09-10T12:10:41Z","error":8.0,"lat":48.3200659},N/A,N/A,N/A
2019-09-10T12:10:41.000Z,CO 5050020,Location updated,c8y_LocationUpdate,{"lng":15.9287275,"alt":193.0,"time":"2019-09-10T12:10:41Z","error":8.0,"lat":48.3246395},N/A,N/A,N/A
2019-09-10T12:10:40.000Z,CO 5052681,Location updated,c8y_LocationUpdate,{"lng":16.4388427,"alt":173.0,"time":"2019-09-10T12:10:40Z","error":8.0,"lat":48.1359584},N/A,N/A,N/A
2019-09-10T12:10:36.000Z,CO 5051921,Location updated,c8y_LocationUpdate,{"lng":15.0581282,"alt":606.0,"time":"2019-09-10T12:10:36Z","error":6.0,"lat":47.3817808},N/A,N/A,N/A
2019-09-10T12:10:35.000Z,CO 5051406,Location updated,c8y_LocationUpdate,{"lng":19.0824957,"alt":108.0,"time":"2019-09-10T12:10:35Z","error":7.0,"lat":47.4680908},N/A,N/A,N/A
2019-09-10T12:10:33.000Z,CO 5052676,Location updated,c8y_LocationUpdate,{"lng":16.4368017,"alt":166.0,"time":"2019-09-10T12:10:33Z","error":7.0,"lat":48.1376442},N/A,N/A,N/A
2019-09-10T12:10:33.000Z,CO 5051767,Location updated,c8y_LocationUpdate,{"lng":14.3252332,"alt":266.0,"time":"2019-09-10T12:10:33Z","error":6.0,"lat":48.2598268},N/A,N/A,N/A
2019-09-10T12:10:32.000Z,CO 5050710,Location updated,c8y_LocationUpdate,{"lng":16.4767327,"alt":164.0,"time":"2019-09-10T12:10:32Z","error":5.0,"lat":48.2780685},N/A,N/A,N/A
2019-09-10T12:10:32.000Z,CO 5050565,Location updated,c8y_LocationUpdate,{"lng":15.0918659,"alt":544.0,"time":"2019-09-10T12:10:32Z","error":12.0,"lat":47.3648989},N/A,N/A,N/A
2019-09-10T12:10:31.000Z,CO 5051820,Location updated,c8y_LocationUpdate,{"lng":13.3525861,"alt":296.0,"time":"2019-09-10T12:10:31Z","error":12.0,"lat":48.5992175},N/A,N/A,N/A
2019-09-10T12:10:25.000Z,CO 5051464,Location updated,c8y_LocationUpdate,{"lng":14.3240624,"alt":271.0,"time":"2019-09-10T12:10:25Z","error":12.0,"lat":48.2607067},N/A,N/A,N/A
2019-09-10T12:10:22.000Z,CO 5050655,Lifesign,cgmon_Lifesign,{"lng":16.4315322,"alt":190.0,"time":"2019-09-10T12:01:19Z","error":13.0,"lat":48.1431609},N/A,N/A,N/A
2019-09-10T12:10:20.000Z,CO 5050581,Location updated,c8y_LocationUpdate,{"lng":13.045159,"alt":422.0,"time":"2019-09-10T12:10:20Z","error":11.0,"lat":47.8110246},N/A,N/A,N/A
2019-09-10T12:10:18.000Z,CO 5051496,Location updated,c8y_LocationUpdate,{"lng":14.3246911,"alt":271.0,"time":"2019-09-10T12:10:18Z","error":7.0,"lat":48.2602569},N/A,N/A,N/A
2019-09-10T12:10:17.000Z,CO 5051111,Location updated,c8y_LocationUpdate,{"lng":12.9975553,"alt":398.0,"time":"2019-09-10T12:10:17Z","error":11.0,"lat":47.8261238},N/A,N/A,N/A
2019-09-10T12:10:11.000Z,CO 5052218,Location updated,c8y_LocationUpdate,{"lng":14.3197285,"alt":262.0,"time":"2019-09-10T12:10:11Z","error":9.0,"lat":48.2688562},N/A,N/A,N/A
2019-09-10T12:10:11.000Z,CO 5052218,Location updated,c8y_LocationUpdate,{"lng":14.3197285,"alt":262.0,"time":"2019-09-10T12:10:11Z","error":9.0,"lat":48.2688562},N/A,N/A,N/A
2019-09-10T12:10:10.000Z,CO 5050889,Location updated,c8y_LocationUpdate,{"lng":15.2681143,"alt":526.0,"time":"2019-09-10T12:10:10Z","error":6.0,"lat":48.7494337},N/A,N/A,N/A
2019-09-10T12:10:06.000Z,CO 5050941,Location updated,c8y_LocationUpdate,{"lng":14.3259313,"alt":254.0,"time":"2019-09-10T12:10:06Z","error":12.0,"lat":48.2594256},N/A,N/A,N/A
2019-09-10T12:10:02.000Z,CO 5052698,Location updated,c8y_LocationUpdate,{"lng":16.4387847,"alt":155.0,"time":"2019-09-10T12:10:02Z","error":12.0,"lat":48.1361544},N/A,N/A,N/A
2019-09-10T12:09:58.000Z,CO 5052994,Location updated,c8y_LocationUpdate,{"lng":12.9975201,"alt":413.0,"time":"2019-09-10T12:09:58Z","error":7.0,"lat":47.8258582},N/A,N/A,N/A
2019-09-10T12:09:58.000Z,CO 5052994,Location updated,c8y_LocationUpdate,{"lng":12.9975201,"alt":413.0,"time":"2019-09-10T12:09:58Z","error":7.0,"lat":47.8258582},N/A,N/A,N/A
2019-09-10T12:09:53.000Z,CO 5050172,Location updated,c8y_LocationUpdate,{"lng":12.5073911,"alt":413.0,"time":"2019-09-10T12:09:53Z","error":6.0,"lat":48.2486859},N/A,N/A,N/A
2019-09-10T12:09:46.000Z,CO 5050036,Location updated,c8y_LocationUpdate,{"lng":15.5402195,"alt":546.0,"time":"2019-09-10T12:09:46Z","error":10.0,"lat":48.7482861},N/A,N/A,N/A
2019-09-10T12:09:42.000Z,CO 5051360,Location updated,c8y_LocationUpdate,{"lng":15.5412234,"alt":546.0,"time":"2019-09-10T12:09:42Z","error":14.0,"lat":48.7482963},N/A,N/A,N/A
2019-09-10T12:09:41.000Z,CO 5052254,Lifesign,cgmon_Lifesign,{"lng":14.1636504,"alt":497.0,"time":"2019-09-10T12:06:33Z","error":3.0,"lat":47.8020297},N/A,N/A,N/A
2019-09-10T12:09:36.000Z,CO 5051886,Location updated,c8y_LocationUpdate,{"lng":14.0586228,"alt":317.0,"time":"2019-09-10T12:09:36Z","error":4.0,"lat":48.1806919},N/A,N/A,N/A
2019-09-10T12:09:36.000Z,CO 5052270,Lifesign,cgmon_Lifesign,{"lng":14.1637559,"alt":497.0,"time":"2019-09-10T12:06:33Z","error":13.0,"lat":47.8015199},N/A,N/A,N/A
2019-09-10T12:09:35.000Z,CO 5050625,Location updated,c8y_LocationUpdate,{"lng":15.0918728,"alt":551.0,"time":"2019-09-10T12:09:35Z","error":14.0,"lat":47.3645485},N/A,N/A,N/A
2019-09-10T12:09:35.000Z,CO 5052165,Location updated,c8y_LocationUpdate,{"lng":13.8262713,"alt":535.0,"time":"2019-09-10T12:09:35Z","error":14.0,"lat":46.5696408},N/A,N/A,N/A
2019-09-10T12:09:32.000Z,CO 5051569,Location updated,c8y_LocationUpdate,{"lng":15.0962545,"alt":251.0,"time":"2019-09-10T12:09:32Z","error":9.0,"lat":48.1569883},N/A,N/A,N/A
2019-09-10T12:09:29.000Z,CO 5051886,Lifesign,cgmon_Lifesign,{"lng":14.0586228,"alt":317.0,"time":"2019-09-10T12:09:36Z","error":4.0,"lat":48.1806919},N/A,N/A,N/A
2019-09-10T12:09:26.000Z,CO 5050079,Location updated,c8y_LocationUpdate,{"lng":14.3260754,"alt":273.0,"time":"2019-09-10T12:09:26Z","error":12.0,"lat":48.259309},N/A,N/A,N/A
2019-09-10T12:09:24.000Z,CO 5051608,Lifesign,cgmon_Lifesign,{"lng":13.0620331,"alt":443.0,"time":"2019-09-10T12:01:33Z","error":4.0,"lat":47.8183534},N/A,N/A,N/A
2019-09-10T12:09:22.000Z,CO 5050636,Location updated,c8y_LocationUpdate,{"lng":15.7496359,"alt":214.0,"time":"2019-09-10T12:09:22Z","error":10.0,"lat":48.3474868},N/A,N/A,N/A
2019-09-10T12:09:13.000Z,CO 5051374,Lifesign,cgmon_Lifesign,{"lng":16.2192937,"alt":290.0,"time":"2019-09-10T12:00:44Z","error":11.0,"lat":47.7971662},N/A,N/A,N/A
2019-09-10T12:09:13.000Z,CO 5050449,Lifesign,cgmon_Lifesign,{"lng":14.5795362,"alt":1.0,"time":"2019-09-10T11:58:43Z","error":5.0,"lat":53.4248321},N/A,N/A,N/A
2019-09-10T12:09:09.000Z,CO 5052285,Location updated,c8y_LocationUpdate,{"lng":14.3242807,"alt":279.0,"time":"2019-09-10T12:09:09Z","error":11.0,"lat":48.2603765},{"cgmon_TrackVersion":1,"cgmon_AccumulatedTrackLength":22.6966869,"cgmon_PrivateMappingData":"{"MappedCoords":[{"OriginalCoords":{"Longitude":14.3242807,"Latitude":48.2603765},"MappedCoords":{"Longitude":14.324294807614198,"Latitude":48.260394023504993},"Distance":2.05000634,"MappedObject":380848,"Source":352093,"Target":355952,"Length":0.5924257},{"OriginalCoords":{"Longitude":14.3242807,"Latitude":48.2603765},"MappedCoords":{"Longitude":14.324331226025482,"Latitude":48.260439145193047},"Distance":7.32469217,"MappedObject":384713,"Source":355935,"Target":355945,"Length":0.7556776},{"OriginalCoords":{"Longitude":14.3242807,"Latitude":48.2603765},"MappedCoords":{"Longitude":14.324228797434349,"Latitude":48.260311767396061},"Distance":7.55397675,"MappedObject":304419,"Source":278400,"Target":278401,"Length":0.2397567}],"LastCoordUsed":{"OriginalCoords":{"Longitude":14.3242807,"Latitude":48.2603765},"MappedCoords":{"Longitude":14.324228797434349,"Latitude":48.260311767396061},"Distance":7.55397675,"MappedObject":304419,"Source":278400,"Target":278401,"Length":0.2397567}}","cgmon_TrackSegmentIDs":[384707,384708,555759,555760,380849,384723,304419],"cgmon_TrackLength":0.7655877,"time":"2019-09-10T12:13:48.8192688+00:00","cgmon_MappedPoint":{"lng":14.324294807614198,"offset":2.05000634,"lat":48.26039402350499}},N/A,N/A
2019-09-10T12:09:09.000Z,CO 5052731,Lifesign,cgmon_Lifesign,{"lng":14.3181143,"alt":252.0,"time":"2019-09-09T11:59:34Z","error":12.0,"lat":48.2771772},N/A,N/A,N/A
2019-09-10T12:09:08.000Z,CO 5051642,Lifesign,cgmon_Lifesign,{"lng":14.163689,"alt":477.0,"time":"2019-09-10T12:06:20Z","error":10.0,"lat":47.8022479},N/A,N/A,N/A
2019-09-10T12:09:07.000Z,CO 5052267,Lifesign,cgmon_Lifesign,{"lng":14.1631847,"alt":471.0,"time":"2019-09-10T12:06:42Z","error":11.0,"lat":47.80162},N/A,N/A,N/A
2019-09-10T12:09:07.000Z,CO 5051478,Lifesign,cgmon_Lifesign,{"lng":14.1641262,"alt":497.0,"time":"2019-09-10T12:06:15Z","error":7.0,"lat":47.8003779},N/A,N/A,N/A
2019-09-10T12:09:01.000Z,CO 5052393,Lifesign,cgmon_Lifesign,{"lng":13.0494004,"alt":428.0,"time":"2019-09-10T12:03:39Z","error":11.0,"lat":47.8189722},N/A,N/A,N/A
2019-09-10T12:08:57.000Z,CO 5051020,Lifesign,cgmon_Lifesign,{"lng":16.2196522,"alt":287.0,"time":"2019-09-10T12:01:08Z","error":4.0,"lat":47.7972928},N/A,N/A,N/A
2019-09-10T12:08:51.000Z,CO 5050301,Location updated,c8y_LocationUpdate,{"lng":2.9992244,"alt":-2.0,"time":"2019-09-10T12:08:51Z","error":17.0,"lat":43.1661339},N/A,N/A,N/A
2019-09-10T12:08:50.000Z,CO 5051365,Location updated,c8y_LocationUpdate,{"lng":22.169639,"alt":60.0,"time":"2019-09-10T12:08:50Z","error":14.0,"lat":48.3902318},{"cgmon_TrackVersion":1,"cgmon_AccumulatedTrackLength":2148.951632500001,"cgmon_PrivateMappingData":"{"MappedCoords":[{"OriginalCoords":{"Longitude":22.169639,"Latitude":48.3902318},"MappedCoords":{"Longitude":22.169632788779996,"Latitude":48.390239719391744},"Distance":0.92519023,"MappedObject":1387861,"Source":1210580,"Target":1236897,"Length":0.8962704},{"OriginalCoords":{"Longitude":22.169639,"Latitude":48.3902318},"MappedCoords":{"Longitude":22.169663000344876,"Latitude":48.390201166002555},"Distance":3.55078237,"MappedObject":1388168,"Source":1236896,"Target":1210581,"Length":0.8169932},{"OriginalCoords":{"Longitude":22.169639,"Latitude":48.3902318},"MappedCoords":{"Longitude":22.169597136212552,"Latitude":48.390285304388065},"Distance":6.21585648,"MappedObject":1388154,"Source":1236890,"Target":1236891,"Length":0.9307675}],"LastCoordUsed":{"OriginalCoords":{"Longitude":22.169639,"Latitude":48.3902318},"MappedCoords":{"Longitude":22.169663000344876,"Latitude":48.390201166002555},"Distance":3.55078237,"MappedObject":1388168,"Source":1236896,"Target":1210581,"Length":0.8169932}}","cgmon_TrackSegmentIDs":[1356958,1356952,1356950,1387850,1387851,1387852,1387853,1387860,1357049,1388168],"cgmon_TrackLength":2.4847659999999996,"time":"2019-09-10T12:11:51.8831079+00:00","cgmon_MappedPoint":{"lng":22.169632788779996,"offset":0.92519023,"lat":48.390239719391744}},N/A,N/A
2019-09-10T12:08:48.000Z,CO 5050995,Location updated,c8y_LocationUpdate,{"lng":22.1701667,"alt":99.0,"time":"2019-09-10T12:08:48Z","error":11.0,"lat":48.3905254},{"cgmon_TrackVersion":1,"cgmon_AccumulatedTrackLength":3214.932654,"cgmon_PrivateMappingData":"{"MappedCoords":[{"OriginalCoords":{"Longitude":22.1701667,"Latitude":48.3905254},"MappedCoords":{"Longitude":22.170165780943215,"Latitude":48.390526570555245},"Distance":0.14129331,"MappedObject":1357050,"Source":1236896,"Target":1210581,"Length":0.8176738},{"OriginalCoords":{"Longitude":22.1701667,"Latitude":48.3905254},"MappedCoords":{"Longitude":22.170200136954,"Latitude":48.3904827851519},"Distance":4.9398585,"MappedObject":1388164,"Source":1210575,"Target":1236894,"Length":0.7718482},{"OriginalCoords":{"Longitude":22.1701667,"Latitude":48.3905254},"MappedCoords":{"Longitude":22.17013252678472,"Latitude":48.390569018631112},"Distance":5.06730103,"MappedObject":1388168,"Source":1236896,"Target":1210581,"Length":0.8169932}],"LastCoordUsed":{"OriginalCoords":{"Longitude":22.1701667,"Latitude":48.3905254},"MappedCoords":{"Longitude":22.17013252678472,"Latitude":48.390569018631112},"Distance":5.06730103,"MappedObject":1388168,"Source":1236896,"Target":1210581,"Length":0.8169932}}","cgmon_TrackSegmentIDs":[1356958,1356952,1356950,1387850,1387851,1387852,1387853,1387860,1357049,1388168],"cgmon_TrackLength":2.4847659999999996,"time":"2019-09-10T12:11:03.6011894+00:00","cgmon_MappedPoint":{"lng":22.170165780943215,"offset":0.14129331,"lat":48.390526570555245}},N/A,N/A
2019-09-10T12:08:43.000Z,CO 5051131,Location updated,c8y_LocationUpdate,{"lng":11.4933341,"alt":581.0,"time":"2019-09-10T12:08:43Z","error":13.0,"lat":47.2738262},N/A,N/A,N/A
2019-09-10T12:08:43.000Z,CO 5051131,Location updated,c8y_LocationUpdate,{"lng":11.4933341,"alt":581.0,"time":"2019-09-10T12:08:43Z","error":13.0,"lat":47.2738262},{"cgmon_TrackVersion":1,"cgmon_AccumulatedTrackLength":742.7216828999998,"cgmon_PrivateMappingData":"{"MappedCoords":[{"OriginalCoords":{"Longitude":11.4933341,"Latitude":47.2738262},"MappedCoords":{"Longitude":11.49333458398241,"Latitude":47.27383398720535},"Distance":0.86590321,"MappedObject":563332,"Source":502560,"Target":509686,"Length":0.115316},{"OriginalCoords":{"Longitude":11.4933341,"Latitude":47.2738262},"MappedCoords":{"Longitude":11.493332248142858,"Latitude":47.2737926515935},"Distance":3.72901274,"MappedObject":375313,"Source":502559,"Target":346447,"Length":0.150019},{"OriginalCoords":{"Longitude":11.4933341,"Latitude":47.2738262},"MappedCoords":{"Longitude":11.493342343757833,"Latitude":47.2742148522778},"Distance":43.21458138,"MappedObject":309483,"Source":283444,"Target":283445,"Length":0.1669769}],"LastCoordUsed":{"OriginalCoords":{"Longitude":11.4933341,"Latitude":47.2738262},"MappedCoords":{"Longitude":11.493332248142858,"Latitude":47.2737926515935},"Distance":3.72901274,"MappedObject":375313,"Source":502559,"Target":346447,"Length":0.150019}}","cgmon_TrackSegmentIDs":[],"cgmon_TrackLength":0.0,"time":"2019-09-10T12:11:24.3216749+00:00","cgmon_MappedPoint":{"lng":11.49333458398241,"offset":0.86590321,"lat":47.27383398720535}},N/A,N/A
2019-09-10T12:08:35.000Z,CO 5050866,Location updated,c8y_LocationUpdate,{"lng":14.3215058,"alt":267.0,"time":"2019-09-10T12:08:35Z","error":4.0,"lat":48.2636151},{"cgmon_TrackVersion":1,"cgmon_AccumulatedTrackLength":4978.0674611,"cgmon_PrivateMappingData":"{"MappedCoords":[{"OriginalCoords":{"Longitude":14.3215058,"Latitude":48.2636151},"MappedCoords":{"Longitude":14.321510549351785,"Latitude":48.263617838889992},"Distance":0.43737558,"MappedObject":555759,"Source":278547,"Target":355946,"Length":0.0388289},{"OriginalCoords":{"Longitude":14.3215058,"Latitude":48.2636151},"MappedCoords":{"Longitude":14.321524282808905,"Latitude":48.263622620442391},"Distance":1.53920182,"MappedObject":384719,"Source":278547,"Target":355950,"Length":0.0197342},{"OriginalCoords":{"Longitude":14.3215058,"Latitude":48.2636151},"MappedCoords":{"Longitude":14.321537838128785,"Latitude":48.263639596569256},"Distance":3.3491457,"MappedObject":384714,"Source":355943,"Target":355950,"Length":0.0222092}],"LastCoordUsed":{"OriginalCoords":{"Longitude":14.3215058,"Latitude":48.2636151},"MappedCoords":{"Longitude":14.321524282808905,"Latitude":48.263622620442391},"Distance":1.53920182,"MappedObject":384719,"Source":278547,"Target":355950,"Length":0.0197342}}","cgmon_TrackSegmentIDs":[384707,384708,384719],"cgmon_TrackLength":0.0800117,"time":"2019-09-10T12:10:46.0909247+00:00","cgmon_MappedPoint":{"lng":14.321510549351785,"offset":0.43737558,"lat":48.26361783888999}},N/A,N/A
2019-09-10T12:08:33.000Z,CO 5051872,Location updated,c8y_LocationUpdate,{"lng":9.1817503,"alt":317.0,"time":"2019-09-10T12:08:33Z","error":7.0,"lat":48.8762001},N/A,N/A,N/A
2019-09-10T12:08:33.000Z,CO 5051872,Location updated,c8y_LocationUpdate,{"lng":9.1817503,"alt":317.0,"time":"2019-09-10T12:08:33Z","error":7.0,"lat":48.8762001},{"cgmon_TrackVersion":1,"cgmon_AccumulatedTrackLength":2164.2141009,"cgmon_PrivateMappingData":"{"MappedCoords":[{"OriginalCoords":{"Longitude":9.1817503,"Latitude":48.8762001},"MappedCoords":{"Longitude":9.1817456392881081,"Latitude":48.876203640311175},"Distance":0.48084995,"MappedObject":271673,"Source":253908,"Target":253909,"Length":0.7860666},{"OriginalCoords":{"Longitude":9.1817503,"Latitude":48.8762001},"MappedCoords":{"Longitude":9.1817918660285152,"Latitude":48.876168565239695},"Distance":4.28556164,"MappedObject":271808,"Source":25656,"Target":253942,"Length":0.7500596},{"OriginalCoords":{"Longitude":9.1817503,"Latitude":48.8762001},"MappedCoords":{"Longitude":9.1817007140337878,"Latitude":48.876240680236577},"Distance":5.3165598,"MappedObject":53249,"Source":65777,"Target":65778,"Length":0.7357852}],"LastCoordUsed":{"OriginalCoords":{"Longitude":9.1817503,"Latitude":48.8762001},"MappedCoords":{"Longitude":9.1817918660285152,"Latitude":48.876168565239695},"Distance":4.28556164,"MappedObject":271808,"Source":25656,"Target":253942,"Length":0.7500596}}","cgmon_TrackSegmentIDs":[271722,271721,271808],"cgmon_TrackLength":0.8470511000000001,"time":"2019-09-10T12:13:19.9039541+00:00","cgmon_MappedPoint":{"lng":9.181745639288108,"offset":0.48084995,"lat":48.876203640311175}},N/A,N/A
2019-09-10T12:08:33.000Z,CO 5052718,Location updated,c8y_LocationUpdate,{"lng":14.3244721,"alt":261.0,"time":"2019-09-10T12:08:33Z","error":7.0,"lat":48.2604709},N/A,N/A,N/A
2019-09-10T12:08:32.000Z,CO 5051786,Location updated,c8y_LocationUpdate,{"lng":15.0934742,"alt":520.0,"time":"2019-09-10T12:08:32Z","error":8.0,"lat":47.3602516},{"cgmon_TrackVersion":1,"cgmon_AccumulatedTrackLength":955.3499446999999,"cgmon_PrivateMappingData":"{"MappedCoords":[{"OriginalCoords":{"Longitude":15.0934742,"Latitude":47.3602516},"MappedCoords":{"Longitude":15.093487876023856,"Latitude":47.360256466815642},"Distance":1.12824369,"MappedObject":543110,"Source":500082,"Target":495975,"Length":0.01726},{"OriginalCoords":{"Longitude":15.0934742,"Latitude":47.3602516},"MappedCoords":{"Longitude":15.093411454625754,"Latitude":47.360229436743339},"Distance":5.16984112,"MappedObject":543115,"Source":495974,"Target":495978,"Length":0.1121813},{"OriginalCoords":{"Longitude":15.0934742,"Latitude":47.3602516},"MappedCoords":{"Longitude":15.0935137,"Latitude":47.3601839},"Distance":8.09452479,"MappedObject":543111,"Source":495975,"Target":495976,"Length":0.0788724}],"LastCoordUsed":{"OriginalCoords":{"Longitude":15.0934742,"Latitude":47.3602516},"MappedCoords":{"Longitude":15.093411454625754,"Latitude":47.360229436743339},"Distance":5.16984112,"MappedObject":543115,"Source":495974,"Target":495978,"Length":0.1121813}}","cgmon_TrackSegmentIDs":[391386,391385,391384,543119,543118,543117,543116,543115],"cgmon_TrackLength":1.1705694,"time":"2019-09-10T12:13:03.6134601+00:00","cgmon_MappedPoint":{"lng":15.093487876023856,"offset":1.12824369,"lat":47.36025646681564}},N/A,N/A
2019-09-10T12:08:31.000Z,CO 5051710,Location updated,c8y_LocationUpdate,{"lng":13.5639021,"alt":90.0,"time":"2019-09-10T12:08:31Z","error":5.0,"lat":55.9589473},N/A,N/A,N/A
2019-09-10T12:08:25.000Z,CO 5050045,Location updated,c8y_LocationUpdate,{"lng":16.1018443,"alt":459.0,"time":"2019-09-10T12:08:25Z","error":6.0,"lat":47.5827225},{"cgmon_TrackVersion":1,"cgmon_AccumulatedTrackLength":8331.496018900007,"cgmon_PrivateMappingData":"{"MappedCoords":[{"OriginalCoords":{"Longitude":16.1018443,"Latitude":47.5827225},"MappedCoords":{"Longitude":16.101728908989276,"Latitude":47.58270774508388},"Distance":8.7637246,"MappedObject":378585,"Source":349716,"Target":349717,"Length":0.2371537},{"OriginalCoords":{"Longitude":16.1018443,"Latitude":47.5827225},"MappedCoords":{"Longitude":16.1015723781081,"Latitude":47.58268623088636},"Distance":20.67398342,"MappedObject":522532,"Source":349716,"Target":479666,"Length":0.3900395}],"LastCoordUsed":{"OriginalCoords":{"Longitude":16.1018443,"Latitude":47.5827225},"MappedCoords":{"Longitude":16.101728908989276,"Latitude":47.58270774508388},"Distance":8.7637246,"MappedObject":378585,"Source":349716,"Target":349717,"Length":0.2371537}}","cgmon_TrackSegmentIDs":[],"cgmon_TrackLength":0.0,"time":"2019-09-10T12:09:18.2576463+00:00","cgmon_MappedPoint":{"lng":16.101728908989276,"offset":8.7637246,"lat":47.58270774508388}},N/A,N/A
2019-09-10T12:08:19.000Z,CO 5050276,Lifesign,cgmon_Lifesign,{"lng":14.9097604,"alt":292.0,"time":"2019-09-10T12:04:01Z","error":12.0,"lat":48.1208139},N/A,N/A,N/A
2019-09-10T12:08:11.000Z,CO 5051153,Lifesign,cgmon_Lifesign,{"lng":15.2786873,"alt":476.0,"time":"2019-09-10T12:01:48Z","error":10.0,"lat":47.4239228},N/A,N/A,N/A
2019-09-10T12:08:02.000Z,CO 5051710,Lifesign,cgmon_Lifesign,{"lng":13.5639021,"alt":90.0,"time":"2019-09-10T12:08:31Z","error":5.0,"lat":55.9589473},N/A,N/A,N/A
The problem here is that the commas inside your json
string are being treated as delimiters. You should modify the input data (if you don’t have direct access to the file, you can always read the contents into a list of strings using open
first).
Here are a few modification options that you can try:
Option 1: Quote json
string with single quote
Use a single quote (or another character that doesn’t otherwise appear in your data) as a quote character for your json
string.
>> cat data.csv
Time,location,labelA,labelB
2019-09-10,'{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}',nan,nan
Then use quotechar="'"
when you read the data:
import pandas as pd
import json
df=pd.read_csv('data.csv', converters={'location':json.loads}, header=0, quotechar="'")
Option 2: Quote json
string with double quote and escape
If the single quote can’t be used, you can actually use the double quote as the quotechar
, as long as your escape the quotes inside the json
string:
>> cat data.csv
Time,location,labelA,labelB
2019-09-10,"{""lng"":12.9,""alt"":413.0,""time"":""2019-09-10"",""error"":7.0,""lat"":17.8}",nan,nan
Notice that this now matches the format of the question you linked.
df=pd.read_csv('data.csv', converters={'location':json.loads}, header=0, quotechar='"')
Option 3: Change the delimiter
Use a different character, for example the |
as the delimiter
>> cat data.csv
Time|location|labelA|labelB
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|nan|nan
Now use the sep
argument to specify the new delimiter:
df=pd.read_csv('data.csv', converters={'location':json.loads}, header=0, sep="|")
Each of these methods produce the same output:
print(df)
# Time location labelA labelB
#0 2019-09-10 {u'lat': 17.8, u'lng': 12.9, u'error': 7.0, u'... NaN NaN
Once you have that, you can expand the location
column using one of the methods described in Flatten JSON column in a Pandas DataFrame
new_df = df.join(pd.io.json.json_normalize(df["location"])).drop(["location"], axis=1)
print(new_df)
# Time labelA labelB alt error lat lng time
#0 2019-09-10 NaN NaN 413.0 7.0 17.8 12.9 2019-09-10
Fix the file:
- Unfortunately, the file is difficult to read because each row contains a
dict
, whosekey-value
pairs are separated by commas. - The easiest way to resolve the issue, is change the separators outside of each
dict
, from,
to|
. - The following code will read the existing file
- It assumes, the first row is the header, use
.replace(',', '|')
- Remaining rows will use a regular expression to replace
,
outside of{}
- Each line will be written to a new file.
- It assumes, the first row is the header, use
Code:
Data:
Time,location,labelA,labelB
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},{"ack":123,"bar":456},{"foo":123,"bar":456}
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},nan,nan
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},{"ack":123,"bar":456},{"foo":123,"bar":456}
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},nan,nan
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},{"ack":123,"bar":456},{"foo":123,"bar":456}
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},nan,nan
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},{"ack":123,"bar":456},{"foo":123,"bar":456}
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},nan,nan
Path.cwd()
assumescurrent working directory
, if this is not the case:Path('c:/some_path_to_my_file') / 'file_name.poo'
can be used
- pathlib is part of the standard library
- Python 3’s pathlib Module: Taming the File System
File repair:
import re
from pathlib import Path
p = Path.cwd() / 'test.csv'
p2 = Path.cwd() / 'test2.csv'
with p.open('r') as f:
with p2.open('w') as f2:
for cnt, line in enumerate(f):
if cnt == 0:
line = line.replace(',', '|')
else:
line = re.sub(r',(?=(((?!}).)*{)|[^{}]*$)', '|', line)
f2.write(line)
New file:
Time|location|labelA|labelB
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|{"ack":123,"bar":456}|{"foo":123,"bar":456}
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|nan|nan
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|{"ack":123,"bar":456}|{"foo":123,"bar":456}
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|nan|nan
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|{"ack":123,"bar":456}|{"foo":123,"bar":456}
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|nan|nan
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|{"ack":123,"bar":456}|{"foo":123,"bar":456}
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|nan|nan
Parse the new file:
- Now the columns will be properly separated by
.read_csv
- However, the
location
,labelA
andlabelB
columns arestr
- Use
ast.literal_eval
to convert todict
literal_eval
won’t work onnan
, so replacenan
with{}
- Use
for col in df.columns[1:]:
loops through each of the columns and:try-except
will catch any columns that are not properly formed- converts them from
str
todict
- separates the
keys
into columns concats
the columns to the existing dataframedrops
the old column
import pandas as pd
from ast import literal_eval
df = pd.read_csv('test2.csv', sep='|')
print(df)
Time location labelA labelB
2019-09-10 {"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8} {"ack":123,"bar":456} {"foo":123,"bar":456}
2019-09-10 {"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8} NaN NaN
2019-09-10 {"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8} {"ack":123,"bar":456} {"foo":123,"bar":456}
2019-09-10 {"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8} NaN NaN
2019-09-10 {"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8} {"ack":123,"bar":456} {"foo":123,"bar":456}
2019-09-10 {"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8} NaN NaN
2019-09-10 {"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8} {"ack":123,"bar":456} {"foo":123,"bar":456}
2019-09-10 {"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8} NaN NaN
for col in df.columns[1:]:
try:
df[col].fillna('{}', inplace=True)
df[col] = df[col].apply(literal_eval)
df = pd.concat([df, df[col].apply(pd.Series)], axis=1)
df.drop(columns=[col], inplace=True)
except (SyntaxError, ValueError) as e:
print(f'{col}: {e}')
print(df)
Time lng alt time error lat ack bar foo bar
2019-09-10 12.9 413.0 2019-09-10 7.0 17.8 123.0 456.0 123.0 456.0
2019-09-10 12.9 413.0 2019-09-10 7.0 17.8 NaN NaN NaN NaN
2019-09-10 12.9 413.0 2019-09-10 7.0 17.8 123.0 456.0 123.0 456.0
2019-09-10 12.9 413.0 2019-09-10 7.0 17.8 NaN NaN NaN NaN
2019-09-10 12.9 413.0 2019-09-10 7.0 17.8 123.0 456.0 123.0 456.0
2019-09-10 12.9 413.0 2019-09-10 7.0 17.8 NaN NaN NaN NaN
2019-09-10 12.9 413.0 2019-09-10 7.0 17.8 123.0 456.0 123.0 456.0
2019-09-10 12.9 413.0 2019-09-10 7.0 17.8 NaN NaN NaN NaN
Literal Eval Notes:
- Pandas has methods for importing data in many forms, such as
dict
orlist
. - However,
read_csv
doesn’t interprete containers (e.g.dict
) well, they are interpreted as a string, unless you specify theconverters
parameter (pd.read_csv('test3.csv', sep='|', converters={'a': literal_eval})
. literal_eval
will not work on a column comprised of both containers andstrings
orNaN
, unless thestring
is only numeric (e.g. ‘8654’)- Part of the code above, first replaced all
nan
with a{}
soliteral_eval
wouldn’t have an error. - Given the following mixed column example:
column_a
{"ack":123,"bar":456}
some string
{"ack":123,"bar":456}
some string
{"ack":123,"bar":456}
some string
literal_eval
will throwValueError: malformed node or string:
- This difference between the two solutions is the other solution fixes one column, whereas this solution was implemented in such a way as to fix all the columns and remove the necessity of reading only the first 100 rows.
- You can forgo the loop to fix all the columns and just fix the
location
column, if it is alldicts
. Use the following code:
df['location'] = df['location'].apply(literal_eval)
df = pd.concat([df, df['location'].apply(pd.Series)], axis=1)
Note about the actual data test100v1.csv
:
- the
location
column is not formed properly'{"lng":12.9975201,alt:413.0,"time:""2019-09-10T12:09:58Z""",error:7.0,lat:47.8258582}'
- Here is the expected form:
'{"lng":12.9975201,"alt":413.0,"time":"2019-09-10T12:09:58Z","error":7.0,"lat":47.8258582}'
Fix the location
column:
- The
location
column isPosition
in the real data
def fix_pos(x):
word_dict = {'alt': '"alt"',
'"time:"': '"time":',
'"",error:': ',"error":',
'lat': '"lat"'}
for k, v in word_dict.items():
x = x.replace(k, v)
return x
df.Position = df.Position.apply(lambda x: fix_pos(x))
- Use the following loop with the real data file.
Zeit
,device
,Text
&Type
don’t need to be processedPosition
is atindex
4.
for col in df.columns[4:]:
try:
df[col].fillna('{}', inplace=True)
df[col] = df[col].apply(literal_eval)
df = pd.concat([df, df[col].apply(pd.Series)], axis=1)
df.drop(columns=[col], inplace=True)
except (SyntaxError, ValueError) as e:
print(f'{col}: {e}')
- The loop that applies
literal_eval
to all columns has been updated withtry-except
- If there’s an
exception
thecolumn
name and error message will be printed out. - There are a total of 64 columns in the real data, most of them are Furchtbar.
- If there’s an
Errors:
- These are the errors for all the columns in the supplied
csv
file.
device: unexpected EOF while parsing (<unknown>, line 1)
Text: malformed node or string: <_ast.Name object at 0x00000203B8473C08>
Typ: malformed node or string: <_ast.Name object at 0x00000203BE217E08>
Data: unexpected EOF while parsing (<unknown>, line 1)
Data1: invalid syntax (<unknown>, line 1)
Data2: invalid syntax (<unknown>, line 1)
Unnamed: 8: invalid syntax (<unknown>, line 1)
Unnamed: 9: unexpected EOF while parsing (<unknown>, line 1)
Unnamed: 10: invalid syntax (<unknown>, line 1)
Unnamed: 11: unexpected EOF while parsing (<unknown>, line 1)
Unnamed: 12: invalid syntax (<unknown>, line 1)
Unnamed: 13: invalid syntax (<unknown>, line 1)
Unnamed: 14: invalid syntax (<unknown>, line 1)
Unnamed: 15: invalid syntax (<unknown>, line 1)
Unnamed: 16: invalid syntax (<unknown>, line 1)
Unnamed: 17: invalid syntax (<unknown>, line 1)
Unnamed: 18: invalid syntax (<unknown>, line 1)
Unnamed: 19: invalid syntax (<unknown>, line 1)
Unnamed: 20: invalid syntax (<unknown>, line 1)
Unnamed: 21: unexpected EOF while parsing (<unknown>, line 1)
Unnamed: 22: invalid syntax (<unknown>, line 1)
Unnamed: 23: invalid syntax (<unknown>, line 1)
Unnamed: 24: invalid syntax (<unknown>, line 1)
Unnamed: 25: invalid syntax (<unknown>, line 1)
Unnamed: 26: invalid syntax (<unknown>, line 1)
Unnamed: 27: invalid syntax (<unknown>, line 1)