panda to_csv set column order when columns are missing occasionally


I’m using panda to convert json data to csv, but I want the column to be in certain order. Now, sometimes in the json data, some columns dont exist. so, this is what I use so far:

cols = ['a','b','c','d','e','f']

sometimes, if d doesnt exist, it would complain request failed because [d] is not in index. is there a way to make panda ignore non existing column but still maintain the column order?
btw, json contains nested object, but only 1 child level max.

so, in case of missing column, the column order should still be a,b,c,d,e,f, just the value of all rows will be empty for the missing columns.
Example if b and d is missing, then:



Asked By: medaliama



Perhaps try:

cols = ['a','b','c','d','e','f']
df = pd.DataFrame(pd.json_normalize(json))

If instead you want just the column that are in df, but in the order of cols:

df.to_csv(columns=[k for k in cols if k in df.columns])

Example (using on of pd.json_normalize examples):

data = [
        "id": 1,
        "name": "Cole Volk",
        "fitness": {"height": 130, "weight": 60},
    {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
        "id": 2,
        "name": "Faye Raker",
        "fitness": {"height": 130, "weight": 60},
df = pd.json_normalize(data, max_level=1)

>>> df
    id        name  fitness.height  fitness.weight
0  1.0   Cole Volk             130              60
1  NaN    Mark Reg             130              60
2  2.0  Faye Raker             130              60


cols = ['id', 'name', 'age', 'fitness.height', 'fitness.weight']

0,1.0,Cole Volk,,130,60
1,,Mark Reg,,130,60
2,2.0,Faye Raker,,130,60

Notice the column 'age' is not present in that df, so the column in the CSV is empty.

Answered By: Pierre D
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.