panda to_csv set column order when columns are missing occasionally
Question:
I’m using panda to convert json data to csv, but I want the column to be in certain order. Now, sometimes in the json data, some columns dont exist. so, this is what I use so far:
cols = ['a','b','c','d','e','f']
pd.DataFrame(pd.json_normalize(json)).to_csv(columns=cols)
sometimes, if d
doesnt exist, it would complain request failed because [d] is not in index
. is there a way to make panda ignore non existing column but still maintain the column order?
btw, json contains nested object, but only 1 child level max.
so, in case of missing column, the column order should still be a,b,c,d,e,f
, just the value of all rows will be empty for the missing columns.
Example if b
and d
is missing, then:
a,b,c,d,e,f
one,,three,,five,six
Thanks
Answers:
Perhaps try:
cols = ['a','b','c','d','e','f']
df = pd.DataFrame(pd.json_normalize(json))
df.reindex(columns=cols).to_csv()
If instead you want just the column that are in df
, but in the order of cols
:
df.to_csv(columns=[k for k in cols if k in df.columns])
Example (using on of pd.json_normalize
examples):
data = [
{
"id": 1,
"name": "Cole Volk",
"fitness": {"height": 130, "weight": 60},
},
{"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
{
"id": 2,
"name": "Faye Raker",
"fitness": {"height": 130, "weight": 60},
},
]
df = pd.json_normalize(data, max_level=1)
>>> df
id name fitness.height fitness.weight
0 1.0 Cole Volk 130 60
1 NaN Mark Reg 130 60
2 2.0 Faye Raker 130 60
Then:
cols = ['id', 'name', 'age', 'fitness.height', 'fitness.weight']
print(df.reindex(columns=cols).to_csv())
,id,name,age,fitness.height,fitness.weight
0,1.0,Cole Volk,,130,60
1,,Mark Reg,,130,60
2,2.0,Faye Raker,,130,60
Notice the column 'age'
is not present in that df
, so the column in the CSV is empty.
I’m using panda to convert json data to csv, but I want the column to be in certain order. Now, sometimes in the json data, some columns dont exist. so, this is what I use so far:
cols = ['a','b','c','d','e','f']
pd.DataFrame(pd.json_normalize(json)).to_csv(columns=cols)
sometimes, if d
doesnt exist, it would complain request failed because [d] is not in index
. is there a way to make panda ignore non existing column but still maintain the column order?
btw, json contains nested object, but only 1 child level max.
so, in case of missing column, the column order should still be a,b,c,d,e,f
, just the value of all rows will be empty for the missing columns.
Example if b
and d
is missing, then:
a,b,c,d,e,f
one,,three,,five,six
Thanks
Perhaps try:
cols = ['a','b','c','d','e','f']
df = pd.DataFrame(pd.json_normalize(json))
df.reindex(columns=cols).to_csv()
If instead you want just the column that are in df
, but in the order of cols
:
df.to_csv(columns=[k for k in cols if k in df.columns])
Example (using on of pd.json_normalize
examples):
data = [
{
"id": 1,
"name": "Cole Volk",
"fitness": {"height": 130, "weight": 60},
},
{"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
{
"id": 2,
"name": "Faye Raker",
"fitness": {"height": 130, "weight": 60},
},
]
df = pd.json_normalize(data, max_level=1)
>>> df
id name fitness.height fitness.weight
0 1.0 Cole Volk 130 60
1 NaN Mark Reg 130 60
2 2.0 Faye Raker 130 60
Then:
cols = ['id', 'name', 'age', 'fitness.height', 'fitness.weight']
print(df.reindex(columns=cols).to_csv())
,id,name,age,fitness.height,fitness.weight
0,1.0,Cole Volk,,130,60
1,,Mark Reg,,130,60
2,2.0,Faye Raker,,130,60
Notice the column 'age'
is not present in that df
, so the column in the CSV is empty.