How to sort output before writing to JSON
Question:
I have a problem with sorting data. When I use the json_dic.sort(key= lambda x: x['lastLogonTimestamp'], reverse= False)
command I get an error TypeError: '<' not supported between instances of 'str' and 'list'
. When I checked the output with the type()
function, I found that I was getting the response in two classes, <class 'list'>
for objects that are empty "lastLogonTimestamp": []
and <class 'str'>
for those that have a value dates "lastLogonTimestamp": "2021-02-15 06:35:34.363626+00:00"
My input:
json_dic = [
{
"lastLogonTimestamp": [],
"sAMAccountName": "batman"
},
{
"lastLogonTimestamp": "2021-02-15 06:35:34.363626+00:00",
"sAMAccountName": "superman"
},
{
"lastLogonTimestamp": "2022-09-12 04:08:01.311201+00:00",
"sAMAccountName": "green-lantern"
},
{
"lastLogonTimestamp": "2022-09-13 04:48:43.275908+00:00",
"sAMAccountName": "wonder-woman"
},
{
"lastLogonTimestamp": [],
"sAMAccountName": "hulk"
}
]
EDIT: after following the advice I used this and it works fine
def lastLogonTimestamp_sort(value):
timestamp = value['lastLogonTimestamp']
is_str = isinstance(timestamp, str)
return is_str, str(timestamp)
json_dic.sort(key=lastLogonTimestamp_sort, reverse=False)
Answers:
Convert the sort key to a str
. This will not modify the original type:
>>> json_dic.sort(key= lambda x: str(x['lastLogonTimestamp']))
>>> json_dic
[
{
"lastLogonTimestamp": "2021-02-15 06:35:34.363626+00:00",
"sAMAccountName": "superman"
},
{
"lastLogonTimestamp": "2022-09-12 04:08:01.311201+00:00",
"sAMAccountName": "green-lantern"
},
{
"lastLogonTimestamp": "2022-09-13 04:48:43.275908+00:00",
"sAMAccountName": "wonder-woman"
},
{
"lastLogonTimestamp": [],
"sAMAccountName": "batman"
},
{
"lastLogonTimestamp": [],
"sAMAccountName": "hulk"
}
]
If you want empty lists to come first, you could use:
>>> json_dic.sort(key= lambda x: x['lastLogonTimestamp'] or "")
You need to make sure that the values returned by the key
function have a uniform type. One option would be to convert to a string in all cases.
However, if you just transform the items into a string for the sort, you may end up with non-strings being sorted in between actual strings because of how they resolve with str
.
To make sorting more predictable you can make a key
function that returns a tuple of items to compare on, one per condition you want to check.
def timestamp_sort(value):
timestamp = value['lastLogonTimestamp']
is_str = isinstance(timestamp, str)
return is_str, str(timestamp)
json_dic.sort(key=timestamp_sort, reverse=False)
print(json.dumps(json_dic, indent=2))
[
{
"lastLogonTimestamp": [],
"sAMAccountName": "batman"
},
{
"lastLogonTimestamp": [],
"sAMAccountName": "hulk"
},
{
"lastLogonTimestamp": "2021-02-15 06:35:34.363626+00:00",
"sAMAccountName": "superman"
},
{
"lastLogonTimestamp": "2022-09-12 04:08:01.311201+00:00",
"sAMAccountName": "green-lantern"
},
{
"lastLogonTimestamp": "2022-09-13 04:48:43.275908+00:00",
"sAMAccountName": "wonder-woman"
}
]
By changing return is_str, str(timestamp)
to return not is_str, str(timestamp)
you can now easily flip if non string values show up at the start or the back of the list, but never in between.
When there’s an empty list in the lastLogonTimestamp
field, use an empty string as the key. Otherwise, use lastLogonTimestamp
.
Using a helper function instead of a lambda function makes it easier to see what’s happening.
json_dic.sort(key=use_timestamp_key, reverse=False)
...
def use_timestamp_key(list_item):
if isinstance(list_item["lastLogonTimestamp"], list):
return ""
return list_item["lastLogonTimestamp"]
The issue with sorting timestamps as strings (as proposed in the other answers) is that 2022-09-13 04:48:43.275908+00:00
will sort before 2022-09-13 04:48:43.275908+01:00
even though it is actually a later time. It would be better to sort using datetime
objects instead. For example:
from datetime import datetime
def ts(item):
try:
return datetime.strptime(item['lastLogonTimestamp'], '%Y-%m-%d %H:%M:%S.%f%z').timestamp()
except:
return 0 # errors will sort first
# return inf if you want errors to sort last instead'
sorted(json_dic, key=ts)
Output:
[
{'lastLogonTimestamp': [], 'sAMAccountName': 'batman'},
{'lastLogonTimestamp': [], 'sAMAccountName': 'hulk'},
{'lastLogonTimestamp': '2021-02-15 06:35:34.363626+00:00', 'sAMAccountName': 'superman'},
{'lastLogonTimestamp': '2022-09-12 04:08:01.311201+00:00', 'sAMAccountName': 'green-lantern'},
{'lastLogonTimestamp': '2022-09-13 04:48:43.275908+00:00', 'sAMAccountName': 'wonder-woman'}
]
I have a problem with sorting data. When I use the json_dic.sort(key= lambda x: x['lastLogonTimestamp'], reverse= False)
command I get an error TypeError: '<' not supported between instances of 'str' and 'list'
. When I checked the output with the type()
function, I found that I was getting the response in two classes, <class 'list'>
for objects that are empty "lastLogonTimestamp": []
and <class 'str'>
for those that have a value dates "lastLogonTimestamp": "2021-02-15 06:35:34.363626+00:00"
My input:
json_dic = [
{
"lastLogonTimestamp": [],
"sAMAccountName": "batman"
},
{
"lastLogonTimestamp": "2021-02-15 06:35:34.363626+00:00",
"sAMAccountName": "superman"
},
{
"lastLogonTimestamp": "2022-09-12 04:08:01.311201+00:00",
"sAMAccountName": "green-lantern"
},
{
"lastLogonTimestamp": "2022-09-13 04:48:43.275908+00:00",
"sAMAccountName": "wonder-woman"
},
{
"lastLogonTimestamp": [],
"sAMAccountName": "hulk"
}
]
EDIT: after following the advice I used this and it works fine
def lastLogonTimestamp_sort(value):
timestamp = value['lastLogonTimestamp']
is_str = isinstance(timestamp, str)
return is_str, str(timestamp)
json_dic.sort(key=lastLogonTimestamp_sort, reverse=False)
Convert the sort key to a str
. This will not modify the original type:
>>> json_dic.sort(key= lambda x: str(x['lastLogonTimestamp']))
>>> json_dic
[
{
"lastLogonTimestamp": "2021-02-15 06:35:34.363626+00:00",
"sAMAccountName": "superman"
},
{
"lastLogonTimestamp": "2022-09-12 04:08:01.311201+00:00",
"sAMAccountName": "green-lantern"
},
{
"lastLogonTimestamp": "2022-09-13 04:48:43.275908+00:00",
"sAMAccountName": "wonder-woman"
},
{
"lastLogonTimestamp": [],
"sAMAccountName": "batman"
},
{
"lastLogonTimestamp": [],
"sAMAccountName": "hulk"
}
]
If you want empty lists to come first, you could use:
>>> json_dic.sort(key= lambda x: x['lastLogonTimestamp'] or "")
You need to make sure that the values returned by the key
function have a uniform type. One option would be to convert to a string in all cases.
However, if you just transform the items into a string for the sort, you may end up with non-strings being sorted in between actual strings because of how they resolve with str
.
To make sorting more predictable you can make a key
function that returns a tuple of items to compare on, one per condition you want to check.
def timestamp_sort(value):
timestamp = value['lastLogonTimestamp']
is_str = isinstance(timestamp, str)
return is_str, str(timestamp)
json_dic.sort(key=timestamp_sort, reverse=False)
print(json.dumps(json_dic, indent=2))
[
{
"lastLogonTimestamp": [],
"sAMAccountName": "batman"
},
{
"lastLogonTimestamp": [],
"sAMAccountName": "hulk"
},
{
"lastLogonTimestamp": "2021-02-15 06:35:34.363626+00:00",
"sAMAccountName": "superman"
},
{
"lastLogonTimestamp": "2022-09-12 04:08:01.311201+00:00",
"sAMAccountName": "green-lantern"
},
{
"lastLogonTimestamp": "2022-09-13 04:48:43.275908+00:00",
"sAMAccountName": "wonder-woman"
}
]
By changing return is_str, str(timestamp)
to return not is_str, str(timestamp)
you can now easily flip if non string values show up at the start or the back of the list, but never in between.
When there’s an empty list in the lastLogonTimestamp
field, use an empty string as the key. Otherwise, use lastLogonTimestamp
.
Using a helper function instead of a lambda function makes it easier to see what’s happening.
json_dic.sort(key=use_timestamp_key, reverse=False)
...
def use_timestamp_key(list_item):
if isinstance(list_item["lastLogonTimestamp"], list):
return ""
return list_item["lastLogonTimestamp"]
The issue with sorting timestamps as strings (as proposed in the other answers) is that 2022-09-13 04:48:43.275908+00:00
will sort before 2022-09-13 04:48:43.275908+01:00
even though it is actually a later time. It would be better to sort using datetime
objects instead. For example:
from datetime import datetime
def ts(item):
try:
return datetime.strptime(item['lastLogonTimestamp'], '%Y-%m-%d %H:%M:%S.%f%z').timestamp()
except:
return 0 # errors will sort first
# return inf if you want errors to sort last instead'
sorted(json_dic, key=ts)
Output:
[
{'lastLogonTimestamp': [], 'sAMAccountName': 'batman'},
{'lastLogonTimestamp': [], 'sAMAccountName': 'hulk'},
{'lastLogonTimestamp': '2021-02-15 06:35:34.363626+00:00', 'sAMAccountName': 'superman'},
{'lastLogonTimestamp': '2022-09-12 04:08:01.311201+00:00', 'sAMAccountName': 'green-lantern'},
{'lastLogonTimestamp': '2022-09-13 04:48:43.275908+00:00', 'sAMAccountName': 'wonder-woman'}
]