How to sort output before writing to JSON

Question:

I have a problem with sorting data. When I use the json_dic.sort(key= lambda x: x['lastLogonTimestamp'], reverse= False) command I get an error TypeError: '<' not supported between instances of 'str' and 'list'. When I checked the output with the type() function, I found that I was getting the response in two classes, <class 'list'> for objects that are empty "lastLogonTimestamp": [] and <class 'str'> for those that have a value dates "lastLogonTimestamp": "2021-02-15 06:35:34.363626+00:00"

My input:

json_dic = [
    {
        "lastLogonTimestamp": [],
        "sAMAccountName": "batman"
    },
    {
        "lastLogonTimestamp": "2021-02-15 06:35:34.363626+00:00",
        "sAMAccountName": "superman"
    },
    {
        "lastLogonTimestamp": "2022-09-12 04:08:01.311201+00:00",
        "sAMAccountName": "green-lantern"
    },
    {
        "lastLogonTimestamp": "2022-09-13 04:48:43.275908+00:00",
        "sAMAccountName": "wonder-woman"
    },
    {
        "lastLogonTimestamp": [],
        "sAMAccountName": "hulk"
    }
]

EDIT: after following the advice I used this and it works fine

def lastLogonTimestamp_sort(value):
    timestamp = value['lastLogonTimestamp']
    is_str = isinstance(timestamp, str)
    return is_str, str(timestamp)

json_dic.sort(key=lastLogonTimestamp_sort, reverse=False)
Asked By: Kubix

||

Answers:

Convert the sort key to a str. This will not modify the original type:

>>> json_dic.sort(key= lambda x: str(x['lastLogonTimestamp']))
>>> json_dic
[
  {
    "lastLogonTimestamp": "2021-02-15 06:35:34.363626+00:00",
    "sAMAccountName": "superman"
  },
  {
    "lastLogonTimestamp": "2022-09-12 04:08:01.311201+00:00",
    "sAMAccountName": "green-lantern"
  },
  {
    "lastLogonTimestamp": "2022-09-13 04:48:43.275908+00:00",
    "sAMAccountName": "wonder-woman"
  },
  {
    "lastLogonTimestamp": [],
    "sAMAccountName": "batman"
  },
  {
    "lastLogonTimestamp": [],
    "sAMAccountName": "hulk"
  }
]

If you want empty lists to come first, you could use:

>>> json_dic.sort(key= lambda x: x['lastLogonTimestamp'] or "")
Answered By: Selcuk

You need to make sure that the values returned by the key function have a uniform type. One option would be to convert to a string in all cases.

However, if you just transform the items into a string for the sort, you may end up with non-strings being sorted in between actual strings because of how they resolve with str.

To make sorting more predictable you can make a key function that returns a tuple of items to compare on, one per condition you want to check.

def timestamp_sort(value):
    timestamp = value['lastLogonTimestamp']
    is_str = isinstance(timestamp, str)
    return is_str, str(timestamp)


json_dic.sort(key=timestamp_sort, reverse=False)
print(json.dumps(json_dic, indent=2))
[
  {
    "lastLogonTimestamp": [],
    "sAMAccountName": "batman"
  },
  {
    "lastLogonTimestamp": [],
    "sAMAccountName": "hulk"
  },
  {
    "lastLogonTimestamp": "2021-02-15 06:35:34.363626+00:00",
    "sAMAccountName": "superman"
  },
  {
    "lastLogonTimestamp": "2022-09-12 04:08:01.311201+00:00",
    "sAMAccountName": "green-lantern"
  },
  {
    "lastLogonTimestamp": "2022-09-13 04:48:43.275908+00:00",
    "sAMAccountName": "wonder-woman"
  }
]

By changing return is_str, str(timestamp) to return not is_str, str(timestamp) you can now easily flip if non string values show up at the start or the back of the list, but never in between.

Answered By: flakes

When there’s an empty list in the lastLogonTimestamp field, use an empty string as the key. Otherwise, use lastLogonTimestamp.

Using a helper function instead of a lambda function makes it easier to see what’s happening.

json_dic.sort(key=use_timestamp_key, reverse=False)

...

def use_timestamp_key(list_item):
    if isinstance(list_item["lastLogonTimestamp"], list):
        return ""
    return list_item["lastLogonTimestamp"]
Answered By: Chase

The issue with sorting timestamps as strings (as proposed in the other answers) is that 2022-09-13 04:48:43.275908+00:00 will sort before 2022-09-13 04:48:43.275908+01:00 even though it is actually a later time. It would be better to sort using datetime objects instead. For example:

from datetime import datetime

def ts(item):
    try:
        return datetime.strptime(item['lastLogonTimestamp'], '%Y-%m-%d %H:%M:%S.%f%z').timestamp()
    except:
        return 0    # errors will sort first
        # return inf if you want errors to sort last instead'

sorted(json_dic, key=ts)

Output:

[
 {'lastLogonTimestamp': [], 'sAMAccountName': 'batman'},
 {'lastLogonTimestamp': [], 'sAMAccountName': 'hulk'},
 {'lastLogonTimestamp': '2021-02-15 06:35:34.363626+00:00', 'sAMAccountName': 'superman'},
 {'lastLogonTimestamp': '2022-09-12 04:08:01.311201+00:00', 'sAMAccountName': 'green-lantern'},
 {'lastLogonTimestamp': '2022-09-13 04:48:43.275908+00:00', 'sAMAccountName': 'wonder-woman'}
]
Answered By: Nick
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.