Push None or empty values to last of json array in datetime sorted lambda function

Question:

I am building a simple function But I am stuck on an error, I am trying to sort json array based on datetime defined it the response. But JSON array also contains some None and Empty string dates like "". so It is showing

KeyError: ‘date’
when it sees None or empty date value

so I am trying to push these type of value in the last of the sorted json array which have None and empty string values (date).

example_response = [
    {
      "id": 2959,
      "original_language": "Permanent Job",
      "date": "2012-10-26",
      "absent": False
    },
    {
      "id": 8752,
      "original_language": "Intern Job",
      "date": "",
      "absent": True
    },
    {
      "adult": False,
      "id": 1300,
      "title": "Training Job",
      "date": "2020-07-25",
      "absent": False
    },
    {
      "adult": False,
      "id": 7807,
      "title": "Training Job",
      "absent": False
    },
]

program.py

def sorting_function(response):
    if response == True:
        sorted_data = sorted(example_response, key=lambda x: datetime.strptime(x['date'], "%Y-%m-%d"))
        print(sorted_data)

    return sorted_data

As you can see in example_response one dict has empty string and one don’t have "date".
When I run this function then it is showing KeyError: ‘date’

What I have tried ?

I have also tried using

sorted_data = sorted(example_response, key=lambda x: (x['date'] is None, x['date'] == "", x['date'], datetime.strptime(x['date']), "%Y-%m-%d"))

But it still showing KeyError.

Any help would be much Appreciated.

Asked By: Vaneseaa

||

Answers:

Dictionaries have a very useful get() function which you could utilise thus:

example_response = [
    {
      "id": 2959,
      "original_language": "Permanent Job",
      "date": "2012-10-26",
      "absent": False
    },
    {
      "id": 8752,
      "original_language": "Intern Job",
      "date": "",
      "absent": True
    },
    {
      "adult": False,
      "id": 1300,
      "title": "Training Job",
      "date": "2020-07-25",
      "absent": False
    },
    {
      "adult": False,
      "id": 7807,
      "title": "Training Job",
      "absent": False
    }
]

example_response.sort(key=lambda d: d.get('date', ''))

print(example_response)

In this case, missing or empty ‘date’ values would precede any other dates.

Output:

[{'id': 8752, 'original_language': 'Intern Job', 'date': '', 'absent': True}, {'adult': False, 'id': 7807, 'title': 'Training Job', 'absent': False}, {'id': 2959, 'original_language': 'Permanent Job', 'date': '2012-10-26', 'absent': False}, {'adult': False, 'id': 1300, 'title': 'Training Job', 'date': '2020-07-25', 'absent': False}]
Answered By: OldBill

Don’t call strptime if x['date'] is None

If the key is

lambda x: (x['date'] is None, datetime.strptime(x['date'], "%Y-%m-%d"))

Then the pair will be computed for all values, which means strptime will be called on all x['date'], including those that are None.

I suggest using a conditional, in order to only call strptime if x['date'] is not None:

lambda x: (0, datetime.strptime(x['date'], "%Y-%m-%d")) if x['date'] is not None else (1, 0)

Use x.get('date') instead of x['date'] if x might be missing the 'date' key

If x is a dict that doesn’t have a 'date', then attempting to access x['date'] will always cause a KeyError, even for something as simple as x['date'] is None.

Instead, you can use dict.get, which doesn’t cause errors. If a value is missing, dict.get will return None, or another value which you can provide as a second argument:

x = { "id": 2959, "original_language": "Permanent Job" }

print(x['date'])
# KeyError

print(x.get('date'))
# None

print(x.get('date', 42))
# 42

Finally, the key function for the sort becomes:

lambda x: (0, datetime.strptime(x.get('date'), "%Y-%m-%d")) if x.get('date') is not None else (1, 0)

Note that if the key function becomes too complex, it might be better to write it using def instead of lambda:

def key(x):
    date = x.get('date')
    if date is None:
        return (1, 0)
    else:
        return (0, datetime.strptime(date, "%Y-%m-%d"))
Answered By: Stef

You’re nearly on the right track, but you need to find a way to not evaluate the date string when it is invalid (key not present, or the value is the empty string).

The nice thing about dates is that chronological order is the same as lexicographical order (for ISO-8601 date formats — %Y-%m-%d). So you don’t actually have to convert them to dates or datetimes — just sort them as strings.

That takes care of items in the sequence which have date keys. But what about ones where the date key is not present? There are three options.

  1. Use a a default value. eg. the empty string. However, this means no-date-items will be mixed together with empty-date-items.
  2. Use a fixed-length tuple where the first item indicates whether the date key is present or not and then a default when the value is not present. eg (False, ''), (True, '') and (True, '2022-10-03'). These values will sort in the order I gave them.
  3. Use a variable length tuple. Tuples with different lengths have a total ordering iff their shared elements are comparable. Much like strings do. eg. car sorts before care. So we can use () to represent a no-date-item, ('',) to represent an empty-date-item and ('2022-10-3',) to represent a normal date string.

Using the third possibility you can do:

sorted(
    example_response,
    key=lambda item: (item['date'],) if 'date' in item else ()
)

This ensures items in the sequence with a date key are sorted separately to items where the value of date key is the empty string. However, both are sorted before all valid dates.

The keys and their sort order for your example would be:

[(), ('',), ('2012-10-26',), ('2020-07-25',)]
Answered By: Dunes
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.