Push None or empty values to last of json array in datetime sorted lambda function
Question:
I am building a simple function But I am stuck on an error, I am trying to sort json array
based on datetime
defined it the response. But JSON array
also contains some None
and Empty string
dates like ""
. so It is showing
KeyError: ‘date’
when it sees None
or empty date value
so I am trying to push these type of value in the last of the sorted json array which have None and empty string values (date).
example_response = [
{
"id": 2959,
"original_language": "Permanent Job",
"date": "2012-10-26",
"absent": False
},
{
"id": 8752,
"original_language": "Intern Job",
"date": "",
"absent": True
},
{
"adult": False,
"id": 1300,
"title": "Training Job",
"date": "2020-07-25",
"absent": False
},
{
"adult": False,
"id": 7807,
"title": "Training Job",
"absent": False
},
]
program.py
def sorting_function(response):
if response == True:
sorted_data = sorted(example_response, key=lambda x: datetime.strptime(x['date'], "%Y-%m-%d"))
print(sorted_data)
return sorted_data
As you can see in example_response
one dict
has empty string and one don’t have "date".
When I run this function then it is showing KeyError: ‘date’
What I have tried ?
I have also tried using
sorted_data = sorted(example_response, key=lambda x: (x['date'] is None, x['date'] == "", x['date'], datetime.strptime(x['date']), "%Y-%m-%d"))
But it still showing KeyError
.
Any help would be much Appreciated.
Answers:
Dictionaries have a very useful get() function which you could utilise thus:
example_response = [
{
"id": 2959,
"original_language": "Permanent Job",
"date": "2012-10-26",
"absent": False
},
{
"id": 8752,
"original_language": "Intern Job",
"date": "",
"absent": True
},
{
"adult": False,
"id": 1300,
"title": "Training Job",
"date": "2020-07-25",
"absent": False
},
{
"adult": False,
"id": 7807,
"title": "Training Job",
"absent": False
}
]
example_response.sort(key=lambda d: d.get('date', ''))
print(example_response)
In this case, missing or empty ‘date’ values would precede any other dates.
Output:
[{'id': 8752, 'original_language': 'Intern Job', 'date': '', 'absent': True}, {'adult': False, 'id': 7807, 'title': 'Training Job', 'absent': False}, {'id': 2959, 'original_language': 'Permanent Job', 'date': '2012-10-26', 'absent': False}, {'adult': False, 'id': 1300, 'title': 'Training Job', 'date': '2020-07-25', 'absent': False}]
Don’t call strptime
if x['date']
is None
If the key is
lambda x: (x['date'] is None, datetime.strptime(x['date'], "%Y-%m-%d"))
Then the pair will be computed for all values, which means strptime
will be called on all x['date']
, including those that are None
.
I suggest using a conditional, in order to only call strptime
if x['date']
is not None
:
lambda x: (0, datetime.strptime(x['date'], "%Y-%m-%d")) if x['date'] is not None else (1, 0)
Use x.get('date')
instead of x['date']
if x
might be missing the 'date'
key
If x
is a dict that doesn’t have a 'date'
, then attempting to access x['date']
will always cause a KeyError
, even for something as simple as x['date'] is None
.
Instead, you can use dict.get
, which doesn’t cause errors. If a value is missing, dict.get
will return None
, or another value which you can provide as a second argument:
x = { "id": 2959, "original_language": "Permanent Job" }
print(x['date'])
# KeyError
print(x.get('date'))
# None
print(x.get('date', 42))
# 42
Finally, the key function for the sort becomes:
lambda x: (0, datetime.strptime(x.get('date'), "%Y-%m-%d")) if x.get('date') is not None else (1, 0)
Note that if the key function becomes too complex, it might be better to write it using def
instead of lambda
:
def key(x):
date = x.get('date')
if date is None:
return (1, 0)
else:
return (0, datetime.strptime(date, "%Y-%m-%d"))
You’re nearly on the right track, but you need to find a way to not evaluate the date string when it is invalid (key not present, or the value is the empty string).
The nice thing about dates is that chronological order is the same as lexicographical order (for ISO-8601 date formats — %Y-%m-%d
). So you don’t actually have to convert them to dates or datetimes — just sort them as strings.
That takes care of items in the sequence which have date keys. But what about ones where the date key is not present? There are three options.
- Use a a default value. eg. the empty string. However, this means no-date-items will be mixed together with empty-date-items.
- Use a fixed-length tuple where the first item indicates whether the date key is present or not and then a default when the value is not present. eg
(False, '')
, (True, '')
and (True, '2022-10-03')
. These values will sort in the order I gave them.
- Use a variable length tuple. Tuples with different lengths have a total ordering iff their shared elements are comparable. Much like strings do. eg.
car
sorts before care
. So we can use ()
to represent a no-date-item, ('',)
to represent an empty-date-item and ('2022-10-3',)
to represent a normal date string.
Using the third possibility you can do:
sorted(
example_response,
key=lambda item: (item['date'],) if 'date' in item else ()
)
This ensures items in the sequence with a date key are sorted separately to items where the value of date key is the empty string. However, both are sorted before all valid dates.
The keys and their sort order for your example would be:
[(), ('',), ('2012-10-26',), ('2020-07-25',)]
I am building a simple function But I am stuck on an error, I am trying to sort json array
based on datetime
defined it the response. But JSON array
also contains some None
and Empty string
dates like ""
. so It is showing
KeyError: ‘date’
when it seesNone
orempty date value
so I am trying to push these type of value in the last of the sorted json array which have None and empty string values (date).
example_response = [
{
"id": 2959,
"original_language": "Permanent Job",
"date": "2012-10-26",
"absent": False
},
{
"id": 8752,
"original_language": "Intern Job",
"date": "",
"absent": True
},
{
"adult": False,
"id": 1300,
"title": "Training Job",
"date": "2020-07-25",
"absent": False
},
{
"adult": False,
"id": 7807,
"title": "Training Job",
"absent": False
},
]
program.py
def sorting_function(response):
if response == True:
sorted_data = sorted(example_response, key=lambda x: datetime.strptime(x['date'], "%Y-%m-%d"))
print(sorted_data)
return sorted_data
As you can see in example_response
one dict
has empty string and one don’t have "date".
When I run this function then it is showing KeyError: ‘date’
What I have tried ?
I have also tried using
sorted_data = sorted(example_response, key=lambda x: (x['date'] is None, x['date'] == "", x['date'], datetime.strptime(x['date']), "%Y-%m-%d"))
But it still showing KeyError
.
Any help would be much Appreciated.
Dictionaries have a very useful get() function which you could utilise thus:
example_response = [
{
"id": 2959,
"original_language": "Permanent Job",
"date": "2012-10-26",
"absent": False
},
{
"id": 8752,
"original_language": "Intern Job",
"date": "",
"absent": True
},
{
"adult": False,
"id": 1300,
"title": "Training Job",
"date": "2020-07-25",
"absent": False
},
{
"adult": False,
"id": 7807,
"title": "Training Job",
"absent": False
}
]
example_response.sort(key=lambda d: d.get('date', ''))
print(example_response)
In this case, missing or empty ‘date’ values would precede any other dates.
Output:
[{'id': 8752, 'original_language': 'Intern Job', 'date': '', 'absent': True}, {'adult': False, 'id': 7807, 'title': 'Training Job', 'absent': False}, {'id': 2959, 'original_language': 'Permanent Job', 'date': '2012-10-26', 'absent': False}, {'adult': False, 'id': 1300, 'title': 'Training Job', 'date': '2020-07-25', 'absent': False}]
Don’t call strptime
if x['date']
is None
If the key is
lambda x: (x['date'] is None, datetime.strptime(x['date'], "%Y-%m-%d"))
Then the pair will be computed for all values, which means strptime
will be called on all x['date']
, including those that are None
.
I suggest using a conditional, in order to only call strptime
if x['date']
is not None
:
lambda x: (0, datetime.strptime(x['date'], "%Y-%m-%d")) if x['date'] is not None else (1, 0)
Use x.get('date')
instead of x['date']
if x
might be missing the 'date'
key
If x
is a dict that doesn’t have a 'date'
, then attempting to access x['date']
will always cause a KeyError
, even for something as simple as x['date'] is None
.
Instead, you can use dict.get
, which doesn’t cause errors. If a value is missing, dict.get
will return None
, or another value which you can provide as a second argument:
x = { "id": 2959, "original_language": "Permanent Job" }
print(x['date'])
# KeyError
print(x.get('date'))
# None
print(x.get('date', 42))
# 42
Finally, the key function for the sort becomes:
lambda x: (0, datetime.strptime(x.get('date'), "%Y-%m-%d")) if x.get('date') is not None else (1, 0)
Note that if the key function becomes too complex, it might be better to write it using def
instead of lambda
:
def key(x):
date = x.get('date')
if date is None:
return (1, 0)
else:
return (0, datetime.strptime(date, "%Y-%m-%d"))
You’re nearly on the right track, but you need to find a way to not evaluate the date string when it is invalid (key not present, or the value is the empty string).
The nice thing about dates is that chronological order is the same as lexicographical order (for ISO-8601 date formats — %Y-%m-%d
). So you don’t actually have to convert them to dates or datetimes — just sort them as strings.
That takes care of items in the sequence which have date keys. But what about ones where the date key is not present? There are three options.
- Use a a default value. eg. the empty string. However, this means no-date-items will be mixed together with empty-date-items.
- Use a fixed-length tuple where the first item indicates whether the date key is present or not and then a default when the value is not present. eg
(False, '')
,(True, '')
and(True, '2022-10-03')
. These values will sort in the order I gave them. - Use a variable length tuple. Tuples with different lengths have a total ordering iff their shared elements are comparable. Much like strings do. eg.
car
sorts beforecare
. So we can use()
to represent a no-date-item,('',)
to represent an empty-date-item and('2022-10-3',)
to represent a normal date string.
Using the third possibility you can do:
sorted(
example_response,
key=lambda item: (item['date'],) if 'date' in item else ()
)
This ensures items in the sequence with a date key are sorted separately to items where the value of date key is the empty string. However, both are sorted before all valid dates.
The keys and their sort order for your example would be:
[(), ('',), ('2012-10-26',), ('2020-07-25',)]