dateparser python ignore trigger words

Question:

Let me first share a text:

I am Fox Sin of Greed came on Earth in 1666 BC. due date   right after
St. P was build in 16.05.1703 and bluh bluh  I moved to Moscow Feb
2nd, 2022 to work as per deadline  And today I read manga Due date for
my project is September 12, 2022 I wonder if Ill be able to pay by Oct
07, 2023 and so  The deadline is unknown by I assume would be 9102023
Bluh bluh Due Date 12-11-2022 30/08/2021 and 9/19/23

This is a randomly generated text to test dateparser and regex.
I wrote a function that is pretty good at recognising dates with regex, but excluding those that are in format [month as letters] [date as number], [year as number]
This is where I usually use dateparser as it’s capable of recognising those.. However, when there are ‘trigger words’ such as ‘may’ ‘to pay'(??) and such it fails.
Example:

I moved to Moscow Feb 2nd, 2022 to work as per deadline

 [('to', datetime.datetime(2022, 9, 8, 0, 0)), ('Feb 2nd, 2022 to', datetime.datetime(2022, 2, 2, 0, 0))]

This is good. It regognised ”Feb 2nd, 2022′ even tho added ‘to’ to ‘it’.

But next one:

I wonder if Ill be able to pay by Oct 07, 2023 and so

[('to pay', datetime.datetime(2022, 9, 8, 0, 0)), ('07, 2023', datetime.datetime(2023, 7, 8, 0, 0))]

it failed to connect october to ’07, 2023′.

This is used in extracting data from invoices and I have no control over in which formats dates come, so I was wondering if more experienced/skilled dateparser (possibly other python tools) users can help me avoid this problem.
Rn it seems to me that I need to avoid words such as ‘may’, ‘to pay’, ‘now’ etc.

Asked By: FoxSinofGreed

||

Answers:

If you know language of target text, you might provide it, which should prevent problems caused by bad language guess. After specifying language en I get one date as expected that is

from dateparser.search import search_dates
print(search_dates('I wonder if Ill be able to pay by Oct 07, 2023 and so',languages=['en']))

gives output

[('by Oct 07, 2023 and', datetime.datetime(2023, 10, 7, 0, 0))]

Nonetheless docs claims that

Warning Support for searching dates is really limited and needs a lot
of improvement

so you should be prepared that you might still get results not as desired.

Answered By: Daweo
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.