detecting year in list of strings
Question:
I have list of strings like this:
words = ['hello', 'world', 'name', '1', '2018']
I looking for the fastest way (python 3.6) to detect year “word” in the list. For example, “2018” is year. “1” not. Let’s define the acceptable year range to 2000-2020.
Possible solution
Check if the word is number ('2018'.isdigit()
) and then convert it to int and check if valid range.
What is the fastest way to do it in python?
Answers:
Concatenate list to one string with special split char. Use regex to search.
For example:
word_tmp = " ".join(words)
re.search("b20[0-2]db", word_tmp)
You can build a set of your valid years (as strings). Then loop through each of the words you want to test to check if it is a valid year:
words = ['hello', 'world', 'name', '1', '2018']
valid_years = {str(x) for x in range(2000,2021)}
for word in words:
if word in valid_years:
print word
As Martijn Pieters mentioned in the comments, sets are the fastest solution for accessing items with an O(1) complexity:
Sets let you test for membership in O(1) time, using a list has a linear O(length_of_list) cost
EDIT:
As you can see in the comments, there are a lot of different ways of generating the set of valid_years
, as long as your data structure is a Set you will have the fastest way of doing what you want.
You can read more here:
- List comprehension
- Sets
- Complexities for different Python data structures (so you can understand which data structures in Python are quicker for specific operations)
I have list of strings like this:
words = ['hello', 'world', 'name', '1', '2018']
I looking for the fastest way (python 3.6) to detect year “word” in the list. For example, “2018” is year. “1” not. Let’s define the acceptable year range to 2000-2020.
Possible solution
Check if the word is number ('2018'.isdigit()
) and then convert it to int and check if valid range.
What is the fastest way to do it in python?
Concatenate list to one string with special split char. Use regex to search.
For example:
word_tmp = " ".join(words)
re.search("b20[0-2]db", word_tmp)
You can build a set of your valid years (as strings). Then loop through each of the words you want to test to check if it is a valid year:
words = ['hello', 'world', 'name', '1', '2018']
valid_years = {str(x) for x in range(2000,2021)}
for word in words:
if word in valid_years:
print word
As Martijn Pieters mentioned in the comments, sets are the fastest solution for accessing items with an O(1) complexity:
Sets let you test for membership in O(1) time, using a list has a linear O(length_of_list) cost
EDIT:
As you can see in the comments, there are a lot of different ways of generating the set of valid_years
, as long as your data structure is a Set you will have the fastest way of doing what you want.
You can read more here:
- List comprehension
- Sets
- Complexities for different Python data structures (so you can understand which data structures in Python are quicker for specific operations)