Python: Using AND and OR with .FIND() method
Question:
imagine I type the following code into the interpreter:
var1 = 'zuuzuu'
now suppose i type:
var1.find('a')
the interpreter returns -1. which i understand because the substring has not been found. but please help me understand this:
var1.find('a' or 'z') #case 1
returns -1
but
var1.find('a' and 'z') #case 2
returns 0
According to the logic in my head the interpreter should return -1 for case 2 because the substrings ‘a’ AND ‘z’ are NOT located in the string. While in case 1, 0 should be returned since ‘z’ is a substring.
thanks
Answers:
Expression 'a' or 'z'
always yields 'a'
. Expression 'a' and 'z'
always yields 'z'
. It’s not some kind of DSL for making queries into containers, it’s a simple boolean expression (and find
is called with its result). If you want to say “is there ‘a’ or ‘z’ in the string”, you need to do
var1.find('a') != -1 or var.find('z') != -1
And for the second one (both ‘a’ and ‘z’ in the string):
var1.find('a') != -1 and var.find('z') != -1
This is because the find
method does not in fact support or
and and
, it only supports querying for a string.
So, what is really going on? Well, it turns out that or
and and
are operators that can be performed on strings.
'a' and 'z' --> 'z'
'a' or 'z' --> 'a'
So there you have it, you’re basically just searching for 'a'
and 'z'
as normal.
def count_tokens(text):
#Tokenizes the given text and returns a dictionary with the count of each distinct token.
# First, split the text into individual words
words = text.split()
# Next, create an empty dictionary to hold the token counts
token_counts = {}
# Loop over the words and count how many times each one appears
for word in words:
if word in token_counts:
token_counts[word] += 1
else:
token_counts[word] = 1
# Finally, return the token counts dictionary
return token_counts
text = "This is a clock. This is only a clock."
counts = count_tokens(text)
print(counts)
### stopword function
import nltk
from nltk.corpus import stopwords
def count_tokens(text):
#Tokenizes the given text, removes stopwords, and returns a dictionary with the count of each distinct token.
# First, split the text into individual words
words = text.split()
# Next, remove stopwords from the words
stop_words = set(stopwords.words('english'))
words = [word for word in words if word.lower() not in stop_words]
# Next, create an empty dictionary to hold the token counts
token_counts = {}
# Loop over the words and count how many times each one appears
for word in words:
if word in token_counts:
token_counts[word] += 1
else:
token_counts[word] = 1
# Finally, return the token counts dictionary
return token_counts
text = "This is a clock. This is only a clock."
counts = count_tokens(text)
print(counts)
imagine I type the following code into the interpreter:
var1 = 'zuuzuu'
now suppose i type:
var1.find('a')
the interpreter returns -1. which i understand because the substring has not been found. but please help me understand this:
var1.find('a' or 'z') #case 1
returns -1
but
var1.find('a' and 'z') #case 2
returns 0
According to the logic in my head the interpreter should return -1 for case 2 because the substrings ‘a’ AND ‘z’ are NOT located in the string. While in case 1, 0 should be returned since ‘z’ is a substring.
thanks
Expression 'a' or 'z'
always yields 'a'
. Expression 'a' and 'z'
always yields 'z'
. It’s not some kind of DSL for making queries into containers, it’s a simple boolean expression (and find
is called with its result). If you want to say “is there ‘a’ or ‘z’ in the string”, you need to do
var1.find('a') != -1 or var.find('z') != -1
And for the second one (both ‘a’ and ‘z’ in the string):
var1.find('a') != -1 and var.find('z') != -1
This is because the find
method does not in fact support or
and and
, it only supports querying for a string.
So, what is really going on? Well, it turns out that or
and and
are operators that can be performed on strings.
'a' and 'z' --> 'z'
'a' or 'z' --> 'a'
So there you have it, you’re basically just searching for 'a'
and 'z'
as normal.
def count_tokens(text):
#Tokenizes the given text and returns a dictionary with the count of each distinct token.
# First, split the text into individual words
words = text.split()
# Next, create an empty dictionary to hold the token counts
token_counts = {}
# Loop over the words and count how many times each one appears
for word in words:
if word in token_counts:
token_counts[word] += 1
else:
token_counts[word] = 1
# Finally, return the token counts dictionary
return token_counts
text = "This is a clock. This is only a clock."
counts = count_tokens(text)
print(counts)
### stopword function
import nltk
from nltk.corpus import stopwords
def count_tokens(text):
#Tokenizes the given text, removes stopwords, and returns a dictionary with the count of each distinct token.
# First, split the text into individual words
words = text.split()
# Next, remove stopwords from the words
stop_words = set(stopwords.words('english'))
words = [word for word in words if word.lower() not in stop_words]
# Next, create an empty dictionary to hold the token counts
token_counts = {}
# Loop over the words and count how many times each one appears
for word in words:
if word in token_counts:
token_counts[word] += 1
else:
token_counts[word] = 1
# Finally, return the token counts dictionary
return token_counts
text = "This is a clock. This is only a clock."
counts = count_tokens(text)
print(counts)