python read only integers from file
Question:
I can’t figure the way to read from this file, only the integers:
34
-1
2 48
+0
++2
+1
2.4
1000
-0
three
-1
The function should return:
[34, -1, 0, 1, -1]
If a number has +
or -
is valid. But if it has ++
or any letters is not.
If it has a space (as for example 2 48
) is not valid.
If it is > 999 it is not valid.
I am stuck only at here:
my_list = []
with open('test.txt') as f:
lines = f.readlines()
for line in lines:
my_list.append(line.strip())
I tried to make it a string and use punctuation using translate
but I am not sure if it gets more complicated.
Also, I am not sure about using regex. I tried a simple regex but I don’t have experience using it.
Answers:
I think regex is the way to go for you. You can achieve what you want with somethin like this: [-+]?d*
It looks for a + or -, the question mark means optional and then for an arbitrary number of digits.
An easy way to find the right regex for your case is https://regex101.com/. You can directly see what is matched by your regex and it is explained to you. In python, regular exressions can be used by the re module (https://docs.python.org/2/library/re.html)
Hope this helps you.
You can convert string to integer using int()
. It will throw ValueError
if string is not integer. So try this:
my_list = []
with open('test.txt') as f:
for line in f:
try:
n = int(line)
if n > 999 or line.strip() == '-0':
#filtering numbers >999 and strings with '-0'
continue
my_list.append(n)
except ValueError:
pass
print(my_list)
Output: [34, -1, 0, 1, -1]
If you want to do it by hand (note that a regex
solution or calling int
are probably more suitable but these are already covered in other answers) then you can also implement each check by yourself:
import string
characters_and_whitspaces = set(string.ascii_letters + ' .')
mylist = []
for line in lines:
# remove leading and trailing whitespaces
val = line.strip()
# Check if valid (!= -0)
if val == '-0':
continue
# Must not start with ++, +-, ....
if val.startswith(('++', '+-', '-+', '--')):
continue
# Must not contain letters or whitespaces or a dot
if characters_and_whitspaces.intersection(val):
continue
# Must only contain 3 or less digits (<= 999) or 4 if it starts with + or -
if val.startswith(('+', '-')):
if len(val) >= 5):
continue
elif len(val) >= 4:
continue
# Remove leading "+"
val = val.lstrip('+')
mylist.append(val)
If you wanted to do this via regular expressions:
import re
exp = re.compile(r'^[+,-]?[0-9]{1,3}$')
my_list = []
with open('input.txt') as f:
lines = f.readlines()
for line in lines:
if re.match(exp, line.strip()):
my_list.append(int(line.strip()))
Lets explain the regular expressions.
^[+,-]?
– ^
means the expression must start with the next qualifiers, which are a list of two characters +
and -
. We need the escaping slashes there to actually put in the special chars in. The final ?
makes the preceding argument optional (so the string can start with a + or -, or nothing).
[0-9]{1,3}$
– [0-9]
specifies the set of characters that are numbers. {1,3}
specifies that they should occur a minimum of one time, or a max of 3 times (hence satisfying your <999
constraint. The $
sign matches the end of the string, so the string must end with this set of chars.
Hope this all helps.
Here’s a regexp solution:
import re
rgx = re.compile(r'^s*[-+]?s*(?:0|0*d{1,3})s*$', re.M)
with open('test.txt') as f:
my_list = [int(match) for match in rgx.findall(f.read())]
Output:
[34, -1, 0, 1, 0, -1]
I can’t figure the way to read from this file, only the integers:
34
-1
2 48
+0
++2
+1
2.4
1000
-0
three
-1
The function should return:
[34, -1, 0, 1, -1]
If a number has +
or -
is valid. But if it has ++
or any letters is not.
If it has a space (as for example 2 48
) is not valid.
If it is > 999 it is not valid.
I am stuck only at here:
my_list = []
with open('test.txt') as f:
lines = f.readlines()
for line in lines:
my_list.append(line.strip())
I tried to make it a string and use punctuation using translate
but I am not sure if it gets more complicated.
Also, I am not sure about using regex. I tried a simple regex but I don’t have experience using it.
I think regex is the way to go for you. You can achieve what you want with somethin like this: [-+]?d*
It looks for a + or -, the question mark means optional and then for an arbitrary number of digits.
An easy way to find the right regex for your case is https://regex101.com/. You can directly see what is matched by your regex and it is explained to you. In python, regular exressions can be used by the re module (https://docs.python.org/2/library/re.html)
Hope this helps you.
You can convert string to integer using int()
. It will throw ValueError
if string is not integer. So try this:
my_list = []
with open('test.txt') as f:
for line in f:
try:
n = int(line)
if n > 999 or line.strip() == '-0':
#filtering numbers >999 and strings with '-0'
continue
my_list.append(n)
except ValueError:
pass
print(my_list)
Output: [34, -1, 0, 1, -1]
If you want to do it by hand (note that a regex
solution or calling int
are probably more suitable but these are already covered in other answers) then you can also implement each check by yourself:
import string
characters_and_whitspaces = set(string.ascii_letters + ' .')
mylist = []
for line in lines:
# remove leading and trailing whitespaces
val = line.strip()
# Check if valid (!= -0)
if val == '-0':
continue
# Must not start with ++, +-, ....
if val.startswith(('++', '+-', '-+', '--')):
continue
# Must not contain letters or whitespaces or a dot
if characters_and_whitspaces.intersection(val):
continue
# Must only contain 3 or less digits (<= 999) or 4 if it starts with + or -
if val.startswith(('+', '-')):
if len(val) >= 5):
continue
elif len(val) >= 4:
continue
# Remove leading "+"
val = val.lstrip('+')
mylist.append(val)
If you wanted to do this via regular expressions:
import re
exp = re.compile(r'^[+,-]?[0-9]{1,3}$')
my_list = []
with open('input.txt') as f:
lines = f.readlines()
for line in lines:
if re.match(exp, line.strip()):
my_list.append(int(line.strip()))
Lets explain the regular expressions.
^[+,-]?
– ^
means the expression must start with the next qualifiers, which are a list of two characters +
and -
. We need the escaping slashes there to actually put in the special chars in. The final ?
makes the preceding argument optional (so the string can start with a + or -, or nothing).
[0-9]{1,3}$
– [0-9]
specifies the set of characters that are numbers. {1,3}
specifies that they should occur a minimum of one time, or a max of 3 times (hence satisfying your <999
constraint. The $
sign matches the end of the string, so the string must end with this set of chars.
Hope this all helps.
Here’s a regexp solution:
import re
rgx = re.compile(r'^s*[-+]?s*(?:0|0*d{1,3})s*$', re.M)
with open('test.txt') as f:
my_list = [int(match) for match in rgx.findall(f.read())]
Output:
[34, -1, 0, 1, 0, -1]