python read only integers from file

Question:

I can’t figure the way to read from this file, only the integers:

34
-1
2 48
  +0
++2
+1
 2.4
1000
-0
three
-1  

The function should return:

[34, -1, 0, 1, -1]

If a number has + or - is valid. But if it has ++ or any letters is not.

If it has a space (as for example 2 48 ) is not valid.

If it is > 999 it is not valid.

I am stuck only at here:

my_list = []
with open('test.txt') as f:
    lines = f.readlines()
    for line in lines:
        my_list.append(line.strip())

I tried to make it a string and use punctuation using translate but I am not sure if it gets more complicated.

Also, I am not sure about using regex. I tried a simple regex but I don’t have experience using it.

Asked By: George

||

Answers:

I think regex is the way to go for you. You can achieve what you want with somethin like this: [-+]?d*It looks for a + or -, the question mark means optional and then for an arbitrary number of digits.
An easy way to find the right regex for your case is https://regex101.com/. You can directly see what is matched by your regex and it is explained to you. In python, regular exressions can be used by the re module (https://docs.python.org/2/library/re.html)

Hope this helps you.

Answered By: molig

You can convert string to integer using int(). It will throw ValueError if string is not integer. So try this:

my_list = []
with open('test.txt') as f:
    for line in f:
        try:
            n = int(line)
            if n > 999 or line.strip() == '-0':
                #filtering numbers >999 and strings with '-0' 
                continue 
            my_list.append(n)
        except ValueError:
            pass

print(my_list)

Output: [34, -1, 0, 1, -1]

Answered By: Yevhen Kuzmovych

If you want to do it by hand (note that a regex solution or calling int are probably more suitable but these are already covered in other answers) then you can also implement each check by yourself:

import string

characters_and_whitspaces = set(string.ascii_letters + ' .')

mylist = []

for line in lines:
    # remove leading and trailing whitespaces
    val = line.strip()

    # Check if valid (!= -0)
    if val == '-0':
        continue
    # Must not start with ++, +-, ....
    if val.startswith(('++', '+-', '-+', '--')):
        continue
    # Must not contain letters or whitespaces or a dot
    if characters_and_whitspaces.intersection(val):
        continue
    # Must only contain 3 or less digits (<= 999) or 4 if it starts with + or -
    if val.startswith(('+', '-')):
        if len(val) >= 5):
            continue
    elif len(val) >= 4:
        continue

    # Remove leading "+"
    val = val.lstrip('+')

    mylist.append(val)
Answered By: MSeifert

If you wanted to do this via regular expressions:

import re
exp = re.compile(r'^[+,-]?[0-9]{1,3}$')

my_list = []
with open('input.txt') as f:
    lines = f.readlines()
    for line in lines:
        if re.match(exp, line.strip()):
            my_list.append(int(line.strip()))

Lets explain the regular expressions.

^[+,-]?^ means the expression must start with the next qualifiers, which are a list of two characters + and -. We need the escaping slashes there to actually put in the special chars in. The final ? makes the preceding argument optional (so the string can start with a + or -, or nothing).

[0-9]{1,3}$[0-9] specifies the set of characters that are numbers. {1,3} specifies that they should occur a minimum of one time, or a max of 3 times (hence satisfying your <999 constraint. The $ sign matches the end of the string, so the string must end with this set of chars.

Hope this all helps.

Answered By: Qichao Zhao

Here’s a regexp solution:

import re

rgx = re.compile(r'^s*[-+]?s*(?:0|0*d{1,3})s*$', re.M)

with open('test.txt') as f:
    my_list = [int(match) for match in rgx.findall(f.read())]

Output:

[34, -1, 0, 1, 0, -1]
Answered By: ekhumoro
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.