Python: Argument Parsing Validation Best Practices

Question:

Is it possible when using the argparse module to add validation when parsing arguments?

from argparse import ArgumentParser

parser = ArgumentParser(description='Argument parser for PG restore')

parser.add_argument('--database', dest='database',
                    default=None, required=False, help='Database to restore')

parser.add_argument('--backup', dest='backup',
                    required=True, help='Location of the backup file')

parsed_args = parser.parse_args()

Would it be possible, to add a validation check to this argument parser, to make sure the backup file / database exist? Rather than having to add an extra check after this for every parameter such as:

from os.path import exists
if not database_exists(parsed_args.database):
    raise DatabaseNotFoundError
if not exists(parsed_args.backup):
    raise FileNotFoundError
Asked By: AK47

||

Answers:

Surely! You just have to specify a custom action as a class, and override __call__(..). Link to documentation.

Something like:

import argparse

class FooAction(argparse.Action):
    def __call__(self, parser, namespace, values, option_string=None):
        if values != "bar":
            print("Got value:", values)
            raise ValueError("Not a bar!")
        setattr(namespace, self.dest, values)


parser = argparse.ArgumentParser()
parser.add_argument("--foo", action=FooAction)

parsed_args = parser.parse_args()

In your particular case, I imagine you’d have DatabaseAction and FileAction (or something like that).

Answered By: UltraInstinct

The argparse.FileType is a type factory class that can open a file, and of course, in the process raise an error if the file does not exist or cannot be created. You could look at its code to see how to create your own class (or function) to test your inputs.

The argument type parameter is a callable (function, etc) that takes a string, tests it as needed, and converts it (as needed) into the kind of value you want to save to the args namespace. So it can do any kind of testing you want. If the type raises an error, then the parser creates an error message (and usage) and exits.

Now whether that’s the right place to do the testing or not depends on your situation. Sometimes opening a file with FileType is fine, but then you have to close it yourself, or wait for the program to end. You can’t use that open file in a with open(filename) as f: context. The same could apply to your database. In a complex program you may not want to open or create the file right away.

I wrote for a Python bug/issue a variation on FileType that created a context, an object that could be used in the with context. I also used os tests to check if the file existed or could be created, without actually doing so. But it required further tricks if the file was stdin/out that you don’t want to close. Sometimes trying to do things like this in argparse is just more work than it’s worth.

Anyways, if you have an easy testing method, you could wrap it in a simple type function like this:

def database(astring):
    from os.path import exists
    if not database_exists(astring):
        raise ValueError  # or TypeError, or `argparse.ArgumentTypeError
    return astring

parser.add_argument('--database', dest='database',
                type = database, 
                default=None, required=False, help='Database to restore')

I don’t think it matters a whole lot whether you implement testing like this in the type or Action. I think the type is simpler and more in line with the developer’s intentions.

Answered By: hpaulj

This is a better version of https://stackoverflow.com/a/37471954/1338570
I could not explain the differences well in a one line comment. Raising a ValueError will cause a traceback in the terminal.
Instead of a raising a ValueErrror, you should call parser.error with a message, as such:

from validators.url import url
class ValidateUrl(Action):
    def __call__(self, parser, namespace, values, option_string=None):
        for value in values:
            if url(value) != True:
                parser.error(f"Please enter a valid url. Got: {value}")
        setattr(namespace, self.dest, values)

# In your parser code: 
parser.add_argument("-u", "--url", dest="url", action=ValidateUrl, help="A url to download")
Answered By: miigotu

With this script I can test the proposed alternatives.

import argparse

class ValidateUrl(argparse.Action):
    def __call__(self, parser, namespace, values, option_string=None):
        if values != "bar":
            parser.error(f"Please enter a valid. Got: {values}")
        setattr(namespace, self.dest, values)

class FooAction(argparse.Action):
    def __call__(self, parser, namespace, values, option_string=None):
        if values != "bar":
            print("Got value:", values)
            #raise ValueError("Not a bar!")  # shows a traceback, not usage
            raise argparse.ArgumentError(self, 'Not a bar')
        setattr(namespace, self.dest, values)

def database(astring):
    if astring != "bar":
        #raise argparse.ArgumentTypeError("not a bar")   # sustom message
        raise ValueError('not a bar') # standard error
        # error: argument --data: invalid database value: 'xxx'
    return astring

parser = argparse.ArgumentParser()
parser.add_argument("--url", action=ValidateUrl)
parser.add_argument("--foo", action = FooAction)
parser.add_argument('--data', type = database)

if __name__=='__main__':
    args = parser.parse_args()
    print(args)

A working case:

1254:~/mypy$ python3 stack37471636.py --url bar --foo bar --data bar
Namespace(data='bar', foo='bar', url='bar')

errors

usage and exit for the parser.error case

1255:~/mypy$ python3 stack37471636.py --url xxx
usage: stack37471636.py [-h] [--url URL] [--foo FOO] [--data DATA]
stack37471636.py: error: Please enter a valid. Got: xxx

The standardize message from a ValueError in the type function

1256:~/mypy$ python3 stack37471636.py --data xxx
usage: stack37471636.py [-h] [--url URL] [--foo FOO] [--data DATA]
stack37471636.py: error: argument --data: invalid database value: 'xxx'

With ArgumentTypeError, the message is displayed as is:

1246:~/mypy$ python3 stack37471636.py --url bar --foo bar --data xxx
usage: stack37471636.py [-h] [--url URL] [--foo FOO] [--data DATA]
stack37471636.py: error: argument --data: not a bar

FooAction with ArgumentError:

1257:~/mypy$ python3 stack37471636.py --foo xxx
Got value: xxx
usage: stack37471636.py [-h] [--url URL] [--foo FOO] [--data DATA]
stack37471636.py: error: argument --foo: Not a bar

Errors in type get converted to an ArgumentError. Note that ArgumentError identifies the argument. Calling parser.error does not.

If FooAction raises a ValueError, are regular traceback is displayed, without usage.

1246:~/mypy$ python3 stack37471636.py --url bar --foo xxx --data bar
Got value: xxx
Traceback (most recent call last):
  File "stack37471636.py", line 27, in <module>
    args = parser.parse_args()
  File "/usr/lib/python3.8/argparse.py", line 1780, in parse_args
    args, argv = self.parse_known_args(args, namespace)
  File "/usr/lib/python3.8/argparse.py", line 1812, in parse_known_args
    namespace, args = self._parse_known_args(args, namespace)
  File "/usr/lib/python3.8/argparse.py", line 2018, in _parse_known_args
    start_index = consume_optional(start_index)
  File "/usr/lib/python3.8/argparse.py", line 1958, in consume_optional
    take_action(action, args, option_string)
  File "/usr/lib/python3.8/argparse.py", line 1886, in take_action
    action(self, namespace, argument_values, option_string)
  File "stack37471636.py", line 13, in __call__
    raise ValueError("Not a bar!")
ValueError: Not a bar!

I believe ArgumentError and ArgumentTypeError are the preferred, or at least intended choices. Auto generated errors use these.

Usually parser.error is used after parsing, resulting for example in

1301:~/mypy$ python3 stack37471636.py
Namespace(data=None, foo=None, url=None)
usage: stack37471636.py [-h] [--url URL] [--foo FOO] [--data DATA]
stack37471636.py: error: not a bar
Answered By: hpaulj
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.