Python how to force one string to match format of another

Question:

I have a few Python scripts I have written for the Assessor’s office where I work. Most of these ask for an input parcel ID number (this is then used to grab certain data through an odbc). They are not very consistent about how they input parcel ID’s.

So here is my problem, they enter a parcel ID in one of 3 ways:

1: ‘1005191000060’

2: ‘001005191000060’

3: ‘0010-05-19-100-006-0’

The third way is the correct way, so I need to make sure the input is fixed to always match that format. Of course, they would rather type in the ID one of the first two ways. The parcel numbers must always be 15 digits long (20 with dashes)

I currently have a working method on how I fix the parcel ID, but it is very ugly. I am wondering if anyone knows a better way (or a more “Pythonic” way). I have a function that usually gets imported to all these scripts. Here is what I have:

import re

def FormatPID(in_pid):
    pid_format = re.compile('d{4}-d{2}-d{2}-d{3}-d{3}-d{1}')
    pid = in_pid.zfill(15) 
    if not pid_format.match(pid):
        fixed_pid = '-'.join([pid[:4],pid[4:6],pid[6:8],pid[8:11],pid[11:-1],pid[-1]])
        return fixed_pid
    else:
        return pid

if __name__ == '__main__':

    pid = '1005191000060'
##    pid = '001005191000060'
##    pid = '0010-05-19-100-006-0'

    # test
    t = FormatPID(pid)
    print t

This does work just fine, but I have been bothered by this ugly code for a while and I am thinking there has got to be a better way than slicing it. I am hoping there is a way I can “force” it to be converted to a string to match the “pid_format” variable. Any ideas? I couldn’t find anything to do this in the regular expressions module

Asked By: cmackey

||

Answers:

Instead of manual slicing you can use itertools.islice:

import re
from itertools import islice
groups = (4, 2, 2, 3, 3, 1)
def FormatPID(in_pid):
    pid_format = re.compile('d{4}-d{2}-d{2}-d{3}-d{3}-d{1}')
    in_pid = in_pid.zfill(15)
    if not pid_format.match(in_pid):
        it = iter(in_pid)
        return '-'.join(''.join(islice(it, i)) for i in groups)
    return in_pid

print FormatPID('1005191000060')
print FormatPID('001005191000060')
print FormatPID('0010-05-19-100-006-0')

Output:

0010-05-19-100-006-0
0010-05-19-100-006-0
0010-05-19-100-006-0
Answered By: Ashwini Chaudhary

I wouldn’t bother using regexes. You just want to get all the digits, ignoring hyphens, left-pad with 0s, then insert the hyphens in the right places, right? So:

def format_pid(pid):
    p = pid.replace('-', '')
    if not p.isdigit():
        raise ValueError('Invalid format: {}'.format(pid))
    p = p.zfill(15)
    # You can use your `join` call instead of the following if you prefer.
    # Or Ashwini's islice call.
    return '{}-{}-{}-{}-{}-{}'.format(p[:4], p[4:6], p[6:8], p[8:11], p[11:14], p[14:])
Answered By: abarnert

All of these answers are a little over done, imho.

rstr is a helper module for easily generating random strings of
various types. It could be useful for fuzz testing, generating dummy
data, or other applications.

ASSESSOR_PARCEL = rstr.xeger('^\d{14}$')
print(ASSESSOR_PARCEL)

>>> 57203112454660
Answered By: DanielBell99