How to Select Numbers from specific positions using regex?

Question:

Given the following string:

{"orders":[{"id":1},{"id":2},{"id":3},{"id":4},{"id":5},{"id":6},{"id":7},{"id":8},{"id":9},{"id":10},{"id":11},{"id":648},{"id":649},{"id":650},{"id":651},{"id":652},{"id":653}],"errors":[{"code":3,"message":"[PHP Warning #2] count(): Parameter must be an array or an object that implements Countable (153)"}]}

I want to select only the highlighted numbers which you can see in the image below:

enter image description here

I tried using: [^#:](d) but it did not end well.

Asked By: Karthik Bhandary

||

Answers:

If asked this in an interview, I would point out that you don’t need to use a regular expression here because you can use ast.literal_eval and then grab the integers. Granted, a regular expression is easier in this case, but as always with regular expressions you have to be concerned with how well they generalize. For example, for this specific string, you can use

".*?"s*:s*(d+)s*[},]

but this assumes the string will always be json with string keys, and therefore the keys will always be quoted. If that’s not the case, you’d have to expand the regular expression. The literal_eval approach might take some more work in terms of unpacking the values, but it could be safer and behave as expected more often than a regular expression would.

To break down the expression:

  • ".*?" – look for any number of characters between quotes, this matches a key
  • s*:s* – allow any number of spaces after the key, then a colon, and then allow any number of spaces after the colon until the value
  • (d+) – capture all of the consecutive digits
  • s*[},] – allow any number of trailing spaces, and then the value is either terminated by a brace or a colon

The result of using re.findall with this pattern is:

['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '648', '649', '650', '651', '652', '653', '3']

so you just have to convert them to int from there.

Answered By: Kraigolas

Refer to the syntax of re:

(?<=...)

Matches if the current position in the string is preceded by a match for ... that ends at the current position. This is called a positive lookbehind assertion. (?<=abc)def will find a match in 'abcdef', since the lookbehind will back up 3 characters and check if the contained pattern matches.

>>> s = '{"orders":[{"id":1},{"id":2},{"id":3},{"id":4},{"id":5},{"id":6},{"id":7},{"id":8},{"id":9},{"id":10},{"id":11},{"id":648},{"id":649},{"id":650},{"id":651},{"id":652},{"id":653}],"errors":[{"code":3,"message":"[PHP Warning #2] count(): Parameter must be an array or an object that implements Countable (153)"}]}'
>>> re.findall('(?<=:)d+', s)
['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '648', '649', '650', '651', '652', '653', '3']
Answered By: Mechanic Pig