Matching everything except for a character followed by a newline

Question

This seems like a simple match, but I’m unable to figure out how to match all text that starts with a known block of text and ends with a semicolon + newline. What I have right now mostly works:

pattern = r'''[ ]+(value w+n)([^;]+)'''

For an example section of text that allows me to parse:

   value Y1N5NALC
      1 = 'Yes'  
      5 = 'No'  
      7 = 'Not ascertained' ;
   value AGESCRN
      15 = '15 years'  
      16 = '16 years';

However, if any of the key/value pairs contain a semicolon in the string the match fails early since the regex is looking for any semicolon. An example:

   value Y1N5NALC
      1 = 'Yes'  
      5 = 'No;Maybe'  
      7 = 'Not ascertained' ;

What I’d like to do is end the match by looking for a semicolon + Optional(space or tab) + newline. Using ([^;n]+) fails since the newline gets match to the negative.

Asked By: Hooked

||

Source

Answer 1

You can use

(?sm)^ +(value w+n)(.*?);$

See the regex demo.

Details:

(?sm) – re.S and re.M are on
^ – start of a line
+ – one or more spaces
(value w+r?n) – Group 1: value, space, one or more word chars, and and an LF line break
(.*?) – Group 2:
; – a ;
$ – at the end of a line.

In case there can be CRLF endings, you need

(?sm)^ +(value w+r?n)(.*?);r?$

Answered By: Wiktor Stribiżew

Matching everything except for a character followed by a newline

Question:

Answers: