regexp: match character group or end of line

Question:

How do you match ^ (begin of line) and $ (end of line) in a [] (character group)?


simple example

haystack string: zazty

rules:

  1. match any “z” or “y”
  2. if preceded by
    1. an “a”, “b”; or
    2. at the beginning of the line.

pass:
match the first two “z”

a regexp that would work is:
(?:^|[aAbB])([zZyY])

But I keep thinking it would be much cleaner with something like that meant beginning/end of line inside the character group
[^aAbB]([zZyY])
(in that example assumes the ^ means beginning of line, and not what it really is there, a negative for the character group)


note: using python. but knowing that on bash and vim would be good too.

Update: read again the manual it says for set of chars, everything lose it’s special meaning, except the character classes (e.g. w)

down on the list of character classes, there’s A for beginning of line, but this does not work [AaAbB]([zZyY])

Any idea why?

Asked By: gcb

||

Answers:

Why not trying escape character ? ([^$])

UPDATE:
If you want to find all Zs and As preceded by “a” than you can use positive lookbehind. Probably there is no way to specify wild cards in character groups (because wild cards are characters too). (It there is I would be pleased to know about it).

private static final Pattern PATTERN = Pattern.compile("(?<=(?:^|[aA]))([zZyY])");

public static void main(String[] args) {
    Matcher matcher = PATTERN.matcher("zazty");

    while(matcher.find()) {
        System.out.println("matcher.group(0) = " + matcher.group(0));
        System.out.println("matcher.start() = " + matcher.start());
    }
}

Output:

matcher.group(0) = z
matcher.start() = 0
matcher.group(0) = z
matcher.start() = 2
Answered By: Alex Nikolaenkov

You can’t match a ^ or $ within a [] because the only characters with special meaning inside a character class are ^ (as in “everything but”) and - (as in “range”) (and the character classes). A and Z just don’t count as character classes.

This is for all (standard) flavours of regex, so you’re stuck with (^|[stuff]) and ($|[stuff]) (which aren’t all that bad, really).

Answered By: mathematical.coffee

Try this one:

(?<![^abAB])([yzYZ])
Answered By: guido

Concatenate the character ‘a’ to the beginning of the string. Then use [aAbB]([zZyY]).

Answered By: polina-c
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.