regexp: match character group or end of line
Question:
How do you match ^
(begin of line) and $
(end of line) in a []
(character group)?
simple example
haystack string: zazty
rules:
- match any “z” or “y”
- if preceded by
- an “a”, “b”; or
- at the beginning of the line.
pass:
match the first two “z”
a regexp that would work is:
(?:^|[aAbB])([zZyY])
But I keep thinking it would be much cleaner with something like that meant beginning/end of line inside the character group
[^aAbB]([zZyY])
(in that example assumes the ^
means beginning of line, and not what it really is there, a negative for the character group)
note: using python. but knowing that on bash and vim would be good too.
Update: read again the manual it says for set of chars, everything lose it’s special meaning, except the character classes (e.g. w
)
down on the list of character classes, there’s A
for beginning of line, but this does not work [AaAbB]([zZyY])
Any idea why?
Answers:
Why not trying escape character
? ([^$]
)
UPDATE:
If you want to find all Zs and As preceded by “a” than you can use positive lookbehind. Probably there is no way to specify wild cards in character groups (because wild cards are characters too). (It there is I would be pleased to know about it).
private static final Pattern PATTERN = Pattern.compile("(?<=(?:^|[aA]))([zZyY])");
public static void main(String[] args) {
Matcher matcher = PATTERN.matcher("zazty");
while(matcher.find()) {
System.out.println("matcher.group(0) = " + matcher.group(0));
System.out.println("matcher.start() = " + matcher.start());
}
}
Output:
matcher.group(0) = z
matcher.start() = 0
matcher.group(0) = z
matcher.start() = 2
You can’t match a ^
or $
within a []
because the only characters with special meaning inside a character class are ^
(as in “everything but”) and -
(as in “range”) (and the character classes). A
and Z
just don’t count as character classes.
This is for all (standard) flavours of regex, so you’re stuck with (^|[stuff])
and ($|[stuff])
(which aren’t all that bad, really).
Try this one:
(?<![^abAB])([yzYZ])
Concatenate the character ‘a’ to the beginning of the string. Then use [aAbB]([zZyY])
.
How do you match ^
(begin of line) and $
(end of line) in a []
(character group)?
simple example
haystack string: zazty
rules:
- match any “z” or “y”
- if preceded by
- an “a”, “b”; or
- at the beginning of the line.
pass:
match the first two “z”
a regexp that would work is:
(?:^|[aAbB])([zZyY])
But I keep thinking it would be much cleaner with something like that meant beginning/end of line inside the character group
[^aAbB]([zZyY])
(in that example assumes the ^
means beginning of line, and not what it really is there, a negative for the character group)
note: using python. but knowing that on bash and vim would be good too.
Update: read again the manual it says for set of chars, everything lose it’s special meaning, except the character classes (e.g. w
)
down on the list of character classes, there’s A
for beginning of line, but this does not work [AaAbB]([zZyY])
Any idea why?
Why not trying escape character ? (
[^$]
)
UPDATE:
If you want to find all Zs and As preceded by “a” than you can use positive lookbehind. Probably there is no way to specify wild cards in character groups (because wild cards are characters too). (It there is I would be pleased to know about it).
private static final Pattern PATTERN = Pattern.compile("(?<=(?:^|[aA]))([zZyY])");
public static void main(String[] args) {
Matcher matcher = PATTERN.matcher("zazty");
while(matcher.find()) {
System.out.println("matcher.group(0) = " + matcher.group(0));
System.out.println("matcher.start() = " + matcher.start());
}
}
Output:
matcher.group(0) = z
matcher.start() = 0
matcher.group(0) = z
matcher.start() = 2
You can’t match a ^
or $
within a []
because the only characters with special meaning inside a character class are ^
(as in “everything but”) and -
(as in “range”) (and the character classes). A
and Z
just don’t count as character classes.
This is for all (standard) flavours of regex, so you’re stuck with (^|[stuff])
and ($|[stuff])
(which aren’t all that bad, really).
Try this one:
(?<![^abAB])([yzYZ])
Concatenate the character ‘a’ to the beginning of the string. Then use [aAbB]([zZyY])
.