What is a regular expression to catch all git commit footer/trailer lines in git log output?

Question:

I am writing a script to remove git commit tags (eg Signed-off-by:, Reviewed-by:) from each git commit message. Currently the script is in python. Right now I have a very simple re.match("Signed-off-by:", line) check. But I think there should be more elegant solution using regular expression.

I am assuming that a footer will begins with [more words separating by -]: For example

Bug:, Issue:, Reviewed-by:, Tested-by:, Ack-by:, Suggested-by:, Signed-off-by:

The pattern should ignore case. I need help coming up with a solution using regular expression for this. I also want to learn more about RE, what is a good starting point?

The actual python script is here https://gerrit-review.googlesource.com/#/c/33213/2/tools/gitlog2asciidoc.py

You could also comment on the script if you sign up for an account.

Thanks

Asked By: DeenSeth

||

Answers:

While the regular expression approach would be nice and with just a flag you can ignore case, I think that in this case you can just use startswith to achieve the same goal:

prefixes = ['bug:', 'issue:', 'reviewed-by:', 'tested-by:',
            'ack-by:', 'suggested-by:', 'signed-off-by:']
...
lower_line = line.lower()
for prefix in prefixes:
    if lower_line.startswith(prefix):
        print 'prefix matched:', prefix
else:
    print 'no match found'
Answered By: jcollado
>>> def match_commit(s):
    r = re.compile(r'((w+*)+w+:)')
    return re.match(r, s) is not None

>>> match_commit("Signed-off-by:")
True
>>> match_commit("Signed-off+by:")
False
>>> match_commit("Signed--by:")
False
>>> match_commit("Bug:")
True
>>> match_commit("Bug-:")
False

The 1st group (w+-)* captures 0 to any repetitions of patterns “word + ‘-‘”, the last one w+: looks for the last word + ‘:’.

Answered By: Emmanuel

This is a nice use case for any:

for line in logfile:
    if any(line.lower().startswith(prefix) for prefix in prefixes):
        print line
Answered By: Katriel

I’m assuming that the original question was about removing all trailers.

I am assuming that a footer will begins with [more words separating by -]:

From man git-intepret-trailers:

Existing trailers are extracted from the input message by looking for a
group of one or more lines that (i) is all trailers, or (ii) contains
at least one Git-generated or user-configured trailer and consists of
at least 25% trailers. The group must be preceded by one or more empty
(or whitespace-only) lines. The group must either be at the end of the
message or be the last non-whitespace lines before a line that starts
with — (followed by a space or the end of the line). Such three minus
signs start the patch part of the message. See also –no-divider below.

(git version 2.39.2)

(I will ignore the “divider line” part since that is irrelevant for commit messages.)

That sounds too involved for a regex.

git interpret-trailers can already parse trailers for you. Done. Right? Not quite.

git intepret-trailers has the --only-trailers option, but not the dual --whole-message-except-trailers (or something). So it looks like we have to do some work.[1]

Get the whole commit message of a SHA1:

git log -1 --format='%s%n%n%b' 688ce90c53d7565f6f8e1d5e438b960620630448

In this example that would be:

Bug: bad documentation of commit conventions

It has come to my attention that some of our committers don’t know how
Signed-off-by: trailers are supposed to be used. Unacceptable! Let me
elucidate this in our docs.

Make haste!

Keywords: nitpicking
Cautioned-against-by: Victor Version Control <[email protected]>
Reviewed-by: Sophia Change My Mind <[email protected]>
Nacked-by: Hector Relaxed <[email protected]>
Yawned-at-by: Yellow Baggers <[email protected]>

I want to remove the five last lines.

We can use grep --invert-match --fixed-strings. The problem though is that we want to negatively match on multiple lines: keep only lines that don’t match this-or-that. We can do that with:

grep --invert-match --fixed-strings --regex='Keywords: nitpicking' […]

And we can build up that command using (sigh)… Bash.

#!/usr/bin/env bash

grep_command="grep --invert-match --fixed-strings "
trailers=$(git log -1 --format='%s%n%n%b' 688ce90c53d7565f6f8e1d5e438b960620630448 
    | git interpret-trailers --only-trailers)

while IFS= read -r trailer; do
    # `--regex=<trailer>` to `grep` with single quote delimiters
    grep_command+=--regex='"$trailer"'" "
done <<< "$trailers"

# `git log` reprise
git log -1 --format='%s%n%n%b' 688ce90c53d7565f6f8e1d5e438b960620630448 
    | eval "$grep_command"

Output:

Bug: bad documentation of commit conventions

It has come to my attention that some of our committers don’t know how
Signed-off-by: trailers are supposed to be used. Unacceptable! Let me
elucidate this in our docs.

Make haste!

It seems this outputs one or two newlines extra at the end. I guess that can be postprocessed away.

Notes

  1. git log supports formats like %(trailer) and %b (body), but seemingly not body-except-trailers.
Answered By: Guildenstern
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.