Get lines of change from a Git diff file for a GitHub repo without using git command

Question:

The goal

I am building a git stats script in Python that can only access the historical git diff patches, so basically files like this

diff --git a/README b/README
index 980a0d5f..fef29374 100644
--- a/README
+++ b/README
@@ -1 +1,3 @@
 Hello World!
+
+Hello planet! - DD
 No newline at end of file

What do I want exactly?

  1. Take a list of git diff files as input
  2. Calculate how many lines were changed (optional), added and removed in each diff file
  3. Sum it all up
  4. Print "total lines added = X, total lines removed = Y" etc.

Constraints

The system running this script does not have access to the git repository, nor does it have git installed – introducing an interesting complication.

I have no issues with accessing the API through Python or writing code to manually calculate things. The only issue I have is what I mentioned above.

The repository is hosted on GitHub, and usage of GitHub-specific facilities is allowed.

Other sources

I checked, and so I know there were many similar questions on this topic. I’m just having trouble finding what I’m looking for in Python, without using git in the repository directly… (happy to close this if someone can point me to a solution)


So then, any ideas? I assume I can just manually parse each of the diff files and sum it up, but I’m hoping for a silver bullet from a git magician!

Asked By: milosmns

||

Answers:

It turns out that GitHub API already offers these diff stats without the need to have git installed or repository checked out locally.

GitHub Docs

Since this is good enough for my case, I’m going to use it. New answers are still welcome though.

Answered By: milosmns
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.