Is there a good Python library that can parse C++?

Question:

Google didn’t turn up anything that seemed relevant.

I have a bunch of existing, working C++ code, and I’d like to use python to crawl through it and figure out relationships between classes, etc.

EDIT: Just wanted to point out: I don’t think I need or want to parse every bit of C++; I just need something smart enough to pick up on class, function and member variable declarations, and to skip over function definitions.

Asked By: csbrooks

||

Answers:

This page shows a C++ grammar written in Antlr, and you can generate Python code from it.

There also seems to be someone who was working on a C++ parser in pyparsing, but I was not able to find out who or its current status.

Answered By: Kathy Van Stone

There is no (free) good library to parse C++ in any language.
Your best choices are probably Dehydra g++ plugin, clang, or Elsa.

Answered By: Employed Russian

C++ is notoriously hard to parse. Most people who try to do this properly end up taking apart a compiler. In fact this is (in part) why LLVM started: Apple needed a way they could parse C++ for use in XCode that matched the way the compiler parsed it.

That’s why there are projects like GCC_XML which you could combine with a python xml library.

Some non-compiler projects that seem to do a pretty good job at parsing C++ are:

  • Eclipse CDT
  • OpenGrok
  • Doxygen
Answered By: Stef

You won’t find a drop-in Python library to do this. Parsing C++ is fiddly, and few parsers have been written that aren’t part of a compiler. You can find a good summary of the issues here.

The best bet might be clang, as its C++ support is well-established. Though this is not a Python solution, it sounds as though it would be amenable to re-use within a Python wrapper, given the emphasis on encapsulation and good design in its development.

Answered By: jlarcombe

If you’ve formatted your comments in a compatible way, doxygen does a fantastic job. It’ll even draw inheritance diagrams if you’ve got graphviz installed.

For example, running doxygen over the following:

/// <summary>
/// A summary of my class
/// </summary>
public class MyClass
{
protected:
    int m_numOfWidgets; /// Keeps track of the number of widgets stored

public:
    /// <summary>
    /// Constructor for the class.
    /// </summary>
    /// <param paramName="numOfWidgets">Specifies how many widgets to start with</param>
    MyClass(int numOfWidgets)
    {
        m_numOfWidgets = numOfWidgets;
    }

    /// <summary>
    /// Increments the number of widgets stored by the amount supplied.
    /// </summary>
    /// <param paramName="numOfWidgets">Specifies how many widgets to start with</param>
    /// <returns>The number of widgets stored</returns>
    IncreaseWidgets(int numOfWidgetsToAdd)
    {
        m_numOfWidgets += numOfWidgets;
        return m_numOfWidgets;
    }
};

Will turn all those comments into entries in .html files. With more complicated designs, the result is even more beneficial – often much easier than trying to browse through the source.

Answered By: Jon Cage

The pyparsing wiki shows this example – all it does is parse struct declarations, so this might give you just a glimpse at the magnitude of the problem.

I suggest you (or even better, your employer) shell out $200 and buy Enterprise Architect from sparxsystems. This software is amazingly powerful for the price, and includes pretty good code reverse engineering features. You will spend far more than this in your own time to only get about 2% of the job done. In this case, “buys” wins over “make”.

Answered By: PaulMcG

Ctypes uses gcc-xml for code generation. It’s possible that cpptypes does also. Even if it doesn’t, you could use gcc-xml to generate XML from your C++ file, then parse the xml with one of the built-in or third-party Python XML parsers.

Answered By: Jason R. Coombs

Not an answer as such, but just to demonstrate how hard parsing C++ correctly actually is. My favorite demo:

template<bool> struct a_t;

template<> struct a_t<true> {
    template<int> struct b {};
};

template<> struct a_t<false> {
    enum { b };
};

typedef a_t<sizeof(void*)==sizeof(int)> a;

enum { c, d };
int main() {
    a::b<c>d; // declaration or expression?
}

This is perfectly valid, standard-compliant C++, but the exact meaning of commented line depends on your implementation. If sizeof(void*)==sizeof(int) (typical on 32-bit platforms), it is a declaration of local variable d of type a::b<c>. If the condition doesn’t hold, then it is a no-op expression ((a::b < c) > d). Adding a constructor for a::b will actually let you expose the difference via presence/absence of side effects.

Answered By: Pavel Minaev

Pycparser is a complete and functional parser for ANSI C.
Perhaps you can extend it to c++ 🙂

Answered By: Eli Bendersky

Here’s a SourceForge project that claims to parse c++ headers. As the other commenters have pointed out, there’s no general solution, but you this sounds like it will do enough for your needs. (I just ran across it for a similar need and haven’t tried it myself yet.)

http://sourceforge.net/projects/cppheaderparser/

Answered By: Bill

The Clang project provides libraries for just parsing C++ code.

Either with Clang and GCC you can generate an XML representation of the code

If you prefer a more Pythonian solution you could also search for a C++ yacc grammar and use py-ply (Yacc for Python), but that seems the solution that needs more work

Answered By: SystematicFrank

For many years I’ve been using pygccxml, which is a very nice Python wrapper around GCC-XML. It’s a very full featured package that forms the basis of some well used code-generation tools out there such as py++ which is from the same author.

Answered By: jkp

I would keep an eye on the gcc.gnu.org/wiki/plugins as it seems like plugins are the way to go. Also the gcc-python-plugin seems like it has a nice implementation.

Answered By: stephenmm
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.