Pros and cons for different configuration formats?

Question:

I’ve seen people using *.cfg (Python Buildout), *.xml (Gnome), *.json (Chrome extension), *.yaml (Google App Engine), *.ini and even *.py for app configuration files (like Django).

My question is: why there are so many different configuration file formats? I can see an advantage from a xml vs json approach (much less verbose) or a Python one (sometimes you have a Python app and don’t want to use a specific module just to parse a config file), but what about the other approaches?

I know there are even more formats than those configuration files I exemplified. What are really their advantages in comparison to each other? Historical reasons? Compatibility with different systems?

If you would start an application to read some kind of configuration files (with a plugin ecosystem), which one would you use?

Which ones that I gave as example are the oldest ones? Do you know it’s history?

Answers:

Note, this is pure opinion and speculation on my part but I suspect that the single biggest reason for the plethora of formats is likely due to the lack of a readily available, omnipresent configuration file parsing library. Lacking that, most programs have to write their own parsers so it would often come down to balance between how complex the configuration structure needs to be (hierarchical vs flat, purely data vs embedded logic like if statements, etc), how much effort the developers were willing to spend on writing a configuration file parser, and how much of a pain it should be for the end user. However, probably every reason you’ve listed and could think has likely been the motivation for a project or two in choosing their format.

For my own projects I tend to use .ini simply because there’s an excellent parser already built in to Python and it’s been “good enough” for most of my use cases. In the couple of cases it’s been insufficient, I’ve used an XML based configuration file due, again, to the relative simplicity of implementation.

Answered By: Rakis

It’s mostly personal preference, purpose, and available libraries. Personally I think xml is way too verbose for config files, but it is popular and has great libraries.

.cfg, .ini are legacy formats that work well and many languages have an included library that reads them. I’ve used it in Java, Python, C++ without issues. It doesn’t really work as a data interchange format and if I am passing data I will probably use the same format for config and data interchange.

yaml, and json are between xml and cfg/ini. You can define many data structures in both, or it can be a simple key-value like with cfg. Both of these formats have great libraries in python and I’m assuming many other languages have libraries as well. I believe json is subset of yaml.

I’ve never used a python file as config, but it does seem to work well for django. It does allow you to have some code in the config which might be useful.

Last time I was choosing a format I chose yaml. It’s simple but has some nice features, and the python library was easy to install and really good. Json was a close second and since the yaml library parsed json I chose yaml over it.

Answered By: mfperzel

It really depends on whether the reader/writer of the configuration file is, or can be, a non-programmer human or if it has strictly programmatic access.

XML, JSON, etc., are totally unsuitable for human consumption, while the INI format is half-way reasonable for humans.

If the configuration file has only programmatic access (with occasional editing by a programmer), then any of the supported formats (XML, JSON, or INI) are suitable. The INI file is not suitable for structured parameters, where XML and JSON can support arrays, structs/objects, etc.

Answered By: Wayne Cannon

According to Python official docs, ‘future enhancements to configuration functionality will
be added to dictConfig()’.

So any config file which can be used with dictconfig() should be fine.

Answered By: vamsikrishna

Most configuration file formats inherit significant complexity because they support too many data types. For example, 2 can be interpreted as an integer, a real number, and a string, and since most configuration languages convert values into these underlying data types, a value like 2 must be written in such a way to convey both the value and the type. Thus, 2 is generally interpreted as an integer, 2.0 is interpreted as a real number, and "2" is interpreted as a string. This makes these languages hard to use for non-programmers who do not know these conventions. It also adds complexity and clutter to the language. This is particularly true for strings. Strings are a sequence of arbitrary characters, and so valid strings may contain the quote character. To distinguish between an embedded quote character and the quote character that terminates the string, the embedded quote character must be escaped. This leads to more complexity and clutter.

JSON and Python are unambiguous and handle hierarchy well, but are cluttered by type cues, quoting and escaping.

YAML makes the quoting optional, which in turn makes the format ambiguous.

INI takes all values as strings, which requires the end application to convert to the expected data type if required. This eliminates the need for type cues, quoting and escaping. But INI does not naturally support hierarchy, and is not standardized and different implementations are inconsistent with their interpretation of files.

A new format is now available that is similar to INI in that all leaf values are strings, but it is standardized (it has an unambiguous spec and a comprehensive test suite) and naturally handles hierarchy. NestedText results in a very simple configuration format that is suitable for both programmers and non-programmers. It should be considered as the configuration file and structured data file formats for all new applications, particularly for those where the files are to be read, written, or modified by casual users.

Answered By: August West