"Flat is better than nested" – for data as well as code?

Question:

This question got me thinking: should we apply the principle that “flat is better than nested” to data as well as to code? Even when there is a “logical tree structure” to the data?

In this case, I suppose it would mean representing children as a list of IDs, rather than an actual list of children, with all the nodes in a single list:

[ {'id': 4, 'children': ()},
  {'id': 2, 'children': (1, 7)},
  {'id': 1, 'children': (6, 5)},
  {'id': 6, 'children': ()},
  {'id': 5, 'children': ()},
  {'id': 7, 'children': (3,)},
  {'id': 3, 'children': ()} ]

(I used tuples because I prefer not to give myself the flexibility to mutate an object until that flexibility proves itself to be useful and usable in a clear manner. In any case I would never use None here instead of an empty sequence, because it complicates the logic, and “special cases aren’t special enough” – here, it isn’t special at all.)

Certainly this is shorter, but the tree structure is obscured. Does that contradict “explicit is better than implicit”?

Personally I find that “flat is better than nested” is of limited applicability, and nowhere near the most important aspect of the Zen. (Certainly I could not do a lot of the nice functional-programming things that I do if I didn’t allow myself significant nesting.) I suspect that the problem with “nested” is that it requires context switching when you comprehend the information. I really think this is more of a problem when following imperative logic than for parsing data or functional-style code – where it’s easier to just mentally name the nested block, and consider its workings separately from the outer context.

What say you?

Asked By: Karl Knechtel

||

Answers:

I wouldn’t inherently prefer either, but rather use whatever seems best suited to the task.

If the structure is important, nesting makes life simple. If you’re regularly operating over each node, the flat structure makes it easy to use for node in tree. Of course, if you define your own class, its easy enough to abstract it so both options are simple; but it may be harder to use with external systems, such as converting to JSON.

Answered By: Thomas K

This is a completely subjective question. The answer is, “it depends.”

It depends on the primary use of your data. If you continually have to reference the nested structure, then it makes sense to represent it that way. And if you never reference the flat representation except when building the nested structure, then why have the flat structure at all?

The “flat” representation is one of the basics of the relational database model: each type of data exists in a table just for that type, and any relationships among the items are contained in separate tables. It’s a useful abstraction, but at times difficult to work with in code. On the other hand, processing all the data of a particular type is trivial.

Consider, for example, if I wanted to find all the descendants of the record with id 2 in your example data. If the data is already in the hierarchy (i.e. native representation is the “nested” structure), then it’s trivial to locate record id 2 and then traverse its children, children’s children, etc.

But if the native representation is sequential as you’ve shown it then I have to pass through the entire data set to create the hierarchical structure and then find record 2 and its children.

So, as I said, “it depends.”

Answered By: Jim Mischel

Does that contradict “explicit is
better than implicit”?

Yes, particularly when being explicit prevents you from implicitly shooting yourself in the foot. The “tree” in your example can have multiple parents claiming to own the same children. It can also have multiple root nodes (and it does: 2 is a root node; 4 is a root as well as a leaf node.)

Answered By: Dan Breslau

This question can’t be answered “in general” – there is no right answer.

For this particular example, I actually like the tree structure better. Without knowing anything about the original application, and just looking at the structure, the relationship between the items is obvious. With the flat structure, I have to read some documentation, or application code to “know” that your tuple of children refer to id’s.

The tree structure is self-documenting – the flat structure isn’t.

Answered By: Gerrat

should we apply the principle that “flat is better than nested” to data as well as to code?

No.

Even when there is a “logical tree structure” to the data?

That’s what “flat is better than nested” doesn’t apply to data. Only to code.

… the tree structure is obscured. Does that contradict “explicit is better than implicit”?

It’s not even comparable. The “flat tree” is a common SQL implementation because SQL can’t handle indefinite recursion of a proper tree.

Comparing the Zen of Python (the language) with data structure design is nonsensical. The two are no more comparable than than the number “2” and splitting a brick of cheese into “2” pieces. One’s a thing, the other’s an action.

Data structures are things.

Python describes actions.

Answered By: S.Lott

“Everything should be made as simple as possible, but not simpler.” Flat is simpler than nested. If you’re dealing with data with relevant nesting, then flattening it probably violates the “but not simpler” part. I took the Zen of Python instead to be encouraging you not to complicate your life with nesting you don’t really need, like an XML config file where a simpler flat format might suffice.

Answered By: Darius Bacon
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.