Connect parent to children in graph structure stored in CSV file

Question:

I have a CSV-formatted file like this. The first row of the file is the header. For example, according to column ‘tree’, and where column ‘type’ is ‘asset’, ‘2’ is the parent of ‘2-1’, ‘2-2’, ‘2-3’, ‘2-4’, ‘2-5’, and ‘2-6’. So, this represents how parents with their children are connected.

type,value,tree
asset,804,1
asset,498,2
asset,0,2-1
asset,0,2-2
asset,0,2-3
asset,0,2-4
asset,4981064,2-5
asset,0,2-6
asset,0,3
asset,0,4
asset,980,5
asset,919,5-1
asset,103,5-1-1
asset,699,5-1-2
asset,315,5-1-3
asset,113,5-1-4
asset,378,5-1-5
asset,0,5-1-6
asset,615,5-1-7
asset,101,5-1-8
asset,137,5-1-9
asset,0,5-1-10
asset,0,5-1-11
asset,179,5-1-12
asset,398,5-1-13
asset,608,5-2
asset,496,5-2-1
asset,497,5-2-2
asset,0,5-2-3
asset,111,5-2-4
asset,484,5-2-5
asset,0,5-2-6
asset,237,5-2-7
asset,226,5-2-8
asset,379,5-2-9
asset,0,5-2-10
asset,0,5-2-11
asset,419,5-2-12
asset,340,5-2-13
asset,262,12-4
asset,181,13
asset,539,14
asset,181,15
asset,0,16
asset,533,17
asset,0,18
liability,137,1
liability,312,1-1
liability,0,1-1-1
liability,0,1-1-2
liability,0,1-1-3
liability,0,1-1-4
liability,0,1-1-5
liability,312,1-1-6
liability,591,1-2
liability,413,1-2-1
liability,506,1-2-2
liability,103,1-2-3
liability,237,1-2-4
liability,0,1-2-5
liability,194,1-3
liability,0,1-4
liability,463,1-5
liability,0,1-6
liability,881,1-7
liability,707,2
liability,615,2-1
liability,858,2-2
liability,831,2-3
liability,0,2-4
liability,266,2-5
liability,554,2-6
liability,312,3
liability,400,3-1
liability,105,3-2
liability,352,3-3
liability,0,3-6
liability,0,3-4
liability,316,3-5
revenue,290,1
revenue,283,1-1
revenue,149,1-1-1
revenue,126,1-1-1-1
revenue,836,1-1-1-2
revenue,294,1-1-1-3
revenue,321,1-1-1-4
revenue,835,1-1-1-5
revenue,0,1-1-1-6
revenue,143,1-1-1-7
revenue,101,1-1-1-8
revenue,165,1-1-1-9
revenue,0,1-1-1-10
revenue,0,1-1-1-11
revenue,0,1-1-1-12
revenue,862,1-1-1-13
revenue,512,1-1-2
revenue,512,1-1-2-1
revenue,0,1-1-2-2
revenue,0,1-1-2-3
revenue,175,1-1-3
revenue,0,1-1-3-1
revenue,0,1-1-3-2
revenue,773,1-1-3-3
revenue,0,1-1-3-4
revenue,0,1-1-3-5
revenue,341,1-1-3-6
revenue,0,1-1-3-7
revenue,336,1-1-3-8
revenue,285,1-1-3-9
revenue,703,1-1-3-10
revenue,125,1-1-3-11
revenue,0,1-1-3-12
revenue,0,1-1-3-13
revenue,0,1-1-3-14
revenue,0,1-1-3-15
revenue,570,1-1-3-16
revenue,111,1-1-4
revenue,0,1-1-5
revenue,0,1-1-6
revenue,690,1-2
revenue,690,1-2-1
revenue,0,1-2-1-1
revenue,0,1-2-1-2
revenue,690,1-2-1-3
cost,123,1
cost,110,1-1
cost,109,1-1-1
cost,109,1-1-1-1
cost,0,1-1-1-2
cost,0,1-1-1-3
cost,0,1-1-1-4
cost,0,1-1-1-5
cost,0,1-1-1-6
cost,0,1-1-1-7
cost,654,1-1-2
cost,0,1-1-2-1
cost,0,1-1-2-2
cost,0,1-1-2-3
cost,654,1-1-2-4
cost,127,1-2
cost,917,1-2-1
cost,562,1-2-1-1
cost,355,1-2-1-2
cost,749,1-2-2
cost,607,1-2-2-1
cost,135,1-2-2-2
cost,220,1-2-3
cost,0,1-2-3-1
cost,220,1-2-3-2
cost,574,1-2-4
cost,473,1-2-4-1
cost,100,1-2-4-2
cost,0,1-2-5
cost,0,1-2-5-1
cost,0,1-2-5-2
cost,0,1-2-6

I want to write Python code to produce a tuple of the accounting constraints strings like this:

('asset_2 == asset_2-1 + asset_2-2 + asset_2-3 + asset_2-4 + asset_2-5 + asset_2-6',
 'asset_5 == asset_5-1 + asset_5-2',
 'asset_5-1 == asset_5-1_1 + asset_5-1-2 + asset_5-1-3 + asset_5-1-4 + asset_5-1-5 + asset_5-1-6 + asset_5-1-7 + asset_5-1-8 + asset_5-1-9 + asset_5-1-10 + asset_5-1-11 + asset_5-1-12 + asset_5-1-13',
...,
'liability_1 == liability_1_1 + liability_1_2 + liability_1_3 + liability_1_4 + liability_1_5 + liability_1_6 + liability_1_7',
...,
'revenue_1 == revenue_1_1 + revenue_1_2',
'revenue_1_1 == revenue_1_1_1 + revenue_1_1_2 + revenue_1_1_3 + revenue_1_1_4 + revenue_1_1_5 + revenue_1_1_6',
...,
'cost_1 == cost_1_1 + cost_1_2',
'cost_1_1 == cost_1_1_1 + cost_1_1_2',
...
)

As you can see in the above output, all children are now connected to their parents.

Answers:

You could use a dictionary to assemble a structure of parent/children from the composite names (using prefixes as the hierarchical relationship):

with open('csv.txt') as f:      
    csv_lines = f.read().split("n")

names = ["{0}_{2}".format(*line.split(",")) for line in csv_lines]

grouped = { name:[] for name in names }
grouped.update( (parent,grouped[parent]+[child])      
                for child in names for parent in child.rsplit("-",1)[:1]
                if parent != child and parent in grouped)

result = tuple( parent + " == " + " + ".join(children)
                for parent,children in grouped.items() if children )

output:

print(*result,sep="n")

asset_2 == asset_2-1 + asset_2-2 + asset_2-3 + asset_2-4 + asset_2-5 + asset_2-6
asset_5 == asset_5-1 + asset_5-2
asset_5-1 == asset_5-1-1 + asset_5-1-2 + asset_5-1-3 + asset_5-1-4 + asset_5-1-5 + asset_5-1-6 + asset_5-1-7 + asset_5-1-8 + asset_5-1-9 + asset_5-1-10 + asset_5-1-11 + asset_5-1-12 + asset_5-1-13
asset_5-2 == asset_5-2-1 + asset_5-2-2 + asset_5-2-3 + asset_5-2-4 + asset_5-2-5 + asset_5-2-6 + asset_5-2-7 + asset_5-2-8 + asset_5-2-9 + asset_5-2-10 + asset_5-2-11 + asset_5-2-12 + asset_5-2-13
liability_1 == liability_1-1 + liability_1-2 + liability_1-3 + liability_1-4 + liability_1-5 + liability_1-6 + liability_1-7
liability_1-1 == liability_1-1-1 + liability_1-1-2 + liability_1-1-3 + liability_1-1-4 + liability_1-1-5 + liability_1-1-6
liability_1-2 == liability_1-2-1 + liability_1-2-2 + liability_1-2-3 + liability_1-2-4 + liability_1-2-5
liability_2 == liability_2-1 + liability_2-2 + liability_2-3 + liability_2-4 + liability_2-5 + liability_2-6
liability_3 == liability_3-1 + liability_3-2 + liability_3-3 + liability_3-6 + liability_3-4 + liability_3-5
revenue_1 == revenue_1-1 + revenue_1-2
revenue_1-1 == revenue_1-1-1 + revenue_1-1-2 + revenue_1-1-3 + revenue_1-1-4 + revenue_1-1-5 + revenue_1-1-6
revenue_1-1-1 == revenue_1-1-1-1 + revenue_1-1-1-2 + revenue_1-1-1-3 + revenue_1-1-1-4 + revenue_1-1-1-5 + revenue_1-1-1-6 + revenue_1-1-1-7 + revenue_1-1-1-8 + revenue_1-1-1-9 + revenue_1-1-1-10 + revenue_1-1-1-11 + revenue_1-1-1-12 + revenue_1-1-1-13
revenue_1-1-2 == revenue_1-1-2-1 + revenue_1-1-2-2 + revenue_1-1-2-3
revenue_1-1-3 == revenue_1-1-3-1 + revenue_1-1-3-2 + revenue_1-1-3-3 + revenue_1-1-3-4 + revenue_1-1-3-5 + revenue_1-1-3-6 + revenue_1-1-3-7 + revenue_1-1-3-8 + revenue_1-1-3-9 + revenue_1-1-3-10 + revenue_1-1-3-11 + revenue_1-1-3-12 + revenue_1-1-3-13 + revenue_1-1-3-14 + revenue_1-1-3-15 + revenue_1-1-3-16
revenue_1-2 == revenue_1-2-1
revenue_1-2-1 == revenue_1-2-1-1 + revenue_1-2-1-2 + revenue_1-2-1-3
cost_1 == cost_1-1 + cost_1-2
cost_1-1 == cost_1-1-1 + cost_1-1-2
cost_1-1-1 == cost_1-1-1-1 + cost_1-1-1-2 + cost_1-1-1-3 + cost_1-1-1-4 + cost_1-1-1-5 + cost_1-1-1-6 + cost_1-1-1-7
cost_1-1-2 == cost_1-1-2-1 + cost_1-1-2-2 + cost_1-1-2-3 + cost_1-1-2-4
cost_1-2 == cost_1-2-1 + cost_1-2-2 + cost_1-2-3 + cost_1-2-4 + cost_1-2-5 + cost_1-2-6
cost_1-2-1 == cost_1-2-1-1 + cost_1-2-1-2
cost_1-2-2 == cost_1-2-2-1 + cost_1-2-2-2
cost_1-2-3 == cost_1-2-3-1 + cost_1-2-3-2
cost_1-2-4 == cost_1-2-4-1 + cost_1-2-4-2
cost_1-2-5 == cost_1-2-5-1 + cost_1-2-5-2

Note: based on your example, I intentionally excluded parents that don’t have any children. Those could be included easily by removing the if children condition from the result comprehension.

Answered By: Alain T.
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.