Make NetworkX node attributes into Pandas Dataframe columns

Question:

I have a Networkx graph called G created below:

import networkx as nx
G = nx.Graph()
G.add_node(1,job= 'teacher', boss = 'dee')
G.add_node(2,job= 'teacher', boss = 'foo')
G.add_node(3,job= 'admin', boss = 'dee')
G.add_node(4,job= 'admin', boss = 'lopez')

I would like to store the node number along with attributes, job and boss in separate columns of a pandas dataframe.

I have attempted to do this with the below code but it produces a dataframe with 2 columns, 1 with node number and one with all of the attributes:

graph = G.nodes(data = True)
import pandas as pd
df = pd.DataFrame(graph)

df
Out[19]: 
    0                                      1
0  1  {u'job': u'teacher', u'boss': u'dee'}
1  2  {u'job': u'teacher', u'boss': u'foo'}
2  3    {u'job': u'admin', u'boss': u'dee'}
3  4  {u'job': u'admin', u'boss': u'lopez'}

Note: I acknowledge that NetworkX has a to_pandas_dataframe function but it does not provide a dataframe with the output I am looking for.

Asked By: BeeGee

||

Answers:

I don’t know how representative your data is but it should be straightforward to modify my code to work on your real network:

In [32]:
data={}
data['node']=[x[0] for x in graph]
data['boss'] = [x[1]['boss'] for x in graph]
data['job'] = [x[1]['job'] for x in graph]
df1 = pd.DataFrame(data)
df1

Out[32]:
    boss      job  node
0    dee  teacher     1
1    foo  teacher     2
2    dee    admin     3
3  lopez    admin     4

So here all I’m doing is constructing a dict from the graph data, pandas accepts dicts as data where the keys are the column names and the data has to be array-like, in this case lists of values

A more dynamic method:

In [42]:
def func(graph):
    data={}
    data['node']=[x[0] for x in graph]
    other_cols = graph[0][1].keys()
    for key in other_cols:
        data[key] = [x[1][key] for x in graph]
    return data
pd.DataFrame(func(graph))

Out[42]:
    boss      job  node
0    dee  teacher     1
1    foo  teacher     2
2    dee    admin     3
3  lopez    admin     4
Answered By: EdChum

I updated this solution to work with my updated version of NetworkX (2.0) and thought I would share. I also had the function return a Pandas DataFrame.

def nodes_to_df(graph):
    import pandas as pd
    data={}
    data['node']=[x[0] for x in graph.nodes(data=True)]
    other_cols = graph.nodes[0].keys()
    for key in other_cols:
        data[key] = [x[1][key] for x in graph.nodes(data=True)]
    return pd.DataFrame(data)
Answered By: LuisZaman

Here’s a one-liner.

pd.DataFrame.from_dict(dict(graph.nodes(data=True)), orient='index')
Answered By: iamjli

I think this is even simpler:

pandas.DataFrame.from_dict(graph.nodes, orient='index')

Without having to convert to another dict.

Answered By: Mitar

I have solved this with a dictionary comprehension.

d = {n:dag.nodes[n] for n in dag.nodes}

df = pd.DataFrame.from_dict(d, orient='index')

Your dictionary d maps the nodes n to dag.nodes[n].
Each value of that dictionary dag.nodes[n] is a dictionary itself and contains all attributes: {attribute_name:attribute_value}

So your dictionary d has the form:

{node_id : {attribute_name : attribute_value} }

The advantage I see is that you do not need to know the names of your attributes.

If you wanted to have the node-IDs not as index but in a column, you could add as the last command:

df.reset_index(drop=False, inplace=True)
Answered By: Aneho