Merge output from cugraph over vertex_id with input data

Question:

If I create a graph with cugraph and then calculate position from the nodes or communities, I get a dataframe with information and a vertex id.

So I have three questions:

  1. How is the vertex id created?

  2. Is there a way to merge the output data over the vertex id with the input data?

  3. Is it possible to store the information like in networkx directly in the graph object?

     G = cugraph.Graph() 
    
     G = cugraph.from_cudf_edgelist(edges  , source = 'source', destination = 'target')   
    
     communities = cugraph.louvain(G)
    
     pos = cugraph.force_atlas2(G, max_iter=10)
    

#################################

Answer to 2.

With the help from @Don_A answer and the comments from @BradRees I was able to merge the output data with the input data. The first step is creating a unique nodelist and after that merging it with the output data.

edges = cudf.read_csv('edges.csv')
nodes_source = edges.loc[:, ['Source', 'retweet_author']].rename(columns={"Source": "node", "retweet_author": "author"})
nodes_target = edges.loc[:, ['Target', 'orginal_author']].rename(columns={"Target": "node", "orginal_author": "author"})
node_list = nodes_source.append(nodes_target).drop_duplicates('node')

G = cnx.Graph()
G = cnx.from_cudf_edgelist(edges  , source = 'Source', destination = 'Target', edge_attr = 'weight')

communities, modularity_score = cnx.louvain(G)

node_list.merge(communities, left_on="node",right_on="vertex").reset_index()
Asked By: padul

||

Answers:

1: How is the vertex id created?
In your example you have an "edges" dataframe that contains the COO data. That data specifies the vertex IDs. cuGraph uses the IDs that you specify, it does not create new ones

2: Is there a way to merge the output data over the vertex id with the input data?
In your example you have a dataframe with edge data but created vertex data. But you could join the cluster information back on top the src and then the dst part of the edge data. That is all done with cuDF.

3. Is it possible to store the information like in networkx directly in the graph object?
Yes. You just need to use the new Property Graph class. See the example below taken from a presentation at a recent GTC

import cudf 
import cugraph 
from cugraph.experimental import PropertyGraph

# Import a built-in dataset
from cugraph.experimental.datasets import karate

# Read edgelist data into a DataFrame, load into PropertyGraph as edge data.

# Create a graph using the imported Dataset object
graph = cugraph.Graph(directed=False)
G = karate.get_graph(create_using=graph, fetch=True)

df = G.edgelist.edgelist_df
pG = PropertyGraph() 
pG. add_edge_data(df, vertex_col_names=("src", "dst"))

# Run Louvain to get the partition number for each vertex. 
# Set resolution accordingly to identify two primary partitions. 
(partition_info, _) = cugraph.louvain(pG.extract_subgraph(create_using=graph), resolution=0.6)

# Add the partition numbers back to the Property Graph as vertex properties 
pG.add_vertex_data(partition_info, vertex_col_name="vertex")

# Use the partition properties to extract a Graph for each partition. 
G0 = pG.extract_subgraph(selection=pG.select_vertices("partition == 0"))
G1 = pG.extract_subgraph(selection=pG. select_vertices("partition == 1"))

# Run pagerank on each graph, print results. 
pageranks0 = cugraph.pagerank(G0) 
pageranks1 = cugraph.pagerank(G1) 
print(pageranks0.sort_values (by="pagerank", ascending=False).head(3))
print(pageranks1.sort_values (by="pagerank", ascending=False).head(3))
Answered By: Don A