Creating a directed adjacency matrix from a dataframe with many columns

Question

I want to create a directed adjacency matrix from data like this:

x1	x2	x3	x4	x5	x6	x7	x8
1	1	1	1	1	1	1	2
22	22	22	3	3	3	2	3
3	3	3	5	5	2	3	23

Where the columns represent states in time.

The adjacency matrix should reflect the following logic:

For the column x1:
1 should go to the 3 rows in column x2,

22 should go to the 3 rows in column x2,

3 should go to the 3 rows in column x2

For the column x2: The same pattern going to column x3.
And this for all columns. So it’s like linking each element in a given column to all elements of the following column, and so on.

The output should be a matrix with columns and rows N x N (where N in the number of unique values in the whole matrix) and… well, an adjacency matrix.

This dataframe is just a sample, the one I have to use has hundreds of columns.

For these 8 columns, the output should resemble something like this:

	1	2	3	5	22	23
1	6	1	0	0	0	0
2	0	0	2	0	0	0
3	0	1	4	1	0	1
5	0	1	0	1	0	0
22	0	0	1	0	2	0
23	0	0	0	0	0	0

This is a representation of how the graph should look like. (edited)

I’ve been trying to make it work, but am really lost by now…
TIA

P.S. I’m working with R but Python could also work.

Asked By: James Simon

||

Source

Answer 1

It seems that you may misunderstand how an adjacency matrix works.

The matrix contains Boolean values ( true or false )

The nodes should be indexed 1,2,3,4, …

If there is a link from node 1 to node 2, then the cell in row 2, column 1 will be true.

Let’s index your first two columns like this

1 4
2 5
3 6

So node 1 is linked to nodes 4,5, and 6

and the adjacency matrix looks like this

  1 2 3 4 5 6
1 
2 
3
4 1 1 1
5 1 1 1
6 1 1 1

Answered By: ravenspoint

Answer 2

I don’t think the adjacency matrix is the thing you are after. I guess it should be the summary info of transitions. You can try the base R code below (without igraph)

d <- do.call(
  rbind,
  apply(
    embed(seq_along(df), 2),
    1,
    function(k) {
      expand.grid(
        setNames(
          df[rev(k)],
          c("from", "to")
        )
      )
    }
  )
)
lvls <- sort(unique(unlist(d)))
table(list2DF(lapply(d, factor, level = lvls)))

which gives

    to
from 1 2 3 5 22 23
  1  6 3 7 2  2  1
  2  1 2 2 0  0  1
  3  6 3 7 2  2  1
  5  2 1 2 1  0  0
  22 3 0 3 1  2  0
  23 0 0 0 0  0  0

data

> dput(df)
structure(list(x1 = c(1L, 22L, 3L), x2 = c(1L, 22L, 3L), x3 = c(1L, 
22L, 3L), x4 = c(1L, 3L, 5L), x5 = c(1L, 3L, 5L), x6 = c(1L,
3L, 2L), x7 = 1:3, x8 = c(2L, 3L, 23L)), class = "data.frame", row.names = c(NA,
-3L))

Answered By: ThomasIsCoding

Answer 3

You could do:

as.data.frame.matrix(xtabs(~factor(x1, unique(c(x1, values)))+values, cbind(df[1], stack(df[-1]))))
   1 2 3 5 22 23
1  6 1 0 0  0  0
22 0 1 4 0  2  0
3  0 1 3 2  0  1
5  0 0 0 0  0  0
2  0 0 0 0  0  0
23 0 0 0 0  0  0


xtabs(~x1+x, transform(reshape(df, names(df)[-1], dir='long', sep=''), x1 = factor(x1, unique(c(x,x1)))))
    x
x1   1 2 3 5 22 23
  1  6 1 0 0  0  0
  22 0 1 4 0  2  0
  3  0 1 3 2  0  1
  5  0 0 0 0  0  0
  2  0 0 0 0  0  0
  23 0 0 0 0  0  0

library(tidyverse)
df %>%
   mutate(x1 = factor(x1, unique(unlist(.)))) %>%
   pivot_longer(-x1) %>%
   xtabs(~x1+value,.) %>%
   as.data.frame.matrix()

   1 2 3 5 22 23
1  6 1 0 0  0  0
22 0 1 4 0  2  0
3  0 1 3 2  0  1
5  0 0 0 0  0  0
2  0 0 0 0  0  0
23 0 0 0 0  0  0

Answered By: onyambu

Answer 4

Starting with the dataframe of @ThomasisCoding.

  structure(list(x1 = c(1L, 22L, 3L), x2 = c(1L, 22L, 3L), x3 = c(1L, 
  22L, 3L), x4 = c(1L, 3L, 5L), x5 = c(1L, 3L, 5L), x6 = c(1L,
  3L, 2L), x7 = 1:3, x8 = c(2L, 3L, 23L)), class = "data.frame", row.names = c(NA,
  -3L))

The first alternative is to combine all nodes without regard to time (x1, x2, …).

m1 <- formatC(as.matrix(df), width = 2, format = "d", flag = "0")

Output.

      x1   x2   x3   x4   x5   x6   x7   x8  
 [1,] "01" "01" "01" "01" "01" "01" "01" "02"
 [2,] "22" "22" "22" "03" "03" "03" "02" "03"
 [3,] "03" "03" "03" "05" "05" "02" "03" "23"

Alternative (II) takes into account the time of observation.

m2 <- 
  rbind(
  c1=paste(m1[1,], names(df), sep="_"),
  c2=paste(m1[2,], names(df), sep="_"),
  c3=paste(m1[3,], names(df), sep="_")
  )

Output.

  [,1]    [,2]    [,3]    [,4]    [,5]    [,6]    [,7]    [,8]   
c1 "01_x1" "01_x2" "01_x3" "01_x4" "01_x5" "01_x6" "01_x7" "02_x8"
c2 "22_x1" "22_x2" "22_x3" "03_x4" "03_x5" "03_x6" "02_x7" "03_x8"
c3 "03_x1" "03_x2" "03_x3" "05_x4" "05_x5" "02_x6" "03_x7" "23_x8"

Expand.grid() combines all occurrences at x(i) with x(i+1) for i = 1 through 7.
Choose m1 or m2 depending on scenario at hand.

mc <- m1
mmm <- c()
for (i in seq(ncol(m1)-1) ) { 
  mmm <- rbind(mmm, expand.grid(x = mc[, i], y = mc[, i + 1])) 
}
table(mmm)
g   <- graph_from_data_frame(mmm, directed=FALSE)
plot(g)
g[]

Output (I). Check this output with table(mmm).

6 x 6 sparse Matrix of class "dgCMatrix"
   01 22 03 05 02 23
01  6  2  7  2  3  1
22  3  2  3  1  .  .
03  6  2  7  2  3  1
05  2  .  2  1  1  .
02  1  .  2  .  2  1
23  .  .  .  .  .  .

Output (II).

24 x 24 sparse Matrix of class "dgCMatrix"
   [[ suppressing 24 column names ‘01_x1’, ‘22_x1’, ‘03_x1’ ... ]]
                                                     
01_x1 . . . 1 1 1 . . . . . . . . . . . . . . . . . .
22_x1 . . . 1 1 1 . . . . . . . . . . . . . . . . . .
03_x1 . . . 1 1 1 . . . . . . . . . . . . . . . . . .
01_x2 . . . . . . 1 1 1 . . . . . . . . . . . . . . .
22_x2 . . . . . . 1 1 1 . . . . . . . . . . . . . . .
03_x2 . . . . . . 1 1 1 . . . . . . . . . . . . . . .
01_x3 . . . . . . . . . 1 1 1 . . . . . . . . . . . .
22_x3 . . . . . . . . . 1 1 1 . . . . . . . . . . . .
03_x3 . . . . . . . . . 1 1 1 . . . . . . . . . . . .
01_x4 . . . . . . . . . . . . 1 1 1 . . . . . . . . .
03_x4 . . . . . . . . . . . . 1 1 1 . . . . . . . . .
05_x4 . . . . . . . . . . . . 1 1 1 . . . . . . . . .
01_x5 . . . . . . . . . . . . . . . 1 1 1 . . . . . .
03_x5 . . . . . . . . . . . . . . . 1 1 1 . . . . . .
05_x5 . . . . . . . . . . . . . . . 1 1 1 . . . . . .
01_x6 . . . . . . . . . . . . . . . . . . 1 1 1 . . .
03_x6 . . . . . . . . . . . . . . . . . . 1 1 1 . . .
02_x6 . . . . . . . . . . . . . . . . . . 1 1 1 . . .
01_x7 . . . . . . . . . . . . . . . . . . . . . 1 1 1
02_x7 . . . . . . . . . . . . . . . . . . . . . 1 1 1
03_x7 . . . . . . . . . . . . . . . . . . . . . 1 1 1
02_x8 . . . . . . . . . . . . . . . . . . . . . . . .
03_x8 . . . . . . . . . . . . . . . . . . . . . . . .
23_x8 . . . . . . . . . . . . . . . . . . . . . . . .

Answered By: clp

Creating a directed adjacency matrix from a dataframe with many columns

Question:

Answers: