Creating a directed adjacency matrix from a dataframe with many columns
Question:
I want to create a directed adjacency matrix from data like this:
x1
x2
x3
x4
x5
x6
x7
x8
1
1
1
1
1
1
1
2
22
22
22
3
3
3
2
3
3
3
3
5
5
2
3
23
Where the columns represent states in time.
The adjacency matrix should reflect the following logic:
For the column x1:
1 should go to the 3 rows in column x2,
22 should go to the 3 rows in column x2,
3 should go to the 3 rows in column x2
For the column x2: The same pattern going to column x3.
And this for all columns. So it’s like linking each element in a given column to all elements of the following column, and so on.
The output should be a matrix with columns and rows N x N (where N in the number of unique values in the whole matrix) and… well, an adjacency matrix.
This dataframe is just a sample, the one I have to use has hundreds of columns.
For these 8 columns, the output should resemble something like this:
1
2
3
5
22
23
1
6
1
0
0
0
0
2
0
0
2
0
0
0
3
0
1
4
1
0
1
5
0
1
0
1
0
0
22
0
0
1
0
2
0
23
0
0
0
0
0
0
This is a representation of how the graph should look like. (edited)
I’ve been trying to make it work, but am really lost by now…
TIA
P.S. I’m working with R but Python could also work.
Answers:
It seems that you may misunderstand how an adjacency matrix works.
The matrix contains Boolean values ( true or false )
The nodes should be indexed 1,2,3,4, …
If there is a link from node 1 to node 2, then the cell in row 2, column 1 will be true.
Let’s index your first two columns like this
1 4
2 5
3 6
So node 1 is linked to nodes 4,5, and 6
and the adjacency matrix looks like this
1 2 3 4 5 6
1
2
3
4 1 1 1
5 1 1 1
6 1 1 1
I don’t think the adjacency matrix is the thing you are after. I guess it should be the summary info of transitions. You can try the base R code below (without igraph
)
d <- do.call(
rbind,
apply(
embed(seq_along(df), 2),
1,
function(k) {
expand.grid(
setNames(
df[rev(k)],
c("from", "to")
)
)
}
)
)
lvls <- sort(unique(unlist(d)))
table(list2DF(lapply(d, factor, level = lvls)))
which gives
to
from 1 2 3 5 22 23
1 6 3 7 2 2 1
2 1 2 2 0 0 1
3 6 3 7 2 2 1
5 2 1 2 1 0 0
22 3 0 3 1 2 0
23 0 0 0 0 0 0
data
> dput(df)
structure(list(x1 = c(1L, 22L, 3L), x2 = c(1L, 22L, 3L), x3 = c(1L,
22L, 3L), x4 = c(1L, 3L, 5L), x5 = c(1L, 3L, 5L), x6 = c(1L,
3L, 2L), x7 = 1:3, x8 = c(2L, 3L, 23L)), class = "data.frame", row.names = c(NA,
-3L))
You could do:
as.data.frame.matrix(xtabs(~factor(x1, unique(c(x1, values)))+values, cbind(df[1], stack(df[-1]))))
1 2 3 5 22 23
1 6 1 0 0 0 0
22 0 1 4 0 2 0
3 0 1 3 2 0 1
5 0 0 0 0 0 0
2 0 0 0 0 0 0
23 0 0 0 0 0 0
xtabs(~x1+x, transform(reshape(df, names(df)[-1], dir='long', sep=''), x1 = factor(x1, unique(c(x,x1)))))
x
x1 1 2 3 5 22 23
1 6 1 0 0 0 0
22 0 1 4 0 2 0
3 0 1 3 2 0 1
5 0 0 0 0 0 0
2 0 0 0 0 0 0
23 0 0 0 0 0 0
library(tidyverse)
df %>%
mutate(x1 = factor(x1, unique(unlist(.)))) %>%
pivot_longer(-x1) %>%
xtabs(~x1+value,.) %>%
as.data.frame.matrix()
1 2 3 5 22 23
1 6 1 0 0 0 0
22 0 1 4 0 2 0
3 0 1 3 2 0 1
5 0 0 0 0 0 0
2 0 0 0 0 0 0
23 0 0 0 0 0 0
Starting with the dataframe of @ThomasisCoding
.
structure(list(x1 = c(1L, 22L, 3L), x2 = c(1L, 22L, 3L), x3 = c(1L,
22L, 3L), x4 = c(1L, 3L, 5L), x5 = c(1L, 3L, 5L), x6 = c(1L,
3L, 2L), x7 = 1:3, x8 = c(2L, 3L, 23L)), class = "data.frame", row.names = c(NA,
-3L))
The first alternative is to combine all nodes without regard to time (x1, x2, …).
m1 <- formatC(as.matrix(df), width = 2, format = "d", flag = "0")
Output.
x1 x2 x3 x4 x5 x6 x7 x8
[1,] "01" "01" "01" "01" "01" "01" "01" "02"
[2,] "22" "22" "22" "03" "03" "03" "02" "03"
[3,] "03" "03" "03" "05" "05" "02" "03" "23"
Alternative (II) takes into account the time of observation.
m2 <-
rbind(
c1=paste(m1[1,], names(df), sep="_"),
c2=paste(m1[2,], names(df), sep="_"),
c3=paste(m1[3,], names(df), sep="_")
)
Output.
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
c1 "01_x1" "01_x2" "01_x3" "01_x4" "01_x5" "01_x6" "01_x7" "02_x8"
c2 "22_x1" "22_x2" "22_x3" "03_x4" "03_x5" "03_x6" "02_x7" "03_x8"
c3 "03_x1" "03_x2" "03_x3" "05_x4" "05_x5" "02_x6" "03_x7" "23_x8"
Expand.grid() combines all occurrences at x(i) with x(i+1) for i = 1 through 7.
Choose m1 or m2 depending on scenario at hand.
mc <- m1
mmm <- c()
for (i in seq(ncol(m1)-1) ) {
mmm <- rbind(mmm, expand.grid(x = mc[, i], y = mc[, i + 1]))
}
table(mmm)
g <- graph_from_data_frame(mmm, directed=FALSE)
plot(g)
g[]
Output (I). Check this output with table(mmm)
.
6 x 6 sparse Matrix of class "dgCMatrix"
01 22 03 05 02 23
01 6 2 7 2 3 1
22 3 2 3 1 . .
03 6 2 7 2 3 1
05 2 . 2 1 1 .
02 1 . 2 . 2 1
23 . . . . . .
Output (II).
24 x 24 sparse Matrix of class "dgCMatrix"
[[ suppressing 24 column names ‘01_x1’, ‘22_x1’, ‘03_x1’ ... ]]
01_x1 . . . 1 1 1 . . . . . . . . . . . . . . . . . .
22_x1 . . . 1 1 1 . . . . . . . . . . . . . . . . . .
03_x1 . . . 1 1 1 . . . . . . . . . . . . . . . . . .
01_x2 . . . . . . 1 1 1 . . . . . . . . . . . . . . .
22_x2 . . . . . . 1 1 1 . . . . . . . . . . . . . . .
03_x2 . . . . . . 1 1 1 . . . . . . . . . . . . . . .
01_x3 . . . . . . . . . 1 1 1 . . . . . . . . . . . .
22_x3 . . . . . . . . . 1 1 1 . . . . . . . . . . . .
03_x3 . . . . . . . . . 1 1 1 . . . . . . . . . . . .
01_x4 . . . . . . . . . . . . 1 1 1 . . . . . . . . .
03_x4 . . . . . . . . . . . . 1 1 1 . . . . . . . . .
05_x4 . . . . . . . . . . . . 1 1 1 . . . . . . . . .
01_x5 . . . . . . . . . . . . . . . 1 1 1 . . . . . .
03_x5 . . . . . . . . . . . . . . . 1 1 1 . . . . . .
05_x5 . . . . . . . . . . . . . . . 1 1 1 . . . . . .
01_x6 . . . . . . . . . . . . . . . . . . 1 1 1 . . .
03_x6 . . . . . . . . . . . . . . . . . . 1 1 1 . . .
02_x6 . . . . . . . . . . . . . . . . . . 1 1 1 . . .
01_x7 . . . . . . . . . . . . . . . . . . . . . 1 1 1
02_x7 . . . . . . . . . . . . . . . . . . . . . 1 1 1
03_x7 . . . . . . . . . . . . . . . . . . . . . 1 1 1
02_x8 . . . . . . . . . . . . . . . . . . . . . . . .
03_x8 . . . . . . . . . . . . . . . . . . . . . . . .
23_x8 . . . . . . . . . . . . . . . . . . . . . . . .
I want to create a directed adjacency matrix from data like this:
x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 |
---|---|---|---|---|---|---|---|
1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 |
22 | 22 | 22 | 3 | 3 | 3 | 2 | 3 |
3 | 3 | 3 | 5 | 5 | 2 | 3 | 23 |
Where the columns represent states in time.
The adjacency matrix should reflect the following logic:
For the column x1:
1 should go to the 3 rows in column x2,
22 should go to the 3 rows in column x2,
3 should go to the 3 rows in column x2
For the column x2: The same pattern going to column x3.
And this for all columns. So it’s like linking each element in a given column to all elements of the following column, and so on.
The output should be a matrix with columns and rows N x N (where N in the number of unique values in the whole matrix) and… well, an adjacency matrix.
This dataframe is just a sample, the one I have to use has hundreds of columns.
For these 8 columns, the output should resemble something like this:
1 | 2 | 3 | 5 | 22 | 23 | |
---|---|---|---|---|---|---|
1 | 6 | 1 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 2 | 0 | 0 | 0 |
3 | 0 | 1 | 4 | 1 | 0 | 1 |
5 | 0 | 1 | 0 | 1 | 0 | 0 |
22 | 0 | 0 | 1 | 0 | 2 | 0 |
23 | 0 | 0 | 0 | 0 | 0 | 0 |
This is a representation of how the graph should look like. (edited)
I’ve been trying to make it work, but am really lost by now…
TIA
P.S. I’m working with R but Python could also work.
It seems that you may misunderstand how an adjacency matrix works.
The matrix contains Boolean values ( true or false )
The nodes should be indexed 1,2,3,4, …
If there is a link from node 1 to node 2, then the cell in row 2, column 1 will be true.
Let’s index your first two columns like this
1 4
2 5
3 6
So node 1 is linked to nodes 4,5, and 6
and the adjacency matrix looks like this
1 2 3 4 5 6
1
2
3
4 1 1 1
5 1 1 1
6 1 1 1
I don’t think the adjacency matrix is the thing you are after. I guess it should be the summary info of transitions. You can try the base R code below (without igraph
)
d <- do.call(
rbind,
apply(
embed(seq_along(df), 2),
1,
function(k) {
expand.grid(
setNames(
df[rev(k)],
c("from", "to")
)
)
}
)
)
lvls <- sort(unique(unlist(d)))
table(list2DF(lapply(d, factor, level = lvls)))
which gives
to
from 1 2 3 5 22 23
1 6 3 7 2 2 1
2 1 2 2 0 0 1
3 6 3 7 2 2 1
5 2 1 2 1 0 0
22 3 0 3 1 2 0
23 0 0 0 0 0 0
data
> dput(df)
structure(list(x1 = c(1L, 22L, 3L), x2 = c(1L, 22L, 3L), x3 = c(1L,
22L, 3L), x4 = c(1L, 3L, 5L), x5 = c(1L, 3L, 5L), x6 = c(1L,
3L, 2L), x7 = 1:3, x8 = c(2L, 3L, 23L)), class = "data.frame", row.names = c(NA,
-3L))
You could do:
as.data.frame.matrix(xtabs(~factor(x1, unique(c(x1, values)))+values, cbind(df[1], stack(df[-1]))))
1 2 3 5 22 23
1 6 1 0 0 0 0
22 0 1 4 0 2 0
3 0 1 3 2 0 1
5 0 0 0 0 0 0
2 0 0 0 0 0 0
23 0 0 0 0 0 0
xtabs(~x1+x, transform(reshape(df, names(df)[-1], dir='long', sep=''), x1 = factor(x1, unique(c(x,x1)))))
x
x1 1 2 3 5 22 23
1 6 1 0 0 0 0
22 0 1 4 0 2 0
3 0 1 3 2 0 1
5 0 0 0 0 0 0
2 0 0 0 0 0 0
23 0 0 0 0 0 0
library(tidyverse)
df %>%
mutate(x1 = factor(x1, unique(unlist(.)))) %>%
pivot_longer(-x1) %>%
xtabs(~x1+value,.) %>%
as.data.frame.matrix()
1 2 3 5 22 23
1 6 1 0 0 0 0
22 0 1 4 0 2 0
3 0 1 3 2 0 1
5 0 0 0 0 0 0
2 0 0 0 0 0 0
23 0 0 0 0 0 0
Starting with the dataframe of @ThomasisCoding
.
structure(list(x1 = c(1L, 22L, 3L), x2 = c(1L, 22L, 3L), x3 = c(1L,
22L, 3L), x4 = c(1L, 3L, 5L), x5 = c(1L, 3L, 5L), x6 = c(1L,
3L, 2L), x7 = 1:3, x8 = c(2L, 3L, 23L)), class = "data.frame", row.names = c(NA,
-3L))
The first alternative is to combine all nodes without regard to time (x1, x2, …).
m1 <- formatC(as.matrix(df), width = 2, format = "d", flag = "0")
Output.
x1 x2 x3 x4 x5 x6 x7 x8
[1,] "01" "01" "01" "01" "01" "01" "01" "02"
[2,] "22" "22" "22" "03" "03" "03" "02" "03"
[3,] "03" "03" "03" "05" "05" "02" "03" "23"
Alternative (II) takes into account the time of observation.
m2 <-
rbind(
c1=paste(m1[1,], names(df), sep="_"),
c2=paste(m1[2,], names(df), sep="_"),
c3=paste(m1[3,], names(df), sep="_")
)
Output.
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
c1 "01_x1" "01_x2" "01_x3" "01_x4" "01_x5" "01_x6" "01_x7" "02_x8"
c2 "22_x1" "22_x2" "22_x3" "03_x4" "03_x5" "03_x6" "02_x7" "03_x8"
c3 "03_x1" "03_x2" "03_x3" "05_x4" "05_x5" "02_x6" "03_x7" "23_x8"
Expand.grid() combines all occurrences at x(i) with x(i+1) for i = 1 through 7.
Choose m1 or m2 depending on scenario at hand.
mc <- m1
mmm <- c()
for (i in seq(ncol(m1)-1) ) {
mmm <- rbind(mmm, expand.grid(x = mc[, i], y = mc[, i + 1]))
}
table(mmm)
g <- graph_from_data_frame(mmm, directed=FALSE)
plot(g)
g[]
Output (I). Check this output with table(mmm)
.
6 x 6 sparse Matrix of class "dgCMatrix"
01 22 03 05 02 23
01 6 2 7 2 3 1
22 3 2 3 1 . .
03 6 2 7 2 3 1
05 2 . 2 1 1 .
02 1 . 2 . 2 1
23 . . . . . .
Output (II).
24 x 24 sparse Matrix of class "dgCMatrix"
[[ suppressing 24 column names ‘01_x1’, ‘22_x1’, ‘03_x1’ ... ]]
01_x1 . . . 1 1 1 . . . . . . . . . . . . . . . . . .
22_x1 . . . 1 1 1 . . . . . . . . . . . . . . . . . .
03_x1 . . . 1 1 1 . . . . . . . . . . . . . . . . . .
01_x2 . . . . . . 1 1 1 . . . . . . . . . . . . . . .
22_x2 . . . . . . 1 1 1 . . . . . . . . . . . . . . .
03_x2 . . . . . . 1 1 1 . . . . . . . . . . . . . . .
01_x3 . . . . . . . . . 1 1 1 . . . . . . . . . . . .
22_x3 . . . . . . . . . 1 1 1 . . . . . . . . . . . .
03_x3 . . . . . . . . . 1 1 1 . . . . . . . . . . . .
01_x4 . . . . . . . . . . . . 1 1 1 . . . . . . . . .
03_x4 . . . . . . . . . . . . 1 1 1 . . . . . . . . .
05_x4 . . . . . . . . . . . . 1 1 1 . . . . . . . . .
01_x5 . . . . . . . . . . . . . . . 1 1 1 . . . . . .
03_x5 . . . . . . . . . . . . . . . 1 1 1 . . . . . .
05_x5 . . . . . . . . . . . . . . . 1 1 1 . . . . . .
01_x6 . . . . . . . . . . . . . . . . . . 1 1 1 . . .
03_x6 . . . . . . . . . . . . . . . . . . 1 1 1 . . .
02_x6 . . . . . . . . . . . . . . . . . . 1 1 1 . . .
01_x7 . . . . . . . . . . . . . . . . . . . . . 1 1 1
02_x7 . . . . . . . . . . . . . . . . . . . . . 1 1 1
03_x7 . . . . . . . . . . . . . . . . . . . . . 1 1 1
02_x8 . . . . . . . . . . . . . . . . . . . . . . . .
03_x8 . . . . . . . . . . . . . . . . . . . . . . . .
23_x8 . . . . . . . . . . . . . . . . . . . . . . . .