Convert a numpy float64 sparse matrix to a pandas data frame
Question:
I have an n x n
numpy
float64
sparse matrix
(data
, where n = 44
), where the rows and columns are graph nodes and the values are edge weights:
>>> data
<44x44 sparse matrix of type '<class 'numpy.float64'>'
with 668 stored elements in Compressed Sparse Row format>
>>> type(data)
<class 'scipy.sparse.csr.csr_matrix'>
>>> print(data)
(0, 7) 0.11793236293516568
(0, 9) 0.10992000939300195
(0, 21) 0.7422196678913772
(0, 23) 0.0630039712667936
(0, 24) 0.027037442463504143
(0, 27) 0.16908845414214152
(0, 28) 0.6109227233402952
(0, 32) 0.0514765253537568
(0, 33) 0.016341754080557713
(1, 6) 0.015070325434709386
(1, 10) 9.346673769086203e-05
(1, 11) 0.2471018034781923
(1, 14) 0.0020684269551621776
(1, 18) 0.015258704502643251
(1, 20) 0.021798149289490358
(1, 22) 0.0087026831764125
(1, 24) 0.1454235884185166
(1, 25) 0.022060777594183015
(1, 29) 0.9117391202819067
(1, 30) 0.018557883854566116
(1, 31) 0.001876070225734826
(1, 32) 0.025841354399637764
(1, 33) 0.014766488228364438
(1, 39) 0.002791226433410351
(1, 43) 1.0
: :
(41, 7) 0.8922099840113696
(41, 10) 0.015776226631920767
(41, 12) 1.0
(41, 15) 0.1839408706622038
(41, 18) 0.5151025641025642
(41, 20) 0.4599130036630037
(41, 22) 0.29378473237788827
(41, 33) 0.47474890700697153
(41, 39) 1.0
(42, 2) 1.0
(42, 10) 0.023305789342610222
(42, 11) 0.011349136164776494
(42, 12) 1.0
(42, 17) 0.886081346522542
(42, 18) 1.0
(42, 30) 1.0
(42, 40) 1.0
(43, 1) 1.0
(43, 6) 1.0
(43, 11) 0.039948959300013256
(43, 13) 1.0
(43, 14) 0.02669811947637717
(43, 29) 1.0
(43, 30) 1.0
(43, 36) 0.3381986531986532
I’d like to convert it to a pandas
data frame
, in order to write it to a file, with the columns: node1, node2, edge_weight
, which will therefore give:
node1, node2, edge_weight
0, 7, 0.11793236293516568
0, 9, 0.10992000939300195
:, :, :
43, 36, 0.3381986531986532
Any idea how to do that?
Note that:
>>> pandas.DataFrame(data)
gives:
0
0 (0, 7)t0.11793236293516568n (0, 9)t0.109...
1 (0, 6)t0.015070325434709386n (0, 10)t9.3...
And
>>> pandas.DataFrame(print(data))
Gives:
(0, 7) 0.11793236293516568
(0, 9) 0.10992000939300195
So I guess pandas.DataFrame(print(data))
is close to what I’m looking for.
Answers:
This ipython session shows one way you could do it. The two steps are: convert the sparse matrix to COO format, and then create the Pandas DataFrame using the .row
, .col
and .data
attributes of the COO matrix.
In [50]: data
Out[50]:
<15x15 sparse matrix of type '<class 'numpy.float64'>'
with 11 stored elements in Compressed Sparse Row format>
In [51]: print(data)
(1, 12) 0.8581958095588134
(6, 12) 0.03828052946099181
(6, 14) 0.7908634838351427
(7, 1) 0.7995008873930302
(7, 11) 0.48477191537121145
(7, 13) 0.6226526443518743
(9, 4) 0.37242576669669103
(11, 1) 0.9604278557580955
(11, 5) 0.13285436036287313
(12, 11) 0.5631419223609928
(13, 8) 0.16481624650723847
In [52]: import pandas as pd
In [53]: c = data.tocoo()
In [54]: df = pd.DataFrame({node1: c.row, node2: c.col, edge_weight: c.data})
In [55]: df
Out[55]:
node1 node2 edge_weight
0 1 12 0.858196
1 6 12 0.038281
2 6 14 0.790863
3 7 1 0.799501
4 7 11 0.484772
5 7 13 0.622653
6 9 4 0.372426
7 11 1 0.960428
8 11 5 0.132854
9 12 11 0.563142
10 13 8 0.164816
Can you try toarray
pd.DataFrame(A.toarray())
I ran into a similar problem when using OneHotEncoder
I fixed it by changing sparse to False
enc = OneHotEncoder(sparse=False)
I have an n x n
numpy
float64
sparse matrix
(data
, where n = 44
), where the rows and columns are graph nodes and the values are edge weights:
>>> data
<44x44 sparse matrix of type '<class 'numpy.float64'>'
with 668 stored elements in Compressed Sparse Row format>
>>> type(data)
<class 'scipy.sparse.csr.csr_matrix'>
>>> print(data)
(0, 7) 0.11793236293516568
(0, 9) 0.10992000939300195
(0, 21) 0.7422196678913772
(0, 23) 0.0630039712667936
(0, 24) 0.027037442463504143
(0, 27) 0.16908845414214152
(0, 28) 0.6109227233402952
(0, 32) 0.0514765253537568
(0, 33) 0.016341754080557713
(1, 6) 0.015070325434709386
(1, 10) 9.346673769086203e-05
(1, 11) 0.2471018034781923
(1, 14) 0.0020684269551621776
(1, 18) 0.015258704502643251
(1, 20) 0.021798149289490358
(1, 22) 0.0087026831764125
(1, 24) 0.1454235884185166
(1, 25) 0.022060777594183015
(1, 29) 0.9117391202819067
(1, 30) 0.018557883854566116
(1, 31) 0.001876070225734826
(1, 32) 0.025841354399637764
(1, 33) 0.014766488228364438
(1, 39) 0.002791226433410351
(1, 43) 1.0
: :
(41, 7) 0.8922099840113696
(41, 10) 0.015776226631920767
(41, 12) 1.0
(41, 15) 0.1839408706622038
(41, 18) 0.5151025641025642
(41, 20) 0.4599130036630037
(41, 22) 0.29378473237788827
(41, 33) 0.47474890700697153
(41, 39) 1.0
(42, 2) 1.0
(42, 10) 0.023305789342610222
(42, 11) 0.011349136164776494
(42, 12) 1.0
(42, 17) 0.886081346522542
(42, 18) 1.0
(42, 30) 1.0
(42, 40) 1.0
(43, 1) 1.0
(43, 6) 1.0
(43, 11) 0.039948959300013256
(43, 13) 1.0
(43, 14) 0.02669811947637717
(43, 29) 1.0
(43, 30) 1.0
(43, 36) 0.3381986531986532
I’d like to convert it to a pandas
data frame
, in order to write it to a file, with the columns: node1, node2, edge_weight
, which will therefore give:
node1, node2, edge_weight
0, 7, 0.11793236293516568
0, 9, 0.10992000939300195
:, :, :
43, 36, 0.3381986531986532
Any idea how to do that?
Note that:
>>> pandas.DataFrame(data)
gives:
0
0 (0, 7)t0.11793236293516568n (0, 9)t0.109...
1 (0, 6)t0.015070325434709386n (0, 10)t9.3...
And
>>> pandas.DataFrame(print(data))
Gives:
(0, 7) 0.11793236293516568
(0, 9) 0.10992000939300195
So I guess pandas.DataFrame(print(data))
is close to what I’m looking for.
This ipython session shows one way you could do it. The two steps are: convert the sparse matrix to COO format, and then create the Pandas DataFrame using the .row
, .col
and .data
attributes of the COO matrix.
In [50]: data
Out[50]:
<15x15 sparse matrix of type '<class 'numpy.float64'>'
with 11 stored elements in Compressed Sparse Row format>
In [51]: print(data)
(1, 12) 0.8581958095588134
(6, 12) 0.03828052946099181
(6, 14) 0.7908634838351427
(7, 1) 0.7995008873930302
(7, 11) 0.48477191537121145
(7, 13) 0.6226526443518743
(9, 4) 0.37242576669669103
(11, 1) 0.9604278557580955
(11, 5) 0.13285436036287313
(12, 11) 0.5631419223609928
(13, 8) 0.16481624650723847
In [52]: import pandas as pd
In [53]: c = data.tocoo()
In [54]: df = pd.DataFrame({node1: c.row, node2: c.col, edge_weight: c.data})
In [55]: df
Out[55]:
node1 node2 edge_weight
0 1 12 0.858196
1 6 12 0.038281
2 6 14 0.790863
3 7 1 0.799501
4 7 11 0.484772
5 7 13 0.622653
6 9 4 0.372426
7 11 1 0.960428
8 11 5 0.132854
9 12 11 0.563142
10 13 8 0.164816
Can you try toarray
pd.DataFrame(A.toarray())
I ran into a similar problem when using OneHotEncoder
I fixed it by changing sparse to False
enc = OneHotEncoder(sparse=False)