How can I transform 3-column pandas dataframe to matrix format in Python?
Question:
I have 3 columns in my data-frame, namely X, Y, Z. I want to transform Z into a matrix based on X, Y (all columns having numerical values). X and Y have duplicate entries, hence a pivot table doesn’t work.
My code (n = #rows) :
mat = numpy.zeros((n, n))
for i in range (0, n):
for j in range (0, n):
if (Y[j] == Y[i]):
mat[i, j] = Z[j]
if (X[j] == X[i]):
mat[i, j] = Z[i]
yields
[[6 10 0 0]
[6 10 10 0]
[0 10 10 0]
[0 0 0 6]]
Data looks like:
X = array([100, 10, 10, 50]);
Y = array([20, 20, 40, 60]);
Z = array([6, 10, 10, 6]);
So the correct matrix should be:
[[6 10 10 0]
[6 10 10 0]
[0 10 10 0]
[0 0 0 6]]
which is obtained by:
| 100 10 10 50
--------------------
20 | 6 10 10 0
--------------------
20 | 6 10 10 0
--------------------
40 | 0 10 10 0
--------------------
60 | 0 0 0 6
--------------------
Answers:
I currently don’t see how to do this faster than with two for-loop. This should work:
data = pd.DataFrame({
'X': np.array([100, 10, 10, 50]),
'Y': np.array([20, 20, 40, 60]),
'Z': np.array([6, 10, 10, 6])
})
mapping = {(x, y): z for (x, y, z) in data[["X", "Y", "Z"]].values}
n = len(data)
mat = np.zeros((n, n))
for i, x in np.ndenumerate(data["X"]):
for j, y in np.ndenumerate(data["Y"]):
mat[j, i] = mapping.get((x, y), 0)
print(mat)
Output:
[[ 6. 10. 10. 0.]
[ 6. 10. 10. 0.]
[ 0. 10. 10. 0.]
[ 0. 0. 0. 6.]]
I am creating a mapping
that corresponds to the assignment of (x, y) ⟶ z.
With this is in place, filling the result matrix mat
is pretty straight-forward.
Note however, that if there exist multiple columns with the same values for both x
and y
, the corresponding z
value of the last column would be taken.
I have 3 columns in my data-frame, namely X, Y, Z. I want to transform Z into a matrix based on X, Y (all columns having numerical values). X and Y have duplicate entries, hence a pivot table doesn’t work.
My code (n = #rows) :
mat = numpy.zeros((n, n))
for i in range (0, n):
for j in range (0, n):
if (Y[j] == Y[i]):
mat[i, j] = Z[j]
if (X[j] == X[i]):
mat[i, j] = Z[i]
yields
[[6 10 0 0]
[6 10 10 0]
[0 10 10 0]
[0 0 0 6]]
Data looks like:
X = array([100, 10, 10, 50]);
Y = array([20, 20, 40, 60]);
Z = array([6, 10, 10, 6]);
So the correct matrix should be:
[[6 10 10 0]
[6 10 10 0]
[0 10 10 0]
[0 0 0 6]]
which is obtained by:
| 100 10 10 50
--------------------
20 | 6 10 10 0
--------------------
20 | 6 10 10 0
--------------------
40 | 0 10 10 0
--------------------
60 | 0 0 0 6
--------------------
I currently don’t see how to do this faster than with two for-loop. This should work:
data = pd.DataFrame({
'X': np.array([100, 10, 10, 50]),
'Y': np.array([20, 20, 40, 60]),
'Z': np.array([6, 10, 10, 6])
})
mapping = {(x, y): z for (x, y, z) in data[["X", "Y", "Z"]].values}
n = len(data)
mat = np.zeros((n, n))
for i, x in np.ndenumerate(data["X"]):
for j, y in np.ndenumerate(data["Y"]):
mat[j, i] = mapping.get((x, y), 0)
print(mat)
Output:
[[ 6. 10. 10. 0.]
[ 6. 10. 10. 0.]
[ 0. 10. 10. 0.]
[ 0. 0. 0. 6.]]
I am creating a mapping
that corresponds to the assignment of (x, y) ⟶ z.
With this is in place, filling the result matrix mat
is pretty straight-forward.
Note however, that if there exist multiple columns with the same values for both x
and y
, the corresponding z
value of the last column would be taken.