how to convert a dataframe to tensor

Question:

I have a dataframe like this:

         ids  dim
0         1    2
1         1    0
2         1    1
3         2    1
4         2    2
5         3    0
6         3    2
7         4    1
8         4    2
9         Nan  0
10        Nan  1
11        Nan  0

I want to build a tensorflow tensor out of it so that the result look like this:
Here the columns are correspond to the dim column in df, as we have three distinct value (0, 1,2) the equivalent tensor would have three column.

And the values of the tensor are the associated id s in the df.

1   1   1
Nan 2   2
3   Nan 3
Nan 4   4

What I did:

I tried to convert the df to a numpy and then convert it to the tensor, however, the result does not look like what I want:

tf.constant(df[['ids', 'dim']].values, dtype=tf.int32)
Asked By: sariii

||

Answers:

Check my code:

import numpy as np
import pandas as pd
import tensorflow as tf


df = pd.DataFrame([[1, 2],
                   [1, 0],
                   [1, 1],
                   [2, 1],
                   [2, 2],
                   [3, 0],
                   [3, 2],
                   [4, 1],
                   [4, 2],
                   [np.nan, 0],
                   [np.nan, 1],
                   [np.nan, 0]], columns=['ids', 'dim'])
dim_array = np.array(df['dim'])
sort = dim_array.argsort()
final = np.array([df.ids[sort]]).reshape((3, 4)).T
final_result = tf.constant(final, dtype=tf.int32) # use tf.float32 to retain nan in tensor
print(final_result)

# <tf.Tensor: shape=(4, 3), dtype=int32, numpy=
# array([[          1,           1,           1],
#        [          3,           2,           2],
#        [-2147483648,           4,           3],
#        [-2147483648, -2147483648,           4]], 
# dtype=int32)>

In tensorflow nan will loss by some value.

Answered By: Davinder Singh

You can use pd.pivot_table() for a concise computation

df = pd.DataFrame([[1, 2],
                   [1, 0],
                   [1, 1],
                   [2, 1],
                   [2, 2],
                   [3, 0],
                   [3, 2],
                   [4, 1],
                   [4, 2],
                   [np.nan, 0],
                   [np.nan, 1],
                   [np.nan, 0]], columns=['ids', 'dim'])

df['val'] = 1
df = df.pivot_table(index='ids',columns='dim',values='val') 
df = df.multiply(np.array(df.index), axis=0)

tensor = tf.constant(df)
Answered By: thushv89

try torch

item=torch.tensor(df.values)
Answered By: Golden Lion
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.