Converting column of object type to pytorch tensor

Question:

I am new to machine learning and python.
I am working on data which has 2 columns of object type and a large number of columns of float type.
For converting a float type columns to tensor, the below code works fine:

cont_cols = ['row_id', 'player1','player2','playervar_0','player_1'] 
conts = np.stack([train_df[col].values for col in cont_cols],1)
conts = torch.tensor(conts,dtype= torch.float)

But when I tried doing with object type data column as below:

    obj_cols = ['winner','team'] 
    objs = np.stack([train_df[col].values for col in obj_cols],1)
    objs = torch.tensor(objs, dtype= torch.float)

I am getting the error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[60], line 2
      1 objs = np.stack([train_df[col].values for col in obj_cols],1)
----> 2 objs = torch.tensor(objs, dtype= torch.float)

TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

I would be really great and kind help if someone can guide me on this.

Edit
To make the question more clear, column ‘winner’ contains winner, loser, draw, loser, winner, winner,draw,……
The column team contains team1, team2, team1, team1, team2,….

Edit2

I tried this approach. I think, this approach is fine? Please suggest some better approach?

 train_df['winner'] = train_df['winner'].map({'loser': 0, 'winner': 1, 'draw': 2})
    train_df['team'] = train_df['team'].map({'team1': 0, 'team2': 1})
    obj_cols = ['winner','team'] 
    objs = np.stack([train_df[col].values.tolist() for col in obj_cols],1)
    objs = torch.tensor(objs, dtype= torch.float)
    objs[:5]
    tensor([[1., 0.],
            [0., 1.],
            [0., 0.],
            [0., 1.],
            [2., 0.]])
Asked By: Deepika

||

Answers:

Try this

obj_cols = ['winner','team'] 
objs = np.stack([train_df[col].values.tolist() for col in obj_cols],1)
objs = torch.tensor(objs, dtype= torch.float)
Answered By: Awal

To make the question more clear, column ‘winner’ contains winner, loser, draw, loser, winner, winner,draw,…… The column team contains team1, team2, team1, team1, team2,….

Your problem is that your data are strings (or "objects") that cannot be converted to a tensor directly.

You have to convert your unique string values into numbers somehow. You are on the right path regarding what you did in "Edit2" 🙂

If you want to preserve the labels of your columns, you could map the column to pandas.Categorical and then use the .codes attribute to get the integers for the tensor (see here), e.g.:

winner team
loser team1
winner team2
draw team1
winner team1
df = pd.DataFrame({
    "winner": ["loser", "winner", "draw", "winner"],
    "team": ["team1", "team2", "team1", "team1"]
})
# you can control the order here, i.e. winner -> 0, loser -> 1, etc.
df["winner"] = pd.Categorical(df["winner"], ["winner", "loser", "draw"]) 
df["team"] = pd.Categorical(df["team"], ["team1", "team2"])

objs = np.stack([df[col].cat.codes for col in ["winner", "team"]],1)

# Output of objs:
# array([[1, 0],
#        [0, 1],
#        [2, 0],
#        [0, 0]], dtype=int8)

Or you can also simply use pandas.factorize() to get an integer representation of you labels:

objs = np.stack([train_df[col].factorize().values for col in obj_cols],1)
Answered By: Rafael-WO
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.