Using PyTorch tensors with scikit-learn

Question:

Can I use PyTorch tensors instead of NumPy arrays while working with scikit-learn?

I tried some methods from scikit-learn like train_test_split and StandardScalar, and it seems to work just fine, but is there anything I should know when I’m using PyTorch tensors instead of NumPy arrays?

According to this question on https://scikit-learn.org/stable/faq.html#how-can-i-load-my-own-datasets-into-a-format-usable-by-scikit-learn :

numpy arrays or scipy sparse matrices. Other types that are convertible to numeric arrays such as pandas DataFrame are also acceptable.

Does that mean using PyTorch tensors is completely safe?

Asked By: SMMousaviSP

||

Answers:

I don’t think PyTorch tensors are directly supported by scikit-learn. But you can always get the underlying numpy array from PyTorch tensors

my_nparray = my_tensor.numpy()

and then use it with scikit learn functions.

Answered By: ayandas

skorch could be an option to consider.
It aims to "make it possible to use PyTorch with sklearn".

It allows you to use PyTorch tensors with scikit learn.

Out of the box, skorch works with many types of data, be it PyTorch Tensors, NumPy arrays, Python dicts, and so on.

In addition to tensors, it also helps to use other PyTorch features like torch.nn module for Neural Networks, PyTorch DataLoaders, etc with the familiar sklearn interface.

Answered By: Noob ML Dude