Using PyTorch tensors with scikit-learn
Question:
Can I use PyTorch tensors instead of NumPy arrays while working with scikit-learn?
I tried some methods from scikit-learn like train_test_split
and StandardScalar
, and it seems to work just fine, but is there anything I should know when I’m using PyTorch tensors instead of NumPy arrays?
According to this question on https://scikit-learn.org/stable/faq.html#how-can-i-load-my-own-datasets-into-a-format-usable-by-scikit-learn :
numpy arrays or scipy sparse matrices. Other types that are convertible to numeric arrays such as pandas DataFrame are also acceptable.
Does that mean using PyTorch tensors is completely safe?
Answers:
I don’t think PyTorch tensors are directly supported by scikit-learn. But you can always get the underlying numpy array from PyTorch tensors
my_nparray = my_tensor.numpy()
and then use it with scikit learn functions.
skorch could be an option to consider.
It aims to "make it possible to use PyTorch with sklearn".
It allows you to use PyTorch tensors with scikit learn.
Out of the box, skorch works with many types of data, be it PyTorch Tensors, NumPy arrays, Python dicts, and so on.
In addition to tensors, it also helps to use other PyTorch features like torch.nn
module for Neural Networks, PyTorch DataLoaders, etc with the familiar sklearn interface.
Can I use PyTorch tensors instead of NumPy arrays while working with scikit-learn?
I tried some methods from scikit-learn like train_test_split
and StandardScalar
, and it seems to work just fine, but is there anything I should know when I’m using PyTorch tensors instead of NumPy arrays?
According to this question on https://scikit-learn.org/stable/faq.html#how-can-i-load-my-own-datasets-into-a-format-usable-by-scikit-learn :
numpy arrays or scipy sparse matrices. Other types that are convertible to numeric arrays such as pandas DataFrame are also acceptable.
Does that mean using PyTorch tensors is completely safe?
I don’t think PyTorch tensors are directly supported by scikit-learn. But you can always get the underlying numpy array from PyTorch tensors
my_nparray = my_tensor.numpy()
and then use it with scikit learn functions.
skorch could be an option to consider.
It aims to "make it possible to use PyTorch with sklearn".
It allows you to use PyTorch tensors with scikit learn.
Out of the box, skorch works with many types of data, be it PyTorch Tensors, NumPy arrays, Python dicts, and so on.
In addition to tensors, it also helps to use other PyTorch features like torch.nn
module for Neural Networks, PyTorch DataLoaders, etc with the familiar sklearn interface.