# How to load a list of numpy arrays to pytorch dataset loader?

## Question:

I have a huge list of numpy arrays, where each array represents an image and I want to load it using torch.utils.data.Dataloader object. But the documentation of torch.utils.data.Dataloader mentions that it loads data directly from a folder. How do I modify it for my cause? I am new to pytorch and any help would be greatly appreciated.

my numpy array for a single image looks something like this. The image is RBG image.

```
[[[ 70 82 94]
[ 67 81 93]
[ 66 82 94]
...,
[182 182 188]
[183 183 189]
[188 186 192]]
[[ 66 80 92]
[ 62 78 91]
[ 64 79 95]
...,
[176 176 182]
[178 178 184]
[180 180 186]]
[[ 62 82 93]
[ 62 81 96]
[ 65 80 99]
...,
[169 172 177]
[173 173 179]
[172 172 178]]
...,
```

## Answers:

I think what DataLoader actually requires is an input that subclasses `Dataset`

. You can either write your own dataset class that subclasses `Dataset`

or use `TensorDataset`

as I have done below:

```
import torch
import numpy as np
from torch.utils.data import TensorDataset, DataLoader
my_x = [np.array([[1.0,2],[3,4]]),np.array([[5.,6],[7,8]])] # a list of numpy arrays
my_y = [np.array([4.]), np.array([2.])] # another list of numpy arrays (targets)
tensor_x = torch.Tensor(my_x) # transform to torch tensor
tensor_y = torch.Tensor(my_y)
my_dataset = TensorDataset(tensor_x,tensor_y) # create your datset
my_dataloader = DataLoader(my_dataset) # create your dataloader
```

Works for me.

PyTorch `DataLoader`

need a `DataSet`

as you can check in the docs. The right way to do that is to use:

```
torch.utils.data.TensorDataset(*tensors)
```

Which is a Dataset for wrapping tensors, where each sample will be retrieved by indexing tensors along the first dimension.

The parameters `*tensors`

means tensors that have the same size of the first dimension.

The other `class torch.utils.data.Dataset`

is an abstract class.

Here is how to convert numpy arrays to tensors:

```
import torch
import numpy as np
n = np.arange(10)
print(n) #[0 1 2 3 4 5 6 7 8 9]
t1 = torch.Tensor(n) # as torch.float32
print(t1) #tensor([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
t2 = torch.from_numpy(n) # as torch.int32
print(t2) #tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=torch.int32)
```

The accepted answer used the `torch.Tensor`

construct.

If you have an image with pixels from 0-255 you may use this:

```
timg = torch.from_numpy(img).float()
```

Or torchvision `to_tensor`

method, that converts a PIL Image or numpy.ndarray to tensor.

But here is a little trick you can put your numpy arrays directly.

```
x1 = np.array([1,2,3])
d1 = DataLoader( x1, batch_size=3)
```

This also works, but if you print `d1.dataset`

type:

```
print(type(d1.dataset)) # <class 'numpy.ndarray'>
```

While we actually need Tensors for working with CUDA so it is better to use Tensors to feed the `DataLoader`

.

Since you have images you probably want to perform transformations on them. So `TensorDataset`

is not the best option here. Instead you can create your own `Dataset`

. Something like this:

```
import torch
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader
import numpy as np
from PIL import Image
class MyDataset(Dataset):
def __init__(self, data, targets, transform=None):
self.data = data
self.targets = torch.LongTensor(targets)
self.transform = transform
def __getitem__(self, index):
x = self.data[index]
y = self.targets[index]
if self.transform:
x = Image.fromarray(self.data[index].astype(np.uint8).transpose(1,2,0))
x = self.transform(x)
return x, y
def __len__(self):
return len(self.data)
# Let's create 10 RGB images of size 128x128 and 10 labels {0, 1}
data = list(np.random.randint(0, 255, size=(10, 3, 128, 128)))
targets = list(np.random.randint(2, size=(10)))
transform = transforms.Compose([transforms.Resize(64), transforms.ToTensor()])
dataset = MyDataset(data, targets, transform=transform)
dataloader = DataLoader(dataset, batch_size=5)
```