How to convert a pytorch Dataset object to a pandas dataframe?

Question:

I have a pandas dataframe called df that I split into train/test sets using the random_split function from torch library.

import pandas as pd
from torch.utils.data import Dataset, random_split

df = pd.read_csv('some_txt_file.txt', sep= ' ')
len_test = len(df) // 10
len_train = len(df) - len(test)
lengths = [len_train, len_test]
train, test = random_split(df, lengths, torch.Generator().manual_seed(42))

Is there a way to convert these Dataset objects back into pandas dataframes? I tried

df_train = pd.DataFrame(train)
df_test = pd.DataFrame(test)

but it throws the error

ValueError: DataFrame constructor not properly called!
Asked By: CoolMathematician

||

Answers:

torch.utils.data.random_split will return torch.utils.data.Subset, which is composed of dataset and indices.

To get the sub dataframe, you can do

df_train = train.dataset.iloc[train.indices]
# or
df_train = df.iloc[train.indices]
Answered By: Ynjxsjmh
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.