How to convert a pytorch Dataset object to a pandas dataframe?
Question:
I have a pandas dataframe called df that I split into train/test sets using the random_split function from torch library.
import pandas as pd
from torch.utils.data import Dataset, random_split
df = pd.read_csv('some_txt_file.txt', sep= ' ')
len_test = len(df) // 10
len_train = len(df) - len(test)
lengths = [len_train, len_test]
train, test = random_split(df, lengths, torch.Generator().manual_seed(42))
Is there a way to convert these Dataset objects back into pandas dataframes? I tried
df_train = pd.DataFrame(train)
df_test = pd.DataFrame(test)
but it throws the error
ValueError: DataFrame constructor not properly called!
Answers:
torch.utils.data.random_split
will return torch.utils.data.Subset
, which is composed of dataset
and indices
.
To get the sub dataframe, you can do
df_train = train.dataset.iloc[train.indices]
# or
df_train = df.iloc[train.indices]
I have a pandas dataframe called df that I split into train/test sets using the random_split function from torch library.
import pandas as pd
from torch.utils.data import Dataset, random_split
df = pd.read_csv('some_txt_file.txt', sep= ' ')
len_test = len(df) // 10
len_train = len(df) - len(test)
lengths = [len_train, len_test]
train, test = random_split(df, lengths, torch.Generator().manual_seed(42))
Is there a way to convert these Dataset objects back into pandas dataframes? I tried
df_train = pd.DataFrame(train)
df_test = pd.DataFrame(test)
but it throws the error
ValueError: DataFrame constructor not properly called!
torch.utils.data.random_split
will return torch.utils.data.Subset
, which is composed of dataset
and indices
.
To get the sub dataframe, you can do
df_train = train.dataset.iloc[train.indices]
# or
df_train = df.iloc[train.indices]