Is there any difference between the DNN model by keras and the DNN model by pytorch?
Question:
Here are my codes for DNN by torch and keras.
I use them to train the same data, but finally get the totally different AUC results(keras version reaches 0.74 and torch version reaches 0.67).
So I’m so confused!
And I have tried many times that my results remain differences.
Is there any difference between the two models?
categorical_embed_sizes = [589806, 21225, 2565, 2686, 343, 344, 10, 2, 8, 8, 7, 7, 2, 2, 2, 17, 17, 17]
#keras model
cat_input, embeds = [], []
for i in range(cat_len):
input_ = Input(shape=(1, ))
cat_input.append(input_)
nums = categorical_embed_sizes[i]
embed = Embedding(nums, 8)(input_)
embeds.append(embed)
cont_input = Input(shape=(cont_len,), name='cont_input', dtype='float32')
cont_input_r = Reshape((1, cont_len))(cont_input)
embeds.append(cont_input_r)
#Merge_L=concatenate([train_emb,trainnumber_emb,departstationname_emb,arrivestationname_emb,seatname_emb,orderofftime_emb,fromcityname_emb,tocityname_emb,daytype_emb,num_input_r])
Merge_L=concatenate(embeds, name='cat_1')
Merge_L=Dense(256,activation=None,name='dense_0')(Merge_L)
Merge_L=PReLU(name='merge_0')(Merge_L)
Merge_L=BatchNormalization(name='bn_0')(Merge_L)
Merge_L=Dense(128,activation=None,name='dense_1')(Merge_L)
Merge_L=PReLU(name='prelu_1')(Merge_L)
Merge_L=BatchNormalization(name='bn_1')(Merge_L)
Merge_L=Dense(64,activation=None,name='Dense_2')(Merge_L)
Merge_L=PReLU(name='prelu_2')(Merge_L)
Merge_L=BatchNormalization(name='bn_2')(Merge_L)
Merge_L=Dense(32,activation=None,name='Dense_3')(Merge_L)
Merge_L=PReLU(name='prelu_3')(Merge_L)
Merge_L=BatchNormalization(name='bn_3')(Merge_L)
Merge_L=Dense(16,activation=None,name='Dense_4')(Merge_L)
Merge_L=PReLU(name='prelu_4')(Merge_L)
predictions= Dense(1, activation='sigmoid', name='Dense_rs')(Merge_L)
predictions=Reshape((1,), name='pred')(predictions)
cat_input.append(cont_input)
model = Model(inputs=cat_input,
outputs=predictions)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=[tf.keras.metrics.BinaryAccuracy(), tf.keras.metrics.AUC()])
# torch model
class DNN(nn.Module):
def __init__(self, categorical_length, categorical_embed_sizes, categorical_embed_dim, in_size):
super(DNN, self).__init__()
self.categorical_length = categorical_length
self.categorical_embed_sizes = categorical_embed_sizes
self.categorical_embed_dim = categorical_embed_dim
self.in_size = in_size
self.nn = torch.nn.Sequential(
nn.Linear(self.in_size, 256),
nn.PReLU(256),
nn.BatchNorm1d(256),
nn.Linear(256, 128),
nn.PReLU(128),
nn.BatchNorm1d(128),
nn.Linear(128, 64),
nn.PReLU(64),
nn.BatchNorm1d(64),
nn.Linear(64, 32),
nn.PReLU(32),
nn.BatchNorm1d(32),
nn.Linear(32, 16),
nn.PReLU(16)
)
self.out = torch.nn.Sequential(
nn.Linear(16, 1),
nn.Sigmoid()
)
self.embedding = nn.Embedding(self.categorical_embed_sizes, self.categorical_embed_dim)
def forward(self, x):
x_categorical = x[:, :self.categorical_length].long()
x_categorical = self.embedding(x_categorical).view(x_categorical.size(0), -1)
x = torch.cat((x_categorical, x[:, self.categorical_length:]), dim=1)
x = self.nn(x)
out = self.out(x)
return out
Answers:
The results of the networks depend on the weight initialization schemes. Keras and Pytorch have different weight initialization schemes. Keras uses Glorot and Pytorch uses Kaming.
Even though if you use same schemes, the results will not be same (but will be close) as there will be a different in weights initialization everytime you start a new training.
Finally I find the real reason for the error I met. It has no business with the model structure or parameters. Actually, the wrong input for sklearn’s roc_auc_score function is the direct cause of this error.
As we know, sklearn.metrics.roc_auc_score
need at least y_true
and y_score
. y_true
is the real labels of datasets and y_score
is the predicted probabilities of label 1 (for binary tasks).
But when I use torch’s outputs to calculate two metrics(Accuracy and AUC), I transform the outputs to 0-1 vectors. So my y_score
is no longer probabilities but 0-1 vectors.
Then the error happened…
Here are my codes for DNN by torch and keras.
I use them to train the same data, but finally get the totally different AUC results(keras version reaches 0.74 and torch version reaches 0.67).
So I’m so confused!
And I have tried many times that my results remain differences.
Is there any difference between the two models?
categorical_embed_sizes = [589806, 21225, 2565, 2686, 343, 344, 10, 2, 8, 8, 7, 7, 2, 2, 2, 17, 17, 17]
#keras model
cat_input, embeds = [], []
for i in range(cat_len):
input_ = Input(shape=(1, ))
cat_input.append(input_)
nums = categorical_embed_sizes[i]
embed = Embedding(nums, 8)(input_)
embeds.append(embed)
cont_input = Input(shape=(cont_len,), name='cont_input', dtype='float32')
cont_input_r = Reshape((1, cont_len))(cont_input)
embeds.append(cont_input_r)
#Merge_L=concatenate([train_emb,trainnumber_emb,departstationname_emb,arrivestationname_emb,seatname_emb,orderofftime_emb,fromcityname_emb,tocityname_emb,daytype_emb,num_input_r])
Merge_L=concatenate(embeds, name='cat_1')
Merge_L=Dense(256,activation=None,name='dense_0')(Merge_L)
Merge_L=PReLU(name='merge_0')(Merge_L)
Merge_L=BatchNormalization(name='bn_0')(Merge_L)
Merge_L=Dense(128,activation=None,name='dense_1')(Merge_L)
Merge_L=PReLU(name='prelu_1')(Merge_L)
Merge_L=BatchNormalization(name='bn_1')(Merge_L)
Merge_L=Dense(64,activation=None,name='Dense_2')(Merge_L)
Merge_L=PReLU(name='prelu_2')(Merge_L)
Merge_L=BatchNormalization(name='bn_2')(Merge_L)
Merge_L=Dense(32,activation=None,name='Dense_3')(Merge_L)
Merge_L=PReLU(name='prelu_3')(Merge_L)
Merge_L=BatchNormalization(name='bn_3')(Merge_L)
Merge_L=Dense(16,activation=None,name='Dense_4')(Merge_L)
Merge_L=PReLU(name='prelu_4')(Merge_L)
predictions= Dense(1, activation='sigmoid', name='Dense_rs')(Merge_L)
predictions=Reshape((1,), name='pred')(predictions)
cat_input.append(cont_input)
model = Model(inputs=cat_input,
outputs=predictions)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=[tf.keras.metrics.BinaryAccuracy(), tf.keras.metrics.AUC()])
# torch model
class DNN(nn.Module):
def __init__(self, categorical_length, categorical_embed_sizes, categorical_embed_dim, in_size):
super(DNN, self).__init__()
self.categorical_length = categorical_length
self.categorical_embed_sizes = categorical_embed_sizes
self.categorical_embed_dim = categorical_embed_dim
self.in_size = in_size
self.nn = torch.nn.Sequential(
nn.Linear(self.in_size, 256),
nn.PReLU(256),
nn.BatchNorm1d(256),
nn.Linear(256, 128),
nn.PReLU(128),
nn.BatchNorm1d(128),
nn.Linear(128, 64),
nn.PReLU(64),
nn.BatchNorm1d(64),
nn.Linear(64, 32),
nn.PReLU(32),
nn.BatchNorm1d(32),
nn.Linear(32, 16),
nn.PReLU(16)
)
self.out = torch.nn.Sequential(
nn.Linear(16, 1),
nn.Sigmoid()
)
self.embedding = nn.Embedding(self.categorical_embed_sizes, self.categorical_embed_dim)
def forward(self, x):
x_categorical = x[:, :self.categorical_length].long()
x_categorical = self.embedding(x_categorical).view(x_categorical.size(0), -1)
x = torch.cat((x_categorical, x[:, self.categorical_length:]), dim=1)
x = self.nn(x)
out = self.out(x)
return out
The results of the networks depend on the weight initialization schemes. Keras and Pytorch have different weight initialization schemes. Keras uses Glorot and Pytorch uses Kaming.
Even though if you use same schemes, the results will not be same (but will be close) as there will be a different in weights initialization everytime you start a new training.
Finally I find the real reason for the error I met. It has no business with the model structure or parameters. Actually, the wrong input for sklearn’s roc_auc_score function is the direct cause of this error.
As we know, sklearn.metrics.roc_auc_score
need at least y_true
and y_score
. y_true
is the real labels of datasets and y_score
is the predicted probabilities of label 1 (for binary tasks).
But when I use torch’s outputs to calculate two metrics(Accuracy and AUC), I transform the outputs to 0-1 vectors. So my y_score
is no longer probabilities but 0-1 vectors.
Then the error happened…