Is there any difference between the DNN model by keras and the DNN model by pytorch?

Question:

Here are my codes for DNN by torch and keras.
I use them to train the same data, but finally get the totally different AUC results(keras version reaches 0.74 and torch version reaches 0.67).
So I’m so confused!
And I have tried many times that my results remain differences.
Is there any difference between the two models?

categorical_embed_sizes = [589806, 21225, 2565, 2686, 343, 344, 10, 2, 8, 8, 7, 7, 2, 2, 2, 17, 17, 17]

#keras model
cat_input, embeds = [], []
for i in range(cat_len):
    input_ = Input(shape=(1, ))
    cat_input.append(input_)
    nums = categorical_embed_sizes[i]
    embed = Embedding(nums, 8)(input_)
    embeds.append(embed)
cont_input = Input(shape=(cont_len,), name='cont_input', dtype='float32')
cont_input_r = Reshape((1, cont_len))(cont_input)

embeds.append(cont_input_r)

#Merge_L=concatenate([train_emb,trainnumber_emb,departstationname_emb,arrivestationname_emb,seatname_emb,orderofftime_emb,fromcityname_emb,tocityname_emb,daytype_emb,num_input_r])
Merge_L=concatenate(embeds, name='cat_1')
Merge_L=Dense(256,activation=None,name='dense_0')(Merge_L)
Merge_L=PReLU(name='merge_0')(Merge_L)
Merge_L=BatchNormalization(name='bn_0')(Merge_L)
Merge_L=Dense(128,activation=None,name='dense_1')(Merge_L)
Merge_L=PReLU(name='prelu_1')(Merge_L)
Merge_L=BatchNormalization(name='bn_1')(Merge_L)
Merge_L=Dense(64,activation=None,name='Dense_2')(Merge_L)
Merge_L=PReLU(name='prelu_2')(Merge_L)
Merge_L=BatchNormalization(name='bn_2')(Merge_L)
Merge_L=Dense(32,activation=None,name='Dense_3')(Merge_L)
Merge_L=PReLU(name='prelu_3')(Merge_L)
Merge_L=BatchNormalization(name='bn_3')(Merge_L)
Merge_L=Dense(16,activation=None,name='Dense_4')(Merge_L)
Merge_L=PReLU(name='prelu_4')(Merge_L)

predictions= Dense(1, activation='sigmoid', name='Dense_rs')(Merge_L)

predictions=Reshape((1,), name='pred')(predictions)

cat_input.append(cont_input)
model = Model(inputs=cat_input, 
                     outputs=predictions)

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=[tf.keras.metrics.BinaryAccuracy(), tf.keras.metrics.AUC()])
# torch model
class DNN(nn.Module):
    def __init__(self, categorical_length, categorical_embed_sizes, categorical_embed_dim, in_size):
        super(DNN, self).__init__()
        self.categorical_length = categorical_length
        self.categorical_embed_sizes = categorical_embed_sizes
        self.categorical_embed_dim = categorical_embed_dim
        self.in_size = in_size
        self.nn = torch.nn.Sequential(
            nn.Linear(self.in_size, 256),
            nn.PReLU(256),
            nn.BatchNorm1d(256),
            nn.Linear(256, 128),
            nn.PReLU(128),
            nn.BatchNorm1d(128),
            nn.Linear(128, 64),
            nn.PReLU(64),
            nn.BatchNorm1d(64),
            nn.Linear(64, 32),
            nn.PReLU(32),
            nn.BatchNorm1d(32),
            nn.Linear(32, 16),
            nn.PReLU(16)
        )
        self.out = torch.nn.Sequential(
            nn.Linear(16, 1),
            nn.Sigmoid()
        )
        self.embedding = nn.Embedding(self.categorical_embed_sizes, self.categorical_embed_dim)

    def forward(self, x):
        x_categorical = x[:, :self.categorical_length].long()
        x_categorical = self.embedding(x_categorical).view(x_categorical.size(0), -1)
        x = torch.cat((x_categorical, x[:, self.categorical_length:]), dim=1)
        x = self.nn(x)
        out = self.out(x)
        return out
Asked By: Junming Liang

||

Answers:

The results of the networks depend on the weight initialization schemes. Keras and Pytorch have different weight initialization schemes. Keras uses Glorot and Pytorch uses Kaming.

Even though if you use same schemes, the results will not be same (but will be close) as there will be a different in weights initialization everytime you start a new training.

Answered By: the_ordinary_guy

Finally I find the real reason for the error I met. It has no business with the model structure or parameters. Actually, the wrong input for sklearn’s roc_auc_score function is the direct cause of this error.

As we know, sklearn.metrics.roc_auc_score need at least y_true and y_score. y_true is the real labels of datasets and y_score is the predicted probabilities of label 1 (for binary tasks).

But when I use torch’s outputs to calculate two metrics(Accuracy and AUC), I transform the outputs to 0-1 vectors. So my y_score is no longer probabilities but 0-1 vectors.

Then the error happened…

Answered By: Junming Liang