CNN Negative Number of Parameters

Question:

I am trying to build a CNN model with keras. When i add two blocks of Conv3D and MaxPooling, everything is normal. However, once the third block is added (as shown in the code) the number of trainable parameters gets negative value. Any idea how this can happen?

model = keras.models.Sequential()

# # # First Block
model.add(Conv2D(filters=16, kernel_size=(5, 5), padding='valid', input_shape=(157, 462, 14), activation = 'tanh' ))
model.add(MaxPooling2D( (2,2) ))

# # # Second Block     
model.add(Conv2D(filters=32, kernel_size=(5, 5), padding='valid', activation = 'tanh'))
model.add(MaxPooling2D( (2, 2) ))

# # # Third Block   
model.add(Conv2D(filters=64, kernel_size=(5, 5), padding='valid', activation = 'tanh'))
model.add(MaxPooling2D( (2, 2) ))

model.add(Flatten())
model.add(Dense(157 * 462))
model.compile(loss='mean_squared_error',
              optimizer=keras.optimizers.Adamax(),
               metrics=['mean_absolute_error'])

print(model.summary())

The result of this code is the following:

Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 153, 458, 16)      5616      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 76, 229, 16)       0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 72, 225, 32)       12832     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 36, 112, 32)       0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 32, 108, 64)       51264     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 16, 54, 64)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 55296)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 72534)             -284054698
=================================================================
Total params: -283,984,986
Trainable params: -283,984,986
Non-trainable params: 0
_________________________________________________________________
None
Asked By: Nijat

||

Answers:

Yes, of course, your Dense layer has a weight matrix of size 55296 x 72534, which contains 4010840064 numbers, that is 4010 million parameters.

Somewhere in the Keras code the number of parameters is stored as an int32, and that means there is a limit to what numbers it can store, namely 2^32 - 1 = 2147483647, and now you can see, your 4010 million parameters is larger than 2^32 - 1, so the number overflows into the negative side of an integer.

I would recommend not making a model with such large number of parameters, you would not be able to train it anyway without a a huge amount of RAM.

Answered By: Dr. Snoopy

The problem is because you are running your code in CPU due to which the backend of keras tensorflow or theano are able to work properly. I was able to run your code perfectly with GPU in google colab and this is what I got

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 153, 458, 16)      5616      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 76, 229, 16)       0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 72, 225, 32)       12832     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 36, 112, 32)       0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 32, 108, 64)       51264     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 16, 54, 64)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 55296)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 72534)             4010912598
=================================================================
Total params: 4,010,982,310
Trainable params: 4,010,982,310
Non-trainable params: 0

I recommend you to use GPU for training such a huge network.