When is it appropriate to use sample_weights in keras?

According to this question, I learnt that class_weight in keras is applying a weighted loss during training, and sample_weight is doing something sample-wise if I don’t have equal confidence in all the training samples.

So my questions would be,

  1. Is the loss during validation weighted by the class_weight, or is it only weighted during training?
  2. My dataset has 2 classes, and I don’t actually have a seriously imbalanced class ditribution. The ratio is approx. 1.7 : 1. Is that neccessary to use class_weight to balance the loss or even use oversampling? Is that OK to leave the slightly imbalanced data as the usual dataset treated?
  3. Can I simply consider sample_weight as the weights I give to each train sample? And my trainig samples can be treated with equal confidence, so I probably I don’t need to use this.


  1. From the keras documentation it says

class_weight: Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only). This can be useful to tell the model to “pay more attention” to samples from an under-represented class.

So class_weight does only affect the loss during traning. I myself have been interested in understanding how the class and sample weights is handled during testing and training. Looking at the keras github repo and the code for metric and loss, it does not seem that either loss or metric is affected by them. The printed values are quite hard to track in the training code like model.fit() and its corresponding tensorflow backend training functions. So I decided to make a test code to test the possible scenarios, see code below. The conclusion is that both class_weight and sample_weight only affect training loss, no effect on any metrics or validation loss. A little surprising as val_sample_weights (which you can specify) seems to do nothing(??).

  1. This types of question always depends on you problem, how skewed the date is and in what way you try to optimize the model. Are you optimizing for accuracy, then as long as the training data is equally skewed as when the model is in production, the best result will be achieved just training without any over/under sampling and/or class weights. If you on the other hand have something where one class is more important (or expensive) than another then you should be weighting the data. For example in fraud prevention, where fraud normally is much more expensive than the income of non-fraud. I would suggest you try out unweighted classes, weighted classes and some under/over-sampling and check which gives the best validation results. Use a validation function (or write your own) that best will compare different models (for-example weighting true-positive, false-positive, true-negative and false-negative differently dependent on cost). A relatively new loss-function that has shown great result at kaggle competitions on skewed data is Focal-loss. Focal-loss reduce the need for over/under-sampling. Unfortunately Focal-loss is not a built inn function in keras (yet), but can be manually programmed.

  2. Yes I think you are correct. I normally use sample_weight for two reasons. 1, the training data have some kind of measuring uncertainty, which if known can be used to weight accurate data more than inaccurate measurements. Or 2, we can weight newer data more than old, forcing the model do adapt to new behavior more quickly, without ignoring valuable old data.

The code for comparing with and without class_weights and sample_weights, while holding the model and everything else static.

import tensorflow as tf
import numpy as np

data_size = 100

x_train = np.random.rand(data_size ,input_size)
y_train= np.random.randint(0,classes,data_size )
#sample_weight_train = np.random.rand(data_size)
x_val = np.random.rand(data_size ,input_size)
y_val= np.random.randint(0,classes,data_size )
#sample_weight_val = np.random.rand(data_size )

inputs = tf.keras.layers.Input(shape=(input_size))
pred=tf.keras.layers.Dense(classes, activation='softmax')(inputs)

model = tf.keras.models.Model(inputs=inputs, outputs=pred)

loss = tf.keras.losses.sparse_categorical_crossentropy
metrics = tf.keras.metrics.sparse_categorical_accuracy

model.compile(loss=loss , metrics=[metrics], optimizer='adam')

# Make model static, so we can compare it between different scenarios
for layer in model.layers:
    layer.trainable = False

# base model no weights (same result as without class_weights)
# model.fit(x=x_train,y=y_train, validation_data=(x_val,y_val))
model.fit(x=x_train,y=y_train, class_weight=class_weights, validation_data=(x_val,y_val))
# which outputs:
> loss: 1.1882 - sparse_categorical_accuracy: 0.3300 - val_loss: 1.1965 - val_sparse_categorical_accuracy: 0.3100

#changing the class weights to zero, to check which loss and metric that is affected
model.fit(x=x_train,y=y_train, class_weight=class_weights, validation_data=(x_val,y_val))
# which outputs:
> loss: 0.0000e+00 - sparse_categorical_accuracy: 0.3300 - val_loss: 1.1945 - val_sparse_categorical_accuracy: 0.3100

#changing the sample_weights to zero, to check which loss and metric that is affected
sample_weight_train = np.zeros(100)
sample_weight_val = np.zeros(100)
model.fit(x=x_train,y=y_train,sample_weight=sample_weight_train, validation_data=(x_val,y_val,sample_weight_val))
# which outputs:
> loss: 0.0000e+00 - sparse_categorical_accuracy: 0.3300 - val_loss: 1.1931 - val_sparse_categorical_accuracy: 0.3100

There are some small deviations between using weights and not (even when all weights are one), possible due to fit using different backend functions for weighted and unweighted data or due to rounding error?

Leave a Reply

Your email address will not be published. Required fields are marked *