Why same notebook allocating large different vram in two different environment?

you can see this notebook is trainable in kaggle using kaggle’s 16gb vram limit : https://www.kaggle.com/firefliesqn/g2net-gpu-newbie

i just tried to run this same notebook locally on rtx3090 gpu where i have torch 1.8 installed and same notebook allocating around 23.3 gb vram,why is this happening and how can i optimize my local environment like kaggle? even if i reduce batch size compared to what is used in kaggle,still locally my notebook allocates around 23gb vram

in kggle i see torch 1.7,tensorflow 2.4 installed and locally as i use rtx3090 so new version of torch and tf is recommended,hence i used torch 1.8.1 and tensorflow 2.6

Answer

By default, TensorFlow allocates max available memory detected.

When using TensorFlow, one can limit the memory used by the following snippet:

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 12GB of memory on the first GPU
  try:
    tf.config.experimental.set_virtual_device_configuration(gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=12288)],

where 12228 = 1024x12

Another solution (see discussion below) is to use (works for OP) (use this only if you do not have a specific upper limit of memory to be used) :

 tf.config.experimental.set_memory_growth(physical_devices[0], True)

https://www.tensorflow.org/api_docs/python/tf/config/experimental/set_memory_growth

In PyTorch, this is even easier:

import torch

# use 1/2 memory of the GPU 0 (should allocate very similar amount like TF)
torch.cuda.set_per_process_memory_fraction(0.5, 0)

#Can then check with
total_memory_available = torch.cuda.get_device_properties(0).total_memory