With several advancements in Deep Learning, complex networks such as giant transformer networks, wider and deeper Resnets, etc. have evolved which keeps a larger memory footprint. More often than not, while training these networks, deep learning practitioners need to use multiple GPUs to train them efficiently. …