Quantizing Ai Models for Inference

Overview

This presentation offers a concise and accessible introduction to the principles of quantization, a technique used to optimize computational efficiency. It includes an overview of a basic, straightforward implementation of Singular Value Decomposition (SVD) quantization applied to the Stable Diffusion XL (SDXL) model. The approach demonstrates a practical method to significantly reduce GPU VRAM usage, dropping from 6.5 GB to 3.5 GB with minimal code. Designed for professionals and enthusiasts alike, this talk highlights the potential for resource optimization in machine learning workflows.

Links

https://github.com/rishabh063/intro-to-Quant/blob/main/into_to_quan...
Jupyter notebook demonstrates quantitative finance concepts using Python for analysis.

Tech stack