Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Quantizing SDXL for Inference
This talk explains quantization principles and demonstrates SVD quantization on Stable Diffusion XL, showing how to reduce GPU VRAM usage effectively for inference.
This presentation offers a concise and accessible introduction to the principles of quantization, a technique used to optimize computational efficiency. It includes an overview of a basic, straightforward implementation of Singular Value Decomposition (SVD) quantization applied to the Stable Diffusion XL (SDXL) model. The approach demonstrates a practical method to significantly reduce GPU VRAM usage, dropping from 6.5 GB to 3.5 GB with minimal code. Designed for professionals and enthusiasts alike, this talk highlights the potential for resource optimization in machine learning workflows.
Jupyter notebook demonstrates quantitative finance concepts using Python for analysis.