PhD Seminar • Systems and Networking • CachedVid: Efficient Caching for Text-to-Video Diffusion Models

Friday, May 30, 2025 1:00 pm - 2:00 pm EDT (GMT -04:00)

Please note: This PhD seminar will take place in DC 1304.

Desen Sun, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Sihang Liu

Text-to-video diffusion models garnered significant attention from both academia and industry. However, video generation remains computationally expensive, often requiring several minutes per video. While previous research on text-to-image diffusion models has introduced approximate caching mechanisms that leverage intermediate results from similar prompts to reduce computational overhead, these approaches are inefficient for text-to-video diffusion models.

We present CachedVid, an approximate caching framework for efficient text-to-video diffusion models. First, we introduce a cache compression mechanism that significantly reduces the cache size by an average of 4.1 – 6.7×. Second, we identify opportunities for decoupling the object and background in the video to perform decoupled cache look-up to improve not only the cache hit rate but also the similarity between the cache and the prompt to save more computation.