PhD Seminar • Systems and Networking • CachedVid: Efficient Caching for Text-to-Video Diffusion Models | Cheriton School of Computer Science

Please note: This PhD seminar will take place in DC 1304.

Desen Sun, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Sihang Liu

Text-to-video diffusion models garnered significant attention from both academia and industry. However, video generation remains computationally expensive, often requiring several minutes per video. While previous research on text-to-image diffusion models has introduced approximate caching mechanisms that leverage intermediate results from similar prompts to reduce computational overhead, these approaches are inefficient for text-to-video diffusion models.

We present CachedVid, an approximate caching framework for efficient text-to-video diffusion models. First, we introduce a cache compression mechanism that significantly reduces the cache size by an average of 4.1 – 6.7×. Second, we identify opportunities for decoupling the object and background in the video to perform decoupled cache look-up to improve not only the cache hit rate but also the similarity between the cache and the prompt to save more computation.

Location Information

Location Address: DC - William G. Davis Computer Research Centre
200 University Avenue West
DC 1304
��ݮ��Ƶ, ON, CA N2L 3G1

Location coordinates: