Please note: This PhD seminar will take place in DC 1304.
Desen Sun, PhD candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Sihang Liu
Text-to-video diffusion models garnered significant attention from both academia and industry. However, video generation remains computationally expensive, often requiring several minutes per video. While previous research on text-to-image diffusion models has introduced approximate caching mechanisms that leverage intermediate results from similar prompts to reduce computational overhead, these approaches are inefficient for text-to-video diffusion models.
We present CachedVid, an approximate caching framework for efficient text-to-video diffusion models. First, we introduce a cache compression mechanism that significantly reduces the cache size by an average of 4.1 – 6.7×. Second, we identify opportunities for decoupling the object and background in the video to perform decoupled cache look-up to improve not only the cache hit rate but also the similarity between the cache and the prompt to save more computation.