Seminar /computer-science/ en PhD Seminar • Data Systems • Semantic Table Discovery in Model Lakes: A Benchmark /computer-science/events/phd-seminar-data-systems-semantic-table-discovery-in-model-lakes-benchmark <span class="field field--name-title field--type-string field--label-hidden">PhD Seminar • Data Systems • Semantic Table Discovery in Model Lakes: A Benchmark</span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="/computer-science/users/jpetrik" typeof="schema:Person" property="schema:name" datatype="" xml:lang="">Joe Petrik</span></span> <span class="field field--name-created field--type-created field--label-hidden">Thu, 07/10/2025 - 18:07</span> <section class="uw-contained-width uw-section-spacing--default uw-section-separator--none uw-column-separator--none layout layout--uw-1-col"><div class="layout__region layout__region--first"> <div class="uw-text-align--left block block-layout-builder block-inline-blockuw-cbl-copy-text"> <div class="uw-copy-text"> <div class="uw-copy-text__wrapper "> <h2><span><span>Please note: This PhD seminar will take place in DC 3301.</span></span></h2> <p><span><span><strong>Zhengyuan Dong, PhD candidate</strong><br /><em>David R. Cheriton School of Computer Science</em></span></span></p> <p><span><span><strong>Supervisor</strong>: Professor Renée J. Miller</span></span></p> <p><span><span>Model Lakes are emerging large-scale repositories of machine learning artifacts. Although they greatly facilitate model sharing, discovery still relies on keyword or full-text search over textual metadata, which overlooks the rich, structured information — especially performance and configuration tables-embedded in model reports.</span></span></p> <p><span><span>In this work, we advance model discovery by leveraging table-discovery techniques within Model Lakes. We first formalize a novel ground-truth methodology for model relatedness, based on three complementary signals: explicit references in model cards, citation links among associated papers, and shared training datasets. We then build and publicly release a benchmark over 100 K Hugging Face models, extracting every table from model cards, GitHub READMEs, arXiv preprints, and BibTeX entries. Compared to standard data-lake tables, our tables are smaller but exhibit far denser inter-table relationships, reflecting the tight coupling of model evolution. To retrieve related models, we adapt a canonical Data Lake task, unionable table search, and compare against dense and sparse IR baselines.</span></span></p> <p><span><span>Our union-based semantic search achieves 54.8% P@1 overall (54.7% on paper-citation ground truth, 30.8% on model-card inheritance, 30.2% on shared-dataset signals), while simple metadata retrieval peaks at 36.8% P@1. Denser citation-graph edges boost precision to 74.8%, and a header-value concatenation augmentation raises overall P@1 to 60.3%. To our knowledge, this is the first empirical study applying Data Lake management principles to Model Discovery using large-scale real-world machine learning artifacts. By demonstrating that structured table information uncovers deep model relationships, we lay the groundwork for more accurate retrieval, systematic comparison, and seamless integration of models within Model Lakes.</span></span></p> </div> </div> </div> </div> </section> Thu, 10 Jul 2025 22:07:57 +0000 Joe Petrik 3990 at /computer-science Seminar • Algorithms and Complexity • Closure Results for Polynomial Factorization and Some Applications /computer-science/events/seminar-algorithms-and-complexity-closure-results-for-polynomial-factorization-and-some-applications <span class="field field--name-title field--type-string field--label-hidden">Seminar • Algorithms and Complexity • Closure Results for Polynomial Factorization and Some Applications</span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="/computer-science/users/jpetrik" typeof="schema:Person" property="schema:name" datatype="" xml:lang="">Joe Petrik</span></span> <span class="field field--name-created field--type-created field--label-hidden">Thu, 07/10/2025 - 14:59</span> <section class="uw-contained-width uw-section-spacing--default uw-section-separator--none uw-column-separator--none layout layout--uw-1-col"><div class="layout__region layout__region--first"> <div class="uw-text-align--left block block-layout-builder block-inline-blockuw-cbl-copy-text"> <div class="uw-copy-text"> <div class="uw-copy-text__wrapper "> <h2><span><span>Please note: This seminar will take place in DC 1304 and online.</span></span></h2> <p><span><span><strong>Shubhangi Saraf, Associate Professor</strong><br /><em>Departments of Mathematics and Computer Science, University of Toronto</em></span></span></p> <p><span><span>I will talk about a recent result showing that algebraic formulas and constant-depth circuits are closed under taking factors. In other words, the complexity of factors of polynomials computable by algebraic formulas or constant depth algebraic circuits is not much more than the complexity of the original polynomial itself.</span></span></p> <p><span><span>This result turns out to be an elementary consequence of a fundamental and surprising result of Furstenberg from the 1960s, which gives a non-iterative description of the power series roots of a bivariate polynomial. Combined with standard structural ideas in algebraic complexity, we observe that this theorem yields the desired closure results. We will see applications of this result to deterministic algorithms for factoring, hardness/randomness tradeoffs, as well as GCD computation of polynomials.</span></span></p> <p><em><span><span>This talk is based on joint works with Somnath Bhattacharjee, Mrinal Kumar, Shanthanu Rai, Varun Ramanathan and Ramprasad Saptharishi.</span></span></em></p> <hr /><p><span><span>To attend this seminar in person, please go to DC 1304. You can also <a href="https://uwaterloo.zoom.us/j/93695940795">attend virtually on Zoom</a>.</span></span></p> </div> </div> </div> </div> </section> Thu, 10 Jul 2025 18:59:16 +0000 Joe Petrik 3989 at /computer-science PhD Seminar • Machine Learning | Information Retrieval • Modern IR Evaluation in the Retrieval Augmented Generation (RAG) Era /computer-science/events/phd-seminar-ml-ir-modern-ir-evaluation-in-the-retrieval-augmented-generation-era <span class="field field--name-title field--type-string field--label-hidden">PhD Seminar • Machine Learning | Information Retrieval • Modern IR Evaluation in the Retrieval Augmented Generation (RAG) Era </span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="/computer-science/users/jpetrik" typeof="schema:Person" property="schema:name" datatype="" xml:lang="">Joe Petrik</span></span> <span class="field field--name-created field--type-created field--label-hidden">Tue, 07/08/2025 - 13:42</span> <section class="uw-contained-width uw-section-spacing--default uw-section-separator--none uw-column-separator--none layout layout--uw-1-col"><div class="layout__region layout__region--first"> <div class="uw-text-align--left block block-layout-builder block-inline-blockuw-cbl-copy-text"> <div class="uw-copy-text"> <div class="uw-copy-text__wrapper "> <h2><span><span>Please note: This PhD seminar will take place in DC 3301.</span></span></h2> <p><span><span><strong>Nandan Thakur, PhD candidate</strong></span></span><br /><em><span><span>David R. Cheriton School of Computer Science</span></span></em></p> <p><span><span><strong>Supervisor</strong>: Professor Jimmy Lin</span></span></p> <p><span><span>Traditional IR evaluation (e.g., TREC, Cranfield paradigm) constructs test collections that use fixed corpora and pool relevance judgments, a practice that minimally captures the challenges of RAG applications. This talk starts by mentioning limitations in prevalent IR benchmarks, comprising either stale data, incomplete labels, or simplistic queries. In particular, we motivate why retrieval evaluation must evolve and why a metric shift is needed for IR evaluation in modern-day systems as an emerging requirement. We survey FreshStack, a holistic benchmark that addresses these gaps, by constructing test collections with recent StackOverflow Q&As and GitHub documents to reflect real-world programming questions, providing insight on the diversity-focused metrics used in IR evaluation. The goal is to give practitioners insights into the limits of traditional IR evaluation and guide them toward more realistic, robust evaluation practice of IR systems in the modern-day RAG applications.</span></span></p> </div> </div> </div> </div> </section> Tue, 08 Jul 2025 17:42:14 +0000 Joe Petrik 3986 at /computer-science Seminar • Algorithms and Complexity • Synthesis and Arithmetic of Quantum Circuits /computer-science/events/seminar-algorithms-and-complexity-synthesis-and-arithmetic-of-quantum-circuits <span class="field field--name-title field--type-string field--label-hidden">Seminar • Algorithms and Complexity • Synthesis and Arithmetic of Quantum Circuits </span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="/computer-science/users/jpetrik" typeof="schema:Person" property="schema:name" datatype="" xml:lang="">Joe Petrik</span></span> <span class="field field--name-created field--type-created field--label-hidden">Fri, 06/27/2025 - 13:19</span> <section class="uw-contained-width uw-section-spacing--default uw-section-separator--none uw-column-separator--none layout layout--uw-1-col"><div class="layout__region layout__region--first"> <div class="uw-text-align--left block block-layout-builder block-inline-blockuw-cbl-copy-text"> <div class="uw-copy-text"> <div class="uw-copy-text__wrapper "> <h2><span><span>Please note: This seminar will take place on DC 1304 and online.</span></span></h2> <p><span><span><strong>Amolak Ratan Kalra, PhD candidate</strong><br /><em>Institute for Quantum Computing, University of ݮƵ</em></span></span></p> <p><span><span>Efficient decomposition of a unitary operator U using words from a universal gate set G is a fundamental problem in quantum computing. The process by which this is achieved is called circuit synthesis. This problem arises naturally in the context of quantum circuit compilation.</span></span></p> <p><span><span>In this talk, I will introduce this problem and explain how one can use tools from number theory to solve it. I will then explain some of our more recent results that build on this connection.</span></span></p> <hr /><p><span><span>To attend this seminar in person, please go to DC 1304. You can also <a href="https://uwaterloo.zoom.us/j/92702227809">attend virtually on Zoom</a>.</span></span></p> </div> </div> </div> </div> </section> Fri, 27 Jun 2025 17:19:50 +0000 Joe Petrik 3980 at /computer-science PhD Seminar • Systems and Networking • Dynamic SLA-aware Network Slice Monitoring with Programmable Data Planes /computer-science/events/phd-seminar-syn-dynamic-sla-aware-network-slice-monitoring-with-programmable-data-planes <span class="field field--name-title field--type-string field--label-hidden">PhD Seminar • Systems and Networking • Dynamic SLA-aware Network Slice Monitoring with Programmable Data Planes</span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="/computer-science/users/jpetrik" typeof="schema:Person" property="schema:name" datatype="" xml:lang="">Joe Petrik</span></span> <span class="field field--name-created field--type-created field--label-hidden">Mon, 06/23/2025 - 21:06</span> <section class="uw-contained-width uw-section-spacing--default uw-section-separator--none uw-column-separator--none layout layout--uw-1-col"><div class="layout__region layout__region--first"> <div class="uw-text-align--left block block-layout-builder block-inline-blockuw-cbl-copy-text"> <div class="uw-copy-text"> <div class="uw-copy-text__wrapper "> <h2><span><span>Please note: This PhD seminar will take place in DC 1304.</span></span></h2> <p><span><span><strong>Niloy Saha, PhD candidate</strong><br /><em>David R. Cheriton School of Computer Science</em></span></span></p> <p><span><span><strong>Supervisor</strong>: Professor Raouf Boutaba</span></span></p> <p><span><span>Next generation networks increasingly rely on network slices — logical networks tailored to specific application requirements, each with distinct Service-Level Agreements (SLAs). Ensuring compliance with these SLAs requires continuous, real-time monitoring of end-to-end performance metrics for each slice, within a limited telemetry budget. However, existing monitoring solutions based on sketches or probabilistic sampling lack end-to-end visibility and treat all traffic uniformly. This leads to inaccurate monitoring of critical slices in order to stay within budget. We present SliceScope, a slice-aware telemetry system that dynamically allocates monitoring resources across a diverse set of slices, based on SLA criticality and evolving network conditions.</span></span></p> <p><span><span>SliceScope combines: (1) a data-plane primitive that enables per-packet end-to-end visibility for each slice with tunable accuracy-overhead trade-off, and (2) a control strategy that adjusts this trade-off per-slice to allocate limited telemetry budget where it matters most. Our evaluation results, conducted on a testbed with programmable switches and in large-scale simulations with a mixture of different slice types, demonstrate that SliceScope provides real-time, fine-grained monitoring of per-slice metrics, and tracks critical slices up to 4× more accurately compared to static or SLA-agnostic baselines.</span></span></p> </div> </div> </div> </div> </section> Tue, 24 Jun 2025 01:06:33 +0000 Joe Petrik 3974 at /computer-science PhD Seminar • Machine Learning | Deep Learning • Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization /computer-science/events/phd-seminar-ml-dl-continuation-kd-improved-knowledge-distillation-through-the-lens-of-continuation-optimization <span class="field field--name-title field--type-string field--label-hidden">PhD Seminar • Machine Learning | Deep Learning • Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization</span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="/computer-science/users/jpetrik" typeof="schema:Person" property="schema:name" datatype="" xml:lang="">Joe Petrik</span></span> <span class="field field--name-created field--type-created field--label-hidden">Thu, 06/12/2025 - 21:19</span> <section class="uw-contained-width uw-section-spacing--default uw-section-separator--none uw-column-separator--none layout layout--uw-1-col"><div class="layout__region layout__region--first"> <div class="uw-text-align--left block block-layout-builder block-inline-blockuw-cbl-copy-text"> <div class="uw-copy-text"> <div class="uw-copy-text__wrapper "> <h2><span><span>Please note: This PhD seminar will take place online.</span></span></h2> <p><span><span><strong>Aref Jafari, PhD candidate</strong><br /><em>David R. Cheriton School of Computer Science</em></span></span></p> <p><span><span><strong>Supervisor</strong>: Professor Ali Ghodsi</span></span></p> <p><span><span>Knowledge Distillation (KD) has been extensively used for natural language understanding (NLU) tasks to improve a small model’s (a student) generalization by transferring the knowledge from a larger model (a teacher). Although KD methods achieve state-of-the-art performance in numerous settings, they suffer from several problems limiting their performance. It is shown in the literature that the capacity gap between the teacher and the student networks can make KD ineffective. Additionally, existing KD techniques do not mitigate the noise in the teacher’s output: modeling the noisy behaviour of the teacher can distract the student from learning more useful features.</span></span></p> <p><span><span>We propose a new KD method that addresses these problems and facilitates the training compared to previous techniques. Inspired by continuation optimization, we design a training procedure that optimizes the highly non-convex KD objective by starting with the smoothed version of this objective and making it more complex as the training proceeds. Our method (Continuation-KD) achieves state-of-the-art performance across various compact architectures on NLU (GLUE benchmark) and computer vision tasks (CIFAR-10 and CIFAR- 100).</span></span></p> <hr /><p><span><span><a href="https://teams.microsoft.com/l/meetup-join/19%3ameeting_MTVkY2NmZmItYWJiNS00YjNlLThmNTEtMWJmMzQ5ZDg4ZTFj%40thread.v2/0?context=%7b%22Tid%22%3a%22723a5a87-f39a-4a22-9247-3fc240c01396%22%2c%22Oid%22%3a%2264f62c73-88b9-4c19-8211-4276afd5e4ee%22%7d ">Attend this PhD seminar virtually on MS Teams</a>.</span></span></p> </div> </div> </div> </div> </section> Fri, 13 Jun 2025 01:19:23 +0000 Joe Petrik 3963 at /computer-science PhD Seminar • Artificial Intelligence • Towards Cost-Effective Reward Guided Text Generation /computer-science/events/phd-seminar-artificial-intelligence-towards-cost-effective-reward-guided-text-generation <span class="field field--name-title field--type-string field--label-hidden">PhD Seminar • Artificial Intelligence • Towards Cost-Effective Reward Guided Text Generation</span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="/computer-science/users/jpetrik" typeof="schema:Person" property="schema:name" datatype="" xml:lang="">Joe Petrik</span></span> <span class="field field--name-created field--type-created field--label-hidden">Tue, 06/10/2025 - 10:51</span> <section class="uw-contained-width uw-section-spacing--default uw-section-separator--none uw-column-separator--none layout layout--uw-1-col"><div class="layout__region layout__region--first"> <div class="uw-text-align--left block block-layout-builder block-inline-blockuw-cbl-copy-text"> <div class="uw-copy-text"> <div class="uw-copy-text__wrapper "> <h2><span><span>Please note: This PhD seminar will take place in DC 2584 and online.</span></span></h2> <p><span><span><strong>Ahmad Rashid, PhD candidate</strong><br /><em>David R. Cheriton School of Computer Science</em></span></span></p> <p><span><span><strong>Supervisor</strong>: Professor Pascal Poupart</span></span></p> <p><span><span>Reward-guided text generation (RGTG) has emerged as a viable alternative to offline reinforcement learning from human feedback (RLHF). RGTG methods can align baseline language models to human preferences without further training as in standard RLHF methods. However, they rely on a reward model to score each candidate token generated by the language model at inference, incurring significant test-time overhead. Additionally, the reward model is usually only trained to score full sequences, which can lead to sub-optimal choices for partial sequences.</span></span></p> <p><span><span>In this work, we present a novel reward model architecture that is trained, using a Bradley-Terry loss, to prefer the optimal expansion of a sequence with just a single call to the reward model at each step of the generation process. That is, a score for all possible candidate tokens is generated simultaneously, leading to efficient inference. We theoretically analyze various RGTG reward models and demonstrate that prior techniques prefer sub-optimal sequences compared to our method during inference. Empirically, our reward model leads to significantly faster inference than other RGTG methods. It requires fewer calls to the reward model and performs competitively compared to previous RGTG and offline RLHF methods.</span></span></p> <hr /><p><span><span>To attend this PhD seminar in person, please go to DC 2584. You can also <a href="https://vectorinstitute.zoom.us/j/88617927392">attend virtually on Zoom</a>.</span></span></p> </div> </div> </div> </div> </section> Tue, 10 Jun 2025 14:51:54 +0000 Joe Petrik 3960 at /computer-science PhD Seminar • Algorithms and Complexity • An Improved Fully Dynamic Algorithm for Counting 4-Cycles in General Graphs /computer-science/events/phd-seminar-algorithms-and-complexity-improved-fully-dynamic-algorithm-for-counting-4-cycles-in-general-graphs <span class="field field--name-title field--type-string field--label-hidden">PhD Seminar • Algorithms and Complexity • An Improved Fully Dynamic Algorithm for Counting 4-Cycles in General Graphs</span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="/computer-science/users/jpetrik" typeof="schema:Person" property="schema:name" datatype="" xml:lang="">Joe Petrik</span></span> <span class="field field--name-created field--type-created field--label-hidden">Fri, 06/06/2025 - 14:22</span> <section class="uw-contained-width uw-section-spacing--default uw-section-separator--none uw-column-separator--none layout layout--uw-1-col"><div class="layout__region layout__region--first"> <div class="uw-text-align--left block block-layout-builder block-inline-blockuw-cbl-copy-text"> <div class="uw-copy-text"> <div class="uw-copy-text__wrapper "> <h2><span><span>Please note: This PhD seminar will take place in DC 1304 and online.</span></span></h2> <p><span><span><strong>Vihan Shah, PhD candidate</strong><br /><em>David R. Cheriton School of Computer Science</em></span></span></p> <p><span><span><strong>Supervisor</strong>: Professor Sepehr Assadi</span></span></p> <p><span><span>We study subgraph counting over fully dynamic graphs, which undergo edge insertions and deletions. Maintaining the number of triangles in fully dynamic graphs is very well studied and has an upper bound of O(m^{1/2}) for the update time [KNN+20]. There is also a conditional lower bound of approximately Omega(m^{1/2}) for the update time [HKNS15] under the OMv conjecture implying that Theta(m^{1/2}) is the “right answer” for the update time of counting triangles. More recently, [HHH22] studied the problem of maintaining the number of 4-cycles in fully dynamic graphs and designed an algorithm with O(m^{2/3}) update time which is a natural generalization of the approach for counting triangles. Thus, it seems natural that O(m^{2/3}) might be the correct answer for the complexity of the update time for counting 4-cycles.</span></span></p> <p><span><span>In this work, we present an improved algorithm for maintaining the number of 4-cycles in fully dynamic graphs. Our algorithm achieves a worst-case update time of O(m^{2/3-eps}) for some constant eps>0. Our approach crucially uses fast matrix multiplication and leverages recent developments therein to get an improved runtime. Using the current best value of the matrix multiplication exponent omega=2.371339 we get eps=0.009811 and if we assume the best possible exponent i.e., omega=2 then we get eps=1/24. The lower bound for the update time is Omega(m^{1/2}), so there is still a big gap between the best-known upper and lower bounds. The key message of our paper is demonstrating that O(m^{2/3}) is not the correct answer for the complexity of the update time.</span></span></p> <hr /><p><span><span>To attend this PhD seminar in person, please go to DC 1304. You can also <a href="https://uwaterloo.zoom.us/j/99646595174">attend virtually on Zoom</a>.</span></span></p> </div> </div> </div> </div> </section> Fri, 06 Jun 2025 18:22:12 +0000 Joe Petrik 3957 at /computer-science Seminar • Algorithms and Complexity • Linear Hashing Is Optimal /computer-science/events/seminar-algorithms-and-complexity-linear-hashing-is-optimal <span class="field field--name-title field--type-string field--label-hidden">Seminar • Algorithms and Complexity • Linear Hashing Is Optimal</span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="/computer-science/users/jpetrik" typeof="schema:Person" property="schema:name" datatype="" xml:lang="">Joe Petrik</span></span> <span class="field field--name-created field--type-created field--label-hidden">Fri, 06/06/2025 - 09:14</span> <section class="uw-contained-width uw-section-spacing--default uw-section-separator--none uw-column-separator--none layout layout--uw-1-col"><div class="layout__region layout__region--first"> <div class="uw-text-align--left block block-layout-builder block-inline-blockuw-cbl-copy-text"> <div class="uw-copy-text"> <div class="uw-copy-text__wrapper "> <h2><span><span>Please note: This seminar will take place in DC 1304 and online.</span></span></h2> <p><span><span><strong>Vinayak Kumar, PhD student</strong><br /><em>Computer Science Theory Group, UT Austin</em></span></span></p> <p><span><span>When n balls are independently and uniformly tossed into n bins, the expected max-load — the number of balls in the heaviest bin — is O(logn/loglogn). This classical result plays a central role in the analysis of hashing with chaining and load balancing. However, implementing a truly random hash function is often impractical due to its high computational and storage costs.</span></span></p> <p><span><span>In this talk, I will present a recent result showing that hashing n balls into n bins via a random matrix over F2 achieves the same expected max-load of O(logn/loglogn). This simple and efficient hash family matches the performance of a fully random function and resolves an open question posed by Alon, Dietzfelbinger, Miltersen, Petrank, and Tardos.</span></span></p> <p><em><span><span>Based on joint work with Michael Jaber and David Zuckerman.</span></span></em></p> <hr /><p><span><span>To attend this seminar in person, please go to DC 1304. You can also <a href="https://uwaterloo.zoom.us/j/99906110877">attend virtually on Zoom</a>.</span></span></p> </div> </div> </div> </div> </section> Fri, 06 Jun 2025 13:14:14 +0000 Joe Petrik 3956 at /computer-science PhD Seminar • Data Systems • UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction /computer-science/events/phd-seminar-data-systems-ui-vision-desktop-centric-gui-benchmark-for-visual-perception-and-interaction <span class="field field--name-title field--type-string field--label-hidden">PhD Seminar • Data Systems • UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction</span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="/computer-science/users/jpetrik" typeof="schema:Person" property="schema:name" datatype="" xml:lang="">Joe Petrik</span></span> <span class="field field--name-created field--type-created field--label-hidden">Tue, 06/03/2025 - 10:33</span> <section class="uw-contained-width uw-section-spacing--default uw-section-separator--none uw-column-separator--none layout layout--uw-1-col"><div class="layout__region layout__region--first"> <div class="uw-text-align--left block block-layout-builder block-inline-blockuw-cbl-copy-text"> <div class="uw-copy-text"> <div class="uw-copy-text__wrapper "> <h2><span><span>Please note: This PhD seminar will take place in DC 3301.</span></span></h2> <p><span><span><strong>Xiangru Jian, PhD candidate</strong><br /><em>David R. Cheriton School of Computer Science</em></span></span></p> <p><span><span><strong>Supervisor</strong>: Professor Tamer Özsu</span></span></p> <p><span><span>Autonomous agents that navigate Graphical User Interfaces (GUIs) to automate tasks like document editing and file management can greatly enhance computer workflows. While existing research focuses on online settings, desktop environments, critical for many professional and everyday tasks, remain underexplored due to data collection challenges and licensing issues.</span></span></p> <p><span><span>We introduce UI-Vision, the first comprehensive, license-permissive benchmark for offline, fine-grained evaluation of computer use agents in real-world desktop environments. Unlike online benchmarks, UI-Vision provides: (i) dense, high-quality annotations of human demonstrations, including bounding boxes, UI labels, and action trajectories (clicks, drags, and keyboard inputs) across 83 software applications, and (ii) three fine-to-coarse grained tasks — Element Grounding, Layout Grounding, and Action Prediction — with well-defined metrics to rigorously evaluate agents’ performance in desktop environments. Our evaluation reveals critical limitations in state-of-the-art models like UI-TARS-72B, including issues with understanding professional software, spatial reasoning, and complex actions like drag-and-drop. These findings highlight the challenges in developing fully autonomous computer-use agents. With UI-Vision, we aim to advance the development of more capable agents for real-world desktop tasks.</span></span></p> </div> </div> </div> </div> </section> Tue, 03 Jun 2025 14:33:16 +0000 Joe Petrik 3955 at /computer-science