Welcome to the Data Systems Group
The Data Systems Group at the University of 蓝莓视频's 聽builds innovative, high-impact platforms, systems, and applications for processing, managing, analyzing, and searching the vast collections of data that are integral to modern information societies 鈥 colloquially known as 鈥渂ig data鈥 technologies.
Our capabilities span the full spectrum from unstructured text collections to relational data, and everything in between including semi-structured sources such as time series, log data, graphs, and other data types. We work at multiple layers in the software stack, ranging from storage management and execution platforms to user-facing applications and studies of user behaviour.
Our research tackles all phases of the information lifecycle, from ingest and cleaning to inference and decision support.
News
Professor Xiao Hu and her colleagues receive 2025 SIGMOD Research Highlight Award
Professor Xiao Hu and her collaborators Binyang Dai and Ke Yi from the Hong Kong University of Science and Technology have received a 2025 SIGMOD Research Highlight Award for their paper, Reservoir Sampling over Joins.
Professor Xiao Hu and her colleagues win Distinguished Paper Award at PODS 2025
Professor Xiao Hu, and her collaborators have received a Distinguished Paper Award at the 2025 ACM SIGMOD/PODS International Conference on Management of Data.聽
Their paper, Fast Matrix Multiplication Meets the Submodular Width, introduces a new and unified framework for determining how efficiently any Boolean conjunctive query can be answered using fast matrix multiplication techniques.
Professor Xiao Hu wins Best Paper Award at the ACM Symposium on Principles of Database Systems
Professor Xiao Hu has received a Best Paper Award at the 2025 ACM SIGMOD/PODS International Conference on Management of Data for her research on optimizing join-aggregate queries.
Her paper, Output-Optimal Algorithms for Join-Aggregate Queries, addresses a long-standing open problem in database theory, establishing output-optimal bounds on the efficiency with which such queries can be processed.