Some of our current projects
Ìýaims
to
make
petabytes
of
historical
Internet
content
accessible
to
scholars
and
others
interested
in
researching
the
recent
past.
We
are
developing
web
archive
search
and
data
analysis
tools
to
enable
scholars,
librarians
and
archivists
to
access,
share,
and
investigate
recent
history
since
the
early
days
of
the
World
Wide
Web.
Ìý
Ìý
Ìý
Ìý
Ìýis
an
in-memory
graph
database
we
are
building
from
scratch
for
evaluating
both
one-time
and
continuous
queries.
We
study
topics
on
fundamental
components
of
graph
databases
such
as
storage,
query
optimization,
query
processing,
and
triggers,
building
each
component
from
scratch.
Ìý
Ìýis
an
RDF
graph
database
system
that
employs
a
native
graph
representation.
gStore
employs
the
subgraph
matching-based
query
strategy
as
well
as
a
series
of
query
optimization
techniques
and
structure-aware
index
to
build
an
efficient
graph-native
SPARQL
query
engine.
It
supports
SPARQL
1.1,
the
standard
RDF
query
language.
It
can
be
deployed
on
a
single
machine
or
in
a
scale-out
setting.
Ìý
â„¢
is
anÌýopen-source
project
that
facilitates
the
efficient
identification
of
allÌýor
nearly
all
relevant
documents
in
a
corpus.ÌýHi-CALâ„¢
allows
users
to
judge
documents
as
fast
as
possible
with
no
perceptible
interface
lag.
Ìý
is
a
statistical
inference
engine
to
impute,
clean,
and
enrich
data.
As
a
weakly
supervised
machine
learning
system,
HoloClean
leverages
available
quality
rules,
value
correlations,
reference
data,
and
other
signals
to
build
a
probabilistic
model
that
captures
the
data
generation
process,
and
uses
the
model
in
a
variety
of
data
curation
tasks.
Ìý
Ìýis an in-process property graph database management system (GDBMS) built for graph data science workloads. °Ã¹³ú³Ü is optimized for query speed and scalability, so aims to be competent on complex join-heavy analytical workloads on very large graph databases. We are building °Ã¹³ú³Ü as a feature-rich usable GDBMS under a permissible license. In our research, we design, implement, and do research on each component of the system.
Ìýis a Streaming Graph Management System that addresses the processing of OLTP and OLAP queries on high streaming rate, very large graphs. These graphs are increasingly being deployed to capture relationships between entities (e.g., customers and catalog items in an online retail environment) both for transactional processing and for analytics (e.g., recommendation systems).
ÌýDistributed ledgers such as blockchains are used to store transactions in a secure and verifiable manner without the need for a trusted third party. ÌýIn the Sirius project, we are working on technologies to make blockchains more scalable and we are investigating novel applications of high-velocity blockchains such as transactive energy and clean transportation
is a benchmark designed to measure how an RDF data management system performs across a wide spectrum of SPARQL queries with varying structural characteristics and selectivity classes. It is a micro benchmark to stress test the performance of systems across a wide variety of queries over varying sizes of data sets.