Source author record

Jennifer Lee

Jennifer Lee appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Artificial Intelligence Databases Information Retrieval Machine Learning

Catalog footprint

What is connected

3works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

OpenZL: Using Graphs to Compress Smaller and Faster

In the last few decades, research techniques have improved lossless compression ratios by significantly increasing processing time. However, these techniques have not gained popularity in industry because production systems require high throughput and low resource utilization. Instead, real world improvements in compression are increasingly realized by building application-specific compressors which can exploit knowledge about the structure and semantics of the data being compressed. Application-specific compressor systems outperform even the best generic compressors, but these techniques have severe drawbacks -- they are inherently limited in applicability, are hard to develop, and are difficult to maintain and deploy. In this work, we show that these challenges can be overcome with a new compression strategy. We propose the "graph model" of compression, a new theoretical framework for representing compression as a directed acyclic graph of modular codecs. OpenZL implements this framework and compresses data into a self-describing wire format, any configuration of which can be decompressed by a universal decoder. OpenZL's design enables rapid development of application-specific compressors with minimal code. Experimental results demonstrate that OpenZL achieves superior compression ratios and speeds compared to state-of-the-art general-purpose compressors on a variety of real-world datasets. Compared to ratio-focused deep-learning compressors, OpenZL is competitive on ratio while being many orders of magnitude faster. Internal deployments at Meta have also shown consistent improvements in size and/or speed, with development timelines reduced from months to days. OpenZL thus represents a significant advance in practical, scalable, and maintainable data compression for modern data-intensive applications.

preprint2021arXiv

Conversational Neuro-Symbolic Commonsense Reasoning

In order for conversational AI systems to hold more natural and broad-ranging conversations, they will require much more commonsense, including the ability to identify unstated presumptions of their conversational partners. For example, in the command "If it snows at night then wake me up early because I don't want to be late for work" the speaker relies on commonsense reasoning of the listener to infer the implicit presumption that they wish to be woken only if it snows enough to cause traffic slowdowns. We consider here the problem of understanding such imprecisely stated natural language commands given in the form of "if-(state), then-(action), because-(goal)" statements. More precisely, we consider the problem of identifying the unstated presumptions of the speaker that allow the requested action to achieve the desired goal from the given state (perhaps elaborated by making the implicit presumptions explicit). We release a benchmark data set for this task, collected from humans and annotated with commonsense presumptions. We present a neuro-symbolic theorem prover that extracts multi-hop reasoning chains, and apply it to this problem. Furthermore, to accommodate the reality that current AI commonsense systems lack full coverage, we also present an interactive conversational framework built on our neuro-symbolic system, that conversationally evokes commonsense knowledge from humans to complete its reasoning chains.

preprint2020arXiv

Rationalizing Medical Relation Prediction from Corpus-level Statistics

Nowadays, the interpretability of machine learning models is becoming increasingly important, especially in the medical domain. Aiming to shed some light on how to rationalize medical relation prediction, we present a new interpretable framework inspired by existing theories on how human memory works, e.g., theories of recall and recognition. Given the corpus-level statistics, i.e., a global co-occurrence graph of a clinical text corpus, to predict the relations between two entities, we first recall rich contexts associated with the target entities, and then recognize relational interactions between these contexts to form model rationales, which will contribute to the final prediction. We conduct experiments on a real-world public clinical dataset and show that our framework can not only achieve competitive predictive performance against a comprehensive list of neural baseline models, but also present rationales to justify its prediction. We further collaborate with medical experts deeply to verify the usefulness of our model rationales for clinical decision making.