Researcher profile

Jason Li

Jason Li contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

Omni-Fake: Benchmarking Unified Multimodal Social Media Deepfake Detection

Multimodal deepfakes are proliferating on social media and threaten authenticity, information integrity, and digital forensics. Existing benchmarks are constrained by their single-modality scope, simplified manipulations, or unrealistic distributions, which limit their ability to assess real-world robustness. To address these limitations, we present Omni-Fake, a unified omni-dataset for comprehensive multimodal deepfake detection in social-media settings. It comprises Omni-Fake-Set, a large-scale, high-quality dataset with 1M+ samples, and Omni-Fake-OOD, an out-of-distribution benchmark with 200k+ samples intentionally excluded from training to evaluate generalization. Omni-Fake spans four modalities (image, audio, video, and audio-video talking head) and supports a joint detection-localization-explanation protocol. On top of Omni-Fake, we further propose Omni-Fake-R1, a reinforcement-learning-driven multimodal detector that adaptively integrates visual and auditory cues and outputs structured decisions, localization, and natural-language explanations. Extensive experiments show significant gains in detection accuracy, cross-modal generalization, and explainability over state-of-the-art baselines. Project page: https://tianxiao1201.github.io/omni-fake-project-page/

preprint2023arXiv

Near-Linear Time Approximations for Cut Problems via Fair Cuts

We introduce the notion of {\em fair cuts} as an approach to leverage approximate $(s,t)$-mincut (equivalently $(s,t)$-maxflow) algorithms in undirected graphs to obtain near-linear time approximation algorithms for several cut problems. Informally, for any $α\geq 1$, an $α$-fair $(s,t)$-cut is an $(s,t)$-cut such that there exists an $(s,t)$-flow that uses $1/α$ fraction of the capacity of \emph{every} edge in the cut. (So, any $α$-fair cut is also an $α$-approximate mincut, but not vice-versa.) We give an algorithm for $(1+ε)$-fair $(s,t)$-cut in $\tilde{O}(m)$-time, thereby matching the best runtime for $(1+ε)$-approximate $(s,t)$-mincut [Peng, SODA '16]. We then demonstrate the power of this approach by showing that this result almost immediately leads to several applications: - the first nearly-linear time $(1+ε)$-approximation algorithm that computes all-pairs maxflow values (by constructing an approximate Gomory-Hu tree). Prior to our work, such a result was not known even for the special case of Steiner mincut [Dinitz and Vainstein, STOC '94; Cole and Hariharan, STOC '03]; - the first almost-linear-work subpolynomial-depth parallel algorithms for computing $(1+ε)$-approximations for all-pairs maxflow values (again via an approximate Gomory-Hu tree) in unweighted graphs; - the first near-linear time expander decomposition algorithm that works even when the expansion parameter is polynomially small; this subsumes previous incomparable algorithms [Nanongkai and Saranurak, FOCS '17; Wulff-Nilsen, FOCS '17; Saranurak and Wang, SODA '19].

preprint2022arXiv

Adapting TTS models For New Speakers using Transfer Learning

Training neural text-to-speech (TTS) models for a new speaker typically requires several hours of high quality speech data. Prior works on voice cloning attempt to address this challenge by adapting pre-trained multi-speaker TTS models for a new voice, using a few minutes of speech data of the new speaker. However, publicly available large multi-speaker datasets are often noisy, thereby resulting in TTS models that are not suitable for use in products. We address this challenge by proposing transfer-learning guidelines for adapting high quality single-speaker TTS models for a new speaker, using only a few minutes of speech data. We conduct an extensive study using different amounts of data for a new speaker and evaluate the synthesized speech in terms of naturalness and voice/style similarity to the target speaker. We find that fine-tuning a single-speaker TTS model on just 30 minutes of data, can yield comparable performance to a model trained from scratch on more than 27 hours of data for both male and female target speakers.

preprint2022arXiv

Breaking the Cubic Barrier for All-Pairs Max-Flow: Gomory-Hu Tree in Nearly Quadratic Time

In 1961, Gomory and Hu showed that the All-Pairs Max-Flow problem of computing the max-flow between all $n\choose 2$ pairs of vertices in an undirected graph can be solved using only $n-1$ calls to any (single-pair) max-flow algorithm. Even assuming a linear-time max-flow algorithm, this yields a running time of $O(mn)$, which is $O(n^3)$ when $m = Θ(n^2)$. While subsequent work has improved this bound for various special graph classes, no subcubic-time algorithm has been obtained in the last 60 years for general graphs. We break this longstanding barrier by giving an $\tilde{O}(n^{2})$-time algorithm on general, weighted graphs. Combined with a popular complexity assumption, we establish a counter-intuitive separation: all-pairs max-flows are strictly easier to compute than all-pairs shortest-paths. Our algorithm produces a cut-equivalent tree, known as the Gomory-Hu tree, from which the max-flow value for any pair can be retrieved in near-constant time. For unweighted graphs, we refine our techniques further to produce a Gomory-Hu tree in the time of a poly-logarithmic number of calls to any max-flow algorithm. This shows an equivalence between the all-pairs and single-pair max-flow problems, and is optimal up to poly-logarithmic factors. Using the recently announced $m^{1+o(1)}$-time max-flow algorithm (Chen et al., March 2022), our Gomory-Hu tree algorithm for unweighted graphs also runs in $m^{1+o(1)}$-time.

preprint2022arXiv

Deterministic Min-cut in Poly-logarithmic Max-flows

We give a deterministic algorithm for finding the minimum (weight) cut of an undirected graph on $n$ vertices and $m$ edges using $\text{polylog}(n)$ calls to any maximum flow subroutine. Using the current best deterministic maximum flow algorithms, this yields an overall running time of $\tilde O(m \cdot \min(\sqrt{m}, n^{2/3}))$ for weighted graphs, and $m^{4/3+o(1)}$ for unweighted (multi)-graphs. This marks the first improvement for this problem since a running time bound of $\tilde O(mn)$ was established by several papers in the early 1990s. Our global minimum cut algorithm is obtained as a corollary of a minimum Steiner cut algorithm, where a minimum Steiner cut is a minimum (weight) set of edges whose removal disconnects at least one pair of vertices among a designated set of terminal vertices. The running time of our deterministic minimum Steiner cut algorithm matches that of the global minimum cut algorithm stated above. Using randomization, the running time improves to $m^{1+o(1)}$ because of a faster maximum flow subroutine; this improves the best known randomized algorithm for the minimum Steiner cut problem as well. Our main technical contribution is a new tool that we call *isolating cuts*. Given a set of vertices $R$, this entails finding cuts of minimum weight that separate (or isolate) each individual vertex $v\in R$ from the rest of the vertices $R\setminus \{v\}$. Naïvely, this can be done using $|R|$ maximum flow calls, but we show that just $O(\log |R|)$ suffice for finding isolating cuts for any set of vertices $R$. We call this the *isolating cut lemma*.

preprint2022arXiv

Faster Parallel Algorithm for Approximate Shortest Path

We present the first $m\,\text{polylog}(n)$ work, $\text{polylog}(n)$ time algorithm in the PRAM model that computes $(1+ε)$-approximate single-source shortest paths on weighted, undirected graphs. This improves upon the breakthrough result of Cohen~[JACM'00] that achieves $O(m^{1+ε_0})$ work and $\text{polylog}(n)$ time. While most previous approaches, including Cohen's, leveraged the power of hopsets, our algorithm builds upon the recent developments in \emph{continuous optimization}, studying the shortest path problem from the lens of the closely-related \emph{minimum transshipment} problem. To obtain our algorithm, we demonstrate a series of near-linear work, polylogarithmic-time reductions between the problems of approximate shortest path, approximate transshipment, and $\ell_1$-embeddings, and establish a recursive algorithm that cycles through the three problems and reduces the graph size on each cycle. As a consequence, we also obtain faster parallel algorithms for approximate transshipment and $\ell_1$-embeddings with polylogarithmic distortion. The minimum transshipment algorithm in particular improves upon the previous best $m^{1+o(1)}$ work sequential algorithm of Sherman~[SODA'17]. To improve readability, the paper is almost entirely self-contained, save for several staple theorems in algorithms and combinatorics.

preprint2022arXiv

On the Fixed-Parameter Tractability of Capacitated Clustering

We study the complexity of the classic capacitated k-median and k-means problems parameterized by the number of centers, k. These problems are notoriously difficult since the best known approximation bound for high dimensional Euclidean space and general metric space is $Θ(\log k)$ and it remains a major open problem whether a constant factor exists. We show that there exists a $(3+ε)$-approximation algorithm for the capacitated k-median and a $(9+ε)$-approximation algorithm for the capacitated k-means problem in general metric spaces whose running times are $f(ε,k) n^{O(1)}$. For Euclidean inputs of arbitrary dimension, we give a $(1+ε)$-approximation algorithm for both problems with a similar running time. This is a significant improvement over the $(7+ε)$-approximation of Adamczyk et al. for k-median in general metric spaces and the $(69+ε)$-approximation of Xu et al. for Euclidean k-means.

preprint2020arXiv

A Deterministic Algorithm for Balanced Cut with Applications to Dynamic Connectivity, Flows, and Beyond

We consider the classical Minimum Balanced Cut problem: given a graph $G$, compute a partition of its vertices into two subsets of roughly equal volume, while minimizing the number of edges connecting the subsets. We present the first {\em deterministic, almost-linear time} approximation algorithm for this problem. Specifically, our algorithm, given an $n$-vertex $m$-edge graph $G$ and any parameter $1\leq r\leq O(\log n)$, computes a $(\log m)^{r^2}$-approximation for Minimum Balanced Cut on $G$, in time $O\left ( m^{1+O(1/r)+o(1)}\cdot (\log m)^{O(r^2)}\right )$. In particular, we obtain a $(\log m)^{1/ε}$-approximation in time $m^{1+O(1/\sqrtε)}$ for any constant $ε$, and a $(\log m)^{f(m)}$-approximation in time $m^{1+o(1)}$, for any slowly growing function $m$. We obtain deterministic algorithms with similar guarantees for the Sparsest Cut and the Lowest-Conductance Cut problems. Our algorithm for the Minimum Balanced Cut problem in fact provides a stronger guarantee: it either returns a balanced cut whose value is close to a given target value, or it certifies that such a cut does not exist by exhibiting a large subgraph of $G$ that has high conductance. We use this algorithm to obtain deterministic algorithms for dynamic connectivity and minimum spanning forest, whose worst-case update time on an $n$-vertex graph is $n^{o(1)}$, thus resolving a major open problem in the area of dynamic graph algorithms. Our work also implies deterministic algorithms for a host of additional problems, whose time complexities match, up to subpolynomial in $n$ factors, those of known randomized algorithms. The implications include almost-linear time deterministic algorithms for solving Laplacian systems and for approximating maximum flows in undirected graphs.

preprint2020arXiv

A Lightweight Algorithm to Uncover Deep Relationships in Data Tables

Many data we collect today are in tabular form, with rows as records and columns as attributes associated with each record. Understanding the structural relationship in tabular data can greatly facilitate the data science process. Traditionally, much of this relational information is stored in table schema and maintained by its creators, usually domain experts. In this paper, we develop automated methods to uncover deep relationships in a single data table without expert or domain knowledge. Our method can decompose a data table into layers of smaller tables, revealing its deep structure. The key to our approach is a computationally lightweight forward addition algorithm that we developed to recursively extract the functional dependencies between table columns that are scalable to tables with many columns. With our solution, data scientists will be provided with automatically generated, data-driven insights when exploring new data sets.

preprint2020arXiv

Cross-Language Transfer Learning, Continuous Learning, and Domain Adaptation for End-to-End Automatic Speech Recognition

In this paper, we demonstrate the efficacy of transfer learning and continuous learning for various automatic speech recognition (ASR) tasks. We start with a pre-trained English ASR model and show that transfer learning can be effectively and easily performed on: (1) different English accents, (2) different languages (German, Spanish and Russian) and (3) application-specific domains. Our experiments demonstrate that in all three cases, transfer learning from a good base model has higher accuracy than a model trained from scratch. It is preferred to fine-tune large models than small pre-trained models, even if the dataset for fine-tuning is small. Moreover, transfer learning significantly speeds up convergence for both very small and very large target datasets.

preprint2020arXiv

Cycle Text-To-Image GAN with BERT

We explore novel approaches to the task of image generation from their respective captions, building on state-of-the-art GAN architectures. Particularly, we baseline our models with the Attention-based GANs that learn attention mappings from words to image features. To better capture the features of the descriptions, we then built a novel cyclic design that learns an inverse function to maps the image back to original caption. Additionally, we incorporated recently developed BERT pretrained word embeddings as our initial text featurizer and observe a noticeable improvement in qualitative and quantitative performance compared to the Attention GAN baseline.

preprint2020arXiv

Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks

We propose NovoGrad, an adaptive stochastic gradient descent method with layer-wise gradient normalization and decoupled weight decay. In our experiments on neural networks for image classification, speech recognition, machine translation, and language modeling, it performs on par or better than well tuned SGD with momentum and Adam or AdamW. Additionally, NovoGrad (1) is robust to the choice of learning rate and weight initialization, (2) works well in a large batch setting, and (3) has two times smaller memory footprint than Adam.

preprint2020arXiv

UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

With the recent success of the pre-training technique for NLP and image-linguistic tasks, some video-linguistic pre-training works are gradually developed to improve video-text related downstream tasks. However, most of the existing multimodal models are pre-trained for understanding tasks, leading to a pretrain-finetune discrepancy for generation tasks. This paper proposes UniVL: a Unified Video and Language pre-training model for both multimodal understanding and generation. It comprises four components, including two single-modal encoders, a cross encoder, and a decoder with the Transformer backbone. Five objectives, including video-text joint, conditioned masked language model (CMLM), conditioned masked frame model (CMFM), video-text alignment, and language reconstruction, are designed to train each of the components. We further develop two pre-training strategies, stage by stage pre-training (StagedP) and enhanced video representation (EnhancedV), to make the training process of the UniVL more effective. The pre-train is carried out on a sizeable instructional video dataset HowTo100M. Experimental results demonstrate that the UniVL can learn strong video-text representation and achieves state-of-the-art results on five downstream tasks.