Source author record

Jason Li

Jason Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms astro-ph.HE Computer Vision eess.AS Machine Learning Computation and Language astro-ph.EP astro-ph.GA Databases Distributed, Parallel, and Cluster Computing eess.IV math.QA math.RA Sound

Catalog footprint

What is connected

19works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Omni-Fake: Benchmarking Unified Multimodal Social Media Deepfake Detection

Multimodal deepfakes are proliferating on social media and threaten authenticity, information integrity, and digital forensics. Existing benchmarks are constrained by their single-modality scope, simplified manipulations, or unrealistic distributions, which limit their ability to assess real-world robustness. To address these limitations, we present Omni-Fake, a unified omni-dataset for comprehensive multimodal deepfake detection in social-media settings. It comprises Omni-Fake-Set, a large-scale, high-quality dataset with 1M+ samples, and Omni-Fake-OOD, an out-of-distribution benchmark with 200k+ samples intentionally excluded from training to evaluate generalization. Omni-Fake spans four modalities (image, audio, video, and audio-video talking head) and supports a joint detection-localization-explanation protocol. On top of Omni-Fake, we further propose Omni-Fake-R1, a reinforcement-learning-driven multimodal detector that adaptively integrates visual and auditory cues and outputs structured decisions, localization, and natural-language explanations. Extensive experiments show significant gains in detection accuracy, cross-modal generalization, and explainability over state-of-the-art baselines. Project page: https://tianxiao1201.github.io/omni-fake-project-page/

preprint2023arXiv

Near-Linear Time Approximations for Cut Problems via Fair Cuts

We introduce the notion of {\em fair cuts} as an approach to leverage approximate $(s,t)$-mincut (equivalently $(s,t)$-maxflow) algorithms in undirected graphs to obtain near-linear time approximation algorithms for several cut problems. Informally, for any $α\geq 1$, an $α$-fair $(s,t)$-cut is an $(s,t)$-cut such that there exists an $(s,t)$-flow that uses $1/α$ fraction of the capacity of \emph{every} edge in the cut. (So, any $α$-fair cut is also an $α$-approximate mincut, but not vice-versa.) We give an algorithm for $(1+ε)$-fair $(s,t)$-cut in $\tilde{O}(m)$-time, thereby matching the best runtime for $(1+ε)$-approximate $(s,t)$-mincut [Peng, SODA '16]. We then demonstrate the power of this approach by showing that this result almost immediately leads to several applications: - the first nearly-linear time $(1+ε)$-approximation algorithm that computes all-pairs maxflow values (by constructing an approximate Gomory-Hu tree). Prior to our work, such a result was not known even for the special case of Steiner mincut [Dinitz and Vainstein, STOC '94; Cole and Hariharan, STOC '03]; - the first almost-linear-work subpolynomial-depth parallel algorithms for computing $(1+ε)$-approximations for all-pairs maxflow values (again via an approximate Gomory-Hu tree) in unweighted graphs; - the first near-linear time expander decomposition algorithm that works even when the expansion parameter is polynomially small; this subsumes previous incomparable algorithms [Nanongkai and Saranurak, FOCS '17; Wulff-Nilsen, FOCS '17; Saranurak and Wang, SODA '19].

preprint2022arXiv

Adapting TTS models For New Speakers using Transfer Learning

Training neural text-to-speech (TTS) models for a new speaker typically requires several hours of high quality speech data. Prior works on voice cloning attempt to address this challenge by adapting pre-trained multi-speaker TTS models for a new voice, using a few minutes of speech data of the new speaker. However, publicly available large multi-speaker datasets are often noisy, thereby resulting in TTS models that are not suitable for use in products. We address this challenge by proposing transfer-learning guidelines for adapting high quality single-speaker TTS models for a new speaker, using only a few minutes of speech data. We conduct an extensive study using different amounts of data for a new speaker and evaluate the synthesized speech in terms of naturalness and voice/style similarity to the target speaker. We find that fine-tuning a single-speaker TTS model on just 30 minutes of data, can yield comparable performance to a model trained from scratch on more than 27 hours of data for both male and female target speakers.

preprint2022arXiv

Breaking the Cubic Barrier for All-Pairs Max-Flow: Gomory-Hu Tree in Nearly Quadratic Time

In 1961, Gomory and Hu showed that the All-Pairs Max-Flow problem of computing the max-flow between all $n\choose 2$ pairs of vertices in an undirected graph can be solved using only $n-1$ calls to any (single-pair) max-flow algorithm. Even assuming a linear-time max-flow algorithm, this yields a running time of $O(mn)$, which is $O(n^3)$ when $m = Θ(n^2)$. While subsequent work has improved this bound for various special graph classes, no subcubic-time algorithm has been obtained in the last 60 years for general graphs. We break this longstanding barrier by giving an $\tilde{O}(n^{2})$-time algorithm on general, weighted graphs. Combined with a popular complexity assumption, we establish a counter-intuitive separation: all-pairs max-flows are strictly easier to compute than all-pairs shortest-paths. Our algorithm produces a cut-equivalent tree, known as the Gomory-Hu tree, from which the max-flow value for any pair can be retrieved in near-constant time. For unweighted graphs, we refine our techniques further to produce a Gomory-Hu tree in the time of a poly-logarithmic number of calls to any max-flow algorithm. This shows an equivalence between the all-pairs and single-pair max-flow problems, and is optimal up to poly-logarithmic factors. Using the recently announced $m^{1+o(1)}$-time max-flow algorithm (Chen et al., March 2022), our Gomory-Hu tree algorithm for unweighted graphs also runs in $m^{1+o(1)}$-time.

preprint2022arXiv

Deterministic Min-cut in Poly-logarithmic Max-flows

We give a deterministic algorithm for finding the minimum (weight) cut of an undirected graph on $n$ vertices and $m$ edges using $\text{polylog}(n)$ calls to any maximum flow subroutine. Using the current best deterministic maximum flow algorithms, this yields an overall running time of $\tilde O(m \cdot \min(\sqrt{m}, n^{2/3}))$ for weighted graphs, and $m^{4/3+o(1)}$ for unweighted (multi)-graphs. This marks the first improvement for this problem since a running time bound of $\tilde O(mn)$ was established by several papers in the early 1990s. Our global minimum cut algorithm is obtained as a corollary of a minimum Steiner cut algorithm, where a minimum Steiner cut is a minimum (weight) set of edges whose removal disconnects at least one pair of vertices among a designated set of terminal vertices. The running time of our deterministic minimum Steiner cut algorithm matches that of the global minimum cut algorithm stated above. Using randomization, the running time improves to $m^{1+o(1)}$ because of a faster maximum flow subroutine; this improves the best known randomized algorithm for the minimum Steiner cut problem as well. Our main technical contribution is a new tool that we call *isolating cuts*. Given a set of vertices $R$, this entails finding cuts of minimum weight that separate (or isolate) each individual vertex $v\in R$ from the rest of the vertices $R\setminus \{v\}$. Naïvely, this can be done using $|R|$ maximum flow calls, but we show that just $O(\log |R|)$ suffice for finding isolating cuts for any set of vertices $R$. We call this the *isolating cut lemma*.

preprint2022arXiv

Edge Connectivity Augmentation in Near-Linear Time

We give an $\tilde{O}(m)$-time algorithm for the edge connectivity augmentation problem and the closely related edge splitting-off problem. This is optimal up to lower order terms and closes the long line of work on these problems.

preprint2022arXiv

Faster Parallel Algorithm for Approximate Shortest Path

We present the first $m\,\text{polylog}(n)$ work, $\text{polylog}(n)$ time algorithm in the PRAM model that computes $(1+ε)$-approximate single-source shortest paths on weighted, undirected graphs. This improves upon the breakthrough result of Cohen~[JACM'00] that achieves $O(m^{1+ε_0})$ work and $\text{polylog}(n)$ time. While most previous approaches, including Cohen's, leveraged the power of hopsets, our algorithm builds upon the recent developments in \emph{continuous optimization}, studying the shortest path problem from the lens of the closely-related \emph{minimum transshipment} problem. To obtain our algorithm, we demonstrate a series of near-linear work, polylogarithmic-time reductions between the problems of approximate shortest path, approximate transshipment, and $\ell_1$-embeddings, and establish a recursive algorithm that cycles through the three problems and reduces the graph size on each cycle. As a consequence, we also obtain faster parallel algorithms for approximate transshipment and $\ell_1$-embeddings with polylogarithmic distortion. The minimum transshipment algorithm in particular improves upon the previous best $m^{1+o(1)}$ work sequential algorithm of Sherman~[SODA'17]. To improve readability, the paper is almost entirely self-contained, save for several staple theorems in algorithms and combinatorics.

preprint2022arXiv

On the Fixed-Parameter Tractability of Capacitated Clustering

We study the complexity of the classic capacitated k-median and k-means problems parameterized by the number of centers, k. These problems are notoriously difficult since the best known approximation bound for high dimensional Euclidean space and general metric space is $Θ(\log k)$ and it remains a major open problem whether a constant factor exists. We show that there exists a $(3+ε)$-approximation algorithm for the capacitated k-median and a $(9+ε)$-approximation algorithm for the capacitated k-means problem in general metric spaces whose running times are $f(ε,k) n^{O(1)}$. For Euclidean inputs of arbitrary dimension, we give a $(1+ε)$-approximation algorithm for both problems with a similar running time. This is a significant improvement over the $(7+ε)$-approximation of Adamczyk et al. for k-median in general metric spaces and the $(69+ε)$-approximation of Xu et al. for Euclidean k-means.

preprint2020arXiv

A Deterministic Algorithm for Balanced Cut with Applications to Dynamic Connectivity, Flows, and Beyond

We consider the classical Minimum Balanced Cut problem: given a graph $G$, compute a partition of its vertices into two subsets of roughly equal volume, while minimizing the number of edges connecting the subsets. We present the first {\em deterministic, almost-linear time} approximation algorithm for this problem. Specifically, our algorithm, given an $n$-vertex $m$-edge graph $G$ and any parameter $1\leq r\leq O(\log n)$, computes a $(\log m)^{r^2}$-approximation for Minimum Balanced Cut on $G$, in time $O\left ( m^{1+O(1/r)+o(1)}\cdot (\log m)^{O(r^2)}\right )$. In particular, we obtain a $(\log m)^{1/ε}$-approximation in time $m^{1+O(1/\sqrtε)}$ for any constant $ε$, and a $(\log m)^{f(m)}$-approximation in time $m^{1+o(1)}$, for any slowly growing function $m$. We obtain deterministic algorithms with similar guarantees for the Sparsest Cut and the Lowest-Conductance Cut problems. Our algorithm for the Minimum Balanced Cut problem in fact provides a stronger guarantee: it either returns a balanced cut whose value is close to a given target value, or it certifies that such a cut does not exist by exhibiting a large subgraph of $G$ that has high conductance. We use this algorithm to obtain deterministic algorithms for dynamic connectivity and minimum spanning forest, whose worst-case update time on an $n$-vertex graph is $n^{o(1)}$, thus resolving a major open problem in the area of dynamic graph algorithms. Our work also implies deterministic algorithms for a host of additional problems, whose time complexities match, up to subpolynomial in $n$ factors, those of known randomized algorithms. The implications include almost-linear time deterministic algorithms for solving Laplacian systems and for approximating maximum flows in undirected graphs.

preprint2020arXiv

A Lightweight Algorithm to Uncover Deep Relationships in Data Tables

Many data we collect today are in tabular form, with rows as records and columns as attributes associated with each record. Understanding the structural relationship in tabular data can greatly facilitate the data science process. Traditionally, much of this relational information is stored in table schema and maintained by its creators, usually domain experts. In this paper, we develop automated methods to uncover deep relationships in a single data table without expert or domain knowledge. Our method can decompose a data table into layers of smaller tables, revealing its deep structure. The key to our approach is a computationally lightweight forward addition algorithm that we developed to recursively extract the functional dependencies between table columns that are scalable to tables with many columns. With our solution, data scientists will be provided with automatically generated, data-driven insights when exploring new data sets.

preprint2020arXiv

Cross-Language Transfer Learning, Continuous Learning, and Domain Adaptation for End-to-End Automatic Speech Recognition

In this paper, we demonstrate the efficacy of transfer learning and continuous learning for various automatic speech recognition (ASR) tasks. We start with a pre-trained English ASR model and show that transfer learning can be effectively and easily performed on: (1) different English accents, (2) different languages (German, Spanish and Russian) and (3) application-specific domains. Our experiments demonstrate that in all three cases, transfer learning from a good base model has higher accuracy than a model trained from scratch. It is preferred to fine-tune large models than small pre-trained models, even if the dataset for fine-tuning is small. Moreover, transfer learning significantly speeds up convergence for both very small and very large target datasets.

preprint2020arXiv

Cycle Text-To-Image GAN with BERT

We explore novel approaches to the task of image generation from their respective captions, building on state-of-the-art GAN architectures. Particularly, we baseline our models with the Attention-based GANs that learn attention mappings from words to image features. To better capture the features of the descriptions, we then built a novel cyclic design that learns an inverse function to maps the image back to original caption. Additionally, we incorporated recently developed BERT pretrained word embeddings as our initial text featurizer and observe a noticeable improvement in qualitative and quantitative performance compared to the Attention GAN baseline.

preprint2020arXiv

Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks

We propose NovoGrad, an adaptive stochastic gradient descent method with layer-wise gradient normalization and decoupled weight decay. In our experiments on neural networks for image classification, speech recognition, machine translation, and language modeling, it performs on par or better than well tuned SGD with momentum and Adam or AdamW. Additionally, NovoGrad (1) is robust to the choice of learning rate and weight initialization, (2) works well in a large batch setting, and (3) has two times smaller memory footprint than Adam.

preprint2020arXiv

UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

With the recent success of the pre-training technique for NLP and image-linguistic tasks, some video-linguistic pre-training works are gradually developed to improve video-text related downstream tasks. However, most of the existing multimodal models are pre-trained for understanding tasks, leading to a pretrain-finetune discrepancy for generation tasks. This paper proposes UniVL: a Unified Video and Language pre-training model for both multimodal understanding and generation. It comprises four components, including two single-modal encoders, a cross encoder, and a decoder with the Transformer backbone. Five objectives, including video-text joint, conditioned masked language model (CMLM), conditioned masked frame model (CMFM), video-text alignment, and language reconstruction, are designed to train each of the components. We further develop two pre-training strategies, stage by stage pre-training (StagedP) and enhanced video representation (EnhancedV), to make the training process of the UniVL more effective. The pre-train is carried out on a sizeable instructional video dataset HowTo100M. Experimental results demonstrate that the UniVL can learn strong video-text representation and achieves state-of-the-art results on five downstream tasks.

preprint2013arXiv

Rotating Accretion Flows: From Infinity to the Black Hole

Accretion onto a supermassive black hole of a rotating inflow is a particularly difficult problem to study because of the wide range of length scales involved. There have been broadly utilized analytic and numerical treatments of the global properties of accretion flows, but detailed numerical simulations are required to address certain critical aspects. We use the ZEUS code to run hydrodynamical simulations of rotating, axisymmetric accretion flows with Bremsstrahlung cooling, considering solutions for which the centrifugal balance radius significantly exceeds the Schwarzschild radius, with and without viscous angular momentum transport. Infalling gas is followed from well beyond the Bondi radius down to the vicinity of the black hole. We produce a continuum of solutions with respect to the single parameter Mdot_Bondi/Mdot_Edd, and there is a sharp transition between two general classes of solutions at an Eddington ratio of Mdot_Bondi/Mdot_Edd ~ few x 10^(-2). Our high inflow solutions are very similar to the standard Shakura & Sunyaev (1973) results. But our low inflow results are to zeroth order the stationary Papaloizou and Pringle (1984) solution, which has no accretion. To next order in the small, assumed viscosity they show circulation, with disk and conical wind outflows almost balancing inflow. These solutions are characterized by hot, vertically extended disks, and net accretion proceeds at an extremely low rate, only of order alpha times the inflow rate. Our simulations have converged with respect to spatial resolution and temporal duration, and they do not depend strongly on our choice of boundary conditions.

preprint2012arXiv

Lower central series of a free associative algebra over the integers and finite fields

Consider the free algebra A_n generated over Q by n generators x_1, ..., x_n. Interesting objects attached to A = A_n are members of its lower central series, L_i = L_i(A), defined inductively by L_1 = A, L_{i+1} = [A,L_{i}], and their associated graded components B_i = B_i(A) defined as B_i=L_i/L_{i+1}. These quotients B_i, for i at least 2, as well as the reduced quotient \bar{B}_1=A/(L_2+A L_3), exhibit a rich geometric structure, as shown by Feigin and Shoikhet and later authors, (Dobrovolska-Kim-Ma,Dobrovolska-Etingof,Arbesfeld-Jordan,Bapat-Jordan). We study the same problem over the integers Z and finite fields F_p. New phenomena arise, namely, torsion in B_i over Z, and jumps in dimension over F_p. We describe the torsion in the reduced quotient RB_1 and B_2 geometrically in terms of the De Rham cohomology of Z^n. As a corollary we obtain a complete description of \bar{B}_1(A_n(Z)) and \bar{B}_1(A_n(F_p)), as well as of B_2(A_n(Z[1/2])) and B_2(A_n(F_p)), p>2. We also give theoretical and experimental results for B_i with i>2, formulating a number of conjectures and questions based on them. Finally, we discuss the supercase, when some of the generators are odd (fermionic) and some are even (bosonic), and provide some theoretical results and experimental data in this case.

preprint2012arXiv

On the Spin-Down of Intermittent Pulsars

Magnetospheres of pulsars are thought to be filled with plasma, and variations in plasma supply can affect both pulsar emission properties and spin-down rates. A number of recently discovered "intermittent" pulsars switch between two distinct states: an "on", radio-loud state, and an "off", radio-quiet state. Spin-down rates in the two states differ by a large factor, $\sim 1.5-2.5$, which is not easily understood in the context of current models. In this Letter we present self-consistent numerical solutions of "on" and "off" states of intermittent pulsar magnetospheres. We model the "on" state as a nearly ideal force-free magnetosphere with abundant magnetospheric plasma supply. The lack of radio emission in the "off" state is associated with plasma supply disruption that results in lower plasma density on the open field lines. We model the "off" state using nearly vacuum conditions on the open field lines and nearly ideal force-free conditions on the closed field lines, where plasma can remain trapped even in the absence of pair production. The toroidal advection of plasma in the closed zone in the "off" state causes spin-downs that are a factor of $\sim 2$ higher than vacuum values, and we naturally obtain a range of spin-down ratios between the "on" and "off" states, $\sim 1.2-2.9$, which corresponds to a likely range of pulsar inclination angles of $30{-}90^\circ$. We consider the implications of our model to a number of poorly understood but possibly related pulsar phenomena, including nulling, timing noise, and rotating radio transients.

preprint2011arXiv

Resistive Solutions for Pulsar Magnetospheres

The current state of the art in the modeling of pulsar magnetospheres invokes either the vacuum or force-free limits for the magnetospheric plasma. Neither of these limits can simultaneously account for both the plasma currents and the accelerating electric fields that are needed to explain the morphology and spectra of high-energy emission from pulsars. To better understand the structure of such magnetospheres, we combine accelerating fields and force-free solutions by considering models of magnetospheres filled with resistive plasma. We formulate Ohm's Law in the minimal velocity fluid frame and construct a family of resistive solutions that smoothly bridges the gap between the vacuum and the force-free magnetosphere solutions. The spin-down luminosity, open field line potential drop, and the fraction of open field lines all transition between the vacuum and force-free values as the plasma conductivity varies from zero to infinity. For fixed inclination angle, we find that the spin-down luminosity depends linearly on the open field line potential drop. We consider the implications of our resistive solutions for the spin down of intermittent pulsars and sub-pulse drift phenomena in radio pulsars.

preprint2010arXiv

Circulation and Dissipation on Hot Jupiters

Many global circulation models predict supersonic zonal winds and large vertical shears in the atmospheres of short-period jovian exoplanets. Using linear analysis and nonlinear local simulations, we investigate hydrodynamic dissipation mechanisms to balance the thermal acceleration of these winds. The adiabatic Richardson criterion remains a good guide to linear stability, although thermal diffusion allows some modes to violate it at very long wavelengths and very low growth rates. Nonlinearly, wind speeds saturate at Mach numbers $\approx 2$ and Richardson numbers $\lesssim 1/4$ for a broad range of plausible diffusivities and forcing strengths. Turbulence and vertical mixing, though accompanied by weak shocks, dominate the dissipation, which appears to be the outcome of a recurrent Kelvin-Helmholtz instability. An explicit shear viscosity, as well as thermal diffusivity, is added to ZEUS to capture dissipation outside of shocks. The wind speed is not monotonic nor single valued for shear viscosities larger than about $10^{-3}$ of the sound speed times the pressure scale height. Coarsening the numerical resolution can also increase the speed. Hence global simulations that are incapable of representing vertical turbulence and shocks, either because of reduced physics or because of limited resolution, may overestimate wind speeds. We recommend that such simulations include artificial dissipation terms to control the Mach and Richardson numbers and to capture mechanical dissipation as heat.

Jason Li

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

Omni-Fake: Benchmarking Unified Multimodal Social Media Deepfake Detection

Near-Linear Time Approximations for Cut Problems via Fair Cuts

Adapting TTS models For New Speakers using Transfer Learning

Breaking the Cubic Barrier for All-Pairs Max-Flow: Gomory-Hu Tree in Nearly Quadratic Time

Deterministic Min-cut in Poly-logarithmic Max-flows

Edge Connectivity Augmentation in Near-Linear Time

Faster Parallel Algorithm for Approximate Shortest Path

On the Fixed-Parameter Tractability of Capacitated Clustering

A Deterministic Algorithm for Balanced Cut with Applications to Dynamic Connectivity, Flows, and Beyond

A Lightweight Algorithm to Uncover Deep Relationships in Data Tables

Cross-Language Transfer Learning, Continuous Learning, and Domain Adaptation for End-to-End Automatic Speech Recognition

Cycle Text-To-Image GAN with BERT

Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks

UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Rotating Accretion Flows: From Infinity to the Black Hole

Lower central series of a free associative algebra over the integers and finite fields

On the Spin-Down of Intermittent Pulsars

Resistive Solutions for Pulsar Magnetospheres

Circulation and Dissipation on Hot Jupiters