Researcher profile

Tianqi Chen

Tianqi Chen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

AnyCXR: Human Anatomy Segmentation of Chest X-ray at Any Acquisition Position using Multi-stage Domain Randomized Synthetic Data with Imperfect Annotations and Conditional Joint Annotation Regularization Learning

Robust anatomical segmentation of chest X-rays (CXRs) remains challenging due to the scarcity of comprehensive annotations and the substantial variability of real-world acquisition conditions. We propose AnyCXR, a unified framework that enables generalizable multi-organ segmentation across arbitrary CXR projection angles using only synthetic supervision. The method combines a Multi-stage Domain Randomization (MSDR) engine, which generates over 100,000 anatomically faithful and highly diverse synthetic radiographs from 3D CT volumes, with a Conditional Joint Annotation Regularization (CAR) learning strategy that leverages partial and imperfect labels by enforcing anatomical consistency in a latent space. Trained entirely on synthetic data, AnyCXR achieves strong zero-shot generalization on multiple real-world datasets, providing accurate delineation of 54 anatomical structures in PA, lateral, and oblique views. The resulting segmentation maps support downstream clinical tasks, including automated cardiothoracic ratio estimation, spine curvature assessment, and disease classification, where the incorporation of anatomical priors improves diagnostic performance. These results demonstrate that AnyCXR establishes a scalable and reliable foundation for anatomy-aware CXR analysis and offers a practical pathway toward reducing annotation burdens while improving robustness across diverse imaging conditions.

preprint2026arXiv

FlashInfer-Bench: Building the Virtuous Cycle for AI-driven LLM Systems

Recent advances show that large language models (LLMs) can act as autonomous agents capable of generating GPU kernels, but integrating these AI-generated kernels into real-world inference systems remains challenging. FlashInfer-Bench addresses this gap by establishing a standardized, closed-loop framework that connects kernel generation, benchmarking, and deployment. At its core, FlashInfer Trace provides a unified schema describing kernel definitions, workloads, implementations, and evaluations, enabling consistent communication between agents and systems. Built on real serving traces, FlashInfer-Bench includes a curated dataset, a robust correctness- and performance-aware benchmarking framework, a public leaderboard to track LLM agents' GPU programming capabilities, and a dynamic substitution mechanism (apply()) that seamlessly injects the best-performing kernels into production LLM engines such as SGLang and vLLM. Using FlashInfer-Bench, we further evaluate the performance and limitations of LLM agents, compare the trade-offs among different GPU programming languages, and provide insights for future agent design. FlashInfer-Bench thus establishes a practical, reproducible pathway for continuously improving AI-generated kernels and deploying them into large-scale LLM inference.

preprint2022arXiv

Bayesian network mediation analysis with application to brain functional connectome

Brain functional connectome, the collection of interconnected neural circuits along functional networks, is one of the most cutting edge neuroimaging traits, and has a potential to play a mediating role within the effect pathway between an exposure and an outcome. While existing mediation analytic approaches are capable of providing insight into complex processes, they mainly focus on a univariate mediator or mediator vector, without considering network-variate mediators. To fill the methodological gap and accomplish this exciting and urgent application, in the paper, we propose an integrative mediation analysis under a Bayesian paradigm with networks entailing the mediation effect. To parameterize the network measurements, we introduce individually specified stochastic block models with unknown block allocation, and naturally bridge effect elements through the latent network mediators induced by the connectivity weights across network modules. To enable the identification of truly active mediating components, we simultaneously impose a feature selection across network mediators. We show the superiority of our model in estimating different effect components and selecting active mediating network structures. As a practical illustration of this approach's application to network neuroscience, we characterize the relationship between a therapeutic intervention and opioid abstinence as mediated by brain functional sub-networks.

preprint2022arXiv

SONAR: Joint Architecture and System Optimization Search

There is a growing need to deploy machine learning for different tasks on a wide array of new hardware platforms. Such deployment scenarios require tackling multiple challenges, including identifying a model architecture that can achieve a suitable predictive accuracy (architecture search), and finding an efficient implementation of the model to satisfy underlying hardware-specific systems constraints such as latency (system optimization search). Existing works treat architecture search and system optimization search as separate problems and solve them sequentially. In this paper, we instead propose to solve these problems jointly, and introduce a simple but effective baseline method called SONAR that interleaves these two search problems. SONAR aims to efficiently optimize for predictive accuracy and inference latency by applying early stopping to both search processes. Our experiments on multiple different hardware back-ends show that SONAR identifies nearly optimal architectures 30 times faster than a brute force approach.

preprint2022arXiv

Stack operation of tensor networks

The tensor network, as a facterization of tensors, aims at performing the operations that are common for normal tensors, such as addition, contraction and stacking. However, due to its non-unique network structure, only the tensor network contraction is so far well defined. In this paper, we propose a mathematically rigorous definition for the tensor network stack approach, that compress a large amount of tensor networks into a single one without changing their structures and configurations. We illustrate the main ideas with the matrix product states based machine learning as an example. Our results are compared with the for loop and the efficient coding method on both CPU and GPU.

preprint2022arXiv

The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding

There is often variation in the shape and size of input data used for deep learning. In many cases, such data can be represented using tensors with non-uniform shapes, or ragged tensors. Due to limited and non-portable support for efficient execution on ragged tensors, current deep learning frameworks generally use techniques such as padding and masking to make the data shapes uniform and then offload the computations to optimized kernels for dense tensor algebra. Such techniques can, however, lead to a lot of wasted computation and therefore, a loss in performance. This paper presents CoRa, a tensor compiler that allows users to easily generate efficient code for ragged tensor operators targeting a wide range of CPUs and GPUs. Evaluating CoRa on a variety of operators on ragged tensors as well as on an encoder layer of the transformer model, we find that CoRa (i)performs competitively with hand-optimized implementations of the operators and the transformer encoder and (ii) achieves, over PyTorch, a 1.6X geomean speedup for the encoder on an Nvidia GPU and a 1.86X geomean speedup for the multi-head attention module used in transformers on an ARM CPU.

preprint2021arXiv

Cortex: A Compiler for Recursive Deep Learning Models

Optimizing deep learning models is generally performed in two steps: (i) high-level graph optimizations such as kernel fusion and (ii) low level kernel optimizations such as those found in vendor libraries. This approach often leaves significant performance on the table, especially for the case of recursive deep learning models. In this paper, we present Cortex, a compiler-based approach to generate highly-efficient code for recursive models for low latency inference. Our compiler approach and low reliance on vendor libraries enables us to perform end-to-end optimizations, leading to up to 14X lower inference latencies over past work, across different backends.

preprint2021arXiv

Thermodynamic performance of a periodically driven harmonic oscillator correlated with the baths

We consider a harmonic oscillator under periodic driving and coupled to two harmonic-oscillator heat baths at different temperatures. We use the thermofield transformation with chain mapping for this setup, which allows us to study the unitary evolution of the system and the baths up to a time when the periodic steady state emerges in the system. We characterize this periodic steady state, and we show that, by tuning the system and the bath parameters, one can turn this system from an engine to an accelerator or even to a heater. The possibility to study the unitary evolution of the system and baths also allows us to evaluate the steady correlations that build between the system and the baths, and correlations that grow between the baths.

preprint2020arXiv

Effects of staggered Dzyaloshinskii-Moriya interactions in a quasi-two-dimensional Shastry-Sutherland model

Frustrated quantum spin systems exhibit exotic physics induced by external magnetic field with anisotropic interactions. Here, we study the effect of non-uniform Dzyaloshinskii-Moriya (DM) interactions on a quasi-two-dimensional Shastry-Sutherland lattice using a matrix product states (MPS) algorithm. We first recover the magnetization plateau structure present in this geometry and then we show that both interdimer and intradimer DM interactions significantly modify the plateaux. The non-number-conserving intradimer interaction smoothens the shape of the magnetization curve, while the number-conserving interdimer interaction induces different small plateaux, which are signatures of the finite size of the system. Interestingly, the interdimer DM interaction induces chirality in the system. We thus characterize these chiral phases with particular emphasis to their robustness against intradimer DM interactions.

preprint2020arXiv

Steady state quantum transport through an anharmonic oscillator strongly coupled to two heat reservoirs

We investigate the transport properties of an anharmonic oscillator, modeled by a single-site Bose-Hubbard model, coupled to two different thermal baths using the numerically exact thermofield based chain-mapping matrix product states (TCMPS) approach. We compare the effectiveness of TCMPS to probe the nonequilibrium dynamics of strongly interacting system irrespective of the system-bath coupling against the global master equation approach in Gorini-Kossakowski-Sudarshan-Lindblad form. We discuss the effect of on-site interactions, temperature bias as well as the system-bath couplings on the steady state transport properties. Last we also show evidence of non-Markovian dynamics by studying the non-monotonicity of the time evolution of the trace distance between two different initial states.

preprint2019arXiv

Skyrmion quantum spin Hall effect

The quantum spin Hall effect is conventionally thought to require a strong spin-orbit coupling, producing an effective spin-dependent magnetic field. However, spin currents can also be present without transport of spins, for example, in spin-waves or skyrmions. In this paper, we show that topological skyrmionic spin textures can be used to realize a quantum spin Hall effect. From basic arguments relating to the single-valuedness of the wave function, we deduce that loop integrals of the derivative of the Hamiltonian must have a spectrum that is integer multiples of $ 2 π$. By relating this to the spin current, we form a new quantity called the quantized spin current which obeys a precise quantization rule. This allows us to derive a quantum spin Hall effect, which we illustrate with an example of a spin-1 Bose-Einstein condensate.