Researcher profile

Min Tang

Min Tang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

LFD: Layer Fused Decoding to Exploit External Knowledge in Retrieval-Augmented Generation

Retrieval-augmented generation (RAG) incorporates external knowledge into large language models (LLMs), improving their adaptability to downstream tasks and enabling information updates. Surprisingly, recent empirical evidence demonstrates that injecting noise into retrieved relevant documents paradoxically facilitates exploitation of external knowledge and improves generation quality. Although counterintuitive and challenging to apply in practice, this phenomenon enables granular control and rigorous analysis of how LLMs integrate external knowledge. Therefore, in this paper, we intervene on noise injection and establish a layer-specific functional demarcation within the LLM: shallow layers specialize in local context modeling, intermediate layers focus on integrating long-range external factual knowledge, and deeper layers primarily rely on parametric internal knowledge. Building on this insight, we propose Layer Fused Decoding (LFD), a simple decoding strategy that directly combines representations from an intermediate layer with final-layer decoding outputs to fully exploit the external factual knowledge. To identify the optimal intermediate layer, we introduce an internal knowledge score (IKS) criterion that selects the layer with the lowest IKS value in the latter half of layers. Experimental results across multiple benchmarks demonstrate that LFD helps RAG systems more effectively surface retrieved context knowledge with minimal cost.

preprint2024arXiv

A fast offline/online forward solver for stationary transport equation with multiple inflow boundary conditions and varying coefficients

It is of great interest to solve the inverse problem of stationary radiative transport equation (RTE) in optical tomography. The standard way is to formulate the inverse problem into an optimization problem, but the bottleneck is that one has to solve the forward problem repeatedly, which is time-consuming. Due to the optical property of biological tissue, in real applications, optical thin and thick regions coexist and are adjacent to each other, and the geometry can be complex. To use coarse meshes and save the computational cost, the forward solver has to be asymptotic preserving across the interface (APAL). In this paper, we propose an offline/online solver for RTE. The cost at the offline stage is comparable to classical methods, while the cost at the online stage is much lower. Two cases are considered. One is to solve the RTE with fixed scattering and absorption cross sections while the boundary conditions vary; the other is when cross sections vary in a small domain and the boundary conditions change many times. The solver can be decomposed into offline/online stages in these two cases. One only needs to calculate the offline stage once and update the online stage when the parameters vary. Our proposed solver is much cheaper when one needs to solve RTE with multiple right-hand sides or when the cross sections vary in a small domain, thus can accelerate the speed of solving inverse RTE problems. We illustrate the online/offline decomposition based on the Tailored Finite Point Method (TFPM), which is APAL on general quadrilateral meshes.

preprint2024arXiv

Confined run-and-tumble model with boundary aggregation: long time behavior and convergence to the confined Fokker-Planck model

The motile micro-organisms such as E. coli, sperm, or some seaweed are usually modelled by self-propelled particles that move with the run-and-tumble process. Individual-based stochastic models are usually employed to model the aggregation phenomenon at the boundary, which is an active research field that has attracted a lot of biologists and biophysicists. Self-propelled particles at the microscale have complex behaviors, while characteristics at the population level are more important for practical applications but rely on individual behaviors. Kinetic PDE models that describe the time evolution of the probability density distribution of the motile micro-organisms are widely used. However, how to impose the appropriate boundary conditions that take into account the boundary aggregation phenomena is rarely studied. In this paper, we propose the boundary conditions for a 2D confined run-and-tumble model (CRTM) for self-propelled particle populations moving between two parallel plates with a run-and-tumble process. The proposed model satisfies the relative entropy inequality and thus long-time convergence. We establish the relation between CRTM and the confined Fokker-Planck model (CFPM) studied in [22]. We prove theoretically that when the tumble is highly forward peaked and frequent enough, CRTM converges asymptotically to the CFPM. A numerical comparison of the CRTM with aggregation and CFPM is given. The time evolution of both the deterministic PDE model and individual-based stochastic simulations are displayed, which match each other well.

preprint2024arXiv

Reconstructing the kinetic chemotaxis kernel using macroscopic data: well-posedness and ill-posedness

Bacterial motion is steered by external stimuli (chemotaxis), and the motion described on the mesoscopic scale is uniquely determined by a parameter $K$ that models velocity change response from the bacteria. This parameter is called chemotaxis kernel. In a practical setting, it is inferred by experimental data. We deploy a PDE-constrained optimization framework to perform this reconstruction using velocity-averaged, localized data taken in the interior of the domain. The problem can be well-posed or ill-posed depending on the data preparation and the experimental setup. In particular, we propose one specific design that guarantees numerical reconstructability and local convergence. This design is adapted to the discretization of $K$ in space and decouples the reconstruction of local values of $K$ into smaller cell problems, opening up parallelization opportunities. Numerical evidences support the theoretical findings.

preprint2022arXiv

N-Cloth: Predicting 3D Cloth Deformation with Mesh-Based Networks

We present a novel mesh-based learning approach (N-Cloth) for plausible 3D cloth deformation prediction. Our approach is general and can handle cloth or obstacles represented by triangle meshes with arbitrary topologies. We use graph convolution to transform the cloth and object meshes into a latent space to reduce the non-linearity in the mesh space. Our network can predict the target 3D cloth mesh deformation based on the initial state of the cloth mesh template and the target obstacle mesh. Our approach can handle complex cloth meshes with up to 100K triangles and scenes with various objects corresponding to SMPL humans, non-SMPL humans or rigid bodies. In practice, our approach can be used to generate plausible cloth simulation at 30-45 fps on an NVIDIA GeForce RTX 3090 GPU. We highlight its benefits over prior learning-based methods and physically-based cloth simulators.

preprint2021arXiv

A Spatial-Temporal asymptotic preserving scheme for radiation magnetohydrodynamics in the equilibrium and non-equilibrium diffusion limit

The radiation magnetohydrodynamics (RMHD) system couples the ideal magnetohydrodynamics equations with a gray radiation transfer equation. The main challenge is that the radiation travels at the speed of light while the magnetohydrodynamics changes with the time scale of the fluid. The time scales of these two processes can vary dramatically. In order to use mesh sizes and time steps that are independent of the speed of light, asymptotic preserving (AP) schemes in both space and time are desired. In this paper, we develop an AP scheme in both space and time for the RMHD system. Two different scalings are considered. One results in an equilibrium diffusion limit system, while the other results in a non-equilibrium system. The main idea is to decompose the radiative intensity into three parts, each part is treated differently with suitable combinations of explicit and implicit discretizations guaranteeing the favorable stability conditionand computational efficiency. The performance of the AP method is presented, for both optically thin and thick regions, as well as for the radiative shock problem.

preprint2021arXiv

Multi-scale GCN-assisted two-stage network for joint segmentation of retinal layers and disc in peripapillary OCT images

An accurate and automated tissue segmentation algorithm for retinal optical coherence tomography (OCT) images is crucial for the diagnosis of glaucoma. However, due to the presence of the optic disc, the anatomical structure of the peripapillary region of the retina is complicated and is challenging for segmentation. To address this issue, we developed a novel graph convolutional network (GCN)-assisted two-stage framework to simultaneously label the nine retinal layers and the optic disc. Specifically, a multi-scale global reasoning module is inserted between the encoder and decoder of a U-shape neural network to exploit anatomical prior knowledge and perform spatial reasoning. We conducted experiments on human peripapillary retinal OCT images. The Dice score of the proposed segmentation network is 0.820$\pm$0.001 and the pixel accuracy is 0.830$\pm$0.002, both of which outperform those from other state-of-the-art techniques.

preprint2020arXiv

Hierarchical Optimization Time Integration for CFL-rate MPM Stepping

We propose Hierarchical Optimization Time Integration (HOT) for efficient implicit time-stepping of the Material Point Method (MPM) irrespective of simulated materials and conditions. HOT is an MPM-specialized hierarchical optimization algorithm that solves nonlinear time step problems for large-scale MPM systems near the CFL-limit. HOT provides convergent simulations "out-of-the-box" across widely varying materials and computational resolutions without parameter tuning. As an implicit MPM time stepper accelerated by a custom-designed Galerkin multigrid wrapped in a quasi-Newton solver, HOT is both highly parallelizable and robustly convergent. As we show in our analysis, HOT maintains consistent and efficient performance even as we grow stiffness, increase deformation, and vary materials over a wide range of finite strain, elastodynamic and plastic examples. Through careful benchmark ablation studies, we compare the effectiveness of HOT against seemingly plausible alternative combinations of MPM with standard multigrid and other Newton-Krylov models. We show how these alternative designs result in severe issues and poor performance. In contrast, HOT outperforms the existing state-of-the-art, heavily optimized implicit MPM codes with an up to 10x performance speedup across a wide range of challenging benchmark test simulations.

preprint2020arXiv

On a subset sums problem of Chen and Wu

For a set $A$, let $P(A)$ be the set of all finite subset sums of $A$. We prove that if a sequence $B=\{11\leq b_1<b_2<\cdots\}$ satisfies $b_2=3b_1+5$, $b_3=3b_2+2$ and $b_{n+1}=3b_n+4b_{n-1}$ for all $n\geq 3$, then there is a sequence of positive integers $A=\{a_1<a_2<\cdots\}$ such that $P(A)=\mathbb{N}\setminus B$. This result shows that the answer to the problem of Chen and Wu [`The inverse problem on subset sums&#39;, European. J. Combin. 34(2013), 841-845] is negative.

preprint2020arXiv

On an inverse problem in additive number theory

For a set $A$, let $P(A)$ be the set of all finite subset sums of $A$. In this paper, for a sequence of integers $B=\{1<b_1<b_2<\cdots\}$ and $3b_1+5\leq b_2\leq 6b_1+10$, we determine the critical value for $b_3$ such that there exists an infinite sequence $A$ of positive integers for which $P(A)=\mathbb{N}\setminus B$. This result shows that we partially solve the problem of Fang and Fang [`On an inverse problem in additive number theory&#39;, Acta Math. Hungar. 158(2019), 36-39].

preprint2020arXiv

P-Cloth: Interactive Complex Cloth Simulation on Multi-GPU Systems using Dynamic Matrix Assembly and Pipelined Implicit Integrators

We present a novel parallel algorithm for cloth simulation that exploits multiple GPUs for fast computation and the handling of very high resolution meshes. To accelerate implicit integration, we describe new parallel algorithms for sparse matrix-vector multiplication (SpMV) and for dynamic matrix assembly on a multi-GPU workstation. Our algorithms use a novel work queue generation scheme for a fat-tree GPU interconnect topology. Furthermore, we present a novel collision handling scheme that uses spatial hashing for discrete and continuous collision detection along with a non-linear impact zone solver. Our parallel schemes can distribute the computation and storage overhead among multiple GPUs and enable us to perform almost interactive simulation on complex cloth meshes, which can hardly be handled on a single GPU due to memory limitations. We have evaluated the performance with two multi-GPU workstations (with 4 and 8 GPUs, respectively) on cloth meshes with 0.5-1.65M triangles. Our approach can reliably handle the collisions and generate vivid wrinkles and folds at 2-5 fps, which is significantly faster than prior cloth simulation systems. We observe almost linear speedups with respect to the number of GPUs.