Source author record

Dongyang Li

Dongyang Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language Computer Vision cond-mat.mtrl-sci cond-mat.soft eess.IV Machine Learning math.NA math.OC nlin.CD physics.app-ph physics.flu-dyn

Catalog footprint

What is connected

6works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AMATA: Adaptive Multi-Agent Trajectory Alignment for Knowledge-Intensive Question Answering

Despite substantial advances in large language models (LLMs), generating factually consistent responses for knowledge-intensive question answering remains challenging. These difficulties are primarily due to hallucinations and the limitations of LLMs in bridging long-tail knowledge gaps. To address this, we propose AMATA, an Adaptive Multi-Agent Trajectory Alignment framework that dynamically integrates external knowledge to improve response interpretability and factual grounding. Our architecture leverages six specialized agents that collaboratively perform structured actions for complex question reasoning. We formalize multi-agent collaboration with external tools as a trajectory preference alignment problem, incorporating question-aware agent customization and inter-agent preference harmonization. AMATA introduces two principal innovations: (1) Intra-Trajectory Preference Learning, which learns objective-oriented preferences to prioritize critical agents, and (2) Inter-Agent Dependency Learning, which captures cross-agent tool dependencies through a novel dependency-aware direct preference optimization technique. Empirical results show that AMATA consistently outperforms baseline approaches, knowledge-augmented frameworks, and LLM-based trajectory systems on five established knowledge-intensive QA benchmarks. Further analysis demonstrates the efficiency of our method in reducing token consumption.

preprint2026arXiv

Transformers Can Implement Preconditioned Richardson Iteration for In-Context Gaussian Kernel Regression

Mechanistic accounts of in-context learning (ICL) have identified iterative algorithms for linear regression and related linear prediction tasks, often using linear or ReLU attention variants. For nonlinear ICL, prior work has related softmax and kernelized attention to functional-gradient-type dynamics, but it remains unclear whether a standard transformer with softmax attention can implement a convergent solver with an end-to-end prediction-error guarantee. In this paper, we study in-context kernel ridge regression (KRR) with Gaussian kernels and show that a standard softmax-attention transformer can approximate the KRR predictor during its forward pass by implementing preconditioned Richardson iteration on the associated kernel linear system. Under bounded-data assumptions, we construct a single-head transformer with $O(\log(1/ε))$ blocks and MLP width $O(\sqrt{N/ε})$ that achieves $ε$-accurate prediction for prompts of length $N$. Our construction reveals a functional decomposition within the transformer architecture: softmax attention produces a row-normalized Gaussian-kernel operator needed for cross-token interactions, while ReLU MLP layers act locally to approximate the intra-token scalar arithmetic required by the update. Empirically, we train GPT-2-style transformers on Gaussian-process regression tasks to further test the preconditioned Richardson interpretation. Through linear probing, we compare the transformer's layer-wise predictions with the step-wise outputs of classical KRR solvers and find that its error profiles align most consistently with preconditioned Richardson iteration. Ablation studies further support this interpretation. Together, our theory and experiments identify preconditioned Richardson iteration as a concrete mechanism that softmax-attention transformers can realize for nonlinear in-context Gaussian-kernel regression.

preprint2022arXiv

HiCLRE: A Hierarchical Contrastive Learning Framework for Distantly Supervised Relation Extraction

Distant supervision assumes that any sentence containing the same entity pairs reflects identical relationships. Previous works of distantly supervised relation extraction (DSRE) task generally focus on sentence-level or bag-level de-noising techniques independently, neglecting the explicit interaction with cross levels. In this paper, we propose a hierarchical contrastive learning Framework for Distantly Supervised relation extraction (HiCLRE) to reduce noisy sentences, which integrate the global structural information and local fine-grained interaction. Specifically, we propose a three-level hierarchical learning framework to interact with cross levels, generating the de-noising context-aware representations via adapting the existing multi-head self-attention, named Multi-Granularity Recontextualization. Meanwhile, pseudo positive samples are also provided in the specific level for contrastive learning via a dynamic gradient-based data augmentation strategy, named Dynamic Gradient Adversarial Perturbation. Experiments demonstrate that HiCLRE significantly outperforms strong baselines in various mainstream DSRE datasets.

preprint2022arXiv

Re-laminarization of elastic turbulence

We report frictional drag reduction and a complete flow re-laminarization of elastic turbulence (ET) at vanishing inertia in a viscoelastic channel flow past an obstacle. We show that intensity of observed elastic waves and wall-normal vorticity correlate well with the measured drag above the ET onset. Moreover, we find that the elastic wave frequency grows with Weissenberg number, and at sufficiently high frequency it causes decay of the elastic waves, resulting in ET attenuation and drag reduction. Thus, this allows us to substantiate a physical mechanism, involving interaction of elastic waves with wall-normal vorticity fluctuations, leading to the drag reduction and re-laminarization phenomena at low Reynolds number.

preprint2022arXiv

Uncertainty-Guided Mutual Consistency Learning for Semi-Supervised Medical Image Segmentation

Medical image segmentation is a fundamental and critical step in many clinical approaches. Semi-supervised learning has been widely applied to medical image segmentation tasks since it alleviates the heavy burden of acquiring expert-examined annotations and takes the advantage of unlabeled data which is much easier to acquire. Although consistency learning has been proven to be an effective approach by enforcing an invariance of predictions under different distributions, existing approaches cannot make full use of region-level shape constraint and boundary-level distance information from unlabeled data. In this paper, we propose a novel uncertainty-guided mutual consistency learning framework to effectively exploit unlabeled data by integrating intra-task consistency learning from up-to-date predictions for self-ensembling and cross-task consistency learning from task-level regularization to exploit geometric shape information. The framework is guided by the estimated segmentation uncertainty of models to select out relatively certain predictions for consistency learning, so as to effectively exploit more reliable information from unlabeled data. Experiments on two publicly available benchmark datasets showed that: 1) Our proposed method can achieve significant performance improvement by leveraging unlabeled data, with up to 4.13% and 9.82% in Dice coefficient compared to supervised baseline on left atrium segmentation and brain tumor segmentation, respectively. 2) Compared with other semi-supervised segmentation methods, our proposed method achieve better segmentation performance under the same backbone network and task settings on both datasets, demonstrating the effectiveness and robustness of our method and potential transferability for other medical image segmentation tasks.

preprint2015arXiv

Variation in electron work function with temperature and its effect on the Young's modulus of metals

Properties of metals are fundamentally determined by their electron behavior, which is largely reflected by the electron work function ($φ$). Recent studies have demonstrated that many properties of metallic materials are directly related to $φ$, which may provide a simple but fundamental parameter for material design. Since material properties are affected by temperature, in this article a simple model is proposed to correlate the work function with temperature, expressed as $φ(T)=φ_{0} -γ\frac{(k_{B} T)^{2}}{φ_{0}} $, where $γ$ varies with the crystal structure. This $φ$-T relationship helps determine and understand the dependence of metal properties on temperature on a feasible electronic base. As a sample application, the established relationship is applied to determine the dependence of Young's modulus of metals on temperature. The proposed relationship is consistent with experimental observations.

Dongyang Li

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

AMATA: Adaptive Multi-Agent Trajectory Alignment for Knowledge-Intensive Question Answering

Transformers Can Implement Preconditioned Richardson Iteration for In-Context Gaussian Kernel Regression

HiCLRE: A Hierarchical Contrastive Learning Framework for Distantly Supervised Relation Extraction

Re-laminarization of elastic turbulence

Uncertainty-Guided Mutual Consistency Learning for Semi-Supervised Medical Image Segmentation

Variation in electron work function with temperature and its effect on the Young's modulus of metals