Source author record

Jiaqi Zhang

Jiaqi Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision Distributed, Parallel, and Cluster Computing math.OC Systems and Control eess.SP eess.SY Multiagent Systems eess.IV Artificial Intelligence cond-mat.mtrl-sci cs.CY Information Theory math.IT Neurons and Cognition physics.geo-ph physics.optics

Catalog footprint

What is connected

18works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Iterative Multimodal Retrieval-Augmented Generation for Medical Question Answering

Medical retrieval-augmented generation (RAG) systems typically operate on text chunks extracted from biomedical literature, discarding the rich visual content (tables, figures, structured layouts) of original document pages. We propose MED-VRAG, an iterative multimodal RAG framework that retrieves and reasons over PMC document page images instead of OCR'd text. The system pairs ColQwen2.5 patch-level page embeddings with a sharded MapReduce LLM filter, scaling to ~350K pages while keeping Stage-1 retrieval under 30 ms via an offline coarse-to-fine index (C=8 centroids per page, ANN over centroids, exact two-way scoring on the top-R shortlist). A vision-language model (VLM) then iteratively refines its query and accumulates evidence in a memory bank across up to 3 reasoning rounds, with a single iteration costing ~15.9 s and the full three-round pipeline ~47.8 s on 4xA100. Across four medical QA benchmarks (MedQA, MedMCQA, PubMedQA, MMLU-Med), MEDVRAG reaches 78.6% average accuracy. Under controlled comparison with the same Qwen2.5-VL-32B backbone, retrieval contributes a +5.8 point gain over the no-retrieval baseline; we also note a +1.8 point edge over MedRAG + GPT-4 (76.8%), with the caveat that this is a cross-paper rather than head-to-head comparison. Ablations isolate +1.0 from page-image vs text-chunk retrieval, +1.5 from iteration, and +1.0 from the memory bank.

preprint2026arXiv

SoLAR: Error-Resilient Streamable Long-Horizon Free-Viewpoint Video Reconstruction with Anchor Activation and Latent Recalibration

Free-Viewpoint Video (FVV) has emerged as a cornerstone of next-generation immersive media systems and attracted widespread attention. Previous methods primarily focus on short video sequences and suffer from significant performance degradation when processing long-horizon free-viewpoint video (LFVV). Motivated by bit allocation theory, we analyze dynamic-anchor-based volumetric video representation within a rate-distortion optimization framework and propose \textbf{SoLAR}, which is the first error-resilient streamable FVV framework that maintains stable reconstruction quality on long sequences without requiring group-of-pictures partitioning. We propose the Anchor Activation Dynamics (AAD), which enables dynamic anchors to model non-rigid transformations by dynamically activating informative anchors and suppressing redundant ones. Furthermore, we introduce Latent Discrepancy Aware Recalibration (LaDAR), which is a mechanism to identify discrepancies between latent representations and recalibrate the correspondences encoded in the network, effectively mitigating error propagation in LFVV without compromising real-time performance or storage compactness. Extensive experiments demonstrate that \textbf{SoLAR} achieves state-of-the-art reconstruction performance while maintaining minimum storage overhead, which provides a new direction for LFVV reconstruction and advances the practical deployment of immersive systems. Demo free-viewpoint videos are provided in the supplementary material.

preprint2026arXiv

Spark3R: Asymmetric Token Reduction Makes Fast Feed-Forward 3D Reconstruction

Feed-forward 3D reconstruction models based on Vision Transformers can directly estimate scene geometry and camera poses from a small set of input images, but scaling them to video inputs with hundreds or thousands of frames remains challenging due to the quadratic cost of global attention layers. Recent token-merging methods accelerate these models by compressing the token sequence within the global attention layers, but they apply a uniform reduction to query tokens and key-value tokens, ignoring their functionally distinct roles in 3D reconstruction. In this work, we identify a key property of feed-forward 3D reconstruction models: query tokens encode view-specific geometric requests and are sensitive to compression, while key-value tokens represent shared scene context and tolerate aggressive compression. Guided by this insight, we propose Spark3R, a training-free acceleration framework that decouples the compression of query tokens and key-value tokens by assigning distinct reduction factors, with intra-group token merging applied to query tokens and lightweight token pruning to key-value tokens. Additionally, Spark3R adaptively adjusts the key-value reduction factor across layers, further improving the quality-efficiency trade-off. As a plug-and-play framework requiring no retraining, Spark3R integrates directly into multiple pretrained feed-forward 3D reconstruction models, including VGGT, $π^3$, Depth-Anything-3, and VGGT-$Ω$, and achieves up to $28\times$ speedup on 1,000-frame inputs while maintaining competitive reconstruction quality.

preprint2022arXiv

Decision Forest Based EMG Signal Classification with Low Volume Dataset Augmented with Random Variance Gaussian Noise

Electromyography signals can be used as training data by machine learning models to classify various gestures. We seek to produce a model that can classify six different hand gestures with a limited number of samples that generalizes well to a wider audience while comparing the effect of our feature extraction results on model accuracy to other more conventional methods such as the use of AR parameters on a sliding window across the channels of a signal. We appeal to a set of more elementary methods such as the use of random bounds on a signal, but desire to show the power these methods can carry in an online setting where EMG classification is being conducted, as opposed to more complicated methods such as the use of the Fourier Transform. To augment our limited training data, we used a standard technique, known as jitter, where random noise is added to each observation in a channel wise manner. Once all datasets were produced using the above methods, we performed a grid search with Random Forest and XGBoost to ultimately create a high accuracy model. For human computer interface purposes, high accuracy classification of EMG signals is of particular importance to their functioning and given the difficulty and cost of amassing any sort of biomedical data in a high volume, it is valuable to have techniques that can work with a low amount of high-quality samples with less expensive feature extraction methods that can reliably be carried out in an online application.

preprint2022arXiv

Distributed Adaptive Newton Methods with Global Superlinear Convergence

This paper considers the distributed optimization problem where each node of a peer-to-peer network minimizes a finite sum of objective functions by communicating with its neighboring nodes. In sharp contrast to the existing literature where the fastest distributed algorithms converge either with a global linear or a local superlinear rate, we propose a distributed adaptive Newton (DAN) algorithm with a global quadratic convergence rate. Our key idea lies in the design of a finite-time set-consensus method with Polyak's adaptive stepsize. Moreover, we introduce a low-rank matrix approximation (LA) technique to compress the innovation of Hessian matrix so that each node only needs to transmit message of dimension $\mathcal{O}(p)$ (where $p$ is the dimension of decision vectors) per iteration, which is essentially the same as that of first-order methods. Nevertheless, the resulting DAN-LA converges to an optimal solution with a global superlinear rate. Numerical experiments on logistic regression problems are conducted to validate their advantages over existing methods.

preprint2022arXiv

Polarized deep diffractive neural network for classification, generation, multiplexing and de-multiplexing of orbital angular momentum modes

The multiplexing and de-multiplexing of orbital angular momentum (OAM) beams are critical issues in optical communication. Optical diffractive neural networks have been introduced to perform classification, generation, multiplexing and de-multiplexing of OAM beams. However, conventional diffractive neural networks cannot handle OAM modes with a varying spatial distribution of polarization directions. Herein, we propose a polarized optical deep diffractive neural network that is designed based on the concept of rectangular micro-structure meta-material. Our proposed polarized optical diffractive neural network is trained to classify, generate, multiplex and de-multiplex polarized OAM beams.The simulation results show that our network framework can successfully classify 14 kinds of orthogonally polarized vortex beams and de-multiplex the hybrid OAM beams into Gauss beams at two, three and four spatial positions respectively. 6 polarized OAM beams with identical total intensity and 8 cylinder vector beams with different topology charges also have been classified effectively. Additionally, results reveal that the network can generate hybrid OAM beams with high quality and multiplex two polarized linear beams into 8 kinds of cylinder vector beams.

preprint2022arXiv

RunnerDNA: Interpretable indicators and model to characterize human activity pattern and individual difference

Human activity analysis based on sensor data plays a significant role in behavior sensing, human-machine interaction, health care, and so on. The current research focused on recognizing human activity and posture at the activity pattern level, neglecting the effective fusion of multi-sensor data and assessing different movement styles at the individual level, thus introducing the challenge to distinguish individuals in the same movement. In this study, the concept of RunnerDNA, consisting of five interpretable indicators, balance, stride, steering, stability, and amplitude, was proposed to describe human activity at the individual level. We collected smartphone multi-sensor data from 33 volunteers who engaged in physical activities such as walking, running, and bicycling and calculated the data into five indicators of RunnerDNA. The indicators were then used to build random forest models and recognize movement activities and the identity of users. The results show that the proposed model has high accuracy in identifying activities (accuracy of 0.679) and is also effective in predicting the identity of running users. Furthermore, the accuracy of the human activity recognition model has significant improved by combing RunnerDNA and two motion feature indicators, velocity, and acceleration. Results demonstrate that RunnerDNA is an effective way to describe an individual's physical activity and helps us understand individual differences in sports style, and the significant differences in balance and amplitude between men and women were found.

preprint2021arXiv

Asynchronous Networked Aggregative Games

We propose a fully asynchronous networked aggregative game (Asy-NAG) where each player minimizes a cost function that depends on its local action and the aggregate of all players' actions. In sharp contrast to the existing NAGs, each player in our Asy-NAG can compute an estimate of the aggregate action at any wall-clock time by only using (possibly stale) information from nearby players of a directed network. Such an asynchronous update does not require any coordination among players. Moreover, we design a novel distributed algorithm with an aggressive mechanism for each player to adaptively adjust the optimization stepsize per update. Particularly, the slow players in terms of updating their estimates smartly increase their stepsizes to catch up with the fast ones. Then, we develop an augmented system approach to address the asynchronicity and the information delays between players, and rigorously show the convergence to a Nash equilibrium of the Asy-NAG via a perturbed coordinate algorithm which is also of independent interest. Finally, we evaluate the performance of the distributed algorithm through numerical simulations.

preprint2021arXiv

Fully Asynchronous Distributed Optimization with Linear Convergence in Directed Networks

We consider the distributed optimization problem, the goal of which is to minimize the sum of local objective functions over a directed network. Though it has been widely studied recently, most of the existing algorithms are designed for synchronized or randomly activated implementation, which may create deadlocks in practice. In sharp contrast, we propose a \emph{fully} asynchronous push-pull gradient algorithm (APPG) where each node updates without waiting for any other node by using (possibly stale) information from neighbors. Thus, it is both deadlock-free and robust to any bounded communication delay. Moreover, we construct two novel augmented networks to theoretically evaluate its performance from the worst-case point of view and show that if local functions have Lipschitz-continuous gradients and their sum satisfies the Polyak-Łojasiewicz condition (convexity is not required), each node of APPG converges to the same optimal solution at a linear rate of $\mathcal{O}(λ^k)$, where $λ\in(0,1)$ and the virtual counter $k$ increases by one no matter which node updates. This largely elucidates its linear speedup efficiency and shows its advantage over the synchronous version. Finally, the performance of APPG is numerically validated via a logistic regression problem on the \emph{Covertype} dataset.

preprint2021arXiv

Fully Asynchronous Policy Evaluation in Distributed Reinforcement Learning over Networks

This paper proposes a \emph{fully asynchronous} scheme for the policy evaluation problem of distributed reinforcement learning (DisRL) over directed peer-to-peer networks. Without waiting for any other node of the network, each node can locally update its value function at any time by using (possibly delayed) information from its neighbors. This is in sharp contrast to the gossip-based scheme where a pair of nodes concurrently update. Though the fully asynchronous setting involves a difficult multi-timescale decision problem, we design a novel stochastic average gradient (SAG) based distributed algorithm and develop a push-pull augmented graph approach to prove its exact convergence at a linear rate of $\mathcal{O}(c^k)$ where $c\in(0,1)$ and $k$ increases by one no matter on which node updates. Finally, numerical experiments validate that our method speeds up linearly with respect to the number of nodes, and is robust to straggler nodes.

preprint2020arXiv

Bayesian Filtering with Unknown Sensor Measurement Losses

This work studies the state estimation problem of a stochastic nonlinear system with unknown sensor measurement losses. If the estimator knows the sensor measurement losses of a linear Gaussian system, the minimum variance estimate is easily computed by the celebrated intermittent Kalman filter (IKF). However, this will no longer be the case when the measurement losses are unknown and/or the system is nonlinear or non-Gaussian. By exploiting the binary property of the measurement loss process and the IKF, we design three suboptimal filters for the state estimation, i.e., BKF-I, BKF-II and RBPF. The BKF-I is based on the MAP estimator of the measurement loss process and the BKF-II is derived by estimating the conditional loss probability. The RBPF is a particle filter based algorithm which marginalizes out the loss process to increase the efficiency of particles. All the proposed filters can be easily implemented in recursive forms. Finally, a linear system, a target tracking system and a quadrotor's path control problem are included to illustrate their effectiveness, and show the tradeoff between computational complexity and estimation accuracy of the proposed filters.

preprint2020arXiv

DARWIN: A Highly Flexible Platform for Imaging Research in Radiology

To conduct a radiomics or deep learning research experiment, the radiologists or physicians need to grasp the needed programming skills, which, however, could be frustrating and costly when they have limited coding experience. In this paper, we present DARWIN, a flexible research platform with a graphical user interface for medical imaging research. Our platform is consists of a radiomics module and a deep learning module. The radiomics module can extract more than 1000 dimension features(first-, second-, and higher-order) and provided many draggable supervised and unsupervised machine learning models. Our deep learning module integrates state of the art architectures of classification, detection, and segmentation tasks. It allows users to manually select hyperparameters, or choose an algorithm to automatically search for the best ones. DARWIN also offers the possibility for users to define a custom pipeline for their experiment. These flexibilities enable radiologists to carry out various experiments easily.

preprint2020arXiv

Decentralized Stochastic Gradient Tracking for Non-convex Empirical Risk Minimization

This paper studies a decentralized stochastic gradient tracking (DSGT) algorithm for non-convex empirical risk minimization problems over a peer-to-peer network of nodes, which is in sharp contrast to the existing DSGT only for convex problems. To ensure exact convergence and handle the variance among decentralized datasets, each node performs a stochastic gradient (SG) tracking step by using a mini-batch of samples, where the batch size is designed to be proportional to the size of the local dataset. We explicitly evaluate the convergence rate of DSGT with respect to the number of iterations in terms of algebraic connectivity of the network, mini-batch size, gradient variance, etc. Under certain conditions, we further show that DSGT has a network independence property in the sense that the network topology only affects the convergence rate up to a constant factor. Hence, the convergence rate of DSGT can be comparable to the centralized SGD method. Moreover, a linear speedup of DSGT with respect to the number of nodes is achievable for some scenarios. Numerical experiments for neural networks and logistic regression problems on CIFAR-10 finally illustrate the advantages of DSGT.

preprint2020arXiv

Defect segmentation: Mapping tunnel lining internal defects with ground penetrating radar data using a convolutional neural network

This research proposes a Ground Penetrating Radar (GPR) data processing method for non-destructive detection of tunnel lining internal defects, called defect segmentation. To perform this critical step of automatic tunnel lining detection, the method uses a CNN called Segnet combined with the Lovász softmax loss function to map the internal defect structure with GPR synthetic data, which improves the accuracy, automation and efficiency of defects detection. The novel method we present overcomes several difficulties of traditional GPR data interpretation as demonstrated by an evaluation on both synthetic and real datas -- to verify the method on real data, a test model containing a known defect was designed and built and GPR data was obtained and analyzed.

preprint2020arXiv

Distributed Dual Gradient Tracking for Resource Allocation in Unbalanced Networks

This paper proposes a distributed dual gradient tracking algorithm (DDGT) to solve resource allocation problems over an unbalanced network, where each node in the network holds a private cost function and computes the optimal resource by interacting only with its neighboring nodes. Our key idea is the novel use of the distributed push-pull gradient algorithm (PPG) to solve the dual problem of the resource allocation problem. To study the convergence of the DDGT, we first establish the sublinear convergence rate of PPG for non-convex objective functions, which advances the existing results on PPG as they require the strong-convexity of objective functions. Then we show that the DDGT converges linearly for strongly convex and Lipschitz smooth cost functions, and sublinearly without the Lipschitz smoothness. Finally, experimental results suggest that DDGT outperforms existing algorithms.

preprint2019arXiv

Flight Control for UAV Loitering Over a Ground Target with Unknown Maneuver

This paper proposes a flight controller for an unmanned aerial vehicle (UAV) to loiter over a ground moving target (GMT). We are concerned with the scenario that the stochastically time-varying maneuver of the GMT is unknown to the UAV, which renders it challenging to estimate the GMT's motion state. Assuming that the state of the GMT is available, we first design a discrete-time Lyapunov vector field for the loitering guidance and then design a discrete-time integral sliding mode control (ISMC) to track the guidance commands. By modeling the maneuver process as a finite-state Markov chain, we propose a Rao-Blackwellised particle filter (RBPF), which only requires a few number of particles, to simultaneously estimate the motion state and the maneuver of the GMT with a camera or radar sensor. Then, we apply the principle of certainty equivalence to the ISMC and obtain the flight controller for completing the loitering task. Finally, the effectiveness and advantages of our controller are validated via simulations.

preprint2012arXiv

Novel Photovoltaic Phenomenon in Manganite/ZnO Heterostructure

In this paper, we report a novel photovoltaic phenomenon in a low cost manganite/ZnO p-n heterojunction grown on ITO glass substrate by pulsed laser depositon (PLD) under relative low growth temperature. The heterostructure ITO/La0.62Ca0.29K0.09MnO3(LCKMO)/ZnO/Al exhibits reproducible rectifying characteristics and consists of four parts: aluminum top electrode, ITO bottom electrode, manganite and ZnO semiconductor films. Moreover, the light current can generate under continuous laser (λ=325nm) irradiation. In this article, we investigate the influence of manganite and ZnO film thickness on the electrical and photoelectric characteristics of the heterostructure at room temperature. The maximum power conversion efficiency (PCE) is achieved not only when the LCKMO and ZnO layer is thin enough, but also when the full space charge layer is sufficient. We obtain the maximum value(0.0145%) of PCE when the thickness of LCKMO and ZnO layer is 25nm and 150nm, respectively. Under this condition, the open circuit voltage is 0.04V, which can be attributed to internal photoemission.

preprint2010arXiv

Impact of Mistiming on the Achievable Information Rate of Rake Receivers in DS-UWB Systems

In this paper, we investigate the impact of mistiming on the performance of Rake receivers in direct-sequence ultra-wideband (DS-UWB) systems from the perspective of the achievable information rate. A generalized expression for the performance degradation due to mistiming is derived. Monte Carlo simulations based on this expression are then conducted, which demonstrate that the performance loss has little relationship with the target achievable information rate, but varies significantly with the system bandwidth and the multipath diversity order, which reflects design trade-offs among the system timing requirement, the bandwidth and the implementation complexity. In addition, the performance degradations of Rake receivers with different multipath component selection schemes and combining techniques are compared. Among these receivers, the widely used maximal ratio combining (MRC) selective-Rake (S-Rake) suffers the largest performance loss in the presence of mistiming.

Jiaqi Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

Iterative Multimodal Retrieval-Augmented Generation for Medical Question Answering

SoLAR: Error-Resilient Streamable Long-Horizon Free-Viewpoint Video Reconstruction with Anchor Activation and Latent Recalibration

Spark3R: Asymmetric Token Reduction Makes Fast Feed-Forward 3D Reconstruction

Decision Forest Based EMG Signal Classification with Low Volume Dataset Augmented with Random Variance Gaussian Noise

Distributed Adaptive Newton Methods with Global Superlinear Convergence

Polarized deep diffractive neural network for classification, generation, multiplexing and de-multiplexing of orbital angular momentum modes

RunnerDNA: Interpretable indicators and model to characterize human activity pattern and individual difference

Asynchronous Networked Aggregative Games

Fully Asynchronous Distributed Optimization with Linear Convergence in Directed Networks

Fully Asynchronous Policy Evaluation in Distributed Reinforcement Learning over Networks

Bayesian Filtering with Unknown Sensor Measurement Losses

DARWIN: A Highly Flexible Platform for Imaging Research in Radiology

Decentralized Stochastic Gradient Tracking for Non-convex Empirical Risk Minimization

Defect segmentation: Mapping tunnel lining internal defects with ground penetrating radar data using a convolutional neural network

Distributed Dual Gradient Tracking for Resource Allocation in Unbalanced Networks

Flight Control for UAV Loitering Over a Ground Target with Unknown Maneuver

Novel Photovoltaic Phenomenon in Manganite/ZnO Heterostructure

Impact of Mistiming on the Achievable Information Rate of Rake Receivers in DS-UWB Systems