Source author record

Yuqi Wang

Yuqi Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Machine Learning cond-mat.mes-hall cond-mat.quant-gas cond-mat.str-el Cryptography and Security eess.SP physics.optics quant-ph Robotics Social and Information Networks

Catalog footprint

What is connected

14works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

D-Artemis: A Deliberative Cognitive Framework for Mobile GUI Multi-Agents

Graphical User Interface (GUI) agents aim to automate a wide spectrum of human tasks by emulating user interaction. Despite rapid advancements, current approaches are hindered by several critical challenges: data bottleneck in end-to-end training, high cost of delayed error detection, and risk of contradictory guidance. Inspired by the human cognitive loop of Thinking, Alignment, and Reflection, we present D-Artemis -- a novel deliberative framework in this paper. D-Artemis leverages a fine-grained, app-specific tip retrieval mechanism to inform its decision-making process. It also employs a proactive Pre-execution Alignment stage, where Thought-Action Consistency (TAC) Check module and Action Correction Agent (ACA) work in concert to mitigate the risk of execution failures. A post-execution Status Reflection Agent (SRA) completes the cognitive loop, enabling strategic learning from experience. Crucially, D-Artemis enhances the capabilities of general-purpose Multimodal large language models (MLLMs) for GUI tasks without the need for training on complex trajectory datasets, demonstrating strong generalization. D-Artemis establishes new state-of-the-art (SOTA) results across both major benchmarks, achieving a 75.8% success rate on AndroidWorld and 96.8% on ScreenSpot-V2. Extensive ablation studies further demonstrate the significant contribution of each component to the framework.

preprint2026arXiv

Generation Navigator: A State-Aware Agentic Framework for Image Generation

Despite rapid advances in text-to-image generation, faithfully realizing user intent remains challenging, often requiring manual multi-turn trial and error. To automate this process, existing systems rely on either simple prompt rewriting or closed-loop agents driven by hand-crafted rules, rather than learning to adapt actions to the evolving generation process. In this paper, we reformulate image generation as a state-conditioned action-making problem and propose Generation Navigator, a multi-turn T2I agent that learns to dynamically steer the generation trajectory and output the next action. However, training this agent via reinforcement learning introduces a critical credit assignment challenge: naively rewarding a trajectory based solely on a single state assigns equal credit to all actions in the rollout, ignores the quality dynamics across turns, and fails to distinguish actions that improve the trajectory from those that degrade it or waste turns without progress. We resolve this with PRE-GRPO (Peak-Retention-Efficiency Group Relative Policy Optimization), a trajectory-level reinforcement learning objective that explicitly rewards discovering a high-quality image (Peak), avoiding subsequent quality degradation across turns (Retention), and minimizing unnecessary turns (Efficiency). Experiments show substantial improvements across benchmarks, reaching a WISE score of 0.90 and 79.06% reasoning accuracy on T2I-ReasonBench.

preprint2026arXiv

Image-to-Video Diffusion: From Foundations to Open Frontiers

Diffusion-based \textit{image-to-video} (I2V) generation has become a central direction in generative models by turning a reference image, with optional conditions, into a temporally coherent video. Compared with broader video generation settings, this task places stricter demands on content consistency, identity preservation, and motion coherence. Although the literature grows rapidly, existing works mostly discuss I2V generation within broader topics and still lack a dedicated taxonomy together with a systematic analysis centered on this field. This work addresses that gap by treating diffusion I2V generation as a standalone subject. It first reviews the task formulation, model architectures, datasets, and evaluation metrics, and then organizes existing methods through a taxonomy based on architecture and training paradigm. It further distills four core designs, namely condition encoding, temporal modeling, noise prior design, and spatial-temporal upsampling, and discusses representative application scenarios together with major open challenges.

preprint2026arXiv

R$^3$L: Reasoning 3D Layouts from Relative Spatial Relations

Relative spatial relations provide a compact representation of spatial structure and are fundamental to relative spatial reasoning in 3D layout generation. Recent works leverage Multimodal Large Language Models (MLLMs) to infer such relations, but the inferred relations are often unreliable and are typically handled with post-hoc heuristics. In this paper, we propose R$^3$L, a general framework that improves the reliability and consistency of relative spatial reasoning for 3D layout generation. Our key motivation is that multi-hop reasoning requires repeated reference-frame transformations, which accumulate errors in inferred relations and lead to semantic and metric drift. To mitigate this, we propose invariant spatial decomposition to break coupled relation chains, and consistent spatial imagination to promote self-consistency through an imagine-and-revise loop. We further introduce supportive spatial optimization to ease pose optimization via global-to-local coordinate re-parameterization. Extensive experiments across diverse scene types and instructions demonstrate that R$^3$L produces more physically feasible and semantically consistent layouts. Notably, our analysis shows that resolving frame-induced inconsistencies is crucial for reliable multi-hop relative spatial reasoning. The code is available at https://github.com/Neal2020GitHub/R3L.

preprint2022arXiv

A Robust Ensemble Model for Patasitic Egg Detection and Classification

Intestinal parasitic infections, as a leading causes of morbidity worldwide, still lacks time-saving, high-sensitivity and user-friendly examination method. The development of deep learning technique reveals its broad application potential in biological image. In this paper, we apply several object detectors such as YOLOv5 and variant cascadeRCNNs to automatically discriminate parasitic eggs in microscope images. Through specially-designed optimization including raw data augmentation, model ensemble, transfer learning and test time augmentation, our model achieves excellent performance on challenge dataset. In addition, our model trained with added noise gains a high robustness against polluted input, which further broaden its applicability in practice.

preprint2022arXiv

Emergence of Machine Language: Towards Symbolic Intelligence with Neural Networks

Representation is a core issue in artificial intelligence. Humans use discrete language to communicate and learn from each other, while machines use continuous features (like vector, matrix, or tensor in deep neural networks) to represent cognitive patterns. Discrete symbols are low-dimensional, decoupled, and have strong reasoning ability, while continuous features are high-dimensional, coupled, and have incredible abstracting capabilities. In recent years, deep learning has developed the idea of continuous representation to the extreme, using millions of parameters to achieve high accuracies. Although this is reasonable from the statistical perspective, it has other major problems like lacking interpretability, poor generalization, and is easy to be attacked. Since both paradigms have strengths and weaknesses, a better choice is to seek reconciliation. In this paper, we make an initial attempt towards this direction. Specifically, we propose to combine symbolism and connectionism principles by using neural networks to derive a discrete representation. This process is highly similar to human language, which is a natural combination of discrete symbols and neural systems, where the brain processes continuous signals and represents intelligence via discrete language. To mimic this functionality, we denote our approach as machine language. By designing an interactive environment and task, we demonstrated that machines could generate a spontaneous, flexible, and semantic language through cooperation. Moreover, through experiments we show that discrete language representation has several advantages compared with continuous feature representation, from the aspects of interpretability, generalization, and robustness.

preprint2022arXiv

Fluctuation assisted collapses of Bose-Einstein condensates

We study the collapse dynamics of a Bose-Einstein condensate subjected to a sudden change of the scattering length to a negative value by adopting the self-consistent Gaussian state theory for mixed states. Compared to the Gross-Pitaevskii and the Hartree-Fock-Bogoliubov approaches, both fluctuations and three-body loss are properly treated in our theory. We find a new type of collapse assisted by fluctuations which amplify the attractive interaction between atoms. Moreover, the calculation of the fluctuated atoms, the entropy, and the second-order correlation function showed that the collapsed gas was significantly deviated from a pure state.

preprint2022arXiv

Hermite-Gaussian-mode coherently composed states and deep learning based free-space optical communication link

In laser-based free-space optical communication, besides OAM beams, Hermite-Gaussian (HG) modes or HG-mode coherently composed states (HG-MCCS) can also be adopted as the information carrier to extend the channel capacity with the spatial pattern based encoding and decoding link. The light field of HG-MCCS is mainly determined by three independent parameters, including indexes of HG modes, relative initial phases between two eigenmodes, and scale coefficients of the eigenmodes, which can obtain a large number of effective coding modes at a low mode order. The beam intensity distributions of the HG-MCCSs have obvious distinguishable spatial characteristics and can keep propagation invariance, which are convenient to be decoded by the convolutional neural network (CNN) based image recognition method. We experimentally utilize HG-MCCS to realize a communication link including encoding, transmission under atmospheric turbulence (AT), and decoding based on CNN. With the index order of eigenmodes within six, 125 HG-MCCS are generated and used for information encoding, and the average recognition accuracy reached 99.5% for non-AT conditions. For the 125-level color images transmission, the error rate of the system is less than 1.8% even under the weak AT condition. Our work provides a useful basis for the future combination of dense data communication and artificial intelligence technology.

preprint2022arXiv

Hessian-Free Second-Order Adversarial Examples for Adversarial Learning

Recent studies show deep neural networks (DNNs) are extremely vulnerable to the elaborately designed adversarial examples. Adversarial learning with those adversarial examples has been proved as one of the most effective methods to defend against such an attack. At present, most existing adversarial examples generation methods are based on first-order gradients, which can hardly further improve models' robustness, especially when facing second-order adversarial attacks. Compared with first-order gradients, second-order gradients provide a more accurate approximation of the loss landscape with respect to natural examples. Inspired by this, our work crafts second-order adversarial examples and uses them to train DNNs. Nevertheless, second-order optimization involves time-consuming calculation for Hessian-inverse. We propose an approximation method through transforming the problem into an optimization in the Krylov subspace, which remarkably reduce the computational complexity to speed up the training procedure. Extensive experiments conducted on the MINIST and CIFAR-10 datasets show that our adversarial learning with second-order adversarial examples outperforms other fisrt-order methods, which can improve the model robustness against a wide range of attacks.

preprint2022arXiv

Neuro-Symbolic Learning: Principles and Applications in Ophthalmology

Neural networks have been rapidly expanding in recent years, with novel strategies and applications. However, challenges such as interpretability, explainability, robustness, safety, trust, and sensibility remain unsolved in neural network technologies, despite the fact that they will unavoidably be addressed for critical applications. Attempts have been made to overcome the challenges in neural network computing by representing and embedding domain knowledge in terms of symbolic representations. Thus, the neuro-symbolic learning (NeSyL) notion emerged, which incorporates aspects of symbolic representation and bringing common sense into neural networks (NeSyL). In domains where interpretability, reasoning, and explainability are crucial, such as video and image captioning, question-answering and reasoning, health informatics, and genomics, NeSyL has shown promising outcomes. This review presents a comprehensive survey on the state-of-the-art NeSyL approaches, their principles, advances in machine and deep learning algorithms, applications such as opthalmology, and most importantly, future perspectives of this emerging field.

preprint2022arXiv

Portrait of locally driven quantum phase transition cascades in a molecular monolayer

Strongly interacting electrons in layered materials give rise to a plethora of emergent phenomena, such as unconventional superconductivity. heavy fermions, and spin textures with non-trivial topology. Similar effects can also be observed in bulk materials, but the advantage of two dimensional (2D) systems is the combination of local accessibility by microscopic techniques and tuneability. In stacks of 2D materials, for example, the twist angle can be employed to tune their properties. However, while material choice and twist angle are global parameters, the full complexity and potential of such correlated 2D electronic lattices will only reveal itself when tuning their parameters becomes possible on the level of individual lattice sites. Here, we discover a lattice of strongly correlated electrons in a perfectly ordered 2D supramolecular network by driving this system through a cascade of quantum phase transitions using a movable atomically sharp electrostatic gate. As the gate field is increased, the molecular building blocks change from a Kondo-screened to a paramagnetic phase one-by-one, enabling us to reconstruct their complex interactions in detail. We anticipate that the supramolecular nature of the system will in future allow to engineer quantum correlations in arbitrary patterned structures.

preprint2020arXiv

Enhancing Rumor Detection in Social Media Using Dynamic Propagation Structures

Social media, such as Facebook and Twitter, has become one of the most important channels for information dissemination. However, these social media platforms are often misused to spread rumors, which has brought about severe social problems, and consequently, there are urgent needs for automatic rumor detection techniques. Existing work on rumor detection concentrates more on the utilization of textual features, but diffusion structure itself can provide critical propagating information in identifying rumors. Previous works which have considered structural information, only utilize limited propagation structures. Moreover, few related research has considered the dynamic evolution of diffusion structures. To address these issues, in this paper, we propose a Neural Model using Dynamic Propagation Structures (NM-DPS) for rumor detection in social media. Firstly, we propose a partition approach to model the dynamic evolution of propagation structure and then use temporal attention based neural model to learn a representation for the dynamic structure. Finally, we fuse the structure representation and content features into a unified framework for effective rumor detection. Experimental results on two real-world social media datasets demonstrate the salience of dynamic propagation structure information and the effectiveness of our proposed method in capturing the dynamic structure.

preprint2020arXiv

Symmetry mediated tunable molecular magnetism on a 2D material

The induction of unconventional superconductivity by twisting two layers of graphene a small angle was groundbreaking1, and since then has attracted widespread attention to novel phenomena caused by lattice or angle mismatch between two-dimensional (2D) materials2. While many studies address the influence of angle mismatch between layered 2D materials3-5 , the impact of the absorption alignment on the physical properties of planar molecules on 2D substrates has not been studied in detail. Using scanning probe microscopy (SPM) we show that individual cobalt phthalocyanine (CoPc) molecules adsorbed on the layered superconductor 2H-NbSe2 change drastically their charge and spin state when the symmetry axes of the molecule and the substrate are twisted with respect to each other. The CoPc changes from an effective spin-1/2 as found in gas-phase6 to a molecule with non-magnetic ground-state. On the latter we observe a singlet-triplet transition originating from an antiferromagnetic interaction between the central-ion spin and a distributed magnetic moment on the molecular ligands. Because the Ising superconductor 2H-NbSe2 lacks inversion symmetry and has large spin-orbit coupling7 this intramolecular magnetic exchange has significant non-collinear Dzyaloshinskii-Moriya (DM)8, 9 contribution.

preprint2015arXiv

Symmetric Ternary Quantum Homomorphic Encryption Schemes Based on the Ternary Quantum One-Time Pad

Aiming at a ternary quantum logic circuit, four symmetric ternary quantum homomorphic encryption schemes, based on ternary quantum one-time protocol, were presented. First, for a one-qutrit rotation gate, a homomorphic quantum encryption scheme was constructed. Second, in view of the synthesis of a 3x3 general unitary transformation, another one-qutrit quantum homomorphic encryption scheme was proposed. Third, according to the one-qutrit scheme, the two-qutrit quantum homomorphic encryption scheme about GCX(m') gate was constructed and was further generalized to the n-qutrit unitary matrix case. Finally, the security of these schemes was analyzed from two perspectives. It could be concluded that the attacker can correctly guess the encryption key with a maximum probability ${p_k} = {1 \mathord{\left/ {\vphantom {1 {3^{3n}}}} \right. \kern-\nulldelimiterspace} {3^{3n}}}$, thus it can better protect the privacy of users' data. Moreover these schemes can be well integrated into future quantum remote server architecture, and the computational security of the user's private quantum information can be well solved in a distributed computing environment.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint