Source author record

Dong Fang

Dong Fang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computer Vision Information Theory math.IT Computation and Language eess.IV

Catalog footprint

What is connected

5works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

GRAVITY: Architecture-Agnostic Structured Anchoring for Long-Horizon Conversational Memory

Long-horizon conversational agents rely on memory systems with increasingly sophisticated retrieval mechanisms. However, retrieved fragments are typically fed to the language model as unstructured text, lacking the relational, temporal, and thematic structures essential for complex reasoning. To bridge this reasoning gap, we introduce GRAVITY (\textbf{G}eneration-time \textbf{R}elational \textbf{A}nchoring \textbf{V}ia \textbf{I}njected \textbf{T}opological Memor\textbf{Y}), a plug-and-play structured memory module. GRAVITY extracts three complementary knowledge representations from raw conversational utterances: entity profiles grounded in relational graphs, temporal event tuples linked into causal traces, and cross-session topic summaries. At generation time, it injects these representations into the host system's prompt as structured anchoring contexts. This approach effectively synthesizes scattered evidence into a coherent, query-relevant context without requiring any architectural modifications to the host model. Extensive evaluations across five diverse memory systems on the LongMemEval and LoCoMo benchmarks demonstrate the efficacy of our approach. On average, GRAVITY improves LLM-judge accuracy by 7.5--10.1%. Gains are inversely correlated with baseline strength: the weakest host improves by 12.2% while the strongest still gains 3.8--5.7%. These findings establish structured context anchoring as a broadly effective, architecture-agnostic augmentation paradigm for long-horizon conversational memory.

preprint2026arXiv

Vision-Language Introspection: Mitigating Overconfident Hallucinations in MLLMs via Interpretable Bi-Causal Steering

Object hallucination critically undermines the reliability of Multimodal Large Language Models, often stemming from a fundamental failure in cognitive introspection, where models blindly trust linguistic priors over specific visual evidence. Existing mitigations remain limited: contrastive decoding approaches operate superficially without rectifying internal semantic misalignments, while current latent steering methods rely on static vectors that lack instance-specific precision. We introduce Vision-Language Introspection (VLI), a training-free inference framework that simulates a metacognitive self-correction process. VLI first performs Attributive Introspection to diagnose hallucination risks via probabilistic conflict detection and localize the causal visual anchors. It then employs Interpretable Bi-Causal Steering to actively modulate the inference process, dynamically isolating visual evidence from background noise while neutralizing blind confidence through adaptive calibration. VLI achieves state-of-the-art performance on advanced models, reducing object hallucination rates by 12.67% on MMHal-Bench and improving accuracy by 5.8% on POPE.

preprint2022arXiv

RFormer: Transformer-based Generative Adversarial Network for Real Fundus Image Restoration on A New Clinical Benchmark

Ophthalmologists have used fundus images to screen and diagnose eye diseases. However, different equipments and ophthalmologists pose large variations to the quality of fundus images. Low-quality (LQ) degraded fundus images easily lead to uncertainty in clinical screening and generally increase the risk of misdiagnosis. Thus, real fundus image restoration is worth studying. Unfortunately, real clinical benchmark has not been explored for this task so far. In this paper, we investigate the real clinical fundus image restoration problem. Firstly, We establish a clinical dataset, Real Fundus (RF), including 120 low- and high-quality (HQ) image pairs. Then we propose a novel Transformer-based Generative Adversarial Network (RFormer) to restore the real degradation of clinical fundus images. The key component in our network is the Window-based Self-Attention Block (WSAB) which captures non-local self-similarity and long-range dependencies. To produce more visually pleasant results, a Transformer-based discriminator is introduced. Extensive experiments on our clinical benchmark show that the proposed RFormer significantly outperforms the state-of-the-art (SOTA) methods. In addition, experiments of downstream tasks such as vessel segmentation and optic disc/cup detection demonstrate that our proposed RFormer benefits clinical fundus image analysis and applications. The dataset, code, and models are publicly available at https://github.com/dengzhuo-AI/Real-Fundus

preprint2016arXiv

Lattice Partition Multiple Access: A New Method of Downlink Non-orthogonal Multiuser Transmissions

In this paper, we propose a new downlink non-orthogonal multiuser superposition transmission scheme for future 5G cellular networks, which we refer to as the lattice partition multiple access (LPMA). In this proposed design, the base station transmits multilevel lattice codes for multiple users. Each user's code level corresponds to a distinct prime and is weighted by a product of all distinct primes of the other users excluding its own. Due to the structural property of lattice codes, each user can cancel out the interference from the other code levels by using the modulo lattice operation in a successive/parallel manner. LPMA can overcome the drawback of non-orthogonal multiple access (NOMA), which arises when users have similar channel conditions. We demonstrate that the proposed LPMA shows a clear throughput enhancement over the current NOMA scheme.

preprint2012arXiv

Linear Physical-layer Network Coding in Galois Field for Rayleigh fading 2-Way Relay Channels

In this paper, we propose a novel linear physicallayer network coding (LPNC) for Rayleigh fading 2-way relay channels (2-WRC). Rather than the simple modulo-2 (bit-XOR) operation, the relay directly maps the superimposed signal of the two users into the linear network coded combination in GF(2^2) by multiplying the user data by properly selected generator matrix. We derive the constellation constrained capacities for LPNC and 5QAM denoise-and forward (5QAM-DNF) [2] and further explicitly characterize the capacity difference between LPNC and 5QAM-DNF. Based on our analysis and simulation, we highlight that without employing the irregular 5QAM mapping and sacrificing the spectral efficiency, our LPNC in GF(2^2) is superior to 5QAM-DNF scheme in low SNR regime while they achieve equal performance in the the moderate-to-high SNR regime.