Source author record

Yiming Du

Yiming Du appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.SP Information Theory math.IT Computation and Language Computer Vision Software Engineering

Catalog footprint

What is connected

4works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models

Memory is essential for large vision-language models (LVLMs) to handle long, multimodal interactions, with two method directions providing this capability: long-context LVLMs and memory-augmented agents. However, no existing benchmark conducts a systematic comparison of the two on questions that genuinely require multimodal evidence. To close this gap, we introduce MEMLENS, a comprehensive benchmark for memory in multimodal multi-session conversations, comprising 789 questions across five memory abilities (information extraction, multi-session reasoning, temporal reasoning, knowledge update, and answer refusal) at four standard context lengths (32K-256K tokens) under a cross-modal token-counting scheme. An image-ablation study confirms that solving MEMLENS requires visual evidence: removing evidence images drops two frontier LVLMs below 2% accuracy on the 80.4% of questions whose evidence includes images. Evaluating 27 LVLMs and 7 memory-augmented agents, we find that long-context LVLMs achieve high short-context accuracy through direct visual grounding but degrade as conversations grow, whereas memory agents are length-stable but lose visual fidelity under storage-time compression. Multi-session reasoning caps most systems below 30%, and neither approach alone solves the task. These results motivate hybrid architectures that combine long-context attention with structured multimodal retrieval. Our code is available at https://github.com/xrenaf/MEMLENS.

preprint2026arXiv

SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving

We present SWE-Lego, a supervised fine-tuning (SFT) recipe designed to achieve state-ofthe-art performance in software engineering (SWE) issue resolving. In contrast to prevalent methods that rely on complex training paradigms (e.g., mid-training, SFT, reinforcement learning, and their combinations), we explore how to push the limits of a lightweight SFT-only approach for SWE tasks. SWE-Lego comprises three core building blocks, with key findings summarized as follows: 1) the SWE-Lego dataset, a collection of 32k highquality task instances and 18k validated trajectories, combining real and synthetic data to complement each other in both quality and quantity; 2) a refined SFT procedure with error masking and a difficulty-based curriculum, which demonstrably improves action quality and overall performance. Empirical results show that with these two building bricks alone,the SFT can push SWE-Lego models to state-of-the-art performance among open-source models of comparable size on SWE-bench Verified: SWE-Lego-Qwen3-8B reaches 42.2%, and SWE-Lego-Qwen3-32B attains 52.6%. 3) We further evaluate and improve test-time scaling (TTS) built upon the SFT foundation. Based on a well-trained verifier, SWE-Lego models can be significantly boosted--for example, 42.2% to 49.6% and 52.6% to 58.8% under TTS@16 for the 8B and 32B models, respectively.

preprint2021arXiv

Terahertz Multi-User Massive MIMO with Intelligent Reflecting Surface: Beam Training and Hybrid Beamforming

Terahertz (THz) communications open a new frontier for the wireless network thanks to their dramatically wider available bandwidth compared to the current micro-wave and forthcoming millimeter-wave communications. However, due to the short length of THz waves, they also suffer from severe path attenuation and poor diffraction. To compensate the THz-induced propagation loss, this paper proposes to combine two promising techniques, viz., massive multiple input multiple output (MIMO) and intelligent reflecting surface (IRS), in THz multi-user communications, considering their significant beamforming and aperture gains. Nonetheless, channel estimation and low-cost beamforming turn out to be two main obstacles to realizing this combination, due to the passivity of IRS for sending/receiving pilot signals and the large-scale use of expensive RF chains in massive MIMO. In view of these limitations, this paper first develops a cooperative beam training scheme to facilitate the channel estimation with IRS. In particular, we design two different hierarchical codebooks for the proposed training procedure, which are able to balance between the robustness against noise and searching complexity. Based on the training results, we further propose two cost-efficient hybrid beamforming (HB) designs for both single-user and multi-user scenarios, respectively. Simulation results demonstrate that the proposed joint beam training and HB scheme is able to achieve close performance to the optimal fully digital beamforming (FDB) which is implemented even under perfect channel state information (CSI).

preprint2020arXiv

Channel Estimation and Transmission for Intelligent Reflecting Surface Assisted THz Communications

Intelligent reflecting surface (IRS) is envisioned as a promising technology to broaden signal coverage and enhance transmission in terahertz (THz) communications. Due to the passivity of IRS, the channel measurement can not be achieved by traditional pilot manner and the subsequent cooperative transmission design remains an open problem. This paper investigates the channel estimation and transmission solutions for massive multiple input multiple output (MIMO) IRS-assisted THz system. The channel estimation is realized by beam training and the quantization error is analyzed for evaluating performance. In addition, a novel hierarchical search codebook design is proposed as a low-complexity basis of beam training. Based on above foundations, we propose a cooperative channel estimation procedure to tactfully acquire the channel knowledge. Finally, by leveraging obtained channel information, the designs of IRS and transceivers are directly provided in closed form without reconstructing the full channel matrix or additional optimization. Simulation and numerical results are presented to illustrate the minimum signal to noise ratio (SNR) required for beam training and the efficacy of the proposed transmission solutions.