Source author record

Zijing Wang

Zijing Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Information Theory math.IT eess.SP

Catalog footprint

What is connected

5works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

DiM\textsuperscript{3}: Bridging Multilingual and Multimodal Models via Direction- and Magnitude-Aware Merging

Towards more general and human-like intelligence, large language models should seamlessly integrate both multilingual and multimodal capabilities; however, extending an existing multimodal model to many languages typically requires expensive multilingual multimodal data construction and repeated end-to-end retraining. We study a training-free alternative: injecting multilingual capability into an existing multimodal model by composing residual updates in the shared language model backbone. The key challenge is that multilingual and multimodal updates are heterogeneous, reflecting different functional roles in the shared model. To address this, we propose Direction- and Magnitude-aware Multilingual Multimodal merging (DiM3), which selectively composes the two updates at each parameter dimension while preserving the original vision encoder and multimodal projector. Experiments on multilingual benchmarks in both text-only and vision-language settings, covering 57 languages across LLaVA- and Qwen-based backbones, show that DiM3 consistently outperforms existing merging baselines, substantially improves multilingual performance over the original multimodal model, and remains competitive with dedicated multilingual multimodal fine-tuning while largely retaining general multimodal ability. We further show that DiM3 can be directly applied to already trained multilingual multimodal models and still yield additional gains. Further interpretability analysis shows that DiM3 primarily reshapes intermediate-layer semantic representations, strengthening cross-lingual alignment under both text-only and multimodal inputs while preserving higher-layer task-sensitive structure. Our repository is on https://github.com/wzj1718/DiM3.

preprint2026arXiv

High-Rank Structured Modulation for Parameter-Efficient Fine-Tuning

As the number of model parameters increases, parameter-efficient fine-tuning (PEFT) has become the go-to choice for tailoring pre-trained large language models. Low-rank Adaptation (LoRA) uses a low-rank update method to simulate full parameter fine-tuning, which is widely used to reduce resource requirements. However, decreasing the rank encounters challenges with limited representational capacity when compared to full parameter fine-tuning. We present \textbf{SMoA}, a high-rank \textbf{S}tructured \textbf{MO}dulation \textbf{A}dapter that uses fewer trainable parameters while maintaining a higher rank, thereby improving the model's representational capacity and offering improved performance potential. The core idea is to freeze the original pretrained weights and selectively amplify or suppress important features of the original weights across multiple subspaces. The subspace mechanism provides an efficient way to increase the capacity and complexity of a model. We conduct both theoretical analyses and empirical studies on various tasks. Experiment results show that SMoA outperforms LoRA and its variants on 10 tasks, with extensive ablation studies validating its effectiveness.

preprint2026arXiv

PlaM: Training-Free Plateau-Guided Model Merging for Better Visual Grounding in MLLMs

Multimodal Large Language Models (MLLMs) rely on strong linguistic reasoning inherited from their base language models. However, multimodal instruction fine-tuning paradoxically degrades this text's reasoning capability, undermining multimodal performance. To address this issue, we propose a training-free framework to mitigate this degradation. Through layer-wise vision token masking, we reveal a common three-stage pattern in multimodal large language models: early-modal separation, mid-modal alignment, and late-modal degradation. By analyzing the behavior of MLLMs at different stages, we propose a plateau-guided model merging method that selectively injects base language model parameters into MLLMs. Experimental results based on five MLLMs on nine benchmarks demonstrate the effectiveness of our method. Attention-based analysis further reveals that merging shifts attention from diffuse, scattered patterns to focused localization on task-relevant visual regions. Our repository is on https://github.com/wzj1718/PlaM.

preprint2022arXiv

A Framework for Characterising the Value of Information in Hidden Markov Models

In this paper, a general framework is formalised to characterise the value of information (VoI) in hidden Markov models. Specifically, the VoI is defined as the mutual information between the current, unobserved status at the source and a sequence of observed measurements at the receiver, which can be interpreted as the reduction in the uncertainty of the current status given that we have noisy past observations of a hidden Markov process. We explore the VoI in the context of the noisy Ornstein-Uhlenbeck process and derive its closed-form expressions. Moreover, we investigate the effect of different sampling policies on VoI, deriving simplified expressions in different noise regimes and analysing statistical properties of the VoI in the worst case. We also study the optimal sampling policy to maximise the average information value under the sampling rate constraint. In simulations, the validity of theoretical results is verified, and the performance of VoI in Markov and hidden Markov models is also analysed. Numerical results further illustrate that the proposed VoI framework can support timely transmission in status update systems, and it can also capture the correlation properties of the underlying random process and the noise in the transmission environment.

preprint2020arXiv

A Value of Information Framework for Latent Variable Models

In this paper, a general value of information (VoI) framework is formalised for latent variable models. In particular, the mutual information between the current status at the source node and the observed noisy measurements at the destination node is used to evaluate the information value, which gives the theoretical interpretation of the reduction in uncertainty in the current status given that we have measurements of the latent process. Moreover, the VoI expression for a hidden Markov model is obtained in this setting. Numerical results are provided to show the relationship between the VoI and the traditional age of information (AoI) metric, and the VoI of Markov and hidden Markov models are analysed for the particular case when the latent process is an Ornstein-Uhlenbeck process. While the contributions of this work are theoretical, the proposed VoI framework is general and useful in designing wireless systems that support timely, but noisy, status updates in the physical world.

Zijing Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

DiM\textsuperscript{3}: Bridging Multilingual and Multimodal Models via Direction- and Magnitude-Aware Merging

High-Rank Structured Modulation for Parameter-Efficient Fine-Tuning

PlaM: Training-Free Plateau-Guided Model Merging for Better Visual Grounding in MLLMs

A Framework for Characterising the Value of Information in Hidden Markov Models

A Value of Information Framework for Latent Variable Models