Source author record

Haobo Wang

Haobo Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Artificial Intelligence Computation and Language Computer Vision

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Adversarial Attacks Against MLLMs via Progressive Resolution Processing and Adaptive Feature Alignment

Adversarial perturbations can mislead Multimodal Large Language Models (MLLMs) recognize a benign image as a specific target object, posing serious risks in safety-critical scenarios such as autonomous driving and medical diagnosis. This makes transfer-based targeted attacks crucial for understanding and improving black-box MLLM robustness. Existing transfer-based targeted attack methods typically rely on the final global features of the surrogate encoder and anchor optimization to original-resolution target crops, leading to their limited transferability and robustness. To address these challenges, we propose Progressive Resolution Processing and Adaptive Feature Alignment (PRAF-Attack), a targeted transfer-based attack framework that integrates multi-scale global semantic guidance with robust intermediate-layer local alignment. Unlike prior methods that align only the surrogate encoder's final layer, we design an adaptive feature alignment strategy that leverages intermediate representations to enhance transferability. Specifically, we introduce an adaptive intermediate layer selection mechanism to identify transferable hierarchical features across surrogate ensembles via gradient consistency, along with an adaptive patch-level optimization strategy that preserves highly correlated local regions through efficient patch filtering. To overcome the reliance on fixed original-resolution target crops, we propose a progressive resolution processing strategy that gradually refines optimization from coarse to fine, enabling the attack to better exploit target information at multiple scales and achieve stronger transferability. We evaluate PRAF-Attack on a diverse suite of black-box MLLMs, including six open-source models and six closed-source commercial APIs. Compared with seven state-of-the-art targeted attack baselines, the proposed PRAF-Attack consistently achieves superior transferability.

preprint2026arXiv

Table as a Modality for Large Language Models

To migrate the remarkable successes of Large Language Models (LLMs), the community has made numerous efforts to generalize them to the table reasoning tasks for the widely deployed tabular data. Despite that, in this work, by showing a probing experiment on our proposed StructQA benchmark, we postulate that even the most advanced LLMs (such as GPTs) may still fall short of coping with tabular data. More specifically, the current scheme often simply relies on serializing the tabular data, together with the meta information, then inputting them through the LLMs. We argue that the loss of structural information is the root of this shortcoming. In this work, we further propose TAMO, which bears an ideology to treat the tables as an independent modality integrated with the text tokens. The resulting model in TAMO is a multimodal framework consisting of a hypergraph neural network as the global table encoder seamlessly integrated with the mainstream LLM. Empirical results on various benchmarking datasets, including HiTab, WikiTQ, WikiSQL, FeTaQA, and StructQA, have demonstrated significant improvements on generalization with an average relative gain of 42.65%.

preprint2016arXiv

Using Dynamic Allocation of Write Voltage to Extend Flash Memory Lifetime

The read channel of a Flash memory cell degrades after repetitive program and erase (P/E) operations. This degradation is often modeled as a function of the number of P/E cycles. In contrast, this paper models the degradation as a function of the cumulative effect of the charge written and erased from the cell. Based on this modeling approach, this paper dynamically allocates voltage using lower-voltage write thresholds at the beginning of the device lifetime and increasing the thresholds as needed to maintain the mutual information of the read channel in the face of degradation. The paper introduces the technique in an idealized setting and then removes ideal assumptions about channel knowledge and available voltage resolution to conclude with a practical scheme with performance close to that of the idealized setting.

preprint2014arXiv

Histogram-Based Flash Channel Estimation

Current generation Flash devices experience significant read-channel degradation from damage to the oxide layer during program and erase operations. Information about the read-channel degradation drives advanced signal processing methods in Flash to mitigate its effect. In this context, channel estimation must be ongoing since channel degradation evolves over time and as a function of the number of program/erase (P/E) cycles. This paper proposes a framework for ongoing model-based channel estimation using limited channel measurements (reads). This paper uses a channel model characterizing degradation resulting from retention time and the amount of charge programmed and erased. For channel histogram measurements, bin selection to achieve approximately equal-probability bins yields a good approximation to the original distribution using only ten bins (i.e. nine reads). With the channel model and binning strategy in place, this paper explores candidate numerical least squares algorithms and ultimately demonstrates the effectiveness of the Levenberg-Marquardt algorithm which provides both speed and accuracy.