Source author record

Zhiwei Xu

Zhiwei Xu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.SP Machine Learning Information Theory math.IT Artificial Intelligence Computation and Language Computer Vision Cryptography and Security cs.CY Databases Distributed, Parallel, and Cluster Computing Hardware Architecture math.ST Methodology Networking and Internet Architecture Performance physics.optics Software Engineering Statistics Theory

Catalog footprint

What is connected

15works

19topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Adversarial Contrastive Learning for LLM Quantization Attacks

Model quantization is critical for deploying large language models (LLMs) on resource-constrained hardware, yet recent work has revealed severe security risks that benign LLMs in full precision may exhibit malicious behaviors after quantization. In this paper, we propose Adversarial Contrastive Learning (ACL), a novel gradient-based quantization attack that achieves superior attack effectiveness by explicitly maximizing the gap between benign and harmful responses probabilities. ACL formulates the attack objective as a triplet-based contrastive loss, and integrates it with a projected gradient descent two-stage distributed fine-tuning strategy to ensure stable and efficient optimization. Extensive experiments demonstrate ACL's remarkable effectiveness, achieving attack success rates of 86.00% for over-refusal, 97.69% for jailbreak, and 92.40% for advertisement injection, substantially outperforming state-of-the-art methods by up to 44.67%, 18.84%, and 50.80%, respectively.

preprint2026arXiv

Hardwired-Neurons Language Processing Units as General-Purpose Cognitive Substrates

The rapid advancement of Large Language Models (LLMs) has established language as a core general-purpose cognitive substrate, driving the demand for specialized Language Processing Units (LPUs) tailored for LLM inference. To overcome the growing energy consumption of LLM inference systems, this paper proposes a Hardwired-Neurons Language Processing Unit (HNLPU), which physically hardwires LLM weight parameters into the computational fabric, achieving several orders of magnitude computational efficiency improvement by extreme specialization. However, a significant challenge still lies in the scale of modern LLMs. A straightforward hardwiring of gpt-oss 120 B would require fabricating photomask sets valued at over 6 billion dollars, rendering this straightforward solution economically impractical. Addressing this challenge, we propose the novel Metal-Embedding methodology. Instead of embedding weights in a 2D grid of silicon device cells, Metal-Embedding embeds weight parameters into the 3D topology of metal wires. This brings two benefits: (1) a 15x increase in density, and (2) 60 out of 70 photomask layers are homogeneous across chips, including all EUV photomasks. In total, Metal-Embedding reduced the photomask cost by 112x, bringing the Non-Recurring Engineering (NRE) cost of HNLPU into an economically viable range. Experimental results show that HNLPU achieved 249,960 tokens/s (5,555x/85x that of GPU/WSE), 36 tokens/J (1,047x/283x that of GPU/WSE), 13,232 mm2 total die area, $59.46 M-123.5 M estimated NRE at 5 nm technology. Analysis shows that HNLPU achieved 41.7-80.4x improvement in cost-effectiveness and 357x reduction in carbon footprint compared to OpenAI-scale H100 clusters, under an annual weight updating assumption.

preprint2026arXiv

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

AI agents may soon become capable of autonomously completing valuable, long-horizon tasks in diverse domains. Current benchmarks either do not measure real-world tasks, or are not sufficiently difficult to meaningfully measure frontier models. To this end, we present Terminal-Bench 2.0: a carefully curated hard benchmark composed of 89 tasks in computer terminal environments inspired by problems from real workflows. Each task features a unique environment, human-written solution, and comprehensive tests for verification. We show that frontier models and agents score less than 65\% on the benchmark and conduct an error analysis to identify areas for model and agent improvement. We publish the dataset and evaluation harness to assist developers and researchers in future work at https://www.tbench.ai/ .

preprint2022arXiv

Exploiting Problem Structure in Deep Declarative Networks: Two Case Studies

Deep declarative networks and other recent related works have shown how to differentiate the solution map of a (continuous) parametrized optimization problem, opening up the possibility of embedding mathematical optimization problems into end-to-end learnable models. These differentiability results can lead to significant memory savings by providing an expression for computing the derivative without needing to unroll the steps of the forward-pass optimization procedure during the backward pass. However, the results typically require inverting a large Hessian matrix, which is computationally expensive when implemented naively. In this work we study two applications of deep declarative networks -- robust vector pooling and optimal transport -- and show how problem structure can be exploited to obtain very efficient backward pass computations in terms of both time and memory. Our ideas can be used as a guide for improving the computational performance of other novel deep declarative nodes.

preprint2022arXiv

Gridless Multisnapshot Variational Line Spectral Estimation from Coarsely Quantized Samples

Due to the increasing demand for low power and higher sampling rates, low resolution quantization for data acquisition has drawn great attention recently. Consequently, line spectral estimation (LSE) with multiple measurement vectors (MMVs) from coarsely quantized samples is of vital importance in cutting edge array signal processing applications such as range estimation and DOA estimation in millimeter wave radar systems. In this paper, we combine the low complexity gridless variational line spectral estimation (VALSE) and expectation propagation (EP) and propose an MVALSE-EP algorithm to estimate the frequencies from coarsely quantized samples. In addition, the Cramér Rao bound (CRB) is derived as a benchmark performance of the proposed algorithm, and insights are provided to reveal the effects of system parameters on estimation performance. It is shown that snapshots benefits the frequency estimation, especially in coarsely quantized scenarios. Numerical experiments are conducted to demonstrate the effectiveness of MVALSE-EP, including real data set.

preprint2022arXiv

MDS-Net: A Multi-scale Depth Stratification Based Monocular 3D Object Detection Algorithm

Monocular 3D object detection is very challenging in autonomous driving due to the lack of depth information. This paper proposes a one-stage monocular 3D object detection algorithm based on multi-scale depth stratification, which uses the anchor-free method to detect 3D objects in a per-pixel prediction. In the proposed MDS-Net, a novel depth-based stratification structure is developed to improve the network's ability of depth prediction by establishing mathematical models between depth and image size of objects. A new angle loss function is then developed to further improve the accuracy of the angle prediction and increase the convergence speed of training. An optimized soft-NMS is finally applied in the post-processing stage to adjust the confidence of candidate boxes. Experiments on the KITTI benchmark show that the MDS-Net outperforms the existing monocular 3D detection methods in 3D detection and BEV detection tasks while fulfilling real-time requirements.

preprint2021arXiv

Newtonalized Orthogonal Matching Pursuit for Linear Frequency Modulated Pulse Frequency Agile Radar

The linear frequency modulated (LFM) frequency agile radar (FAR) can synthesize a wide signal bandwidth through coherent processing while keeping the bandwidth of each pulse narrow. In this way, high range resolution profiles (HRRP) can be obtained without increasing the hardware system cost. Furthermore, the agility provides improved both robustness to jamming and spectrum efficiency. Motivated by the Newtonalized orthogonal matching pursuit (NOMP) for line spectral estimation problem, the NOMP for the FAR radar termed as NOMP-FAR is designed to process each coarse range bin to extract the HRRP and velocities of multiple targets, including the guide for determining the oversampling factor and the stopping criterion. In addition, it is shown that the target will cause false alarm in the nearby coarse range bins, a postprocessing algorithm is then proposed to suppress the ghost targets. Numerical simulations are conducted to demonstrate the effectiveness of NOMP-FAR.

preprint2020arXiv

One-bit LFMCW Radar: Spectrum Analysis and Target Detection

One-bit radar, performing signal sampling and quantization by a one-bit ADC, is a promising technology for many civilian applications due to its low-cost and low-power consumptions. In this paper, problems encountered by one-bit LFMCW radar are studied and a two-stage target detection method termed as the dimension-reduced generalized approximate message passing (DR-GAMP) approach is proposed. Firstly, the spectrum of one-bit quantized signals in a scenario with multiple targets is analyzed. It is indicated that high-order harmonics may result in false alarms (FAs) and cannot be neglected. Secondly, based on the spectrum analysis, the DR-GAMP approach is proposed to carry out target detection. Specifically, linear preprocessing methods and target predetection are firstly adopted to perform the dimension reduction, and then, the GAMP algorithm is utilized to suppress high-order harmonics and recover true targets. Finally, numerical simulations are conducted to evaluate the performance of one-bit LFMCW radar under typical parameters. It is shown that compared to the conventional radar applying linear processing methods, one-bit LFMCW radar has about $1.3$ dB performance gain when the input signal-to-noise ratios (SNRs) of targets are low. In the presence of a strong target, it has about $1.0$ dB performance loss.

preprint2019arXiv

Multidimensional Variational Line Spectra Estimation

The fundamental multidimensional line spectral estimation problem is addressed utilizing the Bayesian methods. Motivated by the recently proposed variational line spectral estimation (VALSE) algorithm, multidimensional VALSE (MDVALSE) is developed. MDVALSE inherits the advantages of VALSE such as automatically estimating the model order, noise variance and providing uncertain degrees of frequency estimates. Compared to VALSE, the multidimensional frequencies of a single component is treated as a whole, and the probability density function is projected as independent univariate von Mises distribution to perform tractable inference. Besides, for the initialization, efficient fast Fourier transform (FFT) is adopted to significantly reduce the computation complexity of MDVALSE. Numerical results demonstrate the effectiveness of the MDVALSE, compared to state-of-art methods.

preprint2016arXiv

Hybrid Airy Plasmons with Dynamically Steerable Trajectories

With the intriguing properties of diffraction-free, self-accelerating, and self-healing, Airy plasmons are promising to be used in the trapping, transporting, and sorting of micro-objects, imaging, and chip scale signal processing. However, the high dissipative loss and the lack of dynamical steerability restrict the implementation of Airy plasmons in these applications. Here we reveal the hybrid Airy plasmons for the first time by taking a hybrid graphene-based plasmonic waveguide in the terahertz (THz) domain as an example. Due to the coupling between an optical mode and a plasmonic mode, the hybrid Airy plasmons can have large propagation lengths and effective transverse deflections, where the transverse waveguide confinements are governed by the hybrid modes with moderate quality factors. Meanwhile, the propagation trajectories of hybrid Airy plasmons are dynamically steerable by changing the chemical potential of graphene. These hybrid Airy plasmons may promote the further discovery of non-diffracting beams with the emerging developments of optical tweezers and tractor beams.

preprint2015arXiv

Beacon Node Placement for Minimal Localization Error

Beacon node placement, node-to-node measurement, and target node positioning are the three key steps for a localization process. However, compared with the other two steps, beacon node placement still lacks a comprehensive, systematic study in research literatures. To fill this gap, we address the Beacon Node Placment (BNP) problem that deploys beacon nodes for minimal localization error in this paper. BNP is difficult in that the localization error is determined by a complicated combination of factors, i.e., the localization error differing greatly under a different environment, with a different algorithm applied, or with a different type of beacon node used. In view of the hardness of BNP, we propose an approximate function to reduce time cost in localization error calculation, and also prove its time complexity and error bound. By approximation, a sub-optimal distribution of beacon nodes could be found within acceptable time cost for placement. In the experiment, we test our method and compare it with other node placement methods under various settings and environments. The experimental results show feasibility and effectiveness of our method in practice.

preprint2015arXiv

CIUV: Collaborating Information Against Unreliable Views

In many real world applications, the information of an object can be obtained from multiple sources. The sources may provide different point of views based on their own origin. As a consequence, conflicting pieces of information are inevitable, which gives rise to a crucial problem: how to find the truth from these conflicts. Many truth-finding methods have been proposed to resolve conflicts based on information trustworthy (i.e. more appearance means more trustworthy) as well as source reliability. However, the factor of men's involvement, i.e., information may be falsified by men with malicious intension, is more or less ignored in existing methods. Collaborating the possible relationship between information's origins and men's participation are still not studied in research. To deal with this challenge, we propose a method -- Collaborating Information against Unreliable Views (CIUV) --- in dealing with men's involvement for finding the truth. CIUV contains 3 stages for interactively mitigating the impact of unreliable views, and calculate the truth by weighting possible biases between sources. We theoretically analyze the error bound of CIUV, and conduct intensive experiments on real dataset for evaluation. The experimental results show that CIUV is feasible and has the smallest error compared with other methods.

preprint2015arXiv

Founding Digital Currency on Imprecise Commodity

Current digital currency schemes provide instantaneous exchange on precise commodity, in which "precise" means a buyer can possibly verify the function of the commodity without error. However, imprecise commodities, e.g. statistical data, with error existing are abundant in digital world. Existing digital currency schemes do not offer a mechanism to help the buyer for payment decision on precision of commodity, which may lead the buyer to a dilemma between having to buy and being unconfident. In this paper, we design a currency schemes IDCS for imprecise digital commodity. IDCS completes a trade in three stages of handshake between a buyer and providers. We present an IDCS prototype implementation that assigns weights on the trustworthy of the providers, and calculates a confidence level for the buyer to decide the quality of a imprecise commodity. In experiment, we characterize the performance of IDCS prototype under varying impact factors.

preprint2014arXiv

Performance Benefits of DataMPI: A Case Study with BigDataBench

Apache Hadoop and Spark are gaining prominence in Big Data processing and analytics. Both of them are widely deployed on Internet companies. On the other hand, high-performance data analysis requirements are causing academical and industrial communities to adopt state-of-the-art technologies in HPC to solve Big Data problems. Recently, we have proposed a key-value pair based communication library, DataMPI, which is extending MPI to support Hadoop/Spark-like Big Data Computing jobs. In this paper, we use BigDataBench, a Big Data benchmark suite, to do comprehensive studies on performance and resource utilization characterizations of Hadoop, Spark and DataMPI. From our experiments, we observe that the job execution time of DataMPI has up to 55% and 39% speedups compared with those of Hadoop and Spark, respectively. Most of the benefits come from the high-efficiency communication mechanisms in DataMPI. We also notice that the resource (CPU, memory, disk and network I/O) utilizations of DataMPI are also more efficient than those of the other two frameworks.

preprint2011arXiv

Testing for Parallelism Between Trends in Multiple Time Series

This paper considers the inference of trends in multiple, nonstationary time series. To test whether trends are parallel to each other, we use a parallelism index based on the L2-distances between nonparametric trend estimators and their average. A central limit theorem is obtained for the test statistic and the test's consistency is established. We propose a simulation-based approximation to the distribution of the test statistic, which significantly improves upon the normal approximation. The test is also applied to devise a clustering algorithm. Finally, the finite-sample properties of the test are assessed through simulations and the test methodology is illustrated with time series from Motorola cell phone activity in the United States.

Zhiwei Xu

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

Adversarial Contrastive Learning for LLM Quantization Attacks

Hardwired-Neurons Language Processing Units as General-Purpose Cognitive Substrates

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Exploiting Problem Structure in Deep Declarative Networks: Two Case Studies

Gridless Multisnapshot Variational Line Spectral Estimation from Coarsely Quantized Samples

MDS-Net: A Multi-scale Depth Stratification Based Monocular 3D Object Detection Algorithm

Newtonalized Orthogonal Matching Pursuit for Linear Frequency Modulated Pulse Frequency Agile Radar

One-bit LFMCW Radar: Spectrum Analysis and Target Detection

Multidimensional Variational Line Spectra Estimation

Hybrid Airy Plasmons with Dynamically Steerable Trajectories

Beacon Node Placement for Minimal Localization Error

CIUV: Collaborating Information Against Unreliable Views

Founding Digital Currency on Imprecise Commodity

Performance Benefits of DataMPI: A Case Study with BigDataBench

Testing for Parallelism Between Trends in Multiple Time Series