Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

Adversarial Contrastive Learning for LLM Quantization Attacks

Model quantization is critical for deploying large language models (LLMs) on resource-constrained hardware, yet recent work has revealed severe security risks that benign LLMs in full precision may exhibit malicious behaviors after quantization. In this paper, we propose Adversarial Contrastive Learning (ACL), a novel gradient-based quantization attack that achieves superior attack effectiveness by explicitly maximizing the gap between benign and harmful responses probabilities. ACL formulates the attack objective as a triplet-based contrastive loss, and integrates it with a projected gradient descent two-stage distributed fine-tuning strategy to ensure stable and efficient optimization. Extensive experiments demonstrate ACL's remarkable effectiveness, achieving attack success rates of 86.00% for over-refusal, 97.69% for jailbreak, and 92.40% for advertisement injection, substantially outperforming state-of-the-art methods by up to 44.67%, 18.84%, and 50.80%, respectively.

preprint2026arXiv

Hardwired-Neurons Language Processing Units as General-Purpose Cognitive Substrates

The rapid advancement of Large Language Models (LLMs) has established language as a core general-purpose cognitive substrate, driving the demand for specialized Language Processing Units (LPUs) tailored for LLM inference. To overcome the growing energy consumption of LLM inference systems, this paper proposes a Hardwired-Neurons Language Processing Unit (HNLPU), which physically hardwires LLM weight parameters into the computational fabric, achieving several orders of magnitude computational efficiency improvement by extreme specialization. However, a significant challenge still lies in the scale of modern LLMs. A straightforward hardwiring of gpt-oss 120 B would require fabricating photomask sets valued at over 6 billion dollars, rendering this straightforward solution economically impractical. Addressing this challenge, we propose the novel Metal-Embedding methodology. Instead of embedding weights in a 2D grid of silicon device cells, Metal-Embedding embeds weight parameters into the 3D topology of metal wires. This brings two benefits: (1) a 15x increase in density, and (2) 60 out of 70 photomask layers are homogeneous across chips, including all EUV photomasks. In total, Metal-Embedding reduced the photomask cost by 112x, bringing the Non-Recurring Engineering (NRE) cost of HNLPU into an economically viable range. Experimental results show that HNLPU achieved 249,960 tokens/s (5,555x/85x that of GPU/WSE), 36 tokens/J (1,047x/283x that of GPU/WSE), 13,232 mm2 total die area, $59.46 M-123.5 M estimated NRE at 5 nm technology. Analysis shows that HNLPU achieved 41.7-80.4x improvement in cost-effectiveness and 357x reduction in carbon footprint compared to OpenAI-scale H100 clusters, under an annual weight updating assumption.

preprint2026arXiv

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

AI agents may soon become capable of autonomously completing valuable, long-horizon tasks in diverse domains. Current benchmarks either do not measure real-world tasks, or are not sufficiently difficult to meaningfully measure frontier models. To this end, we present Terminal-Bench 2.0: a carefully curated hard benchmark composed of 89 tasks in computer terminal environments inspired by problems from real workflows. Each task features a unique environment, human-written solution, and comprehensive tests for verification. We show that frontier models and agents score less than 65\% on the benchmark and conduct an error analysis to identify areas for model and agent improvement. We publish the dataset and evaluation harness to assist developers and researchers in future work at https://www.tbench.ai/ .

preprint2022arXiv

Exploiting Problem Structure in Deep Declarative Networks: Two Case Studies

Deep declarative networks and other recent related works have shown how to differentiate the solution map of a (continuous) parametrized optimization problem, opening up the possibility of embedding mathematical optimization problems into end-to-end learnable models. These differentiability results can lead to significant memory savings by providing an expression for computing the derivative without needing to unroll the steps of the forward-pass optimization procedure during the backward pass. However, the results typically require inverting a large Hessian matrix, which is computationally expensive when implemented naively. In this work we study two applications of deep declarative networks -- robust vector pooling and optimal transport -- and show how problem structure can be exploited to obtain very efficient backward pass computations in terms of both time and memory. Our ideas can be used as a guide for improving the computational performance of other novel deep declarative nodes.

preprint2022arXiv

Gridless Multisnapshot Variational Line Spectral Estimation from Coarsely Quantized Samples

Due to the increasing demand for low power and higher sampling rates, low resolution quantization for data acquisition has drawn great attention recently. Consequently, line spectral estimation (LSE) with multiple measurement vectors (MMVs) from coarsely quantized samples is of vital importance in cutting edge array signal processing applications such as range estimation and DOA estimation in millimeter wave radar systems. In this paper, we combine the low complexity gridless variational line spectral estimation (VALSE) and expectation propagation (EP) and propose an MVALSE-EP algorithm to estimate the frequencies from coarsely quantized samples. In addition, the Cramér Rao bound (CRB) is derived as a benchmark performance of the proposed algorithm, and insights are provided to reveal the effects of system parameters on estimation performance. It is shown that snapshots benefits the frequency estimation, especially in coarsely quantized scenarios. Numerical experiments are conducted to demonstrate the effectiveness of MVALSE-EP, including real data set.

preprint2022arXiv

MDS-Net: A Multi-scale Depth Stratification Based Monocular 3D Object Detection Algorithm

Monocular 3D object detection is very challenging in autonomous driving due to the lack of depth information. This paper proposes a one-stage monocular 3D object detection algorithm based on multi-scale depth stratification, which uses the anchor-free method to detect 3D objects in a per-pixel prediction. In the proposed MDS-Net, a novel depth-based stratification structure is developed to improve the network's ability of depth prediction by establishing mathematical models between depth and image size of objects. A new angle loss function is then developed to further improve the accuracy of the angle prediction and increase the convergence speed of training. An optimized soft-NMS is finally applied in the post-processing stage to adjust the confidence of candidate boxes. Experiments on the KITTI benchmark show that the MDS-Net outperforms the existing monocular 3D detection methods in 3D detection and BEV detection tasks while fulfilling real-time requirements.

preprint2021arXiv

Newtonalized Orthogonal Matching Pursuit for Linear Frequency Modulated Pulse Frequency Agile Radar

The linear frequency modulated (LFM) frequency agile radar (FAR) can synthesize a wide signal bandwidth through coherent processing while keeping the bandwidth of each pulse narrow. In this way, high range resolution profiles (HRRP) can be obtained without increasing the hardware system cost. Furthermore, the agility provides improved both robustness to jamming and spectrum efficiency. Motivated by the Newtonalized orthogonal matching pursuit (NOMP) for line spectral estimation problem, the NOMP for the FAR radar termed as NOMP-FAR is designed to process each coarse range bin to extract the HRRP and velocities of multiple targets, including the guide for determining the oversampling factor and the stopping criterion. In addition, it is shown that the target will cause false alarm in the nearby coarse range bins, a postprocessing algorithm is then proposed to suppress the ghost targets. Numerical simulations are conducted to demonstrate the effectiveness of NOMP-FAR.

preprint2020arXiv

One-bit LFMCW Radar: Spectrum Analysis and Target Detection

One-bit radar, performing signal sampling and quantization by a one-bit ADC, is a promising technology for many civilian applications due to its low-cost and low-power consumptions. In this paper, problems encountered by one-bit LFMCW radar are studied and a two-stage target detection method termed as the dimension-reduced generalized approximate message passing (DR-GAMP) approach is proposed. Firstly, the spectrum of one-bit quantized signals in a scenario with multiple targets is analyzed. It is indicated that high-order harmonics may result in false alarms (FAs) and cannot be neglected. Secondly, based on the spectrum analysis, the DR-GAMP approach is proposed to carry out target detection. Specifically, linear preprocessing methods and target predetection are firstly adopted to perform the dimension reduction, and then, the GAMP algorithm is utilized to suppress high-order harmonics and recover true targets. Finally, numerical simulations are conducted to evaluate the performance of one-bit LFMCW radar under typical parameters. It is shown that compared to the conventional radar applying linear processing methods, one-bit LFMCW radar has about $1.3$ dB performance gain when the input signal-to-noise ratios (SNRs) of targets are low. In the presence of a strong target, it has about $1.0$ dB performance loss.

preprint2019arXiv

Multidimensional Variational Line Spectra Estimation

The fundamental multidimensional line spectral estimation problem is addressed utilizing the Bayesian methods. Motivated by the recently proposed variational line spectral estimation (VALSE) algorithm, multidimensional VALSE (MDVALSE) is developed. MDVALSE inherits the advantages of VALSE such as automatically estimating the model order, noise variance and providing uncertain degrees of frequency estimates. Compared to VALSE, the multidimensional frequencies of a single component is treated as a whole, and the probability density function is projected as independent univariate von Mises distribution to perform tractable inference. Besides, for the initialization, efficient fast Fourier transform (FFT) is adopted to significantly reduce the computation complexity of MDVALSE. Numerical results demonstrate the effectiveness of the MDVALSE, compared to state-of-art methods.

preprint2014arXiv

Performance Benefits of DataMPI: A Case Study with BigDataBench

Apache Hadoop and Spark are gaining prominence in Big Data processing and analytics. Both of them are widely deployed on Internet companies. On the other hand, high-performance data analysis requirements are causing academical and industrial communities to adopt state-of-the-art technologies in HPC to solve Big Data problems. Recently, we have proposed a key-value pair based communication library, DataMPI, which is extending MPI to support Hadoop/Spark-like Big Data Computing jobs. In this paper, we use BigDataBench, a Big Data benchmark suite, to do comprehensive studies on performance and resource utilization characterizations of Hadoop, Spark and DataMPI. From our experiments, we observe that the job execution time of DataMPI has up to 55% and 39% speedups compared with those of Hadoop and Spark, respectively. Most of the benefits come from the high-efficiency communication mechanisms in DataMPI. We also notice that the resource (CPU, memory, disk and network I/O) utilizations of DataMPI are also more efficient than those of the other two frameworks.