Source author record

Zhaozhuo Xu

Zhaozhuo Xu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

6works
4topics
4close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation

Adapting pre-trained large language models (LLMs) is crucial but challenging due to their enormous size. Parameter-efficient fine-tuning (PEFT) techniques typically employ additive adapters applied to frozen model weights. To further reduce memory usage, model weights are often compressed through quantization. However, existing PEFT methods often yield suboptimal model quality because they rely on restrictive assumptions, such as low-rank constraints on adapters to limit the number of trainable parameters. We find that sketching, a popular data compression technique, can serve as an efficient LLM adaptation strategy while avoiding the low-rank assumption. We introduce SketchTune, a compressive adaptation strategy that compresses LLM weights into compact fine-tunable sketches, integrating compression and adaptation into a unified framework. This integration eliminates the need for complex two-path computation in existing PEFT techniques, enabling faster and more memory-efficient training and inference. SketchTune is supported by mathematical insights into matrix classes that are better approximated using sketching rather than low-rank methods. Our extensive evaluations with Llama and Mistral models demonstrate that SketchTune outperforms leading PEFT methods across diverse tasks while using substantially smaller base models and comparable trainable parameters. As a highlight, SketchTune outperforms LoRA, DoRA, and S2FT on commonsense and math benchmarks using 2.6-3.5$\times$ smaller base models and exceeds LoftQ in accuracy by 14.48% on GSM8K with 7.3$\times$ fewer trainable parameters. Our code is available at https://github.com/LeanModels/SketchTune.

preprint2022arXiv

Accelerating Frank-Wolfe Algorithm using Low-Dimensional and Adaptive Data Structures

In this paper, we study the problem of speeding up a type of optimization algorithms called Frank-Wolfe, a conditional gradient method. We develop and employ two novel inner product search data structures, improving the prior fastest algorithm in [Shrivastava, Song and Xu, NeurIPS 2021]. * The first data structure uses low-dimensional random projection to reduce the problem to a lower dimension, then uses efficient inner product data structure. It has preprocessing time $\tilde O(nd^{ω-1}+dn^{1+o(1)})$ and per iteration cost $\tilde O(d+n^ρ)$ for small constant $ρ$. * The second data structure leverages the recent development in adaptive inner product search data structure that can output estimations to all inner products. It has preprocessing time $\tilde O(nd)$ and per iteration cost $\tilde O(d+n)$. The first algorithm improves the state-of-the-art (with preprocessing time $\tilde O(d^2n^{1+o(1)})$ and per iteration cost $\tilde O(dn^ρ)$) in all cases, while the second one provides an even faster preprocessing time and is suitable when the number of iterations is small.

preprint2022arXiv

Proximity Graph Maintenance for Fast Online Nearest Neighbor Search

Approximate Nearest Neighbor (ANN) search is a fundamental technique for (e.g.,) the deployment of recommender systems. Recent studies bring proximity graph-based methods into practitioners' attention -- proximity graph-based methods outperform other solutions such as quantization, hashing, and tree-based ANN algorithm families. In current recommendation systems, data point insertions, deletions, and queries are streamed into the system in an online fashion as users and items change dynamically. As proximity graphs are constructed incrementally by inserting data points as new vertices into the graph, online insertions and queries are well-supported in proximity graph. However, a data point deletion incurs removing a vertex from the proximity graph index, while no proper graph index updating mechanisms are discussed in previous studies. To tackle the challenge, we propose an incremental proximity graph maintenance (IPGM) algorithm for online ANN. IPGM supports both vertex deletion and insertion on proximity graphs. Given a vertex deletion request, we thoroughly investigate solutions to update the connections of the vertex. The proposed updating scheme eliminates the performance drop in online ANN methods on proximity graphs, making the algorithm suitable for practical systems.

preprint2022arXiv

Speeding Up Sparsification using Inner Product Search Data Structures

We present a general framework that utilizes different efficient data structures to improve various sparsification problems involving an iterative process. We also provide insights and characterization for different iterative process, and answer that when should we use which data structures in what type of problem. We obtain improved running time for the following problems. * For constructing linear-sized spectral sparsifier (Batson, Spielman and Srivastava, 2012), all the existing deterministic algorithms require $Ω(d^4)$ time. In this work, we provide the first deterministic algorithm that breaks that barrier which runs in $O(d^{ω+1})$ time, where $ω$ is the exponent of matrix multiplication. * For one-sided Kadison-Singer-typed discrepancy problem, we give fast algorithms for both small and large number of iterations. * For experimental design problem, we speed up a key swapping process. In the heart of our work is the design of a variety of different inner product search data structures that have efficient initialization, query and update time, compatible to dimensionality reduction and robust against adaptive adversary.

preprint2021arXiv

Beyond Convolutions: A Novel Deep Learning Approach for Raw Seismic Data Ingestion

Traditional seismic processing workflows (SPW) are expensive, requiring over a year of human and computational effort. Deep learning (DL) based data-driven seismic workflows (DSPW) hold the potential to reduce these timelines to a few minutes. Raw seismic data (terabytes) and required subsurface prediction (gigabytes) are enormous. This large-scale, spatially irregular time-series data poses seismic data ingestion (SDI) as an unconventional yet fundamental problem in DSPW. Current DL research is limited to small-scale simplified synthetic datasets as they treat seismic data like images and process them with convolution networks. Real seismic data, however, is at least 5D. Applying 5D convolutions to this scale is computationally prohibitive. Moreover, raw seismic data is highly unstructured and hence inherently non-image like. We propose a fundamental shift to move away from convolutions and introduce SESDI: Set Embedding based SDI approach. SESDI first breaks down the mammoth task of large-scale prediction into an efficient compact auxiliary task. SESDI gracefully incorporates irregularities in data with its novel model architecture. We believe SESDI is the first successful demonstration of end-to-end learning on real seismic data. SESDI achieves SSIM of over 0.8 on velocity inversion task on real proprietary data from the Gulf of Mexico and outperforms the state-of-the-art U-Net model on synthetic datasets.

preprint2020arXiv

Climbing the WOL: Training for Cheaper Inference

Efficient inference for wide output layers (WOLs) is an essential yet challenging task in large scale machine learning. Most approaches reduce this problem to approximate maximum inner product search (MIPS), which relies heavily on the observation that for a given model, ground truth labels correspond to logits of highest value during full model inference. However, such an assumption is restrictive in practice. In this paper, we argue that approximate MIPS subroutines, despite having sub-linear computation time, are sub-optimal because they are tailored for retrieving large inner products with high recall instead of retrieving the correct labels. With WOL, the labels often have moderate inner products, which makes approximate MIPS more challenging. We propose an alternative problem formulation, called Label Superior Sampling (LSS), where the objective is to tailor the system to ensure retrieval of the correct label. Accordingly, we propose a novel learned hash approach, which is significantly more efficient and sufficient for high inference accuracy than MIPS baselines. Our extensive evaluation indicates that LSS can match or even outperform full inference accuracy with around 5x speed up and 87% energy reduction.