Source author record

Zhewei Wei

Zhewei Wei appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Data Structures and Algorithms Artificial Intelligence Databases Computation and Language Computer Vision

Catalog footprint

What is connected

15works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AeroSketch: Near-Optimal Time Matrix Sketch Framework for Persistent, Sliding Window, and Distributed Streams

Many real-world matrix datasets arrive as high-throughput vector streams, making it impractical to store or process them in their entirety. To enable real-time analytics under limited computational, memory, and communication resources, matrix sketching techniques have been developed over recent decades to provide compact approximations of such streaming data. Some algorithms have achieved optimal space and communication complexity. However, these approaches often require frequent time-consuming matrix factorization operations. In particular, under tight approximation error bounds, each matrix factorization computation incurs cubic time complexity, thereby limiting their update efficiency. In this paper, we introduce AeroSketch, a novel matrix sketching framework that leverages recent advances in randomized numerical linear algebra (RandNLA). AeroSketch achieves optimal communication and space costs while delivering near-optimal update time complexity (within logarithmic factors) across persistent, sliding window, and distributed streaming scenarios. Extensive experiments on both synthetic and real-world datasets demonstrate that AeroSketch consistently outperforms state-of-the-art methods in update throughput. In particular, under tight approximation error constraints, AeroSketch reduces the cubic time complexity to the quadratic level. Meanwhile, it maintains comparable approximation quality while retaining optimal communication and space costs.

preprint2026arXiv

PageRank Centrality in Directed Graphs with Bounded In-Degree

We study the computational complexity of locally estimating a node's PageRank centrality in a directed graph $G$. For any node $t$, its PageRank centrality $π(t)$ is defined as the probability that a random walk in $G$, starting from a uniformly chosen node, terminates at $t$, where each step terminates with a constant probability $α\in(0,1)$. To obtain a multiplicative $\big(1\pm O(1)\big)$-approximation of $π(t)$ with probability $Ω(1)$, the previously best upper bound is $O(n^{1/2}\min\{ Δ_{in}^{1/2},Δ_{out}^{1/2},m^{1/4}\})$ from [Wang, Wei, Wen, Yang, STOC '24], where $n$ and $m$ denote the number of nodes and edges in $G$, and $Δ_{in}$ and $Δ_{out}$ upper bound the in-degrees and out-degrees of $G$, respectively. Using a refinement of the proof in the same paper, we establish a lower bound of $Ω(n^{1/2}\min\{Δ_{in}^{1/2}/n^γ,Δ_{out}^{1/2}/n^γ,m^{1/4}\})$, where $γ=\frac{1}{2}(2\max\{\log_{1/(1-α)}Δ_{in},1\}-1)^{-1}$. As $γ$ only depends on $Δ_{in}$ and $n^γ=O(1)$ for $Δ_{in}=Ω\left(n^{Ω(1)}\right)$, the known upper bound is tight if we only parameterize the complexity by $n$, $m$, and $Δ_{out}$. However, there remains a gap of $Ω(n^γ)$ when considering $Δ_{in}$, and this gap is large when $Δ_{in}$ is small. In the extreme case where $Δ_{in}\le1/(1-α)$, we have $γ=1/2$, leading to a gap of $Ω(n^{1/2})$ between the bounds $O(n^{1/2})$ and $Ω(1)$. In this paper, we present a new algorithm that achieves the above lower bound (up to logarithmic factors). The algorithm assumes that $n$ and the bounds $Δ_{in}$ and $Δ_{out}$ are known in advance. Our key technique is a novel randomized backwards propagation process that only propagates selectively based on Monte Carlo estimated PageRank scores.

preprint2026arXiv

Position: The Turing-Completeness of Real-World Autoregressive Transformers Relies Heavily on Context Management

Many works make the eye-catching claim that Transformers are Turing-complete. However, the literature often conflates two distinct settings: (i) a fixed Transformer system setting, in which a fixed autoregressive Transformer is coupled with a fixed context-management method to process inputs of different lengths step by step, and (ii) a scaling-family setting, in which a family of different models (with increasing context-window length or numerical precision) is used to handle different input lengths. Existing proofs of Transformer Turing-completeness are frequently established in setting (ii), whereas real-world LLM deployment and the standard notion of Turing-completeness correspond more naturally to setting (i). In this paper, we first formalize the fixed-system setting, thereby providing a concrete characterization of how real-world LLMs operate. We then argue that results proved in the scaling-family setting provide theoretically meaningful resource bounds but do not establish Turing-completeness, thereby clarifying a common misinterpretation of existing results. Finally, we show that different context-management methods can yield sharply different computational power, and we advocate the position that context management is a central component that critically determines the computational power of real-world autoregressive Transformers.

preprint2024arXiv

GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation

Vision Transformers (ViTs) have revolutionized the field of computer vision, yet their deployments on resource-constrained devices remain challenging due to high computational demands. To expedite pre-trained ViTs, token pruning and token merging approaches have been developed, which aim at reducing the number of tokens involved in the computation. However, these methods still have some limitations, such as image information loss from pruned tokens and inefficiency in the token-matching process. In this paper, we introduce a novel Graph-based Token Propagation (GTP) method to resolve the challenge of balancing model efficiency and information preservation for efficient ViTs. Inspired by graph summarization algorithms, GTP meticulously propagates less significant tokens' information to spatially and semantically connected tokens that are of greater importance. Consequently, the remaining few tokens serve as a summarization of the entire token graph, allowing the method to reduce computational complexity while preserving essential information of eliminated tokens. Combined with an innovative token selection strategy, GTP can efficiently identify image tokens to be propagated. Extensive experiments have validated GTP's effectiveness, demonstrating both efficiency and performance improvements. Specifically, GTP decreases the computational complexity of both DeiT-S and DeiT-B by up to 26% with only a minimal 0.3% accuracy drop on ImageNet-1K without finetuning, and remarkably surpasses the state-of-the-art token merging method on various backbones at an even faster inference speed. The source code is available at https://github.com/Ackesnal/GTP-ViT.

preprint2022arXiv

BernNet: Learning Arbitrary Graph Spectral Filters via Bernstein Approximation

Many representative graph neural networks, e.g., GPR-GNN and ChebNet, approximate graph convolutions with graph spectral filters. However, existing work either applies predefined filter weights or learns them without necessary constraints, which may lead to oversimplified or ill-posed filters. To overcome these issues, we propose BernNet, a novel graph neural network with theoretical support that provides a simple but effective scheme for designing and learning arbitrary graph spectral filters. In particular, for any filter over the normalized Laplacian spectrum of a graph, our BernNet estimates it by an order-$K$ Bernstein polynomial approximation and designs its spectral property by setting the coefficients of the Bernstein basis. Moreover, we can learn the coefficients (and the corresponding filter weights) based on observed graphs and their associated signals and thus achieve the BernNet specialized for the data. Our experiments demonstrate that BernNet can learn arbitrary spectral filters, including complicated band-rejection and comb filters, and it achieves superior performance in real-world graph modeling tasks. Code is available at https://github.com/ivam-he/BernNet.

preprint2022arXiv

Edge-based Local Push for Personalized PageRank

Personalized PageRank (PPR) is a popular node proximity metric in graph mining and network research. Given a graph G=(V,E) and a source node $s \in V$, a single-source PPR (SSPPR) query asks for the PPR value $\vpi(u)$ with respect to s, which represents the relative importance of node u in the context of the source node s. Among existing algorithms for SSPPR queries, LocalPush is a fundamental method which serves as a cornerstone for subsequent algorithms. In LocalPush, a push operation is a crucial primitive operation, which distributes the probability at a node u to ALL u's neighbors via the corresponding edges. Although this push operation works well on unweighted graphs, unfortunately, it can be rather inefficient on weighted graphs. In particular, on unbalanced weighted graphs where only a few of these edges take the majority of the total weight among them, the push operation would have to distribute insignificant probabilities along those edges which just take the minor weights, resulting in expensive overhead. To resolve this issue, we propose the EdgePush algorithm, a novel method for computing SSPPR queries on weighted graphs. EdgePush decomposes the aforementioned push operations in edge-based push, allowing the algorithm to operate at the edge level granularity. Hence, it can flexibly distribute the probabilities according to edge weights. Furthermore, our EdgePush allows a fine-grained termination threshold for each individual edge, leading to a superior complexity over LocalPush. Notably, we prove that EdgePush improves the theoretical query cost of LocalPush by an order of up to O(n) when the graph's weights are unbalanced, both in terms of $\ell_1$-error and normalized additive error. Our experimental results demonstrate that EdgePush significantly outperforms state-of-the-art baselines in terms of query efficiency on large motif-based and real-world weighted graphs.

preprint2022arXiv

Instant Graph Neural Networks for Dynamic Graphs

Graph Neural Networks (GNNs) have been widely used for modeling graph-structured data. With the development of numerous GNN variants, recent years have witnessed groundbreaking results in improving the scalability of GNNs to work on static graphs with millions of nodes. However, how to instantly represent continuous changes of large-scale dynamic graphs with GNNs is still an open problem. Existing dynamic GNNs focus on modeling the periodic evolution of graphs, often on a snapshot basis. Such methods suffer from two drawbacks: first, there is a substantial delay for the changes in the graph to be reflected in the graph representations, resulting in losses on the model's accuracy; second, repeatedly calculating the representation matrix on the entire graph in each snapshot is predominantly time-consuming and severely limits the scalability. In this paper, we propose Instant Graph Neural Network (InstantGNN), an incremental computation approach for the graph representation matrix of dynamic graphs. Set to work with dynamic graphs with the edge-arrival model, our method avoids time-consuming, repetitive computations and allows instant updates on the representation and instant predictions. Graphs with dynamic structures and dynamic attributes are both supported. The upper bounds of time complexity of those updates are also provided. Furthermore, our method provides an adaptive training strategy, which guides the model to retrain at moments when it can make the greatest performance gains. We conduct extensive experiments on several real-world and synthetic datasets. Empirical results demonstrate that our model achieves state-of-the-art accuracy while having orders-of-magnitude higher efficiency than existing methods.

preprint2022arXiv

Learning to be a Statistician: Learned Estimator for Number of Distinct Values

Estimating the number of distinct values (NDV) in a column is useful for many tasks in database systems, such as columnstore compression and data profiling. In this work, we focus on how to derive accurate NDV estimations from random (online/offline) samples. Such efficient estimation is critical for tasks where it is prohibitive to scan the data even once. Existing sample-based estimators typically rely on heuristics or assumptions and do not have robust performance across different datasets as the assumptions on data can easily break. On the other hand, deriving an estimator from a principled formulation such as maximum likelihood estimation is very challenging due to the complex structure of the formulation. We propose to formulate the NDV estimation task in a supervised learning framework, and aim to learn a model as the estimator. To this end, we need to answer several questions: i) how to make the learned model workload agnostic; ii) how to obtain training data; iii) how to perform model training. We derive conditions of the learning framework under which the learned model is workload agnostic, in the sense that the model/estimator can be trained with synthetically generated training data, and then deployed into any data warehouse simply as, e.g., user-defined functions (UDFs), to offer efficient (within microseconds on CPU) and accurate NDV estimations for unseen tables and workloads. We compare the learned estimator with the state-of-the-art sample-based estimators on nine real-world datasets to demonstrate its superior estimation accuracy. We publish our code for training data generation, model training, and the learned estimator online for reproducibility.

preprint2022arXiv

Preformer: Predictive Transformer with Multi-Scale Segment-wise Correlations for Long-Term Time Series Forecasting

Transformer-based methods have shown great potential in long-term time series forecasting. However, most of these methods adopt the standard point-wise self-attention mechanism, which not only becomes intractable for long-term forecasting since its complexity increases quadratically with the length of time series, but also cannot explicitly capture the predictive dependencies from contexts since the corresponding key and value are transformed from the same point. This paper proposes a predictive Transformer-based model called {\em Preformer}. Preformer introduces a novel efficient {\em Multi-Scale Segment-Correlation} mechanism that divides time series into segments and utilizes segment-wise correlation-based attention for encoding time series. A multi-scale structure is developed to aggregate dependencies at different temporal scales and facilitate the selection of segment length. Preformer further designs a predictive paradigm for decoding, where the key and value come from two successive segments rather than the same segment. In this way, if a key segment has a high correlation score with the query segment, its successive segment contributes more to the prediction of the query segment. Extensive experiments demonstrate that our Preformer outperforms other Transformer-based methods.

preprint2021arXiv

FlashP: An Analytical Pipeline for Real-time Forecasting of Time-Series Relational Data

Interactive response time is important in analytical pipelines for users to explore a sufficient number of possibilities and make informed business decisions. We consider a forecasting pipeline with large volumes of high-dimensional time series data. Real-time forecasting can be conducted in two steps. First, we specify the part of data to be focused on and the measure to be predicted by slicing, dicing, and aggregating the data. Second, a forecasting model is trained on the aggregated results to predict the trend of the specified measure. While there are a number of forecasting models available, the first step is the performance bottleneck. A natural idea is to utilize sampling to obtain approximate aggregations in real time as the input to train the forecasting model. Our scalable real-time forecasting system FlashP (Flash Prediction) is built based on this idea, with two major challenges to be resolved in this paper: first, we need to figure out how approximate aggregations affect the fitting of forecasting models, and forecasting results; and second, accordingly, what sampling algorithms we should use to obtain these approximate aggregations and how large the samples are. We introduce a new sampling scheme, called GSW sampling, and analyze error bounds for estimating aggregations using GSW samples. We introduce how to construct compact GSW samples with the existence of multiple measures to be analyzed. We conduct experiments to evaluate our solution and compare it with alternatives on real data.

preprint2020arXiv

Exact Single-Source SimRank Computation on Large Graphs

SimRank is a popular measurement for evaluating the node-to-node similarities based on the graph topology. In recent years, single-source and top-$k$ SimRank queries have received increasing attention due to their applications in web mining, social network analysis, and spam detection. However, a fundamental obstacle in studying SimRank has been the lack of ground truths. The only exact algorithm, Power Method, is computationally infeasible on graphs with more than $10^6$ nodes. Consequently, no existing work has evaluated the actual trade-offs between query time and accuracy on large real-world graphs. In this paper, we present ExactSim, the first algorithm that computes the exact single-source and top-$k$ SimRank results on large graphs. With high probability, this algorithm produces ground truths with a rigorous theoretical guarantee. We conduct extensive experiments on real-world datasets to demonstrate the efficiency of ExactSim. The results show that ExactSim provides the ground truth for any single-source SimRank query with a precision up to 7 decimal places within a reasonable query time.

preprint2020arXiv

Personalized PageRank to a Target Node, Revisited

Personalized PageRank (PPR) is a widely used node proximity measure in graph mining and network analysis. Given a source node $s$ and a target node $t$, the PPR value $π(s,t)$ represents the probability that a random walk from $s$ terminates at $t$, and thus indicates the bidirectional importance between $s$ and $t$. The majority of the existing work focuses on the single-source queries, which asks for the PPR value of a given source node $s$ and every node $t \in V$. However, the single-source query only reflects the importance of each node $t$ with respect to $s$. In this paper, we consider the {\em single-target PPR query}, which measures the opposite direction of importance for PPR. Given a target node $t$, the single-target PPR query asks for the PPR value of every node $s\in V$ to a given target node $t$. We propose RBS, a novel algorithm that answers approximate single-target queries with optimal computational complexity. We show that RBS improves three concrete applications: heavy hitters PPR query, single-source SimRank computation, and scalable graph neural networks. We conduct experiments to demonstrate that RBS outperforms the state-of-the-art algorithms in terms of both efficiency and precision on real-world benchmark datasets.

preprint2020arXiv

Simple and Deep Graph Convolutional Networks

Graph convolutional networks (GCNs) are a powerful deep learning approach for graph-structured data. Recently, GCNs and subsequent variants have shown superior performance in various application areas on real-world datasets. Despite their success, most of the current GCN models are shallow, due to the {\em over-smoothing} problem. In this paper, we study the problem of designing and analyzing deep graph convolutional networks. We propose the GCNII, an extension of the vanilla GCN model with two simple yet effective techniques: {\em Initial residual} and {\em Identity mapping}. We provide theoretical and empirical evidence that the two techniques effectively relieves the problem of over-smoothing. Our experiments show that the deep GCNII model outperforms the state-of-the-art methods on various semi- and full-supervised tasks. Code is available at https://github.com/chennnM/GCNII .

preprint2016arXiv

The Space Complexity of 2-Dimensional Approximate Range Counting

We study the problem of $2$-dimensional orthogonal range counting with additive error. Given a set $P$ of $n$ points drawn from an $n\times n$ grid and an error parameter $\eps$, the goal is to build a data structure, such that for any orthogonal range $R$, it can return the number of points in $P\cap R$ with additive error $\eps n$. A well-known solution for this problem is the {\em $\eps$-approximation}, which is a subset $A\subseteq P$ that can estimate the number of points in $P\cap R$ with the number of points in $A\cap R$. It is known that an $\eps$-approximation of size $O(\frac{1}{\eps} \log^{2.5} \frac{1}{\eps})$ exists for any $P$ with respect to orthogonal ranges, and the best lower bound is $Ω(\frac{1}{\eps} \log \frac{1}{\eps})$. The $\eps$-approximation is a rather restricted data structure, as we are not allowed to store any information other than the coordinates of the points in $P$. In this paper, we explore what can be achieved without any restriction on the data structure. We first describe a simple data structure that uses $O(\frac{1}{\eps}(\log^2\frac{1} {\eps} + \log n) )$ bits and answers queries with error $\eps n$. We then prove a lower bound that any data structure that answers queries with error $\eps n$ must use $Ω(\frac{1}{\eps}(\log^2\frac{1} {\eps} + \log n) )$ bits. Our lower bound is information-theoretic: We show that there is a collection of $2^{Ω(n\log n)}$ point sets with large {\em union combinatorial discrepancy}, and thus are hard to distinguish unless we use $Ω(n\log n)$ bits.

preprint2012arXiv

Equivalence between Priority Queues and Sorting in External Memory

A priority queue is a fundamental data structure that maintains a dynamic ordered set of keys and supports the followig basic operations: insertion of a key, deletion of a key, and finding the smallest key. The complexity of the priority queue is closely related to that of sorting: A priority queue can be used to implement a sorting algorithm trivially. Thorup \cite{thorup2007equivalence} proved that the converse is also true in the RAM model. In particular, he designed a priority queue that uses the sorting algorithm as a black box, such that the per-operation cost of the priority queue is asymptotically the same as the per-key cost of sorting. In this paper, we prove an analogous result in the external memory model, showing that priority queues are computationally equivalent to sorting in external memory, under some mild assumptions. The reduction provides a possibility for proving lower bounds for external sorting via showing a lower bound for priority queues.

Zhewei Wei

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

AeroSketch: Near-Optimal Time Matrix Sketch Framework for Persistent, Sliding Window, and Distributed Streams

PageRank Centrality in Directed Graphs with Bounded In-Degree

Position: The Turing-Completeness of Real-World Autoregressive Transformers Relies Heavily on Context Management

GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation

BernNet: Learning Arbitrary Graph Spectral Filters via Bernstein Approximation

Edge-based Local Push for Personalized PageRank

Instant Graph Neural Networks for Dynamic Graphs

Learning to be a Statistician: Learned Estimator for Number of Distinct Values

Preformer: Predictive Transformer with Multi-Scale Segment-wise Correlations for Long-Term Time Series Forecasting

FlashP: An Analytical Pipeline for Real-time Forecasting of Time-Series Relational Data

Exact Single-Source SimRank Computation on Large Graphs

Personalized PageRank to a Target Node, Revisited

Simple and Deep Graph Convolutional Networks

The Space Complexity of 2-Dimensional Approximate Range Counting

Equivalence between Priority Queues and Sorting in External Memory