Source author record

Wei Xing

Wei Xing appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision Networking and Internet Architecture Hardware Architecture physics.comp-ph

Catalog footprint

What is connected

9works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

OpenACM: An Open-Source SRAM-Based Approximate CiM Compiler

The rise of data-intensive AI workloads has exacerbated the ``memory wall'' bottleneck. Digital Compute-in-Memory (DCiM) using SRAM offers a scalable solution, but its vast design space makes manual design impractical, creating a need for automated compilers. A key opportunity lies in approximate computing, which leverages the error tolerance of AI applications for significant energy savings. However, existing DCiM compilers focus on exact arithmetic, failing to exploit this optimization. This paper introduces OpenACM, the first open-source, accuracy-aware compiler for SRAM-based approximate DCiM architectures. OpenACM bridges the gap between application error tolerance and hardware automation. Its key contribution is an integrated library of accuracy-configurable multipliers (exact, tunable approximate, and logarithmic), enabling designers to make fine-grained accuracy-energy trade-offs. The compiler automates the generation of the DCiM architecture, integrating a transistor-level customizable SRAM macro with variation-aware characterization into a complete, open-source physical design flow based on OpenROAD and the FreePDK45 library. This ensures full reproducibility and accessibility, removing dependencies on proprietary tools. Experimental results on representative convolutional neural networks (CNNs) demonstrate that OpenACM achieves energy savings of up to 64\% with negligible loss in application accuracy. The framework is available on \href{https://github.com/ShenShan123/OpenACM}{OpenACM:URL}

preprint2022arXiv

AesUST: Towards Aesthetic-Enhanced Universal Style Transfer

Recent studies have shown remarkable success in universal style transfer which transfers arbitrary visual styles to content images. However, existing approaches suffer from the aesthetic-unrealistic problem that introduces disharmonious patterns and evident artifacts, making the results easy to spot from real paintings. To address this limitation, we propose AesUST, a novel Aesthetic-enhanced Universal Style Transfer approach that can generate aesthetically more realistic and pleasing results for arbitrary styles. Specifically, our approach introduces an aesthetic discriminator to learn the universal human-delightful aesthetic features from a large corpus of artist-created paintings. Then, the aesthetic features are incorporated to enhance the style transfer process via a novel Aesthetic-aware Style-Attention (AesSA) module. Such an AesSA module enables our AesUST to efficiently and flexibly integrate the style patterns according to the global aesthetic channel distribution of the style image and the local semantic spatial distribution of the content image. Moreover, we also develop a new two-stage transfer training strategy with two aesthetic regularizations to train our model more effectively, further improving stylization performance. Extensive experiments and user studies demonstrate that our approach synthesizes aesthetically more harmonious and realistic results than state of the art, greatly narrowing the disparity with real artist-created paintings. Our code is available at https://github.com/EndyWon/AesUST.

preprint2022arXiv

DivSwapper: Towards Diversified Patch-based Arbitrary Style Transfer

Gram-based and patch-based approaches are two important research lines of style transfer. Recent diversified Gram-based methods have been able to produce multiple and diverse stylized outputs for the same content and style images. However, as another widespread research interest, the diversity of patch-based methods remains challenging due to the stereotyped style swapping process based on nearest patch matching. To resolve this dilemma, in this paper, we dive into the crux of existing patch-based methods and propose a universal and efficient module, termed DivSwapper, for diversified patch-based arbitrary style transfer. The key insight is to use an essential intuition that neural patches with higher activation values could contribute more to diversity. Our DivSwapper is plug-and-play and can be easily integrated into existing patch-based and Gram-based methods to generate diverse results for arbitrary styles. We conduct theoretical analyses and extensive experiments to demonstrate the effectiveness of our method, and compared with state-of-the-art algorithms, it shows superiority in diversity, quality, and efficiency.

preprint2022arXiv

Physics Informed Deep Kernel Learning

Deep kernel learning is a promising combination of deep neural networks and nonparametric function learning. However, as a data driven approach, the performance of deep kernel learning can still be restricted by scarce or insufficient data, especially in extrapolation tasks. To address these limitations, we propose Physics Informed Deep Kernel Learning (PI-DKL) that exploits physics knowledge represented by differential equations with latent sources. Specifically, we use the posterior function sample of the Gaussian process as the surrogate for the solution of the differential equation, and construct a generative component to integrate the equation in a principled Bayesian hybrid framework. For efficient and effective inference, we marginalize out the latent variables in the joint probability and derive a collapsed model evidence lower bound (ELBO), based on which we develop a stochastic model estimation algorithm. Our ELBO can be viewed as a nice, interpretable posterior regularization objective. On synthetic datasets and real-world applications, we show the advantage of our approach in both prediction accuracy and uncertainty quantification.

preprint2020arXiv

Diversified Arbitrary Style Transfer via Deep Feature Perturbation

Image style transfer is an underdetermined problem, where a large number of solutions can satisfy the same constraint (the content and style). Although there have been some efforts to improve the diversity of style transfer by introducing an alternative diversity loss, they have restricted generalization, limited diversity and poor scalability. In this paper, we tackle these limitations and propose a simple yet effective method for diversified arbitrary style transfer. The key idea of our method is an operation called deep feature perturbation (DFP), which uses an orthogonal random noise matrix to perturb the deep image feature maps while keeping the original style information unchanged. Our DFP operation can be easily integrated into many existing WCT (whitening and coloring transform)-based methods, and empower them to generate diverse results for arbitrary styles. Experimental results demonstrate that this learning-free and universal method can greatly increase the diversity while maintaining the quality of stylization.

preprint2020arXiv

Multi-Fidelity High-Order Gaussian Processes for Physical Simulation

The key task of physical simulation is to solve partial differential equations (PDEs) on discretized domains, which is known to be costly. In particular, high-fidelity solutions are much more expensive than low-fidelity ones. To reduce the cost, we consider novel Gaussian process (GP) models that leverage simulation examples of different fidelities to predict high-dimensional PDE solution outputs. Existing GP methods are either not scalable to high-dimensional outputs or lack effective strategies to integrate multi-fidelity examples. To address these issues, we propose Multi-Fidelity High-Order Gaussian Process (MFHoGP) that can capture complex correlations both between the outputs and between the fidelities to enhance solution estimation, and scale to large numbers of outputs. Based on a novel nonlinear coregionalization model, MFHoGP propagates bases throughout fidelities to fuse information, and places a deep matrix GP prior over the basis weights to capture the (nonlinear) relationships across the fidelities. To improve inference efficiency and quality, we use bases decomposition to largely reduce the model parameters, and layer-wise matrix Gaussian posteriors to capture the posterior dependency and to simplify the computation. Our stochastic variational learning algorithm successfully handles millions of outputs without extra sparse approximations. We show the advantages of our method in several typical applications.

preprint2020arXiv

Scalable Variational Gaussian Process Regression Networks

Gaussian process regression networks (GPRN) are powerful Bayesian models for multi-output regression, but their inference is intractable. To address this issue, existing methods use a fully factorized structure (or a mixture of such structures) over all the outputs and latent functions for posterior approximation, which, however, can miss the strong posterior dependencies among the latent variables and hurt the inference quality. In addition, the updates of the variational parameters are inefficient and can be prohibitively expensive for a large number of outputs. To overcome these limitations, we propose a scalable variational inference algorithm for GPRN, which not only captures the abundant posterior dependencies but also is much more efficient for massive outputs. We tensorize the output space and introduce tensor/matrix-normal variational posteriors to capture the posterior correlations and to reduce the parameters. We jointly optimize all the parameters and exploit the inherent Kronecker product structure in the variational model evidence lower bound to accelerate the computation. We demonstrate the advantages of our method in several real-world applications.

preprint2013arXiv

Modeling and Performance Analysis of Pull-Based Live Streaming Schemes in Peer-to-Peer Network

Recent years mesh-based Peer-to-Peer live streaming has become a promising way for service providers to offer high-quality live video streaming service to Internet users. In this paper, we make a detailed study on modeling and performance analysis of the pull-based P2P streaming systems. We establish the analytical framework for the pull-based streaming schemes in P2P network, give accurate models of the chunk selection and peer selection strategies, and organize them into three categories, i.e., the chunk first scheme, the peer first scheme and the epidemic scheme. Through numerical performance evaluation, the impacts of some important parameters, such as size of neighbor set, reply number, buffer size and so on are investigated. For the peer first and chunk first scheme, we show that the pull-based schemes do not perform as well as the push-based schemes when peers are limited to reply only one request in each time slot. When the reply number increases, the pull-based streaming schemes will reach close to optimal playout probability. As to the pull-based epidemic scheme, we find it has unexpected poor performance, which is significantly different from the push-based epidemic scheme. Therefore we propose a simple, efficient and easily deployed push-pull scheme which can significantly improve the playout probability.

preprint2012arXiv

Extended Equal Service and Differentiated Service Models for Peer-to-Peer File Sharing

Peer-to-Peer (P2P) systems have proved to be the most effective and popular file sharing applications in recent years. Previous studies mainly focus on the equal service and the differentiated service strategies when peers have no initial data before their download. In an upload-constrained P2P file sharing system, we model both the equal service process and the differentiated service process when peers' initial data distribution satisfies some special conditions, and also show how to minimize the time to get the file to any number of peers. The proposed models can reveal the intrinsic relations among the initial data amount, the size of peer set and the minimum last finish time. By using the models, we can also provide arbitrary degree of differentiated service to a certain number of peers. We believe that our analysis process and achieved theoretical results could provide fundamental insights into studies on bandwidth allocation and data scheduling, and can give helpful reference both for improving system performance and building effective incentive mechanism in P2P file sharing systems.

Wei Xing

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

OpenACM: An Open-Source SRAM-Based Approximate CiM Compiler

AesUST: Towards Aesthetic-Enhanced Universal Style Transfer

DivSwapper: Towards Diversified Patch-based Arbitrary Style Transfer

Physics Informed Deep Kernel Learning

Diversified Arbitrary Style Transfer via Deep Feature Perturbation

Multi-Fidelity High-Order Gaussian Processes for Physical Simulation

Scalable Variational Gaussian Process Regression Networks

Modeling and Performance Analysis of Pull-Based Live Streaming Schemes in Peer-to-Peer Network

Extended Equal Service and Differentiated Service Models for Peer-to-Peer File Sharing