Researcher profile

Xing Liu

Xing Liu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
16works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

16 published item(s)

preprint2026arXiv

FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference

Distributed inference serves as a promising approach to enabling the inference of large language models (LLMs) at the network edge. It distributes the inference process to multiple devices to ensure that the LLMs can fit into the device memory. Recent pipeline-based approaches have the potential to parallelize communication and computation, which helps reduce inference latency. However, the benefit diminishes when the inference request at the network edge is sparse, where pipeline is typically at low utilization. To enable efficient distributed LLM inference at the edge, we propose \textbf{FlowSpec}, a pipeline-parallel tree-based speculative decoding framework. FlowSpec incorporates three key mechanisms to improve decoding efficiency: 1) score-based step-wise verification prioritizes more important draft tokens to bring earlier accepted tokens; 2) efficient draft management to prune invalid tokens while maintaining correct causal relationship during verification; 3) dynamic draft expansion strategies to supply high-quality speculative inputs. These techniques work in concert to enhance both pipeline utilization and speculative efficiency. We evaluate FlowSpec on a real-world testbed with other baselines. Experimental results demonstrate that our proposed framework significantly improves inference speed across diverse models and configurations, achieving speedup ratios 1.37$\times$-1.73$\times$ compared to baselines. Our code is publicly available at \href{https://github.com/Leosang-lx/FlowSpec#}{https://github.com/Leosang-lx/FlowSpec\#}.

preprint2026arXiv

HDRFace: Rethinking Face Restoration with High-Dimensional Representation

Face restoration under complex degradations still remains an ill-posed inverse problem due to severe information loss. Although diffusion models benefit from strong generative priors, most methods still condition only on low-quality inputs, making it difficult to recover identity-critical details under heavy degradations. In this work, we propose HDRFace, a High-Dimensional Representation conditioned Face restoration framework that injects semantically rich priors into the conditional flow without modifying the generative backbone. Our pipeline first obtains a structurally reliable intermediate restoration with an off-the-shelf restorer, then uses a pretrained high-dimensional feature encoder to extract fine-grained facial representations from both the low-quality input and the intermediate result, and injects them as additional conditions for generation. We further introduce SDFM, a Structure-Detail aware adaptive Fusion Mechanism that emphasizes global constraints during structure modeling and strengthens representation guidance during detail synthesis, balancing structural consistency and detail fidelity. To validate the generalization ability of our method, we implement the proposed framework on two generative models, SD V2.1-base and Qwen-Image, and consistently observe stable and coherent performance gains across different architectures.

preprint2026arXiv

LEO Constellations as a Decentralized GNSS Network: Optimizing PNT Corrections in Space

With the rapid expansion of low Earth orbit (LEO) constellations, thousands of satellites are now in operation, many equipped with onboard GNSS receivers capable of continuous orbit determination and time synchronization. This development is creating an unprecedented spaceborne GNSS network, offering new opportunities for network-driven precise LEO orbit and clock estimation. Yet, current onboard GNSS processing is largely standalone and often insufficient for high-precision applications, while centralized fusion is challenging due to computational bottlenecks and the lack of in-orbit infrastructure. In this work, we report a decentralized GNSS network over large-scale LEO constellations, where each satellite processes its own measurements while exchanging compact information with neighboring nodes to enable precise orbit and time determination. We model the moving constellation as a dynamic graph and tailor a momentum-accelerated gradient tracking (GT) method to ensure steady convergence despite topology changes. Numerical simulations with constellations containing hundreds of satellites show that the proposed method matches the accuracy of an ideal centralized benchmark, while substantially reducing communication burdens. Ultimately, this framework supports the development of autonomous and self-organizing space systems, enabling high-precision navigation with reduced dependence on continuous ground contact.

preprint2022arXiv

Constrained Wrapped Least Squares: A Tool for High Accuracy GNSS Attitude Determination

Attitude determination is a popular application of Global Navigation Satellite Systems (GNSS). Many methods have been developed to solve the attitude determination problem with different performance offerings. We develop a constrained wrapped least-squares (C-WLS) method for high-accuracy attitude determination. This approach is built on an optimization model that leverages prior information related to the antenna array and the integer nature of the carrier-phase ambiguities in an innovative way. The proposed approach adopts an efficient search strategy to estimate the vehicle's attitude parameters using ambiguous carrier-phase observations directly, without requiring prior carrier-phase ambiguity fixing. The performance of the proposed method is evaluated via simulations and experimentally utilizing data collected using multiple GNSS receivers. The simulation and experimental results demonstrate excellent performance, with the proposed method outperforming the ambiguity function method, the constrained LAMBDA and multivariate constrained LAMBDA methods, three prominent attitude determination algorithms.

preprint2022arXiv

Grassmann Stein Variational Gradient Descent

Stein variational gradient descent (SVGD) is a deterministic particle inference algorithm that provides an efficient alternative to Markov chain Monte Carlo. However, SVGD has been found to suffer from variance underestimation when the dimensionality of the target distribution is high. Recent developments have advocated projecting both the score function and the data onto real lines to sidestep this issue, although this can severely overestimate the epistemic (model) uncertainty. In this work, we propose Grassmann Stein variational gradient descent (GSVGD) as an alternative approach, which permits projections onto arbitrary dimensional subspaces. Compared with other variants of SVGD that rely on dimensionality reduction, GSVGD updates the projectors simultaneously for the score function and the data, and the optimal projectors are determined through a coupled Grassmann-valued diffusion process which explores favourable subspaces. Both our theoretical and experimental results suggest that GSVGD enjoys efficient state-space exploration in high-dimensional problems that have an intrinsic low-dimensional structure.

preprint2022arXiv

Instantaneous GNSS Ambiguity Resolution and Attitude Determination via Riemannian Manifold Optimization

We present an ambiguity resolution method for Global Navigation Satellite System (GNSS)-based attitude determination. A GNSS attitude model with nonlinear constraints is used to rigorously incorporate a priori information. Given the characteristics of the employed nonlinear constraints, we formulate GNSS attitude determination as an optimization problem on a manifold. Then, Riemannian manifold optimization algorithms are utilized to aid ambiguity resolution based on a proposed decomposition of the objective function. The application of manifold geometry enables high-quality float solutions that are critical to reinforcing search-based integer ambiguity resolution in terms of efficiency, availability, and reliability. The proposed approach is characterized by a low computational complexity and a high probability of resolving the ambiguities correctly. The performance of the proposed ambiguity resolution method is tested through a series of simulations and real experiments. Comparisons with the principal benchmarks indicate the superiority of the proposed method as reflected by the high ambiguity resolution success rates.

preprint2022arXiv

Machine Learning in Heterogeneous Porous Materials

The "Workshop on Machine learning in heterogeneous porous materials" brought together international scientific communities of applied mathematics, porous media, and material sciences with experts in the areas of heterogeneous materials, machine learning (ML) and applied mathematics to identify how ML can advance materials research. Within the scope of ML and materials research, the goal of the workshop was to discuss the state-of-the-art in each community, promote crosstalk and accelerate multi-disciplinary collaborative research, and identify challenges and opportunities. As the end result, four topic areas were identified: ML in predicting materials properties, and discovery and design of novel materials, ML in porous and fractured media and time-dependent phenomena, Multi-scale modeling in heterogeneous porous materials via ML, and Discovery of materials constitutive laws and new governing equations. This workshop was part of the AmeriMech Symposium series sponsored by the National Academies of Sciences, Engineering and Medicine and the U.S. National Committee on Theoretical and Applied Mechanics.

preprint2022arXiv

Strong approximation for fractional wave equation forced by fractional Brownian motion with Hurst parameter $H\in(0,\frac{1}{2})$

We consider the time discretization of fractional stochastic wave equation with Gaussian noise, which is negatively correlated. Major obstacles to design and analyze time discretization of stochastic wave equation come from the approximation of stochastic convolution with respect to fractional Brownian motion. Firstly, we discuss the smoothing properties of stochastic convolution by using integration by parts and covariance function of fractional Brownian motion. Then the regularity estimates of the mild solution of fractional stochastic wave equation are obtained. Next, we design the time discretization of stochastic convolution by integration by parts. Combining stochastic trigonometric method and approximation of stochastic convolution, the time discretization of stochastic wave equation is achieved. We derive the error estimates of the time discretization. Under certain assumptions, the strong convergence rate of the numerical scheme proposed in this paper can reach $\frac{1}{2}+H$. Finally, the convergence rate and computational efficiency of the numerical scheme are illustrated by numerical experiments.

preprint2021arXiv

A New Dataset, Poisson GAN and AquaNet for Underwater Object Grabbing

To boost the object grabbing capability of underwater robots for open-sea farming, we propose a new dataset (UDD) consisting of three categories (seacucumber, seaurchin, and scallop) with 2,227 images. To the best of our knowledge, it is the first 4K HD dataset collected in a real open-sea farm. We also propose a novel Poisson-blending Generative Adversarial Network (Poisson GAN) and an efficient object detection network (AquaNet) to address two common issues within related datasets: the class-imbalance problem and the problem of mass small object, respectively. Specifically, Poisson GAN combines Poisson blending into its generator and employs a new loss called Dual Restriction loss (DR loss), which supervises both implicit space features and image-level features during training to generate more realistic images. By utilizing Poisson GAN, objects of minority class like seacucumber or scallop could be added into an image naturally and annotated automatically, which could increase the loss of minority classes during training detectors to eliminate the class-imbalance problem; AquaNet is a high-efficiency detector to address the problem of detecting mass small objects from cloudy underwater pictures. Within it, we design two efficient components: a depth-wise-convolution-based Multi-scale Contextual Features Fusion (MFF) block and a Multi-scale Blursampling (MBP) module to reduce the parameters of the network to 1.3 million. Both two components could provide multi-scale features of small objects under a short backbone configuration without any loss of accuracy. In addition, we construct a large-scale augmented dataset (AUDD) and a pre-training dataset via Poisson GAN from UDD. Extensive experiments show the effectiveness of the proposed Poisson GAN, AquaNet, UDD, AUDD, and pre-training dataset.

preprint2021arXiv

Pushing the Envelope of Thin Crack Detection

In this study, we consider the problem of detecting cracks from the image of a concrete surface for automated inspection of infrastructure, such as bridges. Its overall accuracy is determined by how accurately thin cracks with sub-pixel widths can be detected. Our interest is in making it possible to detect cracks close to the limit of thinness if it can be defined. Toward this end, we first propose a method for training a CNN to make it detect cracks more accurately than humans while training them on human-annotated labels. To achieve this seemingly impossible goal, we intentionally lower the spatial resolution of input images while maintaining that of their labels when training a CNN. This makes it possible to annotate cracks that are too thin for humans to detect, which we call super-human labels. We experimentally show that this makes it possible to detect cracks from an image of one-third the resolution of images used for annotation with about the same accuracy. We additionally propose three methods for further improving the detection accuracy of thin cracks: i) P-pooling to maintain small image structures during downsampling operations; ii) Removal of short-segment cracks in a post-processing step utilizing a prior of crack shapes learned using the VAE-GAN framework; iii) Modeling uncertainty of the prediction to better handle hard labels beyond the limit of CNNs' detection ability, which technically work as noisy labels. We experimentally examine the effectiveness of these methods.

preprint2021arXiv

TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models

The memory capacity of embedding tables in deep learning recommendation models (DLRMs) is increasing dramatically from tens of GBs to TBs across the industry. Given the fast growth in DLRMs, novel solutions are urgently needed, in order to enable fast and efficient DLRM innovations. At the same time, this must be done without having to exponentially increase infrastructure capacity demands. In this paper, we demonstrate the promising potential of Tensor Train decomposition for DLRMs (TT-Rec), an important yet under-investigated context. We design and implement optimized kernels (TT-EmbeddingBag) to evaluate the proposed TT-Rec design. TT-EmbeddingBag is 3 times faster than the SOTA TT implementation. The performance of TT-Rec is further optimized with the batched matrix multiplication and caching strategies for embedding vector lookup operations. In addition, we present mathematically and empirically the effect of weight initialization distribution on DLRM accuracy and propose to initialize the tensor cores of TT-Rec following the sampled Gaussian distribution. We evaluate TT-Rec across three important design space dimensions -- memory capacity, accuracy, and timing performance -- by training MLPerf-DLRM with Criteo's Kaggle and Terabyte data sets. TT-Rec achieves 117 times and 112 times model size compression, for Kaggle and Terabyte, respectively. This impressive model size reduction can come with no accuracy nor training time overhead as compared to the uncompressed baseline.

preprint2020arXiv

Gyrokinetics investigations of an I-mode pedestal on Alcator C-Mod

Naturally stable to ELMs, and with widths larger than EPED predictions, the I-modes are an excellent laboratory for investigating the role of drift micro-instabilities in pedestals since I-mode pedestal are not "limited" by MHD instabilities. We present here a study based on gyrokinetic simulations (using GENE) to model fluctuations and heat transport in the I-mode pedestals in C-Mod. We find the Weakly Coherent Mode observed on C-Mod I-mode to be an electrostatic Ion Temperature Gradient/Impurity density gradient (ITG/Impurity) driven mode. The ITG/Impurity mode match frequency and the impurity confinement time observed on the I-mode. Nonlinear ETG simulations, can match experimental heat flux with profile adjustment well within experimental error bars. Simulations, varying impurity level (Zeff) and temperature and density profiles (within experimental error bars), are used to probe the sensitivity of fluctuations and transport.

preprint2020arXiv

Higher order approximation for stochastic wave equation

The infinitesimal generator (fractional Laplacian) of a process obtained by subordinating a killed Brownian motion catches the power-law attenuation of wave propagation. This paper studies the numerical schemes for the stochastic wave equation with fractional Laplacian as the space operator, the noise term of which is an infinite dimensional Brownian motion or fractional Brownian motion (fBm). Firstly, we establish the regularity of the mild solution of the stochastic fractional wave equation. Then a spectral Galerkin method is used for the approximation in space, and the space convergence rate is improved by postprocessing the infinite dimensional Gaussian noise. In the temporal direction, when the time derivative of the mild solution is bounded in the sense of mean-squared $L^p$-norm, we propose a modified stochastic trigonometric method, getting a higher strong convergence rate than the existing results, i.e., the time convergence rate is bigger than $1$. Particularly, for time discretization, the provided method can achieve an order of $2$ at the expenses of requiring some extra regularity to the mild solution. The theoretical error estimates are confirmed by numerical experiments.

preprint2020arXiv

Numerical methods for the two-dimensional Fokker-Planck equation governing the probability density function of the tempered fractional Brownian motion

In this paper, we study the numerical schemes for the two-dimensional Fokker-Planck equation governing the probability density function of the tempered fractional Brownian motion. The main challenges of the numerical schemes come from the singularity in the time direction. When $0<H<0.5$, a change of variables $\partial \left(t^{2H}\right)=2Ht^{2H-1}\partial t$ avoids the singularity of numerical computation at $t=0$, which naturally results in nonuniform time discretization and greatly improves the computational efficiency. For $0.5<H<1$, the time span dependent numerical scheme and nonuniform time discretization are introduced to ensure the effectiveness of the calculation and the computational efficiency. By numerically solving the corresponding Fokker-Planck equation, we obtain the mean squared displacement of stochastic processes, which conforms to the characteristics of the tempered fractional Brownian motion.

preprint2020arXiv

Restoring Images with Unknown Degradation Factors by Recurrent Use of a Multi-branch Network

The employment of convolutional neural networks has achieved unprecedented performance in the task of image restoration for a variety of degradation factors. However, high-performance networks have been specifically designed for a single degradation factor. In this paper, we tackle a harder problem, restoring a clean image from its degraded version with an unknown degradation factor, subject to the condition that it is one of the known factors. Toward this end, we design a network having multiple pairs of input and output branches and use it in a recurrent fashion such that a different branch pair is used at each of the recurrent paths. We reinforce the shared part of the network with improved components so that it can handle different degradation factors. We also propose a two-step training method for the network, which consists of multi-task learning and finetuning. The experimental results show that the proposed network yields at least comparable or sometimes even better performance on four degradation factors as compared with the best dedicated network for each of the four. We also test it on a further harder task where the input image contains multiple degradation factors that are mixed with unknown mixture ratios, showing that it achieves better performance than the previous state-of-the-art method designed for the task.

preprint2019arXiv

Numerical approximation for fractional diffusion equation forced by a tempered fractional Gaussian noise

This paper discusses the fractional diffusion equation forced by a tempered fractional Gaussian noise. The fractional diffusion equation governs the probability density function of the subordinated killed Brownian motion. The tempered fractional Gaussian noise plays the role of fluctuating external source with the property of localization. We first establish the regularity of the infinite dimensional stochastic integration of the tempered fractional Brownian motion and then build the regularity of the mild solution of the fractional stochastic diffusion equation. The spectral Galerkin method is used for space approximation; after that the system is transformed into an equivalent form having better regularity than the original one in time. Then we use the semi-implicit Euler scheme to discretize the time derivative. In terms of the temporal-spatial error splitting technique, we obtain the error estimates of the fully discrete scheme in the sense of mean-squared $L^2$-norm. Extensive numerical experiments confirm the theoretical estimates.