Source author record

Xiaoming Xu

Xiaoming Xu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision math.PR Artificial Intelligence Computer Science and Game Theory cond-mat.stat-mech Data Structures and Algorithms Hardware Architecture Machine Learning Performance physics.comp-ph physics.soc-ph

Catalog footprint

What is connected

9works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

The efficiency of attention is important due to its quadratic time complexity. We enhance the efficiency of attention through two key contributions: First, we leverage the new FP4 Tensor Cores in Blackwell GPUs to accelerate attention computation. Our implementation achieves 1038 TOPS on RTX5090, which is a 5x speedup over the fastest FlashAttention on RTX5090. Experiments show that our FP4 attention can accelerate inference of various models in a plug-and-play way. Second, we pioneer low-bit attention to training tasks. Existing low-bit attention works like FlashAttention3 and SageAttention focus only on inference. However, the efficiency of training large models is also important. To explore whether low-bit attention can be effectively applied to training tasks, we design an accurate and efficient 8-bit attention for both forward and backward propagation. Experiments indicate that 8-bit attention achieves lossless performance in fine-tuning tasks but exhibits slower convergence in pretraining tasks. The code is available at https://github.com/thu-ml/SageAttention.

preprint2023arXiv

Semi-MAE: Masked Autoencoders for Semi-supervised Vision Transformers

Vision Transformer (ViT) suffers from data scarcity in semi-supervised learning (SSL). To alleviate this issue, inspired by masked autoencoder (MAE), which is a data-efficient self-supervised learner, we propose Semi-MAE, a pure ViT-based SSL framework consisting of a parallel MAE branch to assist the visual representation learning and make the pseudo labels more accurate. The MAE branch is designed as an asymmetric architecture consisting of a lightweight decoder and a shared-weights encoder. We feed the weakly-augmented unlabeled data with a high masking ratio to the MAE branch and reconstruct the missing pixels. Semi-MAE achieves 75.9% top-1 accuracy on ImageNet with 10% labels, surpassing prior state-of-the-art in semi-supervised image classification. In addition, extensive experiments demonstrate that Semi-MAE can be readily used for other ViT models and masked image modeling methods.

preprint2023arXiv

YOLOv6 v3.0: A Full-Scale Reloading

The YOLO community has been in high spirits since our first two releases! By the advent of Chinese New Year 2023, which sees the Year of the Rabbit, we refurnish YOLOv6 with numerous novel enhancements on the network architecture and the training scheme. This release is identified as YOLOv6 v3.0. For a glimpse of performance, our YOLOv6-N hits 37.5% AP on the COCO dataset at a throughput of 1187 FPS tested with an NVIDIA Tesla T4 GPU. YOLOv6-S strikes 45.0% AP at 484 FPS, outperforming other mainstream detectors at the same scale (YOLOv5-S, YOLOv8-S, YOLOX-S and PPYOLOE-S). Whereas, YOLOv6-M/L also achieve better accuracy performance (50.0%/52.8% respectively) than other detectors at a similar inference speed. Additionally, with an extended backbone and neck design, our YOLOv6-L6 achieves the state-of-the-art accuracy in real-time. Extensive experiments are carefully conducted to validate the effectiveness of each improving component. Our code is made available at https://github.com/meituan/YOLOv6.

preprint2022arXiv

YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications

For years, the YOLO series has been the de facto industry-level standard for efficient object detection. The YOLO community has prospered overwhelmingly to enrich its use in a multitude of hardware platforms and abundant scenarios. In this technical report, we strive to push its limits to the next level, stepping forward with an unwavering mindset for industry application. Considering the diverse requirements for speed and accuracy in the real environment, we extensively examine the up-to-date object detection advancements either from industry or academia. Specifically, we heavily assimilate ideas from recent network design, training strategies, testing techniques, quantization, and optimization methods. On top of this, we integrate our thoughts and practice to build a suite of deployment-ready networks at various scales to accommodate diversified use cases. With the generous permission of YOLO authors, we name it YOLOv6. We also express our warm welcome to users and contributors for further enhancement. For a glimpse of performance, our YOLOv6-N hits 35.9% AP on the COCO dataset at a throughput of 1234 FPS on an NVIDIA Tesla T4 GPU. YOLOv6-S strikes 43.5% AP at 495 FPS, outperforming other mainstream detectors at the same scale~(YOLOv5-S, YOLOX-S, and PPYOLOE-S). Our quantized version of YOLOv6-S even brings a new state-of-the-art 43.3% AP at 869 FPS. Furthermore, YOLOv6-M/L also achieves better accuracy performance (i.e., 49.5%/52.3%) than other detectors with a similar inference speed. We carefully conducted experiments to validate the effectiveness of each component. Our code is made available at https://github.com/meituan/YOLOv6.

preprint2013arXiv

Anticipated backward doubly stochastic differential equations

In this paper, we deal with a new type of differential equations called anticipated backward doubly stochastic differential equations (anticipated BDSDEs). The coefficients of these BDSDEs depend on the future value of the solution $(Y, Z)$. We obtain the existence and uniqueness theorem and a comparison theorem for the solutions of these equations. Besides, as an application, we also establish a duality between the anticipated BDSDEs and the delayed doubly stochastic differential equations (delayed DSDEs).

preprint2013arXiv

Constant-Competitive Prior-Free Auction with Ordered Bidders

A central problem in Microeconomics is to design auctions with good revenue properties. In this setting, the bidders' valuations for the items are private knowledge, but they are drawn from publicly known prior distributions. The goal is to find a truthful auction (no bidder can gain in utility by misreporting her valuation) that maximizes the expected revenue. Naturally, the optimal-auction is sensitive to the prior distributions. An intriguing question is to design a truthful auction that is oblivious to these priors, and yet manages to get a constant factor of the optimal revenue. Such auctions are called prior-free. Goldberg et al. presented a constant-approximate prior-free auction when there are identical copies of an item available in unlimited supply, bidders are unit-demand, and their valuations are drawn from i.i.d. distributions. The recent work of Leonardi et al. [STOC 2012] generalized this problem to non i.i.d. bidders, assuming that the auctioneer knows the ordering of their reserve prices. Leonardi et al. proposed a prior-free auction that achieves a $O(\log^* n)$ approximation. We improve upon this result, by giving the first prior-free auction with constant approximation guarantee.

preprint2013arXiv

Fully Coupled Forward-Backward Stochastic Functional Differential Equations and Applications to Quadratic Optimal Control

In this paper, we consider the fully coupled forward-backward stochastic functional differential equations (FBSFDEs) with stochastic functional differential equations as the forward equations and the generalized anticipated backward stochastic differential equations as the backward equations. We will prove the existence and uniqueness theorem for FBSFDEs. As an application, we deal with a quadratic optimal control problem for functional stochastic systems, and get the explicit form of the optimal control by virtue of FBSFDEs.

preprint2013arXiv

Percolation of a general network of networks

Percolation theory is an approach to study vulnerability of a system. We develop analytical framework and analyze percolation properties of a network composed of interdependent networks (NetONet). Typically, percolation of a single network shows that the damage in the network due to a failure is a continuous function of the fraction of failed nodes. In sharp contrast, in NetONet, due to the cascading failures, the percolation transition may be discontinuous and even a single node failure may lead to abrupt collapse of the system. We demonstrate our general framework for a NetONet composed of $n$ classic Erdős-Rényi (ER) networks, where each network depends on the same number $m$ of other networks, i.e., a random regular network of interdependent ER networks. In contrast to a \emph{treelike} NetONet in which the size of the largest connected cluster (mutual component) depends on $n$, the loops in the RR NetONet cause the largest connected cluster to depend only on $m$. We also analyzed the extremely vulnerable feedback condition of coupling. In the case of ER networks, the NetONet only exhibits two phases, a second order phase transition and collapse, and there is no first phase transition regime unlike the no feedback condition. In the case of NetONet composed of RR networks, there exists a first order phase transition when $q$ is large and second order phase transition when $q$ is small. Our results can help in designing robust interdependent systems.

preprint2011arXiv

Necessary and sufficient condition for the comparison theorem of multidimensional anticipated backward stochastic differential equations

Anticipated backward stochastic differential equations, studied the first time in 2007, are equations of the following type: {tabular}{rlll} $-dY_t$ &=& $f(t, Y_t, Z_t, Y_{t+δ(t)}, Z_{t+ζ(t)})dt-Z_tdB_t, $ & $ t\in[0, T];$ $Y_t$ &=& $ξ_t, $ & $t\in[T, T+K];$ $Z_t$ &=& $η_t, $ & $t\in[T, T+K].$ In this paper, we give a necessary and sufficient condition under which the comparison theorem holds for multidimensional anticipated backward stochastic differential equations with generators independent of the anticipated term of $Z$.

Xiaoming Xu

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

Semi-MAE: Masked Autoencoders for Semi-supervised Vision Transformers

YOLOv6 v3.0: A Full-Scale Reloading

YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications

Anticipated backward doubly stochastic differential equations

Constant-Competitive Prior-Free Auction with Ordered Bidders

Fully Coupled Forward-Backward Stochastic Functional Differential Equations and Applications to Quadratic Optimal Control

Percolation of a general network of networks

Necessary and sufficient condition for the comparison theorem of multidimensional anticipated backward stochastic differential equations