Source author record

Xiaoyi Dong

Xiaoyi Dong appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision astro-ph.GA Machine Learning astro-ph.HE astro-ph.SR

Catalog footprint

What is connected

10works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Bootstrapped Masked Autoencoders for Vision BERT Pretraining

We propose bootstrapped masked autoencoders (BootMAE), a new approach for vision BERT pretraining. BootMAE improves the original masked autoencoders (MAE) with two core designs: 1) momentum encoder that provides online feature as extra BERT prediction targets; 2) target-aware decoder that tries to reduce the pressure on the encoder to memorize target-specific information in BERT pretraining. The first design is motivated by the observation that using a pretrained MAE to extract the features as the BERT prediction target for masked tokens can achieve better pretraining performance. Therefore, we add a momentum encoder in parallel with the original MAE encoder, which bootstraps the pretraining performance by using its own representation as the BERT prediction target. In the second design, we introduce target-specific information (e.g., pixel values of unmasked patches) from the encoder directly to the decoder to reduce the pressure on the encoder of memorizing the target-specific information. Thus, the encoder focuses on semantic modeling, which is the goal of BERT pretraining, and does not need to waste its capacity in memorizing the information of unmasked tokens related to the prediction target. Through extensive experiments, our BootMAE achieves $84.2\%$ Top-1 accuracy on ImageNet-1K with ViT-B backbone, outperforming MAE by $+0.8\%$ under the same pre-training epochs. BootMAE also gets $+1.0$ mIoU improvements on semantic segmentation on ADE20K and $+1.3$ box AP, $+1.4$ mask AP improvement on object detection and segmentation on COCO dataset. Code is released at https://github.com/LightDXY/BootMAE.

preprint2022arXiv

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

We present CSWin Transformer, an efficient and effective Transformer-based backbone for general-purpose vision tasks. A challenging issue in Transformer design is that global self-attention is very expensive to compute whereas local self-attention often limits the field of interactions of each token. To address this issue, we develop the Cross-Shaped Window self-attention mechanism for computing self-attention in the horizontal and vertical stripes in parallel that form a cross-shaped window, with each stripe obtained by splitting the input feature into stripes of equal width. We provide a mathematical analysis of the effect of the stripe width and vary the stripe width for different layers of the Transformer network which achieves strong modeling capability while limiting the computation cost. We also introduce Locally-enhanced Positional Encoding (LePE), which handles the local positional information better than existing encoding schemes. LePE naturally supports arbitrary input resolutions, and is thus especially effective and friendly for downstream tasks. Incorporated with these designs and a hierarchical structure, CSWin Transformer demonstrates competitive performance on common vision tasks. Specifically, it achieves 85.4\% Top-1 accuracy on ImageNet-1K without any extra training data or label, 53.9 box AP and 46.4 mask AP on the COCO detection task, and 52.2 mIOU on the ADE20K semantic segmentation task, surpassing previous state-of-the-art Swin Transformer backbone by +1.2, +2.0, +1.4, and +2.0 respectively under the similar FLOPs setting. By further pretraining on the larger dataset ImageNet-21K, we achieve 87.5% Top-1 accuracy on ImageNet-1K and high segmentation performance on ADE20K with 55.7 mIoU. The code and models are available at https://github.com/microsoft/CSWin-Transformer.

preprint2022arXiv

Mobile-Former: Bridging MobileNet and Transformer

We present Mobile-Former, a parallel design of MobileNet and transformer with a two-way bridge in between. This structure leverages the advantages of MobileNet at local processing and transformer at global interaction. And the bridge enables bidirectional fusion of local and global features. Different from recent works on vision transformer, the transformer in Mobile-Former contains very few tokens (e.g. 6 or fewer tokens) that are randomly initialized to learn global priors, resulting in low computational cost. Combining with the proposed light-weight cross attention to model the bridge, Mobile-Former is not only computationally efficient, but also has more representation power. It outperforms MobileNetV3 at low FLOP regime from 25M to 500M FLOPs on ImageNet classification. For instance, Mobile-Former achieves 77.9\% top-1 accuracy at 294M FLOPs, gaining 1.3\% over MobileNetV3 but saving 17\% of computations. When transferring to object detection, Mobile-Former outperforms MobileNetV3 by 8.6 AP in RetinaNet framework. Furthermore, we build an efficient end-to-end detector by replacing backbone, encoder and decoder in DETR with Mobile-Former, which outperforms DETR by 1.1 AP but saves 52\% of computational cost and 36\% of parameters.

preprint2022arXiv

Protecting Celebrities from DeepFake with Identity Consistency Transformer

In this work we propose Identity Consistency Transformer, a novel face forgery detection method that focuses on high-level semantics, specifically identity information, and detecting a suspect face by finding identity inconsistency in inner and outer face regions. The Identity Consistency Transformer incorporates a consistency loss for identity consistency determination. We show that Identity Consistency Transformer exhibits superior generalization ability not only across different datasets but also across various types of image degradation forms found in real-world applications including deepfake videos. The Identity Consistency Transformer can be easily enhanced with additional identity information when such information is available, and for this reason it is especially well-suited for detecting face forgeries involving celebrities. Code will be released at \url{https://github.com/LightDXY/ICT_DeepFake}

preprint2022arXiv

Shape-invariant 3D Adversarial Point Clouds

Adversary and invisibility are two fundamental but conflict characters of adversarial perturbations. Previous adversarial attacks on 3D point cloud recognition have often been criticized for their noticeable point outliers, since they just involve an "implicit constrain" like global distance loss in the time-consuming optimization to limit the generated noise. While point cloud is a highly structured data format, it is hard to constrain its perturbation with a simple loss or metric properly. In this paper, we propose a novel Point-Cloud Sensitivity Map to boost both the efficiency and imperceptibility of point perturbations. This map reveals the vulnerability of point cloud recognition models when encountering shape-invariant adversarial noises. These noises are designed along the shape surface with an "explicit constrain" instead of extra distance loss. Specifically, we first apply a reversible coordinate transformation on each point of the point cloud input, to reduce one degree of point freedom and limit its movement on the tangent plane. Then we calculate the best attacking direction with the gradients of the transformed point cloud obtained on the white-box model. Finally we assign each point with a non-negative score to construct the sensitivity map, which benefits both white-box adversarial invisibility and black-box query-efficiency extended in our work. Extensive evaluations prove that our method can achieve the superior performance on various point cloud recognition models, with its satisfying adversarial imperceptibility and strong resistance to different point cloud defense settings. Our code is available at: https://github.com/shikiw/SI-Adv.

preprint2016arXiv

A survey of luminous high-redshift quasars with SDSS and WISE II. the bright end of the quasar luminosity function at z ~ 5

This is the second paper in a series on a new luminous z ~ 5 quasar survey using optical and near-infrared colors. Here we present a new determination of the bright end of the quasar luminosity function (QLF) at z ~ 5. Combined our 45 new quasars with previously known quasars that satisfy our selections, we construct the largest uniform luminous z ~ 5 quasar sample to date, with 99 quasars in the range 4.7 <= z < 5.4 and -29 < M1450 <= -26.8, within the Sloan Digital Sky Survey (SDSS) footprint. We use a modified 1/Va method including flux limit correction to derive a binned QLF, and we model the parametric QLF using maximum likelihood estimation. With the faint-end slope of the QLF fixed as alpha = -2.03 from previous deeper samples, the best fit of our QLF gives a flatter bright end slope beta = -3.58+/-0.24 and a fainter break magnitude M*1450 = -26.98+/-0.23 than previous studies at similar redshift. Combined with previous work at lower and higher redshifts, our result is consistent with a luminosity evolution and density evolution (LEDE) model. Using the best fit QLF, the contribution of quasars to the ionizing background at z ~ 5 is found to be 18% - 45% with a clumping factor C of 2 - 5. Our sample suggests an evolution of radio loud fraction with optical luminosity but no obvious evolution with redshift.

preprint2016arXiv

A survey of luminous high-redshift quasars with SDSS and WISE. I. target selection and optical spectroscopy

High-redshift quasars are important tracers of structure and evolution in the early universe. However, they are very rare and difficult to find when using color selection because of contamination from late-type dwarfs. High-redshift quasar surveys based on only optical colors suffer from incompleteness and low identification efficiency, especially at $z\gtrsim4.5$. We have developed a new method to select $4.7\lesssim z \lesssim 5.4$ quasars with both high efficiency and completeness by combining optical and mid-IR Wide-field Infrared Survey Explorer (WISE) photometric data, and are conducting a luminous $z\sim5$ quasar survey in the whole Sloan Digital Sky Survey (SDSS) footprint. We have spectroscopically observed 99 out of 110 candidates with $z$-band magnitudes brighter than 19.5 and 64 (64.6\%) of them are quasars with redshifts of $4.4\lesssim z \lesssim 5.5$ and absolute magnitudes of $-29\lesssim M_{1450} \lesssim -26.4$. In addition, we also observed 14 fainter candidates selected with the same criteria and identified 8 (57.1\%) of them as quasars with $4.7<z<5.4$ . Among 72 newly identified quasars, 12 of them are at $5.2 < z < 5.7$, which leads to an increase of $\sim$36\% of the number of known quasars at this redshift range. More importantly, our identifications doubled the number of quasars with $M_{1450}<-27.5$ at $z>4.5$, which will set strong constraints on the bright end of the quasar luminosity function. We also expand our method to select quasars at $z\gtrsim5.7$. In this paper we report the discovery of four new luminous $z\gtrsim5.7$ quasars based on SDSS-WISE selection.

preprint2016arXiv

Herschel observed Stripe 82 quasars and their host galaxies: connections between the AGN activity and the host galaxy star formation

In this work, we present a study of 207 quasars selected from the Sloan Digital Sky Survey quasar catalogs and the Herschel Stripe 82 survey. Quasars within this sample are high luminosity quasars with a mean bolometric luminosity of $10^{46.4}$ erg s$^{-1}$. The redshift range of this sample is within $z<4$, with a mean value of $1.5\pm0.78$. Because we only selected quasars that have been detected in all three Herschel-SPIRE bands, the quasar sample is complete yet highly biased. Based on the multi-wavelength photometric observation data, we conducted a spectral energy distribution (SED) fitting through UV to FIR. Parameters such as active galactic nucleus (AGN) luminosity, FIR luminosity, stellar mass, as well as many other AGN and galaxy properties are deduced from the SED fitting results. The mean star formation rate (SFR) of the sample is 419 $M_{\odot}$ yr$^{-1}$ and the mean gas mass is $\sim 10^{11.3}$ $M_{\odot}$. All these results point to an IR luminous quasar system. Comparing with star formation main sequence (MS) galaxies, at least 80 out of 207 quasars are hosted by starburst galaxies. It supports the statement that luminous AGNs are more likely to be associated with major mergers. The SFR increases with the redshift up to $z=2$. It is correlated with the AGN bolometric luminosity, where $L_{\rm FIR} \propto L_{\rm Bol}^{0.46\pm0.03}$. The AGN bolometric luminosity is also correlated with the host galaxy mass and gas mass. Yet the correlation between $L_{\rm FIR}$ and $L_{\rm Bol}$ has higher significant level, implies that the link between AGN accretion and the SFR is more primal. The $M_{\rm BH}/M_{\ast}$ ratio of our sample is 0.02, higher than the value 0.005 in the local Universe. It might indicate an evolutionary trend of the $M_{\rm BH} - M_{\ast}$ scaling relation.

preprint2015arXiv

The Large Sky Area Multi-Object Fiber Spectroscopic Telescope Quasar Survey: Quasar Properties from First Data Release

We present preliminary results of the quasar survey in Large Sky Area Multi- Object Fiber Spectroscopic Telescope (LAMOST) first data release (DR1), which includes pilot survey and the first year regular survey. There are 3921 quasars identified with reliability, among which 1180 are new quasars discovered in the survey. These quasars are at low to median redshifts, with highest z of 4.83. We compile emission line measurements around the Hα, Hβ, Mg II, and C IV regions for the new quasars. The continuum luminosities are inferred from SDSS photo- metric data with model fitting as the spectra in DR1 are non-flux-calibrated. We also compile the virial black hole mass estimates, and flags indicating the selec- tion methods, broad absorption line quasars. The catalog and spectra for these quasars are available online. 28% of the 3921 quasars are selected with optical- infrared colours independently, indicating that the method is quite promising in completeness of quasar survey. LAMOST DR1 and the on-going quasar survey will provide valuable data in the studies of quasars.

preprint2014arXiv

A feedback-driven bubble G24.136+00.436: a possible site of triggered star formation

We present a multi-wavelength study of the IR bubble G24.136+00.436. The J=1-0 observations of $^{12}$CO, $^{13}$CO and C$^{18}$O were carried out with the Purple Mountain Observatory 13.7 m telescope. Molecular gas with a velocity of 94.8 km s$^{-1}$ is found prominently in the southeast of the bubble, shaping as a shell with a total mass of $\sim2\times10^{4}$ $M_{\odot}$. It is likely assembled during the expansion of the bubble. The expanding shell consists of six dense cores. Their dense (a few of $10^{3}$ cm$^{-3}$) and massive (a few of $10^{3}$ $M_{\odot}$) characteristics coupled with the broad linewidths ($>$ 2.5 km s$^{-1}$) suggest they are promising sites of forming high-mass stars or clusters. This could be further consolidated by the detection of compact HII regions in Cores A and E. We tentatively identified and classified 63 candidate YSOs based on the \emph{Spitzer} and UKIDSS data. They are found to be dominantly distributed in regions with strong emission of molecular gas, indicative of active star formation especially in the shell. The HII region inside the bubble is mainly ionized by a $\sim$O8V star(s), of the dynamical age $\sim$1.6 Myr. The enhanced number of candidate YSOs and secondary star formation in the shell as well as time scales involved, indicate a possible scenario of triggering star formation, signified by the "collect and collapse" process.

Xiaoyi Dong

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Bootstrapped Masked Autoencoders for Vision BERT Pretraining

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

Mobile-Former: Bridging MobileNet and Transformer

Protecting Celebrities from DeepFake with Identity Consistency Transformer

Shape-invariant 3D Adversarial Point Clouds

A survey of luminous high-redshift quasars with SDSS and WISE II. the bright end of the quasar luminosity function at z ~ 5

A survey of luminous high-redshift quasars with SDSS and WISE. I. target selection and optical spectroscopy

Herschel observed Stripe 82 quasars and their host galaxies: connections between the AGN activity and the host galaxy star formation

The Large Sky Area Multi-Object Fiber Spectroscopic Telescope Quasar Survey: Quasar Properties from First Data Release

A feedback-driven bubble G24.136+00.436: a possible site of triggered star formation