Source author record

Yong Guo

Yong Guo appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.mes-hall Machine Learning physics.optics eess.IV

Catalog footprint

What is connected

21works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

InsHuman: Towards Natural and Identity-Preserving Human Insertion

Human insertion aims to naturally place specific individuals into a target background. Although existing image editing models may have such ability, they often produce failure cases, including inappropriate human pose in new background, inconsistent number of people, and modified facial identity. Moreover, publicly available human datasets often lack full-body portraits and realistic physical interaction between humans and their background. To address these challenges, we propose InsHuman for natural and identity-preserving human insertion. Specifically, we propose Human-Background Adaptive Fusion (HBAF), which detects foreground humans to obtain a binary mask and applies region-aware weighting to align the human regions between predicted and ground-truth latents, ensuring the person's pose, count, and overall appearance are coherently adapted to the target background.We further propose Face-to-Face ID-Preserving (FFIP), which detects and matches faces between the generated image and the source image in terms of face recognition features to enforce identity consistency for each face.In addition, we propose Bidirectional Data Pairing (BDP) strategy to construct BDP-InsHuman, a high-quality dataset with realistic human-background interactions. Experiments demonstrate that InsHuman achieves significant improvements in generating plausible images while keeping human identity unchanged.

preprint2022arXiv

Improving Fine-tuning of Self-supervised Models with Contrastive Initialization

Self-supervised learning (SSL) has achieved remarkable performance in pretraining the models that can be further used in downstream tasks via fine-tuning. However, these self-supervised models may not capture meaningful semantic information since the images belonging to the same class are always regarded as negative pairs in the contrastive loss. Consequently, the images of the same class are often located far away from each other in learned feature space, which would inevitably hamper the fine-tuning process. To address this issue, we seek to provide a better initialization for the self-supervised models by enhancing the semantic information. To this end, we propose a Contrastive Initialization (COIN) method that breaks the standard fine-tuning pipeline by introducing an extra initialization stage before fine-tuning. Extensive experiments show that, with the enriched semantics, our COIN significantly outperforms existing methods without introducing extra training cost and sets new state-of-the-arts on multiple downstream tasks.

preprint2022arXiv

Improving Robustness by Enhancing Weak Subnets

Despite their success, deep networks have been shown to be highly susceptible to perturbations, often causing significant drops in accuracy. In this paper, we investigate model robustness on perturbed inputs by studying the performance of internal sub-networks (subnets). Interestingly, we observe that most subnets show particularly poor robustness against perturbations. More importantly, these weak subnets are correlated with the overall lack of robustness. Tackling this phenomenon, we propose a new training procedure that identifies and enhances weak subnets (EWS) to improve robustness. Specifically, we develop a search algorithm to find particularly weak subnets and explicitly strengthen them via knowledge distillation from the full network. We show that EWS greatly improves both robustness against corrupted images as well as accuracy on clean data. Being complementary to popular data augmentation methods, EWS consistently improves robustness when combined with these approaches. To highlight the flexibility of our approach, we combine EWS also with popular adversarial training methods resulting in improved adversarial robustness.

preprint2021arXiv

Deep View Synthesis via Self-Consistent Generative Network

View synthesis aims to produce unseen views from a set of views captured by two or more cameras at different positions. This task is non-trivial since it is hard to conduct pixel-level matching among different views. To address this issue, most existing methods seek to exploit the geometric information to match pixels. However, when the distinct cameras have a large baseline (i.e., far away from each other), severe geometry distortion issues would occur and the geometric information may fail to provide useful guidance, resulting in very blurry synthesized images. To address the above issues, in this paper, we propose a novel deep generative model, called Self-Consistent Generative Network (SCGN), which synthesizes novel views from the given input views without explicitly exploiting the geometric information. The proposed SCGN model consists of two main components, i.e., a View Synthesis Network (VSN) and a View Decomposition Network (VDN), both employing an Encoder-Decoder structure. Here, the VDN seeks to reconstruct input views from the synthesized novel view to preserve the consistency of view synthesis. Thanks to VDN, SCGN is able to synthesize novel views without using any geometric rectification before encoding, making it easier for both training and applications. Finally, adversarial loss is introduced to improve the photo-realism of novel views. Both qualitative and quantitative comparisons against several state-of-the-art methods on two benchmark tasks demonstrated the superiority of our approach.

preprint2021arXiv

Pareto-Frontier-aware Neural Architecture Generation for Diverse Budgets

Designing feasible and effective architectures under diverse computation budgets incurred by different applications/devices is essential for deploying deep models in practice. Existing methods often perform an independent architecture search for each target budget, which is very inefficient yet unnecessary. Moreover, the repeated independent search manner would inevitably ignore the common knowledge among different search processes and hamper the search performance. To address these issues, we seek to train a general architecture generator that automatically produces effective architectures for an arbitrary budget merely via model inference. To this end, we propose a Pareto-Frontier-aware Neural Architecture Generator (NAG) which takes an arbitrary budget as input and produces the Pareto optimal architecture for the target budget. We train NAG by learning the Pareto frontier (i.e., the set of Pareto optimal architectures) over model performance and computational cost (e.g., latency). Extensive experiments on three platforms (i.e., mobile, CPU, and GPU) show the superiority of the proposed method over existing NAS methods.

preprint2021arXiv

Towards Accurate and Compact Architectures via Neural Architecture Transformer

Designing effective architectures is one of the key factors behind the success of deep neural networks. Existing deep architectures are either manually designed or automatically searched by some Neural Architecture Search (NAS) methods. However, even a well-designed/searched architecture may still contain many nonsignificant or redundant modules/operations. Thus, it is necessary to optimize the operations inside an architecture to improve the performance without introducing extra computational cost. To this end, we have proposed a Neural Architecture Transformer (NAT) method which casts the optimization problem into a Markov Decision Process (MDP) and seeks to replace the redundant operations with more efficient operations, such as skip or null connection. Note that NAT only considers a small number of possible transitions and thus comes with a limited search/transition space. As a result, such a small search space may hamper the performance of architecture optimization. To address this issue, we propose a Neural Architecture Transformer++ (NAT++) method which further enlarges the set of candidate transitions to improve the performance of architecture optimization. Specifically, we present a two-level transition rule to obtain valid transitions, i.e., allowing operations to have more efficient types (e.g., convolution->separable convolution) or smaller kernel sizes (e.g., 5x5->3x3). Note that different operations may have different valid transitions. We further propose a Binary-Masked Softmax (BMSoftmax) layer to omit the possible invalid transitions. Extensive experiments on several benchmark datasets show that the transformed architecture significantly outperforms both its original counterpart and the architectures optimized by existing methods.

preprint2020arXiv

Breaking the Curse of Space Explosion: Towards Efficient NAS with Curriculum Search

Neural architecture search (NAS) has become an important approach to automatically find effective architectures. To cover all possible good architectures, we need to search in an extremely large search space with billions of candidate architectures. More critically, given a large search space, we may face a very challenging issue of space explosion. However, due to the limitation of computational resources, we can only sample a very small proportion of the architectures, which provides insufficient information for the training. As a result, existing methods may often produce suboptimal architectures. To alleviate this issue, we propose a curriculum search method that starts from a small search space and gradually incorporates the learned knowledge to guide the search in a large space. With the proposed search strategy, our Curriculum Neural Architecture Search (CNAS) method significantly improves the search efficiency and finds better architectures than existing NAS methods. Extensive experiments on CIFAR-10 and ImageNet demonstrate the effectiveness of the proposed method.

preprint2020arXiv

Closed-loop Matters: Dual Regression Networks for Single Image Super-Resolution

Deep neural networks have exhibited promising performance in image super-resolution (SR) by learning a nonlinear mapping function from low-resolution (LR) images to high-resolution (HR) images. However, there are two underlying limitations to existing SR methods. First, learning the mapping function from LR to HR images is typically an ill-posed problem, because there exist infinite HR images that can be downsampled to the same LR image. As a result, the space of the possible functions can be extremely large, which makes it hard to find a good solution. Second, the paired LR-HR data may be unavailable in real-world applications and the underlying degradation method is often unknown. For such a more general case, existing SR models often incur the adaptation problem and yield poor performance. To address the above issues, we propose a dual regression scheme by introducing an additional constraint on LR data to reduce the space of the possible functions. Specifically, besides the mapping from LR to HR images, we learn an additional dual regression mapping estimates the down-sampling kernel and reconstruct LR images, which forms a closed-loop to provide additional supervision. More critically, since the dual regression process does not depend on HR images, we can directly learn from LR images. In this sense, we can easily adapt SR models to real-world data, e.g., raw video frames from YouTube. Extensive experiments with paired training data and unpaired real-world data demonstrate our superiority over existing methods.

preprint2020arXiv

Disturbance-immune Weight Sharing for Neural Architecture Search

Neural architecture search (NAS) has gained increasing attention in the community of architecture design. One of the key factors behind the success lies in the training efficiency created by the weight sharing (WS) technique. However, WS-based NAS methods often suffer from a performance disturbance (PD) issue. That is, the training of subsequent architectures inevitably disturbs the performance of previously trained architectures due to the partially shared weights. This leads to inaccurate performance estimation for the previous architectures, which makes it hard to learn a good search strategy. To alleviate the performance disturbance issue, we propose a new disturbance-immune update strategy for model updating. Specifically, to preserve the knowledge learned by previous architectures, we constrain the training of subsequent architectures in an orthogonal space via orthogonal gradient descent. Equipped with this strategy, we propose a novel disturbance-immune training scheme for NAS. We theoretically analyze the effectiveness of our strategy in alleviating the PD risk. Extensive experiments on CIFAR-10 and ImageNet verify the superiority of our method.

preprint2020arXiv

Hierarchical Neural Architecture Search for Single Image Super-Resolution

Deep neural networks have exhibited promising performance in image super-resolution (SR). Most SR models follow a hierarchical architecture that contains both the cell-level design of computational blocks and the network-level design of the positions of upsampling blocks. However, designing SR models heavily relies on human expertise and is very labor-intensive. More critically, these SR models often contain a huge number of parameters and may not meet the requirements of computation resources in real-world applications. To address the above issues, we propose a Hierarchical Neural Architecture Search (HNAS) method to automatically design promising architectures with different requirements of computation cost. To this end, we design a hierarchical SR search space and propose a hierarchical controller for architecture search. Such a hierarchical controller is able to simultaneously find promising cell-level blocks and network-level positions of upsampling layers. Moreover, to design compact architectures with promising performance, we build a joint reward by considering both the performance and computation cost to guide the search process. Extensive experiments on five benchmark datasets demonstrate the superiority of our method over existing methods.

preprint2020arXiv

Improving Generative Adversarial Networks with Local Coordinate Coding

Generative adversarial networks (GANs) have shown remarkable success in generating realistic data from some predefined prior distribution (e.g., Gaussian noises). However, such prior distribution is often independent of real data and thus may lose semantic information (e.g., geometric structure or content in images) of data. In practice, the semantic information might be represented by some latent distribution learned from data. However, such latent distribution may incur difficulties in data sampling for GANs. In this paper, rather than sampling from the predefined prior distribution, we propose an LCCGAN model with local coordinate coding (LCC) to improve the performance of generating data. First, we propose an LCC sampling method in LCCGAN to sample meaningful points from the latent manifold. With the LCC sampling method, we can exploit the local information on the latent manifold and thus produce new data with promising quality. Second, we propose an improved version, namely LCCGAN++, by introducing a higher-order term in the generator approximation. This term is able to achieve better approximation and thus further improve the performance. More critically, we derive the generalization bound for both LCCGAN and LCCGAN++ and prove that a low-dimensional input is sufficient to achieve good generalization performance. Extensive experiments on four benchmark datasets demonstrate the superiority of the proposed method over existing GANs.

preprint2020arXiv

Joint Wasserstein Distribution Matching

Joint distribution matching (JDM) problem, which aims to learn bidirectional mappings to match joint distributions of two domains, occurs in many machine learning and computer vision applications. This problem, however, is very difficult due to two critical challenges: (i) it is often difficult to exploit sufficient information from the joint distribution to conduct the matching; (ii) this problem is hard to formulate and optimize. In this paper, relying on optimal transport theory, we propose to address JDM problem by minimizing the Wasserstein distance of the joint distributions in two domains. However, the resultant optimization problem is still intractable. We then propose an important theorem to reduce the intractable problem into a simple optimization problem, and develop a novel method (called Joint Wasserstein Distribution Matching (JWDM)) to solve it. In the experiments, we apply our method to unsupervised image translation and cross-domain video synthesis. Both qualitative and quantitative comparisons demonstrate the superior performance of our method over several state-of-the-arts.

preprint2020arXiv

NAT: Neural Architecture Transformer for Accurate and Compact Architectures

Designing effective architectures is one of the key factors behind the success of deep neural networks. Existing deep architectures are either manually designed or automatically searched by some Neural Architecture Search (NAS) methods. However, even a well-searched architecture may still contain many non-significant or redundant modules or operations (e.g., convolution or pooling), which may not only incur substantial memory consumption and computation cost but also deteriorate the performance. Thus, it is necessary to optimize the operations inside an architecture to improve the performance without introducing extra computation cost. Unfortunately, such a constrained optimization problem is NP-hard. To make the problem feasible, we cast the optimization problem into a Markov decision process (MDP) and seek to learn a Neural Architecture Transformer (NAT) to replace the redundant operations with the more computationally efficient ones (e.g., skip connection or directly removing the connection). Based on MDP, we learn NAT by exploiting reinforcement learning to obtain the optimization policies w.r.t. different architectures. To verify the effectiveness of the proposed strategies, we apply NAT on both hand-crafted architectures and NAS based architectures. Extensive experiments on two benchmark datasets, i.e., CIFAR-10 and ImageNet, demonstrate that the transformed architecture by NAT significantly outperforms both its original form and those architectures optimized by existing methods.

preprint2020arXiv

The Shallow End: Empowering Shallower Deep-Convolutional Networks through Auxiliary Outputs

Depth is one of the key factors behind the success of convolutional neural networks (CNNs). Since ResNet, we are able to train very deep CNNs as the gradient vanishing issue has been largely addressed by the introduction of skip connections. However, we observe that, when the depth is very large, the intermediate layers (especially shallow layers) may fail to receive sufficient supervision from the loss due to the severe transformation through a long backpropagation path. As a result, the representation power of intermediate layers can be very weak and the model becomes very redundant with limited performance. In this paper, we first investigate the supervision vanishing issue in existing backpropagation (BP) methods. And then, we propose to address it via an effective method, called Multi-way BP (MW-BP), which relies on multiple auxiliary losses added to the intermediate layers of the network. The proposed MW-BP method can be applied to most deep architectures with slight modifications, such as ResNet and MobileNet. Our method often gives rise to much more compact models (denoted by "Mw+Architecture") than existing methods. For example, MwResNet-44 with 44 layers performs better than ResNet-110 with 110 layers on CIFAR-10 and CIFAR-100. More critically, the resultant models even outperform the light models obtained by state-of-the-art model compression methods. Last, our method inherently produces multiple compact models with different depths at the same time, which is helpful for model selection.

preprint2016arXiv

High-speed real-time single-pixel microscopy based on Fourier sampling

Single-pixel cameras based on the concepts of compressed sensing (CS) leverage the inherent structure of images to retrieve them with far fewer measurements and operate efficiently over a significantly broader spectral range than conventional silicon-based cameras. Recently, photonic time-stretch (PTS) technique facilitates the emergence of high-speed single-pixel cameras. A significant breakthrough in imaging speed of single-pixel cameras enables observation of fast dynamic phenomena. However, according to CS theory, image reconstruction is an iterative process that consumes enormous amounts of computational time and cannot be performed in real time. To address this challenge, we propose a novel single-pixel imaging technique that can produce high-quality images through rapid acquisition of their effective spatial Fourier spectrum. We employ phase-shifting sinusoidal structured illumination instead of random illumination for spectrum acquisition and apply inverse Fourier transform to the obtained spectrum for image restoration. We evaluate the performance of our prototype system by recognizing quick response (QR) codes and flow cytometric screening of cells. A frame rate of 625 kHz and a compression ratio of 10% are experimentally demonstrated in accordance with the recognition rate of the QR code. An imaging flow cytometer enabling high-content screening with an unprecedented throughput of 100,000 cells/s is also demonstrated. For real-time imaging applications, the proposed single-pixel microscope can significantly reduce the time required for image reconstruction by two orders of magnitude, which can be widely applied in industrial quality control and label-free biomedical imaging.

preprint2015arXiv

Fano Resonance in the Nonadiabatically Pumped Shot Noise of a Time-Dependent Quantum Well in 2DEG and Graphene

Interference between different quantum paths can generate Fano resonance. One of the examples is transport through a quasibound state driven by time-dependent scattering potential. Previously it is found that Fano resonance occurs as a result of energy matching in one-dimensional systems. In this work, we demonstrate that when transverse motion is present, Fano resonance occurs precisely at the wavevector matching situation. Using the Floquet scattering theory, we considered the transport properties of a nonadiabatic time-dependent well both in the 2DEG and monolayer graphene structure. Dispersion of the quasibound state of a static quantum well is obtained with transverse motion present. We found that Fano resonance occurs when the wavevector in the transport direction of one of the Floquet sidebands is exactly identical to that of the quasibound state in the well at equilibrium and follows the dispersion pattern of the latter. To observe the Fano resonance phenomenon in the transmission spectrum, we also considered the pumped shot noise properties when time and spatial symmetry secures vanishing current in the considered configuration. Prominent Fano resonance is found in the differential pumped shot noise to the reservoir Fermi energy.

preprint2013arXiv

Generation of a fully valley-polarized current in bulk graphene

The generation of a fully valley-polarized current (FVPC) in bulk graphene is a fundamental goal in valleytronics. To this end, we investigate valley-dependent transport through a strained graphene modulated by a finite magnetic superlattice. It is found that this device allows a coexistence of insulating transmission gap of one valley and metallic resonant band of the other. Accordingly, a substantial bulk FVPC appears in a wide range of edge orientation and temperature, which can be effectively tuned by structural parameters. A valley-resolved Hall configuration is designed to measure the valley polarization degree of the filtered current.

preprint2013arXiv

Negative differential resistances with back gate-controlled lowest operation windows in graphene double barrier resonant tunneling diodes

We theoretically investigate negative differential resistance (NDR) of massless and massive Dirac Fermions in double barrier resonant tunneling diodes based on sufficiently short and wide graphene strips. The current-voltage characteristics calculated in a rotated pseudospin space show that, the NDR feature only presents with appropriate structural parameters for the massless case and the peak-to-valley current ratio can be enhanced exponentially by a tunable band gap. Remarkably, the lowest NDR operation window is nearly structure-free and can be almost solely controlled by a back gate, which may have potential applications in NDR devices with the operation window as a crucial parameter.

preprint2012arXiv

Giant Goos-Hänchen Shift in Graphene Double-barrier Structures

We report giant Goos-Hänchen shifts [Goos and Hänchen, Ann. Phys. 436, 333 (1947)] for electron beams tunneling through graphene double barrier structures. We find that inside the transmission gap for the single barrier, the shift displays sharp peaks with magnitudes up to the order of electron beam width and rather small full-widths-at-half-maximum, which may be utilized to design valley and spin beam splitters with wide tunability and high energy resolution. We attribute the giant shifts to quasibound states in the structures. Moreover, an induced energy gap in the dispersion can increase the tunability and resolution of the splitters.

preprint2012arXiv

Two-dimensional group delay in graphene probed by spin precession measurements

We take graphene as an example to demonstrate that the present widely adopted expression is only the scattering component of a true 2D group delay in the condensed matter context, in which the spatial Goos-Hänchen (GH) shift along an interface contributes an intrinsic component. We relate the dwell time to spin precession and derive a relation between the 2D group delay and dwell time, whereby we for the first time reveal that, the group delay for 2D ballistic electronic systems can be directly observed by measuring a conductance difference in a weak-field spin precession experiment. This physical observable not only implies the group delay being a relevant quantity even in the condensed matter context, but also provides an experimental evidence for the intrinsic effect of the GH shift. Finally, we revisit the 2D Hartman effect, a central issue of the group delay, by analytically solving it via the vested relation and calculating the proposed observable at the Dirac point.

preprint2009arXiv

Enhanced spin injection efficiency in a four-terminal double quantum dot system

Within the scheme of quantum rate equations, we investigate the spin-resolved transport through a double quantum dot system with four ferromagnetic terminals. It is found that the injection efficiency of spin-polarized electrons can be significantly improved compared with single dot case. When the magnetization in one of four ferromagnetic terminals is antiparallel with the other three, the polarization rate of the current through one dot can be greatly enhanced, accompanied by the drastic decrease of the current polarization rate through the other one. The mechanism is the exchange interaction between electrons in the two quantum dots, which can be a promising candidate for the improvement of the spin injection efficiency.

Yong Guo

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

InsHuman: Towards Natural and Identity-Preserving Human Insertion

Improving Fine-tuning of Self-supervised Models with Contrastive Initialization

Improving Robustness by Enhancing Weak Subnets

Deep View Synthesis via Self-Consistent Generative Network

Pareto-Frontier-aware Neural Architecture Generation for Diverse Budgets

Towards Accurate and Compact Architectures via Neural Architecture Transformer

Breaking the Curse of Space Explosion: Towards Efficient NAS with Curriculum Search

Closed-loop Matters: Dual Regression Networks for Single Image Super-Resolution

Disturbance-immune Weight Sharing for Neural Architecture Search

Hierarchical Neural Architecture Search for Single Image Super-Resolution

Improving Generative Adversarial Networks with Local Coordinate Coding

Joint Wasserstein Distribution Matching

NAT: Neural Architecture Transformer for Accurate and Compact Architectures

The Shallow End: Empowering Shallower Deep-Convolutional Networks through Auxiliary Outputs

High-speed real-time single-pixel microscopy based on Fourier sampling

Fano Resonance in the Nonadiabatically Pumped Shot Noise of a Time-Dependent Quantum Well in 2DEG and Graphene

Generation of a fully valley-polarized current in bulk graphene

Negative differential resistances with back gate-controlled lowest operation windows in graphene double barrier resonant tunneling diodes

Giant Goos-Hänchen Shift in Graphene Double-barrier Structures

Two-dimensional group delay in graphene probed by spin precession measurements

Enhanced spin injection efficiency in a four-terminal double quantum dot system