Source author record

Chao Ma

Chao Ma appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

47works

28topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Unified Map Prior Encoder for Mapping and Planning

Online mapping and end-to-end (E2E) planning in autonomous driving remain largely sensor-centric, leaving rich map priors, including HD/SD vector maps, rasterized SD maps, and satellite imagery, underused because of heterogeneity, pose drift, and inconsistent availability at test time. We present UMPE, a Unified Map Prior Encoder that can ingest any subset of four priors and fuse them with BEV features for both mapping and planning. UMPE has two branches. The vector encoder pre-aligns HD/SD polylines with a frame-wise SE(2) correction, encodes points via multi-frequency sinusoidal features, and produces polyline tokens with confidence scores. BEV queries then apply cross-attention with confidence bias, followed by normalized channel-wise gating to avoid length imbalance and softly down-weight uncertain sources. The raster encoder shares a ResNet-18 backbone conditioned by FiLM with scaling and shift at every stage, performs SE(2) micro-alignment, and injects priors through zero-initialized residual fusion, so the network starts from a do-no-harm baseline and learns to add only useful prior evidence. A vector-then-raster fusion order reflects the inductive bias of geometry first, appearance second. On nuScenes mapping, UMPE lifts MapTRv2 from 61.5 to 67.4 mAP (+5.9) and MapQR from 66.4 to 71.7 mAP (+5.3). On Argoverse2, UMPE adds +4.1 mAP over strong baselines. UMPE is compositional: when trained with all priors, it outperforms single-prior models even when only one prior is available at test time, demonstrating powerset robustness. For E2E planning with the VAD backbone on nuScenes, UMPE reduces trajectory error from 0.72 to 0.42 m L2 on average (-0.30 m) and collision rate from 0.22% to 0.12% (-0.10%), surpassing recent prior-injection methods. These results show that a unified, alignment-aware treatment of heterogeneous map priors yields better mapping and better planning.

preprint2022arXiv

AiATrack: Attention in Attention for Transformer Visual Tracking

Transformer trackers have achieved impressive advancements recently, where the attention mechanism plays an important role. However, the independent correlation computation in the attention mechanism could result in noisy and ambiguous attention weights, which inhibits further performance improvement. To address this issue, we propose an attention in attention (AiA) module, which enhances appropriate correlations and suppresses erroneous ones by seeking consensus among all correlation vectors. Our AiA module can be readily applied to both self-attention blocks and cross-attention blocks to facilitate feature aggregation and information propagation for visual tracking. Moreover, we propose a streamlined Transformer tracking framework, dubbed AiATrack, by introducing efficient feature reuse and target-background embeddings to make full use of temporal references. Experiments show that our tracker achieves state-of-the-art performance on six tracking benchmarks while running at a real-time speed.

preprint2022arXiv

Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes

A quadratic approximation of neural network loss landscapes has been extensively used to study the optimization process of these networks. Though, it usually holds in a very small neighborhood of the minimum, it cannot explain many phenomena observed during the optimization process. In this work, we study the structure of neural network loss functions and its implication on optimization in a region beyond the reach of a good quadratic approximation. Numerically, we observe that neural network loss functions possesses a multiscale structure, manifested in two ways: (1) in a neighborhood of minima, the loss mixes a continuum of scales and grows subquadratically, and (2) in a larger region, the loss shows several separate scales clearly. Using the subquadratic growth, we are able to explain the Edge of Stability phenomenon [5] observed for the gradient descent (GD) method. Using the separate scales, we explain the working mechanism of learning rate decay by simple examples. Finally, we study the origin of the multiscale structure and propose that the non-convexity of the models and the non-uniformity of training data is one of the causes. By constructing a two-layer neural network problem we show that training data with different magnitudes give rise to different scales of the loss function, producing subquadratic growth and multiple separate scales.

preprint2022arXiv

BSODA: A Bipartite Scalable Framework for Online Disease Diagnosis

A growing number of people are seeking healthcare advice online. Usually, they diagnose their medical conditions based on the symptoms they are experiencing, which is also known as self-diagnosis. From the machine learning perspective, online disease diagnosis is a sequential feature (symptom) selection and classification problem. Reinforcement learning (RL) methods are the standard approaches to this type of tasks. Generally, they perform well when the feature space is small, but frequently become inefficient in tasks with a large number of features, such as the self-diagnosis. To address the challenge, we propose a non-RL Bipartite Scalable framework for Online Disease diAgnosis, called BSODA. BSODA is composed of two cooperative branches that handle symptom-inquiry and disease-diagnosis, respectively. The inquiry branch determines which symptom to collect next by an information-theoretic reward. We employ a Product-of-Experts encoder to significantly improve the handling of partial observations of a large number of features. Besides, we propose several approximation methods to substantially reduce the computational cost of the reward to a level that is acceptable for online services. Additionally, we leverage the diagnosis model to estimate the reward more precisely. For the diagnosis branch, we use a knowledge-guided self-attention model to perform predictions. In particular, BSODA determines when to stop inquiry and output predictions using both the inquiry and diagnosis models. We demonstrate that BSODA outperforms the state-of-the-art methods on several public datasets. Moreover, we propose a novel evaluation method to test the transferability of symptom checking methods from synthetic to real-world tasks. Compared to existing RL baselines, BSODA is more effectively scalable to large search spaces.

preprint2022arXiv

Continual Learning for Blind Image Quality Assessment

The explosive growth of image data facilitates the fast development of image processing and computer vision methods for emerging visual applications, meanwhile introducing novel distortions to the processed images. This poses a grand challenge to existing blind image quality assessment (BIQA) models, failing to continually adapt to such subpopulation shift. Recent work suggests training BIQA methods on the combination of all available human-rated IQA datasets. However, this type of approach is not scalable to a large number of datasets, and is cumbersome to incorporate a newly created dataset as well. In this paper, we formulate continual learning for BIQA, where a model learns continually from a stream of IQA datasets, building on what was learned from previously seen data. We first identify five desiderata in the new setting with a measure to quantify the plasticity-stability trade-off. We then propose a simple yet effective method for learning BIQA models continually. Specifically, based on a shared backbone network, we add a prediction head for a new dataset, and enforce a regularizer to allow all prediction heads to evolve with new data while being resistant to catastrophic forgetting of old data. We compute the quality score by an adaptive weighted summation of estimates from all prediction heads. Extensive experiments demonstrate the promise of the proposed continual learning method in comparison to standard training techniques for BIQA. We made the code publicly available at https://github.com/zwx8981/BIQA_CL.

preprint2022arXiv

Correcting Convexity Bias in Function and Functional Estimate

A general framework with a series of different methods is proposed to improve the estimate of convex function (or functional) values when only noisy observations of the true input are available. Technically, our methods catch the bias introduced by the convexity and remove this bias from a baseline estimate. Theoretical analysis are conducted to show that the proposed methods can strictly reduce the expected estimate error under mild conditions. When applied, the methods require no specific knowledge about the problem except the convexity and the evaluation of the function. Therefore, they can serve as off-the-shelf tools to obtain good estimate for a wide range of problems, including optimization problems with random objective functions or constraints, and functionals of probability distributions such as the entropy and the Wasserstein distance. Numerical experiments on a wide variety of problems show that our methods can significantly improve the quality of the estimate compared with the baseline method.

preprint2022arXiv

Coverage Analysis for Cellular-Connected Random 3D Mobile UAVs with Directional Antennas

This letter proposes an analytical framework to evaluate the coverage performance of a cellular-connected unmanned aerial vehicle (UAV) network in which UAV user equipments (UAV-UEs) are equipped with directional antennas and move according to a three-dimensional (3D) mobility model. The ground base stations (GBSs) equipped with practical down-tilted antennas are distributed according to a Poisson point process (PPP). With tools from stochastic geometry, we derive the handover probability and coverage probability of a random UAV-UE under the strongest average received signal strength (RSS) association strategy. The proposed analytical framework allows to investigate the effect of UAV-UE antenna beamwidth, mobility speed, cell association, and vertical motions on both the handover probability and coverage probability. We conclude that the optimal UAV-UE antenna beamwidth decreases with the GBS density, and the omnidirectional antenna model is preferred in the sparse network scenario. What's more, the superiority of the strongest average RSS association over the nearest association diminishes with the increment of GBS density.

preprint2022arXiv

Deep End-to-end Causal Inference

Causal inference is essential for data-driven decision making across domains such as business engagement, medical treatment and policy making. However, research on causal discovery has evolved separately from inference methods, preventing straight-forward combination of methods from both fields. In this work, we develop Deep End-to-end Causal Inference (DECI), a single flow-based non-linear additive noise model that takes in observational data and can perform both causal discovery and inference, including conditional average treatment effect (CATE) estimation. We provide a theoretical guarantee that DECI can recover the ground truth causal graph under standard causal discovery assumptions. Motivated by application impact, we extend this model to heterogeneous, mixed-type data with missing values, allowing for both continuous and discrete treatment decisions. Our results show the competitive performance of DECI when compared to relevant baselines for both causal discovery and (C)ATE estimation in over a thousand experiments on both synthetic datasets and causal machine learning benchmarks across data-types and levels of missingness.

preprint2022arXiv

Depth-Adapted CNNs for RGB-D Semantic Segmentation

Recent RGB-D semantic segmentation has motivated research interest thanks to the accessibility of complementary modalities from the input side. Existing works often adopt a two-stream architecture that processes photometric and geometric information in parallel, with few methods explicitly leveraging the contribution of depth cues to adjust the sampling position on RGB images. In this paper, we propose a novel framework to incorporate the depth information in the RGB convolutional neural network (CNN), termed Z-ACN (Depth-Adapted CNN). Specifically, our Z-ACN generates a 2D depth-adapted offset which is fully constrained by low-level features to guide the feature extraction on RGB images. With the generated offset, we introduce two intuitive and effective operations to replace basic CNN operators: depth-adapted convolution and depth-adapted average pooling. Extensive experiments on both indoor and outdoor semantic segmentation tasks demonstrate the effectiveness of our approach.

preprint2022arXiv

Direct Measurement of Topological Number by Quench Dynamics

The measurement of topological number is crucial in the research of topological systems. Recently, the relations between the topological number and the dynamics are built. But a direct method to read out the topological number via the dynamics is still lacking. In this work, we propose a new dynamical protocol to directly measure the topological number of an unknown system. Different from common quench operations, we change the Hamiltonian of the unknown system to another one with known topological properties. After the quench, different initial states result in different particle number distributions on the post-quench final Bloch bands. Such distributions depend on the wavefunction overlap between the initial Bloch state and the final Bloch state, which is a complex number depending on the momentum. We prove a theorem that when the momentum varies by $2π$, the phase of the wavefunction overlap change by $Δnπ$ where $Δn$ is the topological number difference between the initial Bloch band and the final Bloch band. Based on this and the known topological number of the final Bloch band, we can directly deduce the topological number of the initial state from the particle number distribution and need not track the evolution of the system nor measure the spin texture. Two experimental schemes are also proposed as well. These schemes provide a convenient and robust measurement method and also deepens the understanding of the relation between topology and dynamics.

preprint2022arXiv

Exploring Frequency Adversarial Attacks for Face Forgery Detection

Various facial manipulation techniques have drawn serious public concerns in morality, security, and privacy. Although existing face forgery classifiers achieve promising performance on detecting fake images, these methods are vulnerable to adversarial examples with injected imperceptible perturbations on the pixels. Meanwhile, many face forgery detectors always utilize the frequency diversity between real and fake faces as a crucial clue. In this paper, instead of injecting adversarial perturbations into the spatial domain, we propose a frequency adversarial attack method against face forgery detectors. Concretely, we apply discrete cosine transform (DCT) on the input images and introduce a fusion module to capture the salient region of adversary in the frequency domain. Compared with existing adversarial attacks (e.g. FGSM, PGD) in the spatial domain, our method is more imperceptible to human observers and does not degrade the visual quality of the original images. Moreover, inspired by the idea of meta-learning, we also propose a hybrid adversarial attack that performs attacks in both the spatial and frequency domains. Extensive experiments indicate that the proposed method fools not only the spatial-based detectors but also the state-of-the-art frequency-based detectors effectively. In addition, the proposed frequency attack enhances the transferability across face forgery detectors as black-box attacks.

preprint2022arXiv

Facial Geometric Detail Recovery via Implicit Representation

Learning a dense 3D model with fine-scale details from a single facial image is highly challenging and ill-posed. To address this problem, many approaches fit smooth geometries through facial prior while learning details as additional displacement maps or personalized basis. However, these techniques typically require vast datasets of paired multi-view data or 3D scans, whereas such datasets are scarce and expensive. To alleviate heavy data dependency, we present a robust texture-guided geometric detail recovery approach using only a single in-the-wild facial image. More specifically, our method combines high-quality texture completion with the powerful expressiveness of implicit surfaces. Initially, we inpaint occluded facial parts, generate complete textures, and build an accurate multi-view dataset of the same subject. In order to estimate the detailed geometry, we define an implicit signed distance function and employ a physically-based implicit renderer to reconstruct fine geometric details from the generated multi-view images. Our method not only recovers accurate facial details but also decomposes normals, albedos, and shading parts in a self-supervised way. Finally, we register the implicit shape details to a 3D Morphable Model template, which can be used in traditional modeling and rendering pipelines. Extensive experiments demonstrate that the proposed approach can reconstruct impressive facial details from a single image, especially when compared with state-of-the-art methods trained on large datasets.

preprint2022arXiv

H2-Stereo: High-Speed, High-Resolution Stereoscopic Video System

High-speed, high-resolution stereoscopic (H2-Stereo) video allows us to perceive dynamic 3D content at fine granularity. The acquisition of H2-Stereo video, however, remains challenging with commodity cameras. Existing spatial super-resolution or temporal frame interpolation methods provide compromised solutions that lack temporal or spatial details, respectively. To alleviate this problem, we propose a dual camera system, in which one camera captures high-spatial-resolution low-frame-rate (HSR-LFR) videos with rich spatial details, and the other captures low-spatial-resolution high-frame-rate (LSR-HFR) videos with smooth temporal details. We then devise a Learned Information Fusion network (LIFnet) that exploits the cross-camera redundancies to enhance both camera views to high spatiotemporal resolution (HSTR) for reconstructing the H2-Stereo video effectively. We utilize a disparity network to transfer spatiotemporal information across views even in large disparity scenes, based on which, we propose disparity-guided flow-based warping for LSR-HFR view and complementary warping for HSR-LFR view. A multi-scale fusion method in feature domain is proposed to minimize occlusion-induced warping ghosts and holes in HSR-LFR view. The LIFnet is trained in an end-to-end manner using our collected high-quality Stereo Video dataset from YouTube. Extensive experiments demonstrate that our model outperforms existing state-of-the-art methods for both views on synthetic data and camera-captured real data with large disparity. Ablation studies explore various aspects, including spatiotemporal resolution, camera baseline, camera desynchronization, long/short exposures and applications, of our system to fully understand its capability for potential applications.

preprint2022arXiv

PillarNet: Real-Time and High-Performance Pillar-based 3D Object Detection

Real-time and high-performance 3D object detection is of critical importance for autonomous driving. Recent top-performing 3D object detectors mainly rely on point-based or 3D voxel-based convolutions, which are both computationally inefficient for onboard deployment. In contrast, pillar-based methods use solely 2D convolutions, which consume less computation resources, but they lag far behind their voxel-based counterparts in detection accuracy. In this paper, by examining the primary performance gap between pillar- and voxel-based detectors, we develop a real-time and high-performance pillar-based detector, dubbed PillarNet.The proposed PillarNet consists of a powerful encoder network for effective pillar feature learning, a neck network for spatial-semantic feature fusion and the commonly used detect head. Using only 2D convolutions, PillarNet is flexible to an optional pillar size and compatible with classical 2D CNN backbones, such as VGGNet and ResNet. Additionally, PillarNet benefits from our designed orientation-decoupled IoU regression loss along with the IoU-aware prediction branch. Extensive experimental results on the large-scale nuScenes Dataset and Waymo Open Dataset demonstrate that the proposed PillarNet performs well over state-of-the-art 3D detectors in terms of effectiveness and efficiency. Code is available at \url{https://github.com/agent-sgs/PillarNet}.

preprint2022arXiv

Provably convergent quasistatic dynamics for mean-field two-player zero-sum games

In this paper, we study the problem of finding mixed Nash equilibrium for mean-field two-player zero-sum games. Solving this problem requires optimizing over two probability distributions. We consider a quasistatic Wasserstein gradient flow dynamics in which one probability distribution follows the Wasserstein gradient flow, while the other one is always at the equilibrium. Theoretical analysis are conducted on this dynamics, showing its convergence to the mixed Nash equilibrium under mild conditions. Inspired by the continuous dynamics of probability distributions, we derive a quasistatic Langevin gradient descent method with inner-outer iterations, and test the method on different problems, including training mixture of GANs.

preprint2022arXiv

Removing Rain Streaks via Task Transfer Learning

Due to the difficulty in collecting paired real-world training data, image deraining is currently dominated by supervised learning with synthesized data generated by e.g., Photoshop rendering. However, the generalization to real rainy scenes is usually limited due to the gap between synthetic and real-world data. In this paper, we first statistically explore why the supervised deraining models cannot generalize well to real rainy cases, and find the substantial difference of synthetic and real rainy data. Inspired by our studies, we propose to remove rain by learning favorable deraining representations from other connected tasks. In connected tasks, the label for real data can be easily obtained. Hence, our core idea is to learn representations from real data through task transfer to improve deraining generalization. We thus term our learning strategy as \textit{task transfer learning}. If there are more than one connected tasks, we propose to reduce model size by knowledge distillation. The pretrained models for the connected tasks are treated as teachers, all their knowledge is distilled to a student network, so that we reduce the model size, meanwhile preserve effective prior representations from all the connected tasks. At last, the student network is fine-tuned with minority of paired synthetic rainy data to guide the pretrained prior representations to remove rain. Extensive experiments demonstrate that proposed task transfer learning strategy is surprisingly successful and compares favorably with state-of-the-art supervised learning methods and apparently surpass other semi-supervised deraining methods on synthetic data. Particularly, it shows superior generalization over them to real-world scenes.

preprint2022arXiv

Temperature-induced dephasing in high-order harmonic generation from solids

High harmonic generation (HHG) in solid and gaseous targets has been proven to be a powerful avenue for the generation of attosecond pulses, whereas the influence of electron-phonon scattering on HHG is a critical outstanding problem. Here we first introduce a temperature dependent lattice vibration model by characterizing the spacing fluctuation. Our results reveal that (i) structural disorder induced by lattice vibration does not lead to generation of even-order harmonics; (ii) dephasing of HHG occurs as the lattice temperature is growing; (iii) an open-trajectory picture predicts the maximal photon energy in the temperature-dependent HHG spectra. Moreover, a formula assessing dephasing time with lattice temperature is proposed to identify the timescale of electron-phonon scattering. This work paves a way to study non-Born-Oppenheimer effect in solids driven by strong field.

preprint2022arXiv

Wireless Real-Time Capacitance Readout Based on Perturbed Nonlinear Parity-Time Symmetry

In this article, we report a vector-network-analyzer-free and real-time LC wireless capacitance readout system based on perturbed nonlinear parity-time (PT) symmetry. The system is composed of two inductively coupled reader-sensor parallel RLC resonators with gain and loss respectively. By searching for the real mode that requires the minimum saturation gain, the steady-state frequency evolution as a function of the sensor capacitance perturbation is analytically deduced. The proposed system can work in different modes by setting different perturbation point. In particular, at the exceptional point of PT symmetry, the system exhibits high sensitivity. Experimental demonstrations revealed the viability of the proposed readout mechanism by measuring the steady-state frequency of the reader resonator in response to the change of trimmer capacitor on the sensor side. Our findings could impact many emerging applications such as implantable medical device for health monitoring, parameter detection in harsh environment and sealed food packages, etc.

preprint2021arXiv

Deep learning-based GTV contouring modeling inter- and intra- observer variability in sarcomas

Background and purpose: The delineation of the gross tumor volume (GTV) is a critical step for radiation therapy treatment planning. The delineation procedure is typically performed manually which exposes two major issues: cost and reproducibility. Delineation is a time-consuming process that is subject to inter- and intra-observer variability. While methods have been proposed to predict GTV contours, typical approaches ignore variability and therefore fail to utilize the valuable confidence information offered by multiple contours. Materials and methods: In this work we propose an automatic GTV contouring method for soft-tissue sarcomas from X-ray computed tomography (CT) images, using deep learning by integrating inter- and intra-observer variability in the learned model. Sixty-eight patients with soft tissue and bone sarcomas were considered in this evaluation, all underwent pre-operative CT imaging used to perform GTV delineation. Four radiation oncologists and radiologists performed three contouring trials each for all patients. We quantify variability by defining confidence levels based on the frequency of inclusion of a given voxel into the GTV and use a deep convolutional neural network to learn GTV confidence maps. Results: Results were compared to confidence maps from the four readers as well as ground-truth consensus contours established jointly by all readers. The resulting continuous Dice score between predicted and true confidence maps was 87% and the Hausdorff distance was 14 mm. Conclusion: Results demonstrate the ability of the proposed method to predict accurate contours while utilizing variability and as such it can be used to improve clinical workflow.

preprint2021arXiv

The Generalization Error of the Minimum-norm Solutions for Over-parameterized Neural Networks

We study the generalization properties of minimum-norm solutions for three over-parametrized machine learning models including the random feature model, the two-layer neural network model and the residual network model. We proved that for all three models, the generalization error for the minimum-norm solution is comparable to the Monte Carlo rate, up to some logarithmic terms, as long as the models are sufficiently over-parametrized.

Chao Ma

What is connected

Connect this record

See the researcher in context

Building this map preview

47 published item(s)

Unified Map Prior Encoder for Mapping and Planning

AiATrack: Attention in Attention for Transformer Visual Tracking

Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes

BSODA: A Bipartite Scalable Framework for Online Disease Diagnosis

Continual Learning for Blind Image Quality Assessment

Correcting Convexity Bias in Function and Functional Estimate

Coverage Analysis for Cellular-Connected Random 3D Mobile UAVs with Directional Antennas

Deep End-to-end Causal Inference

Depth-Adapted CNNs for RGB-D Semantic Segmentation

Direct Measurement of Topological Number by Quench Dynamics

Exploring Frequency Adversarial Attacks for Face Forgery Detection

Facial Geometric Detail Recovery via Implicit Representation

H2-Stereo: High-Speed, High-Resolution Stereoscopic Video System

PillarNet: Real-Time and High-Performance Pillar-based 3D Object Detection

Provably convergent quasistatic dynamics for mean-field two-player zero-sum games

Removing Rain Streaks via Task Transfer Learning

Temperature-induced dephasing in high-order harmonic generation from solids

Wireless Real-Time Capacitance Readout Based on Perturbed Nonlinear Parity-Time Symmetry

Deep learning-based GTV contouring modeling inter- and intra- observer variability in sarcomas

The Generalization Error of the Minimum-norm Solutions for Over-parameterized Neural Networks

A Comparative Analysis of the Optimization and Generalization Property of Two-layer Neural Network and Random Feature Models Under Gradient Descent Dynamics

A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth

A Note on Parallel Distinguishability of two Quantum Operations

A Priori Estimates of the Population Risk for Two-layer Neural Networks

Accelerating MRI Reconstruction on TPUs

Complexity Measures for Neural Networks with General Activation Functions Using Path-based Norms

Cross-Modality 3D Object Detection

Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks

DGL-KE: Training Knowledge Graph Embeddings at Scale

Long-lived and disorder-free charge transfer states enable endothermic charge separation in efficient non-fullerene organic solar cells

MR-Based PET Attenuation Correction using a Combined Ultrashort Echo Time/Multi-Echo Dixon Acquisition

Rethinking Image Deraining via Rain Streaks and Vapors

Robust Tracking against Adversarial Attacks

See More, Know More: Unsupervised Video Object Segmentation with Co-Attention Siamese Networks

Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering

The Quenching-Activation Behavior of the Gradient Descent Dynamics for Two-layer Neural Network Models

The Slow Deterioration of the Generalization Error of the Random Feature Model

Two-stage model and optimal SI-SNR for monaural multi-speaker speech separation in noisy environment

Unsupervised Deep Representation Learning for Real-Time Tracking

VAEM: a Deep Generative Model for Heterogeneous Mixed Type Data

Anomalous Interlayer Exciton Diffusion in Twist-Angle-Dependent Moiré Potentials of WS$_2$-WSe$_2$ Heterobilayers

Real-Time Correlation Tracking via Joint Model Compression and Transfer

Deep Extreme Feature Extraction: New MVA Method for Searching Particles in High Energy Physics

Learning a No-Reference Quality Metric for Single-Image Super-Resolution

Superconducting dome and microstructure properties of Rb0.8Fe1.6+xSe2 superconductors

Strong Coupling of the Iron-Quadrupole and Anion-Dipole Polarizations in Ba(Fe$_{1-x}$Co$_x$)$_2$As$_2$

The role of particle interactions in a many-body model of Feshbach molecular formation in bosonic systems