Researcher profile

Di Huang

Di Huang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
31works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

31 published item(s)

preprint2026arXiv

PerCaM-Health: Personalized Dynamic Causal Graphs for Healthcare Reasoning

Personalized healthcare decisions require reasoning about how physiological and behavioral variables influence an individual patient over time. Existing temporal causal discovery methods are poorly matched to this setting: cohort-level models provide stable but non-personalized structures, while per-patient discovery is unreliable because individual trajectories are short, noisy, irregular, and non-stationary. This creates a fundamental gap between population-level causal modeling and the patient-specific, time-varying mechanisms needed for intervention reasoning. We introduce PerCaM-Health, a framework for learning personalized dynamic causal graphs from longitudinal health data. The framework learns a knowledge-guided population temporal graph, then conservatively adapts and evolves it using patient-specific temporal evidence and rolling-window updates, producing interpretable and auditable graph sequences. By coupling these graphs with temporal structural equations, the framework enables patient-level counterfactual queries, such as estimating short-horizon outcome changes under hypothetical behavioral interventions. Experiments on a semi-synthetic dynamic health benchmark show that PerCaM-Health improves graph recovery, dynamic edge tracking, and intervention direction accuracy compared to cohort-level, per-patient, and non-personalized temporal baselines. These results demonstrate that jointly modeling personalization and temporal evolution yields more reliable causal structure and intervention reasoning.

preprint2023arXiv

A First Search for Solar $^8$B Neutrino in the PandaX-4T Experiment using Neutrino-Nucleus Coherent Scattering

A search for interactions from solar $^8$B neutrinos elastically scattering off xenon nuclei using PandaX-4T commissioning data is reported. The energy threshold of this search is further lowered compared with the previous search for dark matter, with various techniques utilized to suppress the background that emerges from data with the lowered threshold. A blind analysis is performed on the data with an effective exposure of 0.48 tonne$\cdot$year, and no significant excess of events is observed. Among results obtained using the neutrino-nucleus coherent scattering, our results give the best constraint on the solar $^8$B neutrino flux. We further provide a more stringent limit on the cross section between dark matter and nucleon in the mass range from 3 to 9 GeV/c$^2$.

preprint2023arXiv

OnePose++: Keypoint-Free One-Shot Object Pose Estimation without CAD Models

We propose a new method for object pose estimation without CAD models. The previous feature-matching-based method OnePose has shown promising results under a one-shot setting which eliminates the need for CAD models or object-specific training. However, OnePose relies on detecting repeatable image keypoints and is thus prone to failure on low-textured objects. We propose a keypoint-free pose estimation pipeline to remove the need for repeatable keypoint detection. Built upon the detector-free feature matching method LoFTR, we devise a new keypoint-free SfM method to reconstruct a semi-dense point-cloud model for the object. Given a query image for object pose estimation, a 2D-3D matching network directly establishes 2D-3D correspondences between the query image and the reconstructed point-cloud model without first detecting keypoints in the image. Experiments show that the proposed pipeline outperforms existing one-shot CAD-model-free methods by a large margin and is comparable to CAD-model-based methods on LINEMOD even for low-textured objects. We also collect a new dataset composed of 80 sequences of 40 low-textured objects to facilitate future research on one-shot object pose estimation. The supplementary material, code and dataset are available on the project page: https://zju3dv.github.io/onepose_plus_plus/.

preprint2022arXiv

A Search for the Cosmic Ray Boosted Sub-GeV Dark Matter at the PandaX-II Experiment

We report a novel search for the cosmic ray boosted dark matter using the 100~tonne$\cdot$day full data set of the PandaX-II detector located at the China Jinping Underground Laboratory. With the extra energy gained from the cosmic rays, sub-GeV dark matter particles can produce visible recoil signals in the detector. The diurnal modulations in rate and energy spectrum are utilized to further enhance the signal sensitivity. Our result excludes the dark matter-nucleon elastic scattering cross section between 10$^{-31}$cm$^{2}$ and 10$^{-28}$cm$^{2}$ for a dark matter masses from 0.1 MeV/$c^2$ to 0.1 GeV/$c^2$, with a large parameter space previously unexplored by experimental collaborations.

preprint2022arXiv

A search for two-component Majorana dark matter in a simplified model using the full exposure data of PandaX-II experiment

In the two-component Majorana dark matter model, one dark matter particle can scatter off the target nuclei, and turn into a slightly heavier component. In the framework of a simplified model with a vector boson mediator, both the tree-level and loop-level processes contribute to the signal in direct detection experiment. In this paper, we report the search results for such dark matter from PandaX-II experiment, using total data of the full 100.7 tonne$\cdot$day exposure. No significant excess is observed, so strong constraints on the combined parameter space of mediator mass and dark matter mass are derived. With the complementary search results from collider experiments, a large range of parameter space can be excluded.

preprint2022arXiv

Beyond 3DMM: Learning to Capture High-fidelity 3D Face Shape

3D Morphable Model (3DMM) fitting has widely benefited face analysis due to its strong 3D priori. However, previous reconstructed 3D faces suffer from degraded visual verisimilitude due to the loss of fine-grained geometry, which is attributed to insufficient ground-truth 3D shapes, unreliable training strategies and limited representation power of 3DMM. To alleviate this issue, this paper proposes a complete solution to capture the personalized shape so that the reconstructed shape looks identical to the corresponding person. Specifically, given a 2D image as the input, we virtually render the image in several calibrated views to normalize pose variations while preserving the original image geometry. A many-to-one hourglass network serves as the encode-decoder to fuse multiview features and generate vertex displacements as the fine-grained geometry. Besides, the neural network is trained by directly optimizing the visual effect, where two 3D shapes are compared by measuring the similarity between the multiview images rendered from the shapes. Finally, we propose to generate the ground-truth 3D shapes by registering RGB-D images followed by pose and shape augmentation, providing sufficient data for network training. Experiments on several challenging protocols demonstrate the superior reconstruction accuracy of our proposal on the face shape.

preprint2022arXiv

CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection

In autonomous driving, LiDAR point-clouds and RGB images are two major data modalities with complementary cues for 3D object detection. However, it is quite difficult to sufficiently use them, due to large inter-modal discrepancies. To address this issue, we propose a novel framework, namely Contrastively Augmented Transformer for multi-modal 3D object Detection (CAT-Det). Specifically, CAT-Det adopts a two-stream structure consisting of a Pointformer (PT) branch, an Imageformer (IT) branch along with a Cross-Modal Transformer (CMT) module. PT, IT and CMT jointly encode intra-modal and inter-modal long-range contexts for representing an object, thus fully exploring multi-modal information for detection. Furthermore, we propose an effective One-way Multi-modal Data Augmentation (OMDA) approach via hierarchical contrastive learning at both the point and object levels, significantly improving the accuracy only by augmenting point-clouds, which is free from complex generation of paired samples of the two modalities. Extensive experiments on the KITTI benchmark show that CAT-Det achieves a new state-of-the-art, highlighting its effectiveness.

preprint2022arXiv

Entropy-based Active Learning for Object Detection with Progressive Diversity Constraint

Active learning is a promising alternative to alleviate the issue of high annotation cost in the computer vision tasks by consciously selecting more informative samples to label. Active learning for object detection is more challenging and existing efforts on it are relatively rare. In this paper, we propose a novel hybrid approach to address this problem, where the instance-level uncertainty and diversity are jointly considered in a bottom-up manner. To balance the computational complexity, the proposed approach is designed as a two-stage procedure. At the first stage, an Entropy-based Non-Maximum Suppression (ENMS) is presented to estimate the uncertainty of every image, which performs NMS according to the entropy in the feature space to remove predictions with redundant information gains. At the second stage, a diverse prototype (DivProto) strategy is explored to ensure the diversity across images by progressively converting it into the intra-class and inter-class diversities of the entropy-based class-specific prototypes. Extensive experiments are conducted on MS COCO and Pascal VOC, and the proposed approach achieves state of the art results and significantly outperforms the other counterparts, highlighting its superiority.

preprint2022arXiv

ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations

Precise representations of 3D faces are beneficial to various computer vision and graphics applications. Due to the data discretization and model linearity, however, it remains challenging to capture accurate identity and expression clues in current studies. This paper presents a novel 3D morphable face model, namely ImFace, to learn a nonlinear and continuous space with implicit neural representations. It builds two explicitly disentangled deformation fields to model complex shapes associated with identities and expressions, respectively, and designs an improved learning strategy to extend embeddings of expressions to allow more diverse changes. We further introduce a Neural Blend-Field to learn sophisticated details by adaptively blending a series of local fields. In addition to ImFace, an effective preprocessing pipeline is proposed to address the issue of watertight input requirement in implicit representations, enabling them to work with common facial surfaces for the first time. Extensive experiments are performed to demonstrate the superiority of ImFace.

preprint2022arXiv

Low Radioactive Material Screening and Background Control for the PandaX-4T Experiment

PandaX-4T is a ton-scale dark matter direct detection experiment using a dual-phase TPC technique at the China Jinping Underground Laboratory. Various ultra-low background technologies have been developed and applied to material screening for PandaX-4T, including HPGe gamma spectroscopy, ICP-MS, NAA, radon emanation measurement system, krypton assay station, and alpha detection system. Low background materials were selected to assemble the detector. Surface treatment procedures were investigated to further suppress radioactive background. Combining measured results and Monte Carlo simulation, the total material background rates of PandaX-4T in the energy region of 1-25 keV$\rm{}_{ee}$ are estimated to be (9.9 $\pm$ 1.9) $\times \ 10^{-3}$ mDRU for electron recoil and (2.8 $\pm$ 0.6) $\times \ 10^{-4}$ mDRU for nuclear recoil. In addition, $^{nat}$Kr in the detector is estimated to be <8 ppt.

preprint2022arXiv

Motion Sensitive Contrastive Learning for Self-supervised Video Representation

Contrastive learning has shown great potential in video representation learning. However, existing approaches fail to sufficiently exploit short-term motion dynamics, which are crucial to various down-stream video understanding tasks. In this paper, we propose Motion Sensitive Contrastive Learning (MSCL) that injects the motion information captured by optical flows into RGB frames to strengthen feature learning. To achieve this, in addition to clip-level global contrastive learning, we develop Local Motion Contrastive Learning (LMCL) with frame-level contrastive objectives across the two modalities. Moreover, we introduce Flow Rotation Augmentation (FRA) to generate extra motion-shuffled negative samples and Motion Differential Sampling (MDS) to accurately screen training samples. Extensive experiments on standard benchmarks validate the effectiveness of the proposed method. With the commonly-used 3D ResNet-18 as the backbone, we achieve the top-1 accuracies of 91.5\% on UCF101 and 50.3\% on Something-Something v2 for video classification, and a 65.6\% Top-1 Recall on UCF101 for video retrieval, notably improving the state-of-the-art.

preprint2022arXiv

Neural Program Synthesis with Query

Aiming to find a program satisfying the user intent given input-output examples, program synthesis has attracted increasing interest in the area of machine learning. Despite the promising performance of existing methods, most of their success comes from the privileged information of well-designed input-output examples. However, providing such input-output examples is unrealistic because it requires the users to have the ability to describe the underlying program with a few input-output examples under the training distribution. In this work, we propose a query-based framework that trains a query neural network to generate informative input-output examples automatically and interactively from a large query space. The quality of the query depends on the amount of the mutual information between the query and the corresponding program, which can guide the optimization of the query framework. To estimate the mutual information more accurately, we introduce the functional space (F-space) which models the relevance between the input-output examples and the programs in a differentiable way. We evaluate the effectiveness and generalization of the proposed query-based framework on the Karel task and the list processing task. Experimental results show that the query-based framework can generate informative input-output examples which achieve and even outperform well-designed input-output examples.

preprint2022arXiv

Neutron-induced nuclear recoil background in the PandaX-4T experiment

Neutron-induced nuclear recoil background is critical to the dark matter searches in the PandaX-4T liquid xenon experiment. This paper studies the feature of neutron background in liquid xenon and evaluates their contribution in the single scattering nuclear recoil events through three methods. The first method is fully Monte Carlo simulation based. The last two are data-driven methods that also use the multiple scattering signals and high energy signals in the data, respectively. In the PandaX-4T commissioning data with an exposure of 0.63 tonne-year, all these methods give a consistent result that there are $1.15\pm0.57$ neutron-induced background in dark matter signal region within an approximated nuclear recoil energy window between 5 and 100 keV.

preprint2022arXiv

Readout electronics and data acquisition system of PandaX-4T experiment

PandaX-4T is a dark matter direct detection experiment located in China jinping underground laboratory. The central apparatus is a dual-phase xenon detector containing 4 ton liquid xenon in the sensitive volume, with about 500 photomultipliers instrumented in the top and the bottom of the detector. In this paper we present a completely new system of readout electronics and data acquisition in the PandaX-4T experiment. Compared to the one used in the previous PandaX dark matter experiments, the new system features triggerless readout and higher bandwidth. With triggerless readout, dark matter searches are not affected by the efficiency loss of external triggers. The system records single photelectron signals of the dominant PMTs with an average efficiency of 96\%, and achieves the bandwidth of more than 450 MB/s. The system has been used to successfully acquire data during the commissioning runs of PandaX-4T.

preprint2022arXiv

STS: Surround-view Temporal Stereo for Multi-view 3D Detection

Learning accurate depth is essential to multi-view 3D object detection. Recent approaches mainly learn depth from monocular images, which confront inherent difficulties due to the ill-posed nature of monocular depth learning. Instead of using a sole monocular depth method, in this work, we propose a novel Surround-view Temporal Stereo (STS) technique that leverages the geometry correspondence between frames across time to facilitate accurate depth learning. Specifically, we regard the field of views from all cameras around the ego vehicle as a unified view, namely surroundview, and conduct temporal stereo matching on it. The resulting geometrical correspondence between different frames from STS is utilized and combined with the monocular depth to yield final depth prediction. Comprehensive experiments on nuScenes show that STS greatly boosts 3D detection ability, notably for medium and long distance objects. On BEVDepth with ResNet-50 backbone, STS improves mAP and NDS by 2.6% and 1.4%, respectively. Consistent improvements are observed when using a larger backbone and a larger image resolution, demonstrating its effectiveness

preprint2022arXiv

Study of background from accidental coincidence signals in the PandaX-II experiment

The PandaX-II experiment employed a 580kg liquid xenon detector to search for the interactions between dark matter particles and the target xenon atoms. The accidental coincidences of isolated signals result in a dangerous background which mimic the signature of the dark matter. We performed a detailed study on the accidental coincidence background in PandaX-II, including the possible origin of the isolated signals, the background level and corresponding background suppression method. With a boosted-decision-tree algorithm, the accidental coincidence background is reduced by 70% in the dark matter signal region, thus the sensitivity of dark matter search at PandaX-II is improved.

preprint2022arXiv

Target-Relevant Knowledge Preservation for Multi-Source Domain Adaptive Object Detection

Domain adaptive object detection (DAOD) is a promising way to alleviate performance drop of detectors in new scenes. Albeit great effort made in single source domain adaptation, a more generalized task with multiple source domains remains not being well explored, due to knowledge degradation during their combination. To address this issue, we propose a novel approach, namely target-relevant knowledge preservation (TRKP), to unsupervised multi-source DAOD. Specifically, TRKP adopts the teacher-student framework, where the multi-head teacher network is built to extract knowledge from labeled source domains and guide the student network to learn detectors in unlabeled target domain. The teacher network is further equipped with an adversarial multi-source disentanglement (AMSD) module to preserve source domain-specific knowledge and simultaneously perform cross-domain alignment. Besides, a holistic target-relevant mining (HTRM) scheme is developed to re-weight the source images according to the source-target relevance. By this means, the teacher network is enforced to capture target-relevant knowledge, thus benefiting decreasing domain shift when mentoring object detection in the target domain. Extensive experiments are conducted on various widely used benchmarks with new state-of-the-art scores reported, highlighting the effectiveness.

preprint2022arXiv

UFPMP-Det: Toward Accurate and Efficient Object Detection on Drone Imagery

This paper proposes a novel approach to object detection on drone imagery, namely Multi-Proxy Detection Network with Unified Foreground Packing (UFPMP-Det). To deal with the numerous instances of very small scales, different from the common solution that divides the high-resolution input image into quite a number of chips with low foreground ratios to perform detection on them each, the Unified Foreground Packing (UFP) module is designed, where the sub-regions given by a coarse detector are initially merged through clustering to suppress background and the resulting ones are subsequently packed into a mosaic for a single inference, thus significantly reducing overall time cost. Furthermore, to address the more serious confusion between inter-class similarities and intra-class variations of instances, which deteriorates detection performance but is rarely discussed, the Multi-Proxy Detection Network (MP-Det) is presented to model object distributions in a fine-grained manner by employing multiple proxy learning, and the proxies are enforced to be diverse by minimizing a Bag-of-Instance-Words (BoIW) guided optimal transport loss. By such means, UFPMP-Det largely promotes both the detection accuracy and efficiency. Extensive experiments are carried out on the widely used VisDrone and UAVDT datasets, and UFPMP-Det reports new state-of-the-art scores at a much higher speed, highlighting its advantages.

preprint2022arXiv

Video Anomaly Detection by Solving Decoupled Spatio-Temporal Jigsaw Puzzles

Video Anomaly Detection (VAD) is an important topic in computer vision. Motivated by the recent advances in self-supervised learning, this paper addresses VAD by solving an intuitive yet challenging pretext task, i.e., spatio-temporal jigsaw puzzles, which is cast as a multi-label fine-grained classification problem. Our method exhibits several advantages over existing works: 1) the spatio-temporal jigsaw puzzles are decoupled in terms of spatial and temporal dimensions, responsible for capturing highly discriminative appearance and motion features, respectively; 2) full permutations are used to provide abundant jigsaw puzzles covering various difficulty levels, allowing the network to distinguish subtle spatio-temporal differences between normal and abnormal events; and 3) the pretext task is tackled in an end-to-end manner without relying on any pre-trained models. Our method outperforms state-of-the-art counterparts on three public benchmarks. Especially on ShanghaiTech Campus, the result is superior to reconstruction and prediction-based methods by a large margin.

preprint2021arXiv

Dark Matter Search Results from the PandaX-4T Commissioning Run

We report the first dark matter search results using the commissioning data from PandaX-4T. Using a time projection chamber with 3.7-tonne of liquid xenon target and an exposure of 0.63 tonne$\cdot$year, 1058 candidate events are identified within an approximate nuclear recoil energy window between 5 and 100 keV. No significant excess over background is observed. Our data set a stringent limit to the dark matter-nucleon spin-independent interactions, with a lowest excluded cross section (90% C.L.) of $3.8\times10^{-47} $cm$^2$ at a dark matter mass of 30 GeV/$c^2$.

preprint2021arXiv

Internal Calibration of the PandaX-II Detector with Radon Gaseous Sources

We have developed a low-energy electron recoil (ER) calibration method with $^{220}$Rn for the PandaX-II detector. $^{220}$Rn, emanated from natural thorium compounds, was fed into the detector through the xenon purification system. From 2017 to 2019, we performed three dedicated calibration campaigns with different radon sources. We studied the detector response to $α$, $β$, and $γ$ particles with focus on low energy ER events. During the runs in 2017 and 2018, the amount of radioactivity of $^{222}$Rn were on the order of 1\% of that of $^{220}$Rn and thorium particulate contamination was negligible, especially in 2018. We also measured the background contribution from $^{214}$Pb for the first time in PandaX-II with the help from a $^{222}$Rn injection. Calibration strategy with $^{220}$Rn and $^{222}$Rn will be implemented in the upcoming PandaX-4T experiment and can be useful for other xenon-based detectors as well.

preprint2021arXiv

Light yield and field dependence measurement in PandaX-II dual-phase xenon detector

The dual-phase xenon time projection chamber (TPC) is one of the most sensitive detector technology for dark matter direct search, where the energy deposition of incoming particle can be converted into photons and electrons through xenon excitation and ionization. The detector response to signal energy deposition varies significantly with the electric field in liquid xenon. We study the detector&#39;s light yield and its dependence on the electric field in the PandaX-II dual-phase detector containing 580~kg liquid xenon in the sensitive volume. From our measurements, the light yield at electric fields from 0~V/cm to 317~V/cm is obtained for energy depositions up to 236~keV.

preprint2021arXiv

Results of Dark Matter Search using the Full PandaX-II Exposure

We report the dark matter search results obtained using the full 132 ton$\cdot$day exposure of the PandaX-II experiment, including all data from March 2016 to August 2018. No significant excess of events is identified above the expected background. Upper limits are set on the spin-independent dark matter-nucleon interactions. The lowest 90% confidence level exclusion on the spin-independent cross section is $2.2\times 10^{-46}$ cm$^2$ at a WIMP mass of 30 GeV/$c^2$.

preprint2020arXiv

Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers. Recent emerged quantization technique has been applied to inference of deep neural networks for fast and efficient execution. However, directly applying quantization in training can cause significant accuracy loss, thus remaining an open challenge.

preprint2020arXiv

ArduCode: Predictive Framework for Automation Engineering

Automation engineering is the task of integrating, via software, various sensors, actuators, and controls for automating a real-world process. Today, automation engineering is supported by a suite of software tools including integrated development environments (IDE), hardware configurators, compilers, and runtimes. These tools focus on the automation code itself, but leave the automation engineer unassisted in their decision making. This can lead to increased time for software development because of imperfections in decision making leading to multiple iterations between software and hardware. To address this, this paper defines multiple challenges often faced in automation engineering and propose solutions using machine learning to assist engineers tackle such challenges. We show that machine learning can be leveraged to assist the automation engineer in classifying automation, finding similar code snippets, and reasoning about the hardware selection of sensors and actuators. We validate our architecture on two real datasets consisting of 2,927 Arduino projects, and 683 Programmable Logic Controller (PLC) projects. Our results show that paragraph embedding techniques can be utilized to classify automation using code snippets with precision close to human annotation, giving an F1-score of 72%. Further, we show that such embedding techniques can help us find similar code snippets with high accuracy. Finally, we use autoencoder models for hardware recommendation and achieve a p@3 of 0.79 and p@5 of 0.95.

preprint2020arXiv

Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels

Performing controlled experiments on noisy data is essential in understanding deep learning across noise levels. Due to the lack of suitable datasets, previous research has only examined deep learning on controlled synthetic label noise, and real-world label noise has never been studied in a controlled setting. This paper makes three contributions. First, we establish the first benchmark of controlled real-world label noise from the web. This new benchmark enables us to study the web label noise in a controlled setting for the first time. The second contribution is a simple but effective method to overcome both synthetic and real noisy labels. We show that our method achieves the best result on our dataset as well as on two public benchmarks (CIFAR and WebVision). Third, we conduct the largest study by far into understanding deep neural networks trained on noisy labels across different noise levels, noise types, network architectures, and training settings. The data and code are released at the following link: http://www.lujiang.info/cnlw.html

preprint2020arXiv

Cross-domain Object Detection through Coarse-to-Fine Feature Adaptation

Recent years have witnessed great progress in deep learning based object detection. However, due to the domain shift problem, applying off-the-shelf detectors to an unseen domain leads to significant performance drop. To address such an issue, this paper proposes a novel coarse-to-fine feature adaptation approach to cross-domain object detection. At the coarse-grained stage, different from the rough image-level or instance-level feature alignment used in the literature, foreground regions are extracted by adopting the attention mechanism, and aligned according to their marginal distributions via multi-layer adversarial learning in the common feature space. At the fine-grained stage, we conduct conditional distribution alignment of foregrounds by minimizing the distance of global prototypes with the same category but from different domains. Thanks to this coarse-to-fine feature adaptation, domain knowledge in foreground regions can be effectively transferred. Extensive experiments are carried out in various cross-domain detection scenarios. The results are state-of-the-art, which demonstrate the broad applicability and effectiveness of the proposed approach.

preprint2020arXiv

DWM: A Decomposable Winograd Method for Convolution Acceleration

Winograd&#39;s minimal filtering algorithm has been widely used in Convolutional Neural Networks (CNNs) to reduce the number of multiplications for faster processing. However, it is only effective on convolutions with kernel size as 3x3 and stride as 1, because it suffers from significantly increased FLOPs and numerical accuracy problem for kernel size larger than 3x3 and fails on convolution with stride larger than 1. In this paper, we propose a novel Decomposable Winograd Method (DWM), which breaks through the limitation of original Winograd&#39;s minimal filtering algorithm to a wide and general convolutions. DWM decomposes kernels with large size or large stride to several small kernels with stride as 1 for further applying Winograd method, so that DWM can reduce the number of multiplications while keeping the numerical accuracy. It enables the fast exploring of larger kernel size and larger stride value in CNNs for high performance and accuracy and even the potential for new CNNs. Comparing against the original Winograd, the proposed DWM is able to support all kinds of convolutions with a speedup of ~2, without affecting the numerical accuracy.

preprint2020arXiv

Improving Object Detection with Selective Self-supervised Self-training

We study how to leverage Web images to augment human-curated object detection datasets. Our approach is two-pronged. On the one hand, we retrieve Web images by image-to-image search, which incurs less domain shift from the curated data than other search methods. The Web images are diverse, supplying a wide variety of object poses, appearances, their interactions with the context, etc. On the other hand, we propose a novel learning method motivated by two parallel lines of work that explore unlabeled data for image classification: self-training and self-supervised learning. They fail to improve object detectors in their vanilla forms due to the domain gap between the Web images and curated datasets. To tackle this challenge, we propose a selective net to rectify the supervision signals in Web images. It not only identifies positive bounding boxes but also creates a safe zone for mining hard negative boxes. We report state-of-the-art results on detecting backpacks and chairs from everyday scenes, along with other challenging object classes.

preprint2020arXiv

Multi-Scale Positive Sample Refinement for Few-Shot Object Detection

Few-shot object detection (FSOD) helps detectors adapt to unseen classes with few training instances, and is useful when manual annotation is time-consuming or data acquisition is limited. Unlike previous attempts that exploit few-shot classification techniques to facilitate FSOD, this work highlights the necessity of handling the problem of scale variations, which is challenging due to the unique sample distribution. To this end, we propose a Multi-scale Positive Sample Refinement (MPSR) approach to enrich object scales in FSOD. It generates multi-scale positive samples as object pyramids and refines the prediction at various scales. We demonstrate its advantage by integrating it as an auxiliary branch to the popular architecture of Faster R-CNN with FPN, delivering a strong FSOD solution. Several experiments are conducted on PASCAL VOC and MS COCO, and the proposed approach achieves state of the art results and significantly outperforms other counterparts, which shows its effectiveness. Code is available at https://github.com/jiaxi-wu/MPSR.

preprint2019arXiv

Searching for Neutrino-less Double Beta Decay of $^{136}$Xe with PandaX-II Liquid Xenon Detector

We report the Neutrino-less Double Beta Decay (NLDBD) search results from PandaX-II dual-phase liquid xenon time projection chamber. The total live time used in this analysis is 403.1 days from June 2016 to August 2018. With NLDBD-optimized event selection criteria, we obtain a fiducial mass of 219 kg of natural xenon. The accumulated xenon exposure is 242 kg$\cdot$yr, or equivalently 22.2 kg$\cdot$yr of $^{136}$Xe exposure. At the region around $^{136}$Xe decay Q-value of 2458 keV, the energy resolution of PandaX-II is 4.2%. We find no evidence of NLDBD in PandaX-II and establish a lower limit for decay half-life of 2.4 $ \times 10^{23} $ yr at the 90% confidence level, which corresponds to an effective Majorana neutrino mass $m_{ββ} < (1.3 - 3.5)$ eV. This is the first NLDBD result reported from a dual-phase xenon experiment.