Source author record

Xinjiang Wang

Xinjiang Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.mtrl-sci Machine Learning

Catalog footprint

What is connected

10works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

An Open and Comprehensive Pipeline for Unified Object Grounding and Detection

Grounding-DINO is a state-of-the-art open-set detection model that tackles multiple vision tasks including Open-Vocabulary Detection (OVD), Phrase Grounding (PG), and Referring Expression Comprehension (REC). Its effectiveness has led to its widespread adoption as a mainstream architecture for various downstream applications. However, despite its significance, the original Grounding-DINO model lacks comprehensive public technical details due to the unavailability of its training code. To bridge this gap, we present MM-Grounding-DINO, an open-source, comprehensive, and user-friendly baseline, which is built with the MMDetection toolbox. It adopts abundant vision datasets for pre-training and various detection and grounding datasets for fine-tuning. We give a comprehensive analysis of each reported result and detailed settings for reproduction. The extensive experiments on the benchmarks mentioned demonstrate that our MM-Grounding-DINO-Tiny outperforms the Grounding-DINO-Tiny baseline. We release all our models to the research community. Codes and trained models are released at https://github.com/open-mmlab/mmdetection/tree/main/configs/mm_grounding_dino.

preprint2022arXiv

Group R-CNN for Weakly Semi-supervised Object Detection with Points

We study the problem of weakly semi-supervised object detection with points (WSSOD-P), where the training data is combined by a small set of fully annotated images with bounding boxes and a large set of weakly-labeled images with only a single point annotated for each instance. The core of this task is to train a point-to-box regressor on well-labeled images that can be used to predict credible bounding boxes for each point annotation. We challenge the prior belief that existing CNN-based detectors are not compatible with this task. Based on the classic R-CNN architecture, we propose an effective point-to-box regressor: Group R-CNN. Group R-CNN first uses instance-level proposal grouping to generate a group of proposals for each point annotation and thus can obtain a high recall rate. To better distinguish different instances and improve precision, we propose instance-level proposal assignment to replace the vanilla assignment strategy adopted in the original R-CNN methods. As naive instance-level assignment brings converging difficulty, we propose instance-aware representation learning which consists of instance-aware feature enhancement and instance-aware parameter generation to overcome this issue. Comprehensive experiments on the MS-COCO benchmark demonstrate the effectiveness of our method. Specifically, Group R-CNN significantly outperforms the prior method Point DETR by 3.9 mAP with 5% well-labeled images, which is the most challenging scenario. The source code can be found at https://github.com/jshilong/GroupRCNN

preprint2022arXiv

Inorganic Crystal Structure Prototype Database based on Unsupervised Learning of Local Atomic Environments

Recognition of structure prototypes from tremendous known inorganic crystal structures has been an important subject beneficial for material science research and new materials design. The existing databases of inorganic crystal structure prototypes were mostly constructed by classifying materials in terms of the crystallographic space group information. Herein, we employed a distinct strategy to construct the inorganic crystal structure prototype database, relying on the classification of materials in terms of local atomic environments (LAE) accompanied by unsupervised machine learning method. Specifically, we adopted a hierarchical clustering approach onto all experimentally known inorganic crystal structures data to identify structure prototypes. The criterion for hierarchical clustering is the LAE represented by the state-of-the-art structure fingerprints of the improved bond-orientational order parameters and the smooth overlap of atomic positions. This allows us to build up a LAE-based Inorganic Crystal Structure Prototype Database (LAE-ICSPD) containing 15,613 structure prototypes with defined stoichiometries. In addition, we have developed a Structure Prototype Generator Infrastructure (SPGI) package, which is a useful toolkit for structure prototype generation. Our developed SPGI toolkit and LAE-ICSPD are beneficial for investigating inorganic materials in a global way as well as accelerating materials discovery process in the data-driven mode.

preprint2022arXiv

What Are Expected Queries in End-to-End Object Detection?

End-to-end object detection is rapidly progressed after the emergence of DETR. DETRs use a set of sparse queries that replace the dense candidate boxes in most traditional detectors. In comparison, the sparse queries cannot guarantee a high recall as dense priors. However, making queries dense is not trivial in current frameworks. It not only suffers from heavy computational cost but also difficult optimization. As both sparse and dense queries are imperfect, then \emph{what are expected queries in end-to-end object detection}? This paper shows that the expected queries should be Dense Distinct Queries (DDQ). Concretely, we introduce dense priors back to the framework to generate dense queries. A duplicate query removal pre-process is applied to these queries so that they are distinguishable from each other. The dense distinct queries are then iteratively processed to obtain final sparse outputs. We show that DDQ is stronger, more robust, and converges faster. It obtains 44.5 AP on the MS COCO detection dataset with only 12 epochs. DDQ is also robust as it outperforms previous methods on both object detection and instance segmentation tasks on various datasets. DDQ blends advantages from traditional dense priors and recent end-to-end detectors. We hope it can serve as a new baseline and inspires researchers to revisit the complementarity between traditional methods and end-to-end detectors. The source code is publicly available at \url{https://github.com/jshilong/DDQ}.

preprint2021arXiv

Understanding the wiring evolution in differentiable neural architecture search

Controversy exists on whether differentiable neural architecture search methods discover wiring topology effectively. To understand how wiring topology evolves, we study the underlying mechanism of several existing differentiable NAS frameworks. Our investigation is motivated by three observed searching patterns of differentiable NAS: 1) they search by growing instead of pruning; 2) wider networks are more preferred than deeper ones; 3) no edges are selected in bi-level optimization. To anatomize these phenomena, we propose a unified view on searching algorithms of existing frameworks, transferring the global optimization to local cost minimization. Based on this reformulation, we conduct empirical and theoretical analyses, revealing implicit inductive biases in the cost's assignment mechanism and evolution dynamics that cause the observed phenomena. These biases indicate strong discrimination towards certain topologies. To this end, we pose questions that future differentiable methods for neural wiring discovery need to confront, hoping to evoke a discussion and rethinking on how much bias has been enforced implicitly in existing NAS methods.

preprint2020arXiv

AdaX: Adaptive Gradient Descent with Exponential Long Term Memory

Although adaptive optimization algorithms such as Adam show fast convergence in many machine learning tasks, this paper identifies a problem of Adam by analyzing its performance in a simple non-convex synthetic problem, showing that Adam's fast convergence would possibly lead the algorithm to local minimums. To address this problem, we improve Adam by proposing a novel adaptive gradient descent algorithm named AdaX. Unlike Adam that ignores the past gradients, AdaX exponentially accumulates the long-term gradient information in the past during training, to adaptively tune the learning rate. We thoroughly prove the convergence of AdaX in both the convex and non-convex settings. Extensive experiments show that AdaX outperforms Adam in various tasks of computer vision and natural language processing and can catch up with Stochastic Gradient Descent.

preprint2020arXiv

How Does BN Increase Collapsed Neural Network Filters?

Improving sparsity of deep neural networks (DNNs) is essential for network compression and has drawn much attention. In this work, we disclose a harmful sparsifying process called filter collapse, which is common in DNNs with batch normalization (BN) and rectified linear activation functions (e.g. ReLU, Leaky ReLU). It occurs even without explicit sparsity-inducing regularizations such as $L_1$. This phenomenon is caused by the normalization effect of BN, which induces a non-trainable region in the parameter space and reduces the network capacity as a result. This phenomenon becomes more prominent when the network is trained with large learning rates (LR) or adaptive LR schedulers, and when the network is finetuned. We analytically prove that the parameters of BN tend to become sparser during SGD updates with high gradient noise and that the sparsifying probability is proportional to the square of learning rate and inversely proportional to the square of the scale parameter of BN. To prevent the undesirable collapsed filters, we propose a simple yet effective approach named post-shifted BN (psBN), which has the same representation ability as BN while being able to automatically make BN parameters trainable again as they saturate during training. With psBN, we can recover collapsed filters and increase the model performance in various tasks such as classification on CIFAR-10 and object detection on MS-COCO2017.

preprint2020arXiv

Scale-Equalizing Pyramid Convolution for Object Detection

Feature pyramid has been an efficient method to extract features at different scales. Development over this method mainly focuses on aggregating contextual information at different levels while seldom touching the inter-level correlation in the feature pyramid. Early computer vision methods extracted scale-invariant features by locating the feature extrema in both spatial and scale dimension. Inspired by this, a convolution across the pyramid level is proposed in this study, which is termed pyramid convolution and is a modified 3-D convolution. Stacked pyramid convolutions directly extract 3-D (scale and spatial) features and outperforms other meticulously designed feature fusion modules. Based on the viewpoint of 3-D convolution, an integrated batch normalization that collects statistics from the whole feature pyramid is naturally inserted after the pyramid convolution. Furthermore, we also show that the naive pyramid convolution, together with the design of RetinaNet head, actually best applies for extracting features from a Gaussian pyramid, whose properties can hardly be satisfied by a feature pyramid. In order to alleviate this discrepancy, we build a scale-equalizing pyramid convolution (SEPC) that aligns the shared pyramid convolution kernel only at high-level feature maps. Being computationally efficient and compatible with the head design of most single-stage object detectors, the SEPC module brings significant performance improvement ($>4$AP increase on MS-COCO2017 dataset) in state-of-the-art one-stage object detectors, and a light version of SEPC also has $\sim3.5$AP gain with only around 7% inference time increase. The pyramid convolution also functions well as a stand-alone module in two-stage object detectors and is able to improve the performance by $\sim2$AP. The source code can be found at https://github.com/jshilong/SEPC.

preprint2015arXiv

First-principles study of anisotropic thermoelectric transport properties of IV-VI semiconductor compounds SnSe and SnS

We conduct comprehensive investigations of both thermal and electrical transport properties of SnSe and SnS using first-principles calculations combined with the Boltzmann transport theory. Due to the distinct layered lattice structure, SnSe and SnS exhibit similarly anisotropic thermal and electrical behaviors. The cross-plane lattice thermal conductivity $κ_{L}$ is 40-60% lower than the in-plane values. Extremely low $κ_{L}$ is found for both materials because of high anharmonicity. It is suggested that nanostructuring would be difficult to further decrease $κ_{L}$ because of the short mean free paths of dominant phonon modes (1-30 nm at 300 K) while alloying would be efficient in reducing $κ_{L}$ considering that the relative $κ_{L}$ contribution ($\sim$ 65%) of optical phonons is remarkably large. On the electrical side, the anisotropic electrical conductivities are mainly due to the different effective masses of holes and electrons along the $a$, $b$ and $c$ axes. This leads to the highest optimal $ZT$ values along the $b$ axis and lowest ones along the $a$ axis in both $p$-type materials. However, the $n$-type ones exhibit the highest $ZT$s along the $a$ axis due to the enhancement of power factor when the chemical potential gradually approaches the secondary band valley that causes significant increase in electron mobility and density of states. SnSe exhibits larger optimal $ZT$s compared with SnS in both $p$-type and $n$-type materials. For both materials, the peak $ZT$s of $n$-type materials are much higher than those of $p$-type ones along the same direction. The predicted highest $ZT$ values at 750 K are 1.0 in SnSe and 0.6 in SnS along the $b$ axis for the $p$-type doping while those for the $n$-type doping reach 2.7 in SnSe and 1.5 in SnS along the $a$ axis, rendering them among the best bulk thermoelectric materials for large-scale applications.

preprint2015arXiv

Thermal conductivity of graphene mediated by strain and size

Based on first-principles calculations and full iterative solution of the linearized Boltzmann-Peierls transport equation for phonons within three-phonon scattering framework, we characterize the lattice thermal conductivities $κ$ of strained and unstrained graphene. We find $κ$ converges to 5450 W/m-K for infinite unstrained graphene, while $κ$ diverges for strained graphene with increasing system size at room temperature. The different $κ$ behaviors for these systems are further validated mathematically through phonon lifetime analysis. Flexural acoustic phonons are the dominant heat carriers in both unstrained and strained graphene within the temperature considered. Ultralong mean free paths of flexural phonons contribute to finite size effects on $κ$ for samples as large as 8 cm at room temperature. The calculated size-dependent and temperature-dependent $κ$ for finite samples agree well with experimental data, demonstrating the ability of the present approach to predict $κ$ of larger graphene sample. Tensile strain hardens the flexural modes and increases their lifetimes, causing interesting dependence of $κ$ on sample size and strain due to the competition between boundary scattering and intrinsic phonon-phonon scattering. These findings shed light on the nature of thermal transport in two-dimensional materials and may guide predicting and engineering $κ$ of graphene by varying strain and size.

Xinjiang Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

An Open and Comprehensive Pipeline for Unified Object Grounding and Detection

Group R-CNN for Weakly Semi-supervised Object Detection with Points

Inorganic Crystal Structure Prototype Database based on Unsupervised Learning of Local Atomic Environments

What Are Expected Queries in End-to-End Object Detection?

Understanding the wiring evolution in differentiable neural architecture search

AdaX: Adaptive Gradient Descent with Exponential Long Term Memory

How Does BN Increase Collapsed Neural Network Filters?

Scale-Equalizing Pyramid Convolution for Object Detection

First-principles study of anisotropic thermoelectric transport properties of IV-VI semiconductor compounds SnSe and SnS

Thermal conductivity of graphene mediated by strain and size