Researcher profile

Xiaokang Chen

Xiaokang Chen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

General reasoning represents a long-standing and formidable challenge in artificial intelligence. Recent breakthroughs, exemplified by large language models (LLMs) and chain-of-thought prompting, have achieved considerable success on foundational reasoning tasks. However, this success is heavily contingent upon extensive human-annotated demonstrations, and models' capabilities are still insufficient for more complex problems. Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labeled reasoning trajectories. The proposed RL framework facilitates the emergent development of advanced reasoning patterns, such as self-reflection, verification, and dynamic strategy adaptation. Consequently, the trained model achieves superior performance on verifiable tasks such as mathematics, coding competitions, and STEM fields, surpassing its counterparts trained via conventional supervised learning on human demonstrations. Moreover, the emergent reasoning patterns exhibited by these large-scale models can be systematically harnessed to guide and enhance the reasoning capabilities of smaller models.

preprint2022arXiv

Conditional DETR V2: Efficient Detection Transformer with Box Queries

In this paper, we are interested in Detection Transformer (DETR), an end-to-end object detection approach based on a transformer encoder-decoder architecture without hand-crafted postprocessing, such as NMS. Inspired by Conditional DETR, an improved DETR with fast training convergence, that presented box queries (originally called spatial queries) for internal decoder layers, we reformulate the object query into the format of the box query that is a composition of the embeddings of the reference point and the transformation of the box with respect to the reference point. This reformulation indicates the connection between the object query in DETR and the anchor box that is widely studied in Faster R-CNN. Furthermore, we learn the box queries from the image content, further improving the detection quality of Conditional DETR still with fast training convergence. In addition, we adopt the idea of axial self-attention to save the memory cost and accelerate the encoder. The resulting detector, called Conditional DETR V2, achieves better results than Conditional DETR, saves the memory cost and runs more efficiently. For example, for the DC$5$-ResNet-$50$ backbone, our approach achieves $44.8$ AP with $16.4$ FPS on the COCO $val$ set and compared to Conditional DETR, it runs $1.6\times$ faster, saves $74$\% of the overall memory cost, and improves $1.0$ AP score.

preprint2022arXiv

Large discrepancy between observations and simulations: Implications for urban air quality in China

Chemical transport models (CTMs) have been widely used to provide instructions for the control of ozone (O3) pollution. However, we find large discrepancies between observation- and model-based urban O3 chemical regimes: volatile organic compound (VOC)-limited regimes over N. China and weak nitrogen oxides (NOx)-limited regimes over S. China in observations, in contrast to simulations with widespread distributions of strong NOx-limited regimes. The conflicting O3 evolutions are caused by underestimated urban NOx concentrations and the possible overestimation of biogenic VOC emissions. Reductions in NOx emissions, in response to regulations, have thus led to an unintended deterioration of O3 pollution over N. China provinces, for example, an increase in surface O3 by approximately 7 ppb over the Sichuan Basin (SCB) in 2014-2020. The NOx-induced urban O3 changes resulted in an increase in premature mortality by approximately 3000 cases in 2015-2020.

preprint2022arXiv

MaskGroup: Hierarchical Point Grouping and Masking for 3D Instance Segmentation

This paper studies the 3D instance segmentation problem, which has a variety of real-world applications such as robotics and augmented reality. Since the surroundings of 3D objects are of high complexity, the separating of different objects is very difficult. To address this challenging problem, we propose a novel framework to group and refine the 3D instances. In practice, we first learn an offset vector for each point and shift it to its predicted instance center. To better group these points, we propose a Hierarchical Point Grouping algorithm to merge the centrally aggregated points progressively. All points are grouped into small clusters, which further gradually undergo another clustering procedure to merge into larger groups. These multi-scale groups are exploited for instance prediction, which is beneficial for predicting instances with different scales. In addition, a novel MaskScoreNet is developed to produce binary point masks of these groups for further refining the segmentation results. Extensive experiments conducted on the ScanNetV2 and S3DIS benchmarks demonstrate the effectiveness of the proposed method. For instance, our approach achieves a 66.4\% mAP with the 0.5 IoU threshold on the ScanNetV2 test set, which is 1.9\% higher than the state-of-the-art method.

preprint2022arXiv

Point Scene Understanding via Disentangled Instance Mesh Reconstruction

Semantic scene reconstruction from point cloud is an essential and challenging task for 3D scene understanding. This task requires not only to recognize each instance in the scene, but also to recover their geometries based on the partial observed point cloud. Existing methods usually attempt to directly predict occupancy values of the complete object based on incomplete point cloud proposals from a detection-based backbone. However, this framework always fails to reconstruct high fidelity mesh due to the obstruction of various detected false positive object proposals and the ambiguity of incomplete point observations for learning occupancy values of complete objects. To circumvent the hurdle, we propose a Disentangled Instance Mesh Reconstruction (DIMR) framework for effective point scene understanding. A segmentation-based backbone is applied to reduce false positive object proposals, which further benefits our exploration on the relationship between recognition and reconstruction. Based on the accurate proposals, we leverage a mesh-aware latent code space to disentangle the processes of shape completion and mesh generation, relieving the ambiguity caused by the incomplete point observations. Furthermore, with access to the CAD model pool at test time, our model can also be used to improve the reconstruction quality by performing mesh retrieval without extra training. We thoroughly evaluate the reconstructed mesh quality with multiple metrics, and demonstrate the superiority of our method on the challenging ScanNet dataset. Code is available at \url{https://github.com/ashawkey/dimr}.

preprint2020arXiv

3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure Prior

The goal of the Semantic Scene Completion (SSC) task is to simultaneously predict a completed 3D voxel representation of volumetric occupancy and semantic labels of objects in the scene from a single-view observation. Since the computational cost generally increases explosively along with the growth of voxel resolution, most current state-of-the-arts have to tailor their framework into a low-resolution representation with the sacrifice of detail prediction. Thus, voxel resolution becomes one of the crucial difficulties that lead to the performance bottleneck. In this paper, we propose to devise a new geometry-based strategy to embed depth information with low-resolution voxel representation, which could still be able to encode sufficient geometric information, e.g., room layout, object's sizes and shapes, to infer the invisible areas of the scene with well structure-preserving details. To this end, we first propose a novel 3D sketch-aware feature embedding to explicitly encode geometric information effectively and efficiently. With the 3D sketch in hand, we further devise a simple yet effective semantic scene completion framework that incorporates a light-weight 3D Sketch Hallucination module to guide the inference of occupancy and the semantic labels via a semi-supervised structure prior learning strategy. We demonstrate that our proposed geometric embedding works better than the depth feature learning from habitual SSC frameworks. Our final model surpasses state-of-the-arts consistently on three public benchmarks, which only requires 3D volumes of 60 x 36 x 60 resolution for both input and output. The code and the supplementary material will be available at https://charlesCXK.github.io.

preprint2020arXiv

Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation

Depth information has proven to be a useful cue in the semantic segmentation of RGB-D images for providing a geometric counterpart to the RGB representation. Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion to obtain better feature representations to achieve more accurate segmentation. This, however, may not lead to satisfactory results as actual depth data are generally noisy, which might worsen the accuracy as the networks go deeper. In this paper, we propose a unified and efficient Cross-modality Guided Encoder to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively. The key of the proposed architecture is a novel Separation-and-Aggregation Gating operation that jointly filters and recalibrates both representations before cross-modality aggregation. Meanwhile, a Bi-direction Multi-step Propagation strategy is introduced, on the one hand, to help to propagate and fuse information between the two modalities, and on the other hand, to preserve their specificity along the long-term propagation process. Besides, our proposed encoder can be easily injected into the previous encoder-decoder structures to boost their performance on RGB-D semantic segmentation. Our model outperforms state-of-the-arts consistently on both in-door and out-door challenging datasets. Code of this work is available at https://charlescxk.github.io/

preprint2020arXiv

Impacts of COVID-19 control measures on tropospheric NO$_2$ over China, South Korea and Italy

Tropospheric nitrogen dioxide (NO$_2$) concentrations are strongly affected by anthropogenic activities. Using space-based measurements of tropospheric NO$_2$, here we investigate the responses of tropospheric NO$_2$ to the 2019 novel coronavirus (COVID-19) over China, South Korea, and Italy. We find noticeable reductions of tropospheric NO$_2$ columns due to the COVID-19 controls by more than 40% over E. China, South Korea, and N. Italy. The 40% reductions of tropospheric NO$_2$ are coincident with intensive lockdown events as well as up to 20% reductions in anthropogenic nitrogen oxides (NO$_x$) emissions. The perturbations in tropospheric NO$_2$ diminished accompanied with the mitigation of COVID-19 pandemic, and finally disappeared within around 50-70 days after the starts of control measures over all three nations, providing indications for the start, maximum, and mitigation of intensive controls. This work exhibits significant influences of lockdown measures on atmospheric environment, highlighting the importance of satellite observations to monitor anthropogenic activity changes.