Source author record

Xin Tan

Xin Tan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Computation and Language cond-mat.mtrl-sci eess.IV Information Theory Machine Learning math.IT Networking and Internet Architecture physics.ins-det

Catalog footprint

What is connected

11works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

World2Minecraft: Occupancy-Driven Simulated Scenes Construction

Embodied intelligence requires high-fidelity simulation environments to support perception and decision-making, yet existing platforms often suffer from data contamination and limited flexibility. To mitigate this, we propose World2Minecraft to convert real-world scenes into structured Minecraft environments based on 3D semantic occupancy prediction. In the reconstructed scenes, we can effortlessly perform downstream tasks such as Vision-Language Navigation(VLN). However, we observe that reconstruction quality heavily depends on accurate occupancy prediction, which remains limited by data scarcity and poor generalization in existing models. We introduce a low-cost, automated, and scalable data acquisition pipeline for creating customized occupancy datasets, and demonstrate its effectiveness through MinecraftOcc, a large-scale dataset featuring 100,165 images from 156 richly detailed indoor scenes. Extensive experiments show that our dataset provides a critical complement to existing datasets and poses a significant challenge to current SOTA methods. These findings contribute to improving occupancy prediction and highlight the value of World2Minecraft in providing a customizable and editable platform for personalized embodied AI research. Project page:https://world2minecraft.github.io/.

preprint2022arXiv

Discourse Cohesion Evaluation for Document-Level Neural Machine Translation

It is well known that translations generated by an excellent document-level neural machine translation (NMT) model are consistent and coherent. However, existing sentence-level evaluation metrics like BLEU can hardly reflect the model's performance at the document level. To tackle this issue, we propose a Discourse Cohesion Evaluation Method (DCoEM) in this paper and contribute a new test suite that considers four cohesive manners (reference, conjunction, substitution, and lexical cohesion) to measure the cohesiveness of document translations. The evaluation results on recent document-level NMT systems show that our method is practical and essential in estimating translations at the document level.

preprint2022arXiv

DMT: Dynamic Mutual Training for Semi-Supervised Learning

Recent semi-supervised learning methods use pseudo supervision as core idea, especially self-training methods that generate pseudo labels. However, pseudo labels are unreliable. Self-training methods usually rely on single model prediction confidence to filter low-confidence pseudo labels, thus remaining high-confidence errors and wasting many low-confidence correct labels. In this paper, we point out it is difficult for a model to counter its own errors. Instead, leveraging inter-model disagreement between different models is a key to locate pseudo label errors. With this new viewpoint, we propose mutual training between two different models by a dynamically re-weighted loss function, called Dynamic Mutual Training (DMT). We quantify inter-model disagreement by comparing predictions from two different models to dynamically re-weight loss in training, where a larger disagreement indicates a possible error and corresponds to a lower loss value. Extensive experiments show that DMT achieves state-of-the-art performance in both image classification and semantic segmentation. Our codes are released at https://github.com/voldemortX/DST-CBC .

preprint2022arXiv

Dual Windows Are Significant: Learning from Mediastinal Window and Focusing on Lung Window

Since the pandemic of COVID-19, several deep learning methods were proposed to analyze the chest Computed Tomography (CT) for diagnosis. In the current situation, the disease course classification is significant for medical personnel to decide the treatment. Most previous deep-learning-based methods extract features observed from the lung window. However, it has been proved that some appearances related to diagnosis can be observed better from the mediastinal window rather than the lung window, e.g., the pulmonary consolidation happens more in severe symptoms. In this paper, we propose a novel Dual Window RCNN Network (DWRNet), which mainly learns the distinctive features from the successive mediastinal window. Regarding the features extracted from the lung window, we introduce the Lung Window Attention Block (LWA Block) to pay additional attention to them for enhancing the mediastinal-window features. Moreover, instead of picking up specific slices from the whole CT slices, we use a Recurrent CNN and analyze successive slices as videos. Experimental results show that the fused and representative features improve the predictions of disease course by reaching the accuracy of 90.57%, against the baseline with an accuracy of 84.86%. Ablation studies demonstrate that combined dual window features are more efficient than lung-window features alone, while paying attention to lung-window features can improve the model's stability.

preprint2022arXiv

Night-time Scene Parsing with a Large Real Dataset

Although huge progress has been made on scene analysis in recent years, most existing works assume the input images to be in day-time with good lighting conditions. In this work, we aim to address the night-time scene parsing (NTSP) problem, which has two main challenges: 1) labeled night-time data are scarce, and 2) over- and under-exposures may co-occur in the input night-time images and are not explicitly modeled in existing pipelines. To tackle the scarcity of night-time data, we collect a novel labeled dataset, named {\it NightCity}, of 4,297 real night-time images with ground truth pixel-level semantic annotations. To our knowledge, NightCity is the largest dataset for NTSP. In addition, we also propose an exposure-aware framework to address the NTSP problem through augmenting the segmentation process with explicitly learned exposure features. Extensive experiments show that training on NightCity can significantly improve NTSP performances and that our exposure-aware model outperforms the state-of-the-art methods, yielding top performances on our dataset as well as existing datasets.

preprint2021arXiv

Boundary-Aware Geometric Encoding for Semantic Segmentation of Point Clouds

Boundary information plays a significant role in 2D image segmentation, while usually being ignored in 3D point cloud segmentation where ambiguous features might be generated in feature extraction, leading to misclassification in the transition area between two objects. In this paper, firstly, we propose a Boundary Prediction Module (BPM) to predict boundary points. Based on the predicted boundary, a boundary-aware Geometric Encoding Module (GEM) is designed to encode geometric information and aggregate features with discrimination in a neighborhood, so that the local features belonging to different categories will not be polluted by each other. To provide extra geometric information for boundary-aware GEM, we also propose a light-weight Geometric Convolution Operation (GCO), making the extracted features more distinguishing. Built upon the boundary-aware GEM, we build our network and test it on benchmarks like ScanNet v2, S3DIS. Results show our methods can significantly improve the baseline and achieve state-of-the-art performance. Code is available at https://github.com/JchenXu/BoundaryAwareGEM.

preprint2021arXiv

Weakly-Supervised Saliency Detection via Salient Object Subitizing

Salient object detection aims at detecting the most visually distinct objects and producing the corresponding masks. As the cost of pixel-level annotations is high, image tags are usually used as weak supervisions. However, an image tag can only be used to annotate one class of objects. In this paper, we introduce saliency subitizing as the weak supervision since it is class-agnostic. This allows the supervision to be aligned with the property of saliency detection, where the salient objects of an image could be from more than one class. To this end, we propose a model with two modules, Saliency Subitizing Module (SSM) and Saliency Updating Module (SUM). While SSM learns to generate the initial saliency masks using the subitizing information, without the need for any unsupervised methods or some random seeds, SUM helps iteratively refine the generated saliency masks. We conduct extensive experiments on five benchmark datasets. The experimental results show that our method outperforms other weakly-supervised methods and even performs comparably to some fully-supervised methods.

preprint2020arXiv

SceneEncoder: Scene-Aware Semantic Segmentation of Point Clouds with A Learnable Scene Descriptor

Besides local features, global information plays an essential role in semantic segmentation, while recent works usually fail to explicitly extract the meaningful global information and make full use of it. In this paper, we propose a SceneEncoder module to impose a scene-aware guidance to enhance the effect of global information. The module predicts a scene descriptor, which learns to represent the categories of objects existing in the scene and directly guides the point-level semantic segmentation through filtering out categories not belonging to this scene. Additionally, to alleviate segmentation noise in local region, we design a region similarity loss to propagate distinguishing features to their own neighboring points with the same label, leading to the enhancement of the distinguishing ability of point-wise features. We integrate our methods into several prevailing networks and conduct extensive experiments on benchmark datasets ScanNet and ShapeNet. Results show that our methods greatly improve the performance of baselines and achieve state-of-the-art performance.

preprint2015arXiv

A Testbed of Magnetic Induction-based Communication System for Underground Applications

Wireless underground sensor networks (WUSNs) can enable many important applications such as intelligent agriculture, pipeline fault diagnosis, mine disaster rescue, concealed border patrol, crude oil exploration, among others. The key challenge to realize WUSNs is the wireless communication in underground environments. Most existing wireless communication systems utilize the dipole antenna to transmit and receive propagating electromagnetic (EM) waves, which do not work well in underground environments due to the very high material absorption loss. The Magnetic Induction (MI) technique provides a promising alternative solution that could address the current problem in underground. Although the MI-based underground communication has been intensively investigated theoretically, to date, seldom effort has been made in developing a testbed for the MI-based underground communication that can validate the theoretical results. In this paper, a testbed of MI-based communication system is designed and implemented in an in-lab underground environment. The testbed realizes and tests not only the original MI mechanism that utilizes single coil but also recent developed techniques that use the MI waveguide and the 3-directional (3D) MI coils. The experiments are conducted in an in-lab underground environment with reconfigurable environmental parameters such as soil composition and water content. This paper provides the principles and guidelines for developing the MI underground communications testbed, which is very complicated and time-consuming due to the new communication mechanism and the new wireless transmission medium.

preprint2015arXiv

Increasing Indoor Spectrum Sharing Capacity using Smart Reflect-Array

The radio frequency (RF) spectrum becomes overly crowded in some indoor environments due to the high density of users and bandwidth demands. To accommodate the tremendous wireless data demands, efficient spectrum-sharing approaches are highly desired. To this end, this paper introduces a new spectrum sharing solution for indoor environments based on the usage of a reconfigurable reflect-array in the middle of the wireless channel. By optimally controlling the phase shift of each element on the reflect-array, the useful signals for each transmission pair can be enhanced while the interferences can be canceled. As a result, multiple wireless users in the same room can access the same spectrum band at the same time without interfering each other. Hence, the network capacity can be dramatically increased. To prove the feasibility of the proposed solution, an experimental testbed is first developed and evaluated. Then, the effects of the reflect-array on transport capacity of the indoor wireless networks are investigated. Through experiments, theoretical deduction, and simulations, this paper demonstrates that significantly higher spectrum-spatial efficiency can be achieved by using the smart reflect-array without any modification of the hardware and software in the users' devices.

preprint2015arXiv

Tetragonal Bismuth Bilayer: A Stable and Robust Quantum Spin Hall Insulator

Topological insulators (TIs) exhibit novel physics with great promise for new devices, but considerable challenges remain to identify TIs with high structural stability and large nontrivial band gap suitable for practical applications. Here we predict by first-principles calculations a two-dimensional (2D) TI, also known as a quantum spin Hall (QSH) insulator, in a tetragonal bismuth bilayer (TB-Bi) structure that is dynamically and thermally stable based on phonon calculations and finite-temperature molecular dynamics simulations. Density functional theory and tight-binding calculations reveal a band inversion among the Bi-p orbits driven by the strong intrinsic spin-orbit coupling, producing a large nontrivial band gap, which can be effectively tuned by moderate strains. The helical gapless edge states exhibit a linear dispersion with a high Fermi velocity comparable to that of graphene, and the QSH phase remains robust on a NaCl substrate. These remarkable properties place TB-Bi among the most promising 2D TIs for high-speed spintronic devices, and the present results provide insights into the intriguing QSH phenomenon in this new Bi structure and offer guidance for its implementation in potential applications.

Xin Tan

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

World2Minecraft: Occupancy-Driven Simulated Scenes Construction

Discourse Cohesion Evaluation for Document-Level Neural Machine Translation

DMT: Dynamic Mutual Training for Semi-Supervised Learning

Dual Windows Are Significant: Learning from Mediastinal Window and Focusing on Lung Window

Night-time Scene Parsing with a Large Real Dataset

Boundary-Aware Geometric Encoding for Semantic Segmentation of Point Clouds

Weakly-Supervised Saliency Detection via Salient Object Subitizing

SceneEncoder: Scene-Aware Semantic Segmentation of Point Clouds with A Learnable Scene Descriptor

A Testbed of Magnetic Induction-based Communication System for Underground Applications

Increasing Indoor Spectrum Sharing Capacity using Smart Reflect-Array

Tetragonal Bismuth Bilayer: A Stable and Robust Quantum Spin Hall Insulator