Source author record

Jin Xiao

Jin Xiao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Computation and Language cond-mat.mtrl-sci cs.CY Machine Learning Quantitative Methods Social and Information Networks Software Engineering

Catalog footprint

What is connected

7works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Channel-Level Relation to Attentive Aggregation with Neighborhood-Homogeneity Constraint for Point Cloud Analysis

In 3D point cloud understanding, the core challenge lies in accurately capturing discriminative features within complex neighborhoods, which directly affects the execution precision of downstream tasks such as embodied AI and autonomous driving. Existing methods explore feature correlation discrimination but are limited to point-level spatial distribution or channel responses, enabling only coarse-grained level evaluation. For modern multi-scale point cloud networks, such coarse-grained metrics inevitably incur significant information loss in deeper layers. To address this, we propose PointCRA, a novel network with a channel-level metric-based enhancement mechanism. Our core idea is to introduce temporal trend variation as a new evaluation dimension to avoid the information loss caused by weight dimension collapse in existing spatial and channel attention mechanisms. On this basis, we construct a multi-level calibration framework guided by neighborhood homogeneity for weight calibration, and design a dedicated loss function to enhance channel discriminability.PointCRA leverages intrinsic feature priors to adaptively correct feature aggregation, offering interpretability with low parameter overhead. Our method is transferable, interpretable, and efficient. We validate the proposed method on diverse datasets and benchmark models, and further demonstrate its rationality through extensive analytical experiments. Our PointCRA achieves 77.5\% mIoU on the S3DIS dataset, 90.4\% OA on the ScanObjectNN dataset, and 87.4\% instance mIoU on the ShapeNetPart dataset. The code and pretrained weights are publicly available on GitHub: https://github.com/AGENT9717/PointCRA

preprint2024arXiv

Can Large Language Models Understand Real-World Complex Instructions?

Large language models (LLMs) can understand human instructions, showing their potential for pragmatic applications beyond traditional NLP tasks. However, they still struggle with complex instructions, which can be either complex task descriptions that require multiple tasks and constraints, or complex input that contains long context, noise, heterogeneous information and multi-turn format. Due to these features, LLMs often ignore semantic constraints from task descriptions, generate incorrect formats, violate length or sample count constraints, and be unfaithful to the input text. Existing benchmarks are insufficient to assess LLMs' ability to understand complex instructions, as they are close-ended and simple. To bridge this gap, we propose CELLO, a benchmark for evaluating LLMs' ability to follow complex instructions systematically. We design eight features for complex instructions and construct a comprehensive evaluation dataset from real-world scenarios. We also establish four criteria and develop corresponding metrics, as current ones are inadequate, biased or too strict and coarse-grained. We compare the performance of representative Chinese-oriented and English-oriented models in following complex instructions through extensive experiments. Resources of CELLO are publicly available at https://github.com/Abbey4799/CELLO.

preprint2022arXiv

American Twitter Users Revealed Social Determinants-related Oral Health Disparities amid the COVID-19 Pandemic

Objectives: To assess self-reported population oral health conditions amid COVID-19 pandemic using user reports on Twitter. Method and Material: We collected oral health-related tweets during the COVID-19 pandemic from 9,104 Twitter users across 26 states (with sufficient samples) in the United States between November 12, 2020 and June 14, 2021. We inferred user demographics by leveraging the visual information from the user profile images. Other characteristics including income, population density, poverty rate, health insurance coverage rate, community water fluoridation rate, and relative change in the number of daily confirmed COVID-19 cases were acquired or inferred based on retrieved information from user profiles. We performed logistic regression to examine whether discussions vary across user characteristics. Results: Overall, 26.70% of the Twitter users discuss wisdom tooth pain/jaw hurt, 23.86% tweet about dental service/cavity, 18.97% discuss chipped tooth/tooth break, 16.23% talk about dental pain, and the rest are about tooth decay/gum bleeding. Women and younger adults (19-29) are more likely to talk about oral health problems. Health insurance coverage rate is the most significant predictor in logistic regression for topic prediction. Conclusion: Tweets inform social disparities in oral health during the pandemic. For instance, people from counties at a higher risk of COVID-19 talk more about tooth decay/gum bleeding and chipped tooth/tooth break. Older adults, who are vulnerable to COVID-19, are more likely to discuss dental pain. Topics of interest vary across user characteristics. Through the lens of social media, our findings may provide insights for oral health practitioners and policy makers.

preprint2022arXiv

An Expert System for Redesigning Software for Cloud Applications

Cloud-based software has many advantages. When services are divided into many independent components, they are easier to update. Also, during peak demand, it is easier to scale cloud services (just hire more CPUs). Hence, many organizations are partitioning their monolithic enterprise applications into cloud-based microservices. Recently there has been much work using machine learning to simplify this partitioning task. Despite much research, no single partitioning method can be recommended as generally useful. More specifically, those prior solutions are "brittle"; i.e. if they work well for one kind of goal in one dataset, then they can be sub-optimal if applied to many datasets and multiple goals. In order to find a generally useful partitioning method, we propose DEEPLY. This new algorithm extends the CO-GCN deep learning partition generator with (a) a novel loss function and (b) some hyper-parameter optimization. As shown by our experiments, DEEPLY generally outperforms prior work (including CO-GCN, and others) across multiple datasets and goals. To the best of our knowledge, this is the first report in SE of such stable hyper-parameter optimization. To aid reuse of this work, DEEPLY is available on-line at https://bit.ly/2WhfFlB.

preprint2022arXiv

Interpretable RNA Foundation Model from Unannotated Data for Highly Accurate RNA Structure and Function Predictions

Non-coding RNA structure and function are essential to understanding various biological processes, such as cell signaling, gene expression, and post-transcriptional regulations. These are all among the core problems in the RNA field. With the rapid growth of sequencing technology, we have accumulated a massive amount of unannotated RNA sequences. On the other hand, expensive experimental observatory results in only limited numbers of annotated data and 3D structures. Hence, it is still challenging to design computational methods for predicting their structures and functions. The lack of annotated data and systematic study causes inferior performance. To resolve the issue, we propose a novel RNA foundation model (RNA-FM) to take advantage of all the 23 million non-coding RNA sequences through self-supervised learning. Within this approach, we discover that the pre-trained RNA-FM could infer sequential and evolutionary information of non-coding RNAs without using any labels. Furthermore, we demonstrate RNA-FM's effectiveness by applying it to the downstream secondary/3D structure prediction, SARS-CoV-2 genome structure and evolution prediction, protein-RNA binding preference modeling, and gene expression regulation modeling. The comprehensive experiments show that the proposed method improves the RNA structural and functional modelling results significantly and consistently. Despite only being trained with unlabelled data, RNA-FM can serve as the foundational model for the field.

preprint2020arXiv

A Smartphone-based System for Real-time Early Childhood Caries Diagnosis

Early childhood caries (ECC) is the most common, yet preventable chronic disease in children under the age of 6. Treatments on severe ECC are extremely expensive and unaffordable for socioeconomically disadvantaged families. The identification of ECC in an early stage usually requires expertise in the field, and hence is often ignored by parents. Therefore, early prevention strategies and easy-to-adopt diagnosis techniques are desired. In this study, we propose a multistage deep learning-based system for cavity detection. We create a dataset containing RGB oral images labeled manually by dental practitioners. We then investigate the effectiveness of different deep learning models on the dataset. Furthermore, we integrate the deep learning system into an easy-to-use mobile application that can diagnose ECC from an early stage and provide real-time results to untrained users.

preprint2019arXiv

A realistic dimension-independent approach for charged defect calculations in semiconductors

First-principles calculations of charged defects have become a cornerstone of research in semiconductors and insulators by providing insights into their fundamental physical properties. But current standard approach using the so-called jellium model has encountered both conceptual ambiguity and computational difficulty, especially for low-dimensional semiconducting materials. In this Communication, we propose a physical, straightforward, and dimension-independent universal model to calculate the formation energies of charged defects in both three-dimensional (3D) bulk and low-dimensional semiconductors. Within this model, the ionized electrons or holes are placed on the realistic host band-edge states instead of the virtual jellium state, therefore, rendering it not only naturally keeps the supercell charge neutral, but also has clear physical meaning. This realistic model reproduces the same accuracy as the traditional jellium model for most of the 3D semiconducting materials, and remarkably, for the low-dimensional structures, it is able to cure the divergence caused by the artificial long-range electrostatic energy introduced in the jellium model, and hence gives meaningful formation energies of defects in charged state and transition energy levels of the corresponding defects. Our realistic method, therefore, will have significant impact for the study of defect physics in all low-dimensional systems including quantum dots, nanowires, surfaces, interfaces, and 2D materials.

Jin Xiao

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Channel-Level Relation to Attentive Aggregation with Neighborhood-Homogeneity Constraint for Point Cloud Analysis

Can Large Language Models Understand Real-World Complex Instructions?

American Twitter Users Revealed Social Determinants-related Oral Health Disparities amid the COVID-19 Pandemic

An Expert System for Redesigning Software for Cloud Applications

Interpretable RNA Foundation Model from Unannotated Data for Highly Accurate RNA Structure and Function Predictions

A Smartphone-based System for Real-time Early Childhood Caries Diagnosis

A realistic dimension-independent approach for charged defect calculations in semiconductors