Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
58works
0followers
23topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

58 published item(s)

preprint2026arXiv

A Universal and Robust Framework for Multiple Gas Recognition Based-on Spherical Normalization-Coupled Mahalanobis Algorithm

Electronic nose (E-nose) systems face two interconnected challenges in open-set gas recognition: feature distribution shift caused by signal drift and decision boundary failure induced by unknown gas interference. Existing methods predominantly rely on Euclidean distance or conventional classifiers, failing to account for anisotropic feature distributions and dynamic signal intensity variations. To address these issues, this study proposes the Spherical Normalization coupled Mahalanobis (SNM) module, a universal post-processing module for open-set gas recognition. First, it achieves geometric decoupling through cascaded batch and L2 normalization, projecting features onto a unit hypersphere to eliminate signal intensity fluctuations. Second, it utilizes Mahalanobis distance to construct adaptive ellipsoidal decision boundaries that conform to the anisotropic feature geometry. The architecture-agnostic SNM-Module seamlessly integrates with mainstream backbones including Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Transformer. Experiments on the public Vergara dataset demonstrate that the Transformer+SNM configuration achieves near-theoretical-limit performance in discriminating among multiple target gases, with an AUROC of 0.9977 and an unknown gas detection rate of 99.57% at 5% false positive rate, significantly outperforming state-of-the-art methods with a 3.0% AUROC improvement and 91.0% standard deviation reduction compared to Class Anchor Clustering (CAC). The module maintains exceptional robustness across five sensor positions, with standard deviations below 0.0028. This work effectively addresses the critical challenge of simultaneously achieving high accuracy and high stability in open-set gas recognition, providing solid support for industrial E-nose deployment.

preprint2026arXiv

Edge Deep Learning in Computer Vision and Medical Diagnostics: A Comprehensive Survey

Edge deep learning, a paradigm change reconciling edge computing and deep learning, facilitates real-time decision making attuned to environmental factors through the close integration of computational resources and data sources. Here we provide a comprehensive review of the current state of the art in edge deep learning, focusing on computer vision applications, in particular medical diagnostics. An overview of the foundational principles and technical advantages of edge deep learning is presented, emphasising the capacity of this technology to revolutionise a wide range of domains. Furthermore, we present a novel categorisation of edge hardware platforms based on performance and usage scenarios, facilitating platform selection and operational effectiveness. Following this, we dive into approaches to effectively implement deep neural networks on edge devices, encompassing methods such as lightweight design and model compression. Reviewing practical applications in the fields of computer vision in general and medical diagnostics in particular, we demonstrate the profound impact edge-deployed deep learning models can have in real-life situations. Finally, we provide an analysis of potential future directions and obstacles to the adoption of edge deep learning, with the intention to stimulate further investigations and advancements of intelligent edge deep learning solutions. This survey provides researchers and practitioners with a comprehensive reference shedding light on the critical role deep learning plays in the advancement of edge computing applications.

preprint2026arXiv

Reinforcement Learning for Follow-the-Leader Robotic Endoscopic Navigation via Synthetic Data

Autonomous navigation is crucial for both medical and industrial endoscopic robots, enabling safe and efficient exploration of narrow tubular environments without continuous human intervention, where avoiding contact with the inner walls has been a longstanding challenge for prior approaches. We present a follow-the-leader endoscopic robot based on a flexible continuum structure designed to minimize contact between the endoscope body and intestinal walls, thereby reducing patient discomfort. To achieve this objective, we propose a vision-based deep reinforcement learning framework guided by monocular depth estimation. A realistic intestinal simulation environment was constructed in \textit{NVIDIA Omniverse} to train and evaluate autonomous navigation strategies. Furthermore, thousands of synthetic intraluminal images were generated using NVIDIA Replicator to fine-tune the Depth Anything model, enabling dense three-dimensional perception of the intestinal environment with a single monocular camera. Subsequently, we introduce a geometry-aware reward and penalty mechanism to enable accurate lumen tracking. Compared with the original Depth Anything model, our method improves $δ_{1}$ depth accuracy by 39.2% and reduces the navigation J-index by 0.67 relative to the second-best method, demonstrating the robustness and effectiveness of the proposed approach.

preprint2026arXiv

SPECTRA: Revealing the Full Spectrum of User Preferences via Distributional LLM Inference

Large Language Models (LLMs) are increasingly used to understand user preferences, typically via the direct generation of ranked item lists. However, this end-to-end generative paradigm inherits the bias and opacity of autoregressive decoding, over-emphasizing frequent (head) preferences and obscure long-tail ones, thereby biasing personalization toward head preferences. To address this, we propose SPECTRA (Semantic Preference Extraction and Clustered TRAcking), which treats the LLM as an implicit probabilistic model by probing it to infer a probability distribution over interpretable preference clusters. In doing so, SPECTRA reframes user modeling from sequence generation with decoding heuristics to distributional inference, yielding explicit, cluster-level user preference representations. We evaluate SPECTRA on MovieLens, Yelp, and a large-scale short-video platform, demonstrating significant gains across three dimensions: SPECTRA achieves (i) distributional alignment, reducing Jensen-Shannon divergence to empirical distributions by 25% against strong baselines; (ii) long-tail exposure, reducing decoding-induced head concentration and increasing global exposure entropy by 30%; and (iii) downstream applications such as personalized ranking, translating these gains into a 40% NDCG boost on public datasets and a 7x improvement on ranking long-tail preferences against an industry-leading Transformer-based production baseline.

preprint2026arXiv

TrajPrism: A Multi-Task Benchmark for Language-Grounded Urban Trajectory Understanding

Urban mobility is naturally expressed both as trajectories in space and as natural-language descriptions of travel intent, constraints, and preferences. However, prior work rarely evaluates these two modalities together on the same real-world trajectories: trajectory modeling often stays geometry-centric, while language-centric mobility benchmarks frequently target route planning and tool use rather than fine-grained, verifiable alignment between text and the underlying route. We introduce TrajPrism, a multi-task benchmark for language-trajectory alignment that unifies (i) instruction-conditioned trajectory generation, (ii) language-driven semantic trajectory retrieval, and (iii) trajectory captioning, together with an evaluation protocol that measures trajectory fidelity, retrieval quality, and language groundedness. We construct TrajPrism by pairing real urban trajectories with judge-filtered language annotations generated under a four-dimensional travel-intent taxonomy. The benchmark contains 300K selected trajectories across Porto, San Francisco, and Beijing, yielding 2.1M task instances from three instruction variants, three retrieval queries, and one caption per trajectory. We further develop proof-of-concept models for each task: TrajAnchor for instruction-conditioned trajectory generation, TrajFuse for semantic trajectory retrieval, and TrajRap for trajectory captioning. These models instantiate the proposed tasks and show that geometry-only trajectory baselines leave a large gap on our protocol, especially where language is part of the input-output interface. We release TrajPrism with code and a reproducible annotation pipeline that is designed to be portable across cities, given compatible trajectory inputs and map resources.

preprint2026arXiv

When Rules Fall Short: Agent-Driven Discovery of Emerging Content Issues in Short Video Platforms

Trends on short-video platforms evolve at a rapid pace, with new content issues emerging every day that fall outside the coverage of existing annotation policies. However, traditional human-driven discovery of emerging issues is too slow, which leads to delayed updates of annotation policies and poses a major challenge for effective content governance. In this work, we propose an automatic issue discovery method based on multimodal LLM agents. Our approach automatically recalls short videos containing potential new issues and applies a two-stage clustering strategy to group them, with each cluster corresponding to a newly discovered issue. The agent then generates updated annotation policies from these clusters, thereby extending coverage to these emerging issues. Our agent has been deployed in the real system. Both offline and online experiments demonstrate that this agent-based method significantly improves the effectiveness of emerging-issue discovery (with an F1 score improvement of over 20%) and enhances the performance of subsequent issue governance (reducing the view count of problematic videos by approximately 15%). More importantly, compared to manual issue discovery, it greatly reduces time costs and substantially accelerates the iteration of annotation policies.

preprint2026arXiv

X-OmniClaw Technical Report: A Unified Mobile Agent for Multimodal Understanding and Interaction

Inspired by the development of OpenClaw, there is a growing demand for mobile-based personal agents capable of handling complex and intuitive interactions. In this technical report, we introduce X-OmniClaw, a unified mobile agent designed for multimodal understanding and interaction in the Android ecosystem. This unified architecture of perception, memory, and action enables the agent to handle complex mobile tasks with high contextual awareness. Specifically, Omni Perception provides a unified multimodal ingress pipeline that integrates UI states, real-world visual contexts, and speech inputs, leveraging a temporal alignment module to decompose raw data into structured multimodal intent representations. Omni Memory leverages multimodal memory optimization to enhance personalized intelligence by integrating runtime working memory for task continuity with long-term personal memory distilled from local data, enabling highly context-aware and personalized interactions. Finally, Omni Action employs a hybrid grounding strategy that combines structural XML metadata with visual perception for robust interaction. Through Behavior Cloning and Trajectory Replay, the system captures user navigation as reusable skills, enabling precise direct-access execution. Demonstrations across diverse scenarios show that X-OmniClaw effectively enhances interaction efficiency and task reliability, providing a practical architectural blueprint for the next generation of mobile-native personal assistants.

preprint2024arXiv

SponTTS: modeling and transferring spontaneous style for TTS

Spontaneous speaking style exhibits notable differences from other speaking styles due to various spontaneous phenomena (e.g., filled pauses, prolongation) and substantial prosody variation (e.g., diverse pitch and duration variation, occasional non-verbal speech like a smile), posing challenges to modeling and prediction of spontaneous style. Moreover, the limitation of high-quality spontaneous data constrains spontaneous speech generation for speakers without spontaneous data. To address these problems, we propose SponTTS, a two-stage approach based on neural bottleneck (BN) features to model and transfer spontaneous style for TTS. In the first stage, we adopt a Conditional Variational Autoencoder (CVAE) to capture spontaneous prosody from a BN feature and involve the spontaneous phenomena by the constraint of spontaneous phenomena embedding prediction loss. Besides, we introduce a flow-based predictor to predict a latent spontaneous style representation from the text, which enriches the prosody and context-specific spontaneous phenomena during inference. In the second stage, we adopt a VITS-like module to transfer the spontaneous style learned in the first stage to the target speakers. Experiments demonstrate that SponTTS is effective in modeling spontaneous style and transferring the style to the target speakers, generating spontaneous speech with high naturalness, expressiveness, and speaker similarity. The zero-shot spontaneous style TTS test further verifies the generalization and robustness of SponTTS in generating spontaneous speech for unseen speakers.

preprint2023arXiv

EZInterviewer: To Improve Job Interview Performance with Mock Interview Generator

Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.

preprint2023arXiv

TractGraphCNN: anatomically informed graph CNN for classification using diffusion MRI tractography

The structure and variability of the brain's connections can be investigated via prediction of non-imaging phenotypes using neural networks. However, known neuroanatomical relationships between input features are generally ignored in network design. We propose TractGraphCNN, a novel, anatomically informed graph CNN framework for machine learning tasks using diffusion MRI tractography. An EdgeConv module aggregates features from anatomically similar white matter connections indicated by graph edges, and an attention module enables interpretation of predictive white matter tracts. Results in a sex prediction testbed task demonstrate strong performance of TractGraphCNN in two large datasets (HCP and ABCD). Graphs informed by white matter geometry demonstrate higher performance than graphs informed by gray matter connectivity. Overall, the bilateral cingulum and left middle longitudinal fasciculus are consistently highly predictive of sex. This work shows the potential of incorporating anatomical information, especially known anatomical similarities between input features, to guide convolutions in neural networks.

preprint2022arXiv

A Model-Agnostic Causal Learning Framework for Recommendation using Search Data

Machine-learning based recommender systems(RSs) has become an effective means to help people automatically discover their interests. Existing models often represent the rich information for recommendation, such as items, users, and contexts, as embedding vectors and leverage them to predict users' feedback. In the view of causal analysis, the associations between these embedding vectors and users' feedback are a mixture of the causal part that describes why an item is preferred by a user, and the non-causal part that merely reflects the statistical dependencies between users and items, for example, the exposure mechanism, public opinions, display position, etc. However, existing RSs mostly ignored the striking differences between the causal parts and non-causal parts when using these embedding vectors. In this paper, we propose a model-agnostic framework named IV4Rec that can effectively decompose the embedding vectors into these two parts, hence enhancing recommendation results. Specifically, we jointly consider users' behaviors in search scenarios and recommendation scenarios. Adopting the concepts in causal analysis, we embed users' search behaviors as instrumental variables (IVs), to help decompose original embedding vectors in recommendation, i.e., treatments. IV4Rec then combines the two parts through deep neural networks and uses the combined results for recommendation. IV4Rec is model-agnostic and can be applied to a number of existing RSs such as DIN and NRHUB. Experimental results on both public and proprietary industrial datasets demonstrate that IV4Rec consistently enhances RSs and outperforms a framework that jointly considers search and recommendation.

preprint2022arXiv

A Wearable ECG Monitor for Deep Learning Based Real-Time Cardiovascular Disease Detection

Cardiovascular disease has become one of the most significant threats endangering human life and health. Recently, Electrocardiogram (ECG) monitoring has been transformed into remote cardiac monitoring by Holter surveillance. However, the widely used Holter can bring a great deal of discomfort and inconvenience to the individuals who carry them. We developed a new wireless ECG patch in this work and applied a deep learning framework based on the Convolutional Neural Network (CNN) and Long Short-term Memory (LSTM) models. However, we find that the models using the existing techniques are not able to differentiate two main heartbeat types (Supraventricular premature beat and Atrial fibrillation) in our newly obtained dataset, resulting in low accuracy of 58.0 %. We proposed a semi-supervised method to process the badly labelled data samples with using the confidence-level-based training. The experiment results conclude that the proposed method can approach an average accuracy of 90.2 %, i.e., 5.4 % higher than the accuracy of conventional ECG classification methods.

preprint2022arXiv

Billion-user Customer Lifetime Value Prediction: An Industrial-scale Solution from Kuaishou

Customer Life Time Value (LTV) is the expected total revenue that a single user can bring to a business. It is widely used in a variety of business scenarios to make operational decisions when acquiring new customers. Modeling LTV is a challenging problem, due to its complex and mutable data distribution. Existing approaches either directly learn from posterior feature distributions or leverage statistical models that make strong assumption on prior distributions, both of which fail to capture those mutable distributions. In this paper, we propose a complete set of industrial-level LTV modeling solutions. Specifically, we introduce an Order Dependency Monotonic Network (ODMN) that models the ordered dependencies between LTVs of different time spans, which greatly improves model performance. We further introduce a Multi Distribution Multi Experts (MDME) module based on the Divide-and-Conquer idea, which transforms the severely imbalanced distribution modeling problem into a series of relatively balanced sub-distribution modeling problems hence greatly reduces the modeling complexity. In addition, a novel evaluation metric Mutual Gini is introduced to better measure the distribution difference between the estimated value and the ground-truth label based on the Lorenz Curve. The ODMN framework has been successfully deployed in many business scenarios of Kuaishou, and achieved great performance. Extensive experiments on real-world industrial data demonstrate the superiority of the proposed methods compared to state-of-the-art baselines including ZILN and Two-Stage XGBoost models.

preprint2022arXiv

Computer-Aided Extraction of Select MRI Markers of Cerebral Small Vessel Disease: A Systematic Review

Cerebral small vessel disease (CSVD) is a major vascular contributor to cognitive impairment in ageing, including dementias. Imaging remains the most promising method for in vivo studies of CSVD. To replace the subjective and laborious visual rating approaches, emerging studies have applied state-of-the-art artificial intelligence to extract imaging biomarkers of CSVD from MRI scans. We aimed to summarise published computer-aided methods to examine three imaging biomarkers of CSVD, namely cerebral microbleeds (CMB), dilated perivascular spaces (PVS), and lacunes of presumed vascular origin. Seventy-one classical image processing, classical machine learning, and deep learning studies were identified. CMB and PVS have been better studied, compared to lacunes. While good performance metrics have been achieved in local test datasets, there have not been generalisable pipelines validated in different research or clinical cohorts. Transfer learning and weak supervision techniques have been applied to accommodate the limitations in training data. Future studies could consider pooling data from multiple sources to increase diversity, and validating the performance of the methods using both image processing metrics and associations with clinical measures.

preprint2022arXiv

Decompose to Adapt: Cross-domain Object Detection via Feature Disentanglement

Recent advances in unsupervised domain adaptation (UDA) techniques have witnessed great success in cross-domain computer vision tasks, enhancing the generalization ability of data-driven deep learning architectures by bridging the domain distribution gaps. For the UDA-based cross-domain object detection methods, the majority of them alleviate the domain bias by inducing the domain-invariant feature generation via adversarial learning strategy. However, their domain discriminators have limited classification ability due to the unstable adversarial training process. Therefore, the extracted features induced by them cannot be perfectly domain-invariant and still contain domain-private factors, bringing obstacles to further alleviate the cross-domain discrepancy. To tackle this issue, we design a Domain Disentanglement Faster-RCNN (DDF) to eliminate the source-specific information in the features for detection task learning. Our DDF method facilitates the feature disentanglement at the global and local stages, with a Global Triplet Disentanglement (GTD) module and an Instance Similarity Disentanglement (ISD) module, respectively. By outperforming state-of-the-art methods on four benchmark UDA object detection tasks, our DDF method is demonstrated to be effective with wide applicability.

preprint2022arXiv

Density Ratio Estimation via Infinitesimal Classification

Density ratio estimation (DRE) is a fundamental machine learning technique for comparing two probability distributions. However, existing methods struggle in high-dimensional settings, as it is difficult to accurately compare probability distributions based on finite samples. In this work we propose DRE-\infty, a divide-and-conquer approach to reduce DRE to a series of easier subproblems. Inspired by Monte Carlo methods, we smoothly interpolate between the two distributions via an infinite continuum of intermediate bridge distributions. We then estimate the instantaneous rate of change of the bridge distributions indexed by time (the "time score") -- a quantity defined analogously to data (Stein) scores -- with a novel time score matching objective. Crucially, the learned time scores can then be integrated to compute the desired density ratio. In addition, we show that traditional (Stein) scores can be used to obtain integration paths that connect regions of high density in both distributions, improving performance in practice. Empirically, we demonstrate that our approach performs well on downstream tasks such as mutual information estimation and energy-based modeling on complex, high-dimensional datasets.

preprint2022arXiv

Disentangling Long and Short-Term Interests for Recommendation

Modeling user's long-term and short-term interests is crucial for accurate recommendation. However, since there is no manually annotated label for user interests, existing approaches always follow the paradigm of entangling these two aspects, which may lead to inferior recommendation accuracy and interpretability. In this paper, to address it, we propose a Contrastive learning framework to disentangle Long and Short-term interests for Recommendation (CLSR) with self-supervision. Specifically, we first propose two separate encoders to independently capture user interests of different time scales. We then extract long-term and short-term interests proxies from the interaction sequences, which serve as pseudo labels for user interests. Then pairwise contrastive tasks are designed to supervise the similarity between interest representations and their corresponding interest proxies. Finally, since the importance of long-term and short-term interests is dynamically changing, we propose to adaptively aggregate them through an attention-based network for prediction. We conduct experiments on two large-scale real-world datasets for e-commerce and short-video recommendation. Empirical results show that our CLSR consistently outperforms all state-of-the-art models with significant improvements: GAUC is improved by over 0.01, and NDCG is improved by over 4%. Further counterfactual evaluations demonstrate that stronger disentanglement of long and short-term interests is successfully achieved by CLSR. The code and data are available at https://github.com/tsinghua-fib-lab/CLSR.

preprint2022arXiv

DSNet: A Dual-Stream Framework for Weakly-Supervised Gigapixel Pathology Image Analysis

We present a novel weakly-supervised framework for classifying whole slide images (WSIs). WSIs, due to their gigapixel resolution, are commonly processed by patch-wise classification with patch-level labels. However, patch-level labels require precise annotations, which is expensive and usually unavailable on clinical data. With image-level labels only, patch-wise classification would be sub-optimal due to inconsistency between the patch appearance and image-level label. To address this issue, we posit that WSI analysis can be effectively conducted by integrating information at both high magnification (local) and low magnification (regional) levels. We auto-encode the visual signals in each patch into a latent embedding vector representing local information, and down-sample the raw WSI to hardware-acceptable thumbnails representing regional information. The WSI label is then predicted with a Dual-Stream Network (DSNet), which takes the transformed local patch embeddings and multi-scale thumbnail images as inputs and can be trained by the image-level label only. Experiments conducted on two large-scale public datasets demonstrate that our method outperforms all recent state-of-the-art weakly-supervised WSI classification methods.

preprint2022arXiv

Explainability in Graph Neural Networks: An Experimental Survey

Graph neural networks (GNNs) have been extensively developed for graph representation learning in various application domains. However, similar to all other neural networks models, GNNs suffer from the black-box problem as people cannot understand the mechanism underlying them. To solve this problem, several GNN explainability methods have been proposed to explain the decisions made by GNNs. In this survey, we give an overview of the state-of-the-art GNN explainability methods and how they are evaluated. Furthermore, we propose a new evaluation metric and conduct thorough experiments to compare GNN explainability methods on real world datasets. We also suggest future directions for GNN explainability.

preprint2022arXiv

GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation

Predicting molecular conformations from molecular graphs is a fundamental problem in cheminformatics and drug discovery. Recently, significant progress has been achieved with machine learning approaches, especially with deep generative models. Inspired by the diffusion process in classical non-equilibrium thermodynamics where heated particles will diffuse from original states to a noise distribution, in this paper, we propose a novel generative model named GeoDiff for molecular conformation prediction. GeoDiff treats each atom as a particle and learns to directly reverse the diffusion process (i.e., transforming from a noise distribution to stable conformations) as a Markov chain. Modeling such a generation process is however very challenging as the likelihood of conformations should be roto-translational invariant. We theoretically show that Markov chains evolving with equivariant Markov kernels can induce an invariant distribution by design, and further propose building blocks for the Markov kernels to preserve the desirable equivariance property. The whole framework can be efficiently trained in an end-to-end fashion by optimizing a weighted variational lower bound to the (conditional) likelihood. Experiments on multiple benchmarks show that GeoDiff is superior or comparable to existing state-of-the-art approaches, especially on large molecules.

preprint2022arXiv

GrainSpace: A Large-scale Dataset for Fine-grained and Domain-adaptive Recognition of Cereal Grains

Cereal grains are a vital part of human diets and are important commodities for people's livelihood and international trade. Grain Appearance Inspection (GAI) serves as one of the crucial steps for the determination of grain quality and grain stratification for proper circulation, storage and food processing, etc. GAI is routinely performed manually by qualified inspectors with the aid of some hand tools. Automated GAI has the benefit of greatly assisting inspectors with their jobs but has been limited due to the lack of datasets and clear definitions of the tasks. In this paper we formulate GAI as three ubiquitous computer vision tasks: fine-grained recognition, domain adaptation and out-of-distribution recognition. We present a large-scale and publicly available cereal grains dataset called GrainSpace. Specifically, we construct three types of device prototypes for data acquisition, and a total of 5.25 million images determined by professional inspectors. The grain samples including wheat, maize and rice are collected from five countries and more than 30 regions. We also develop a comprehensive benchmark based on semi-supervised learning and self-supervised learning techniques. To the best of our knowledge, GrainSpace is the first publicly released dataset for cereal grain inspection.

preprint2022arXiv

HybridGNN: Learning Hybrid Representation in Multiplex Heterogeneous Networks

Recently, graph neural networks have shown the superiority of modeling the complex topological structures in heterogeneous network-based recommender systems. Due to the diverse interactions among nodes and abundant semantics emerging from diverse types of nodes and edges, there is a bursting research interest in learning expressive node representations in multiplex heterogeneous networks. One of the most important tasks in recommender systems is to predict the potential connection between two nodes under a specific edge type (i.e., relationship). Although existing studies utilize explicit metapaths to aggregate neighbors, practically they only consider intra-relationship metapaths and thus fail to leverage the potential uplift by inter-relationship information. Moreover, it is not always straightforward to exploit inter-relationship metapaths comprehensively under diverse relationships, especially with the increasing number of node and edge types. In addition, contributions of different relationships between two nodes are difficult to measure. To address the challenges, we propose HybridGNN, an end-to-end GNN model with hybrid aggregation flows and hierarchical attentions to fully utilize the heterogeneity in the multiplex scenarios. Specifically, HybridGNN applies a randomized inter-relationship exploration module to exploit the multiplexity property among different relationships. Then, our model leverages hybrid aggregation flows under intra-relationship metapaths and randomized exploration to learn the rich semantics. To explore the importance of different aggregation flow and take advantage of the multiplexity property, we bring forward a novel hierarchical attention module which leverages both metapath-level attention and relationship-level attention. Extensive experimental results suggest that HybridGNN achieves the best performance compared to several state-of-the-art baselines.

preprint2022arXiv

Leveraging Search History for Improving Person-Job Fit

As the core technique of online recruitment platforms, person-job fit can improve hiring efficiency by accurately matching job positions with qualified candidates. However, existing studies mainly focus on the recommendation scenario, while neglecting another important channel for linking positions with job seekers, i.e. search. Intuitively, search history contains rich user behavior in job seeking, reflecting important evidence for job intention of users. In this paper, we present a novel Search History enhanced Person-Job Fit model, named as SHPJF. To utilize both text content from jobs/resumes and search histories from users, we propose two components with different purposes. For text matching component, we design a BERT-based text encoder for capturing the semantic interaction between resumes and job descriptions. For intention modeling component, we design two kinds of intention modeling approaches based on the Transformer architecture, either based on the click sequence or query text sequence. To capture underlying job intentions, we further propose an intention clustering technique to identify and summarize the major intentions from search logs. Extensive experiments on a large real-world recruitment dataset have demonstrated the effectiveness of our approach.

preprint2022arXiv

Modeling Two-Way Selection Preference for Person-Job Fit

Person-job fit is the core technique of online recruitment platforms, which can improve the efficiency of recruitment by accurately matching the job positions with the job seekers. Existing works mainly focus on modeling the unidirectional process or overall matching. However, recruitment is a two-way selection process, which means that both candidate and employer involved in the interaction should meet the expectation of each other, instead of unilateral satisfaction. In this paper, we propose a dual-perspective graph representation learning approach to model directed interactions between candidates and jobs. To model the two-way selection preference from the dual-perspective of job seekers and employers, we incorporate two different nodes for each candidate (or job) and characterize both successful matching and failed matching via a unified dual-perspective interaction graph. To learn dual-perspective node representations effectively, we design an effective optimization algorithm, which involves a quadruple-based loss and a dual-perspective contrastive learning loss. Extensive experiments on three large real-world recruitment datasets have shown the effectiveness of our approach.

preprint2022arXiv

Multiple Sclerosis Lesion Analysis in Brain Magnetic Resonance Images: Techniques and Clinical Applications

Multiple sclerosis (MS) is a chronic inflammatory and degenerative disease of the central nervous system, characterized by the appearance of focal lesions in the white and gray matter that topographically correlate with an individual patient's neurological symptoms and signs. Magnetic resonance imaging (MRI) provides detailed in-vivo structural information, permitting the quantification and categorization of MS lesions that critically inform disease management. Traditionally, MS lesions have been manually annotated on 2D MRI slices, a process that is inefficient and prone to inter-/intra-observer errors. Recently, automated statistical imaging analysis techniques have been proposed to detect and segment MS lesions based on MRI voxel intensity. However, their effectiveness is limited by the heterogeneity of both MRI data acquisition techniques and the appearance of MS lesions. By learning complex lesion representations directly from images, deep learning techniques have achieved remarkable breakthroughs in the MS lesion segmentation task. Here, we provide a comprehensive review of state-of-the-art automatic statistical and deep-learning MS segmentation methods and discuss current and future clinical applications. Further, we review technical strategies, such as domain adaptation, to enhance MS lesion segmentation in real-world clinical settings.

preprint2022arXiv

Partial Graph Reasoning for Neural Network Regularization

Regularizers help deep neural networks prevent feature co-adaptations. Dropout, as a commonly used regularization technique, stochastically disables neuron activations during network optimization. However, such complete feature disposal can affect the feature representation and network understanding. Toward better descriptions of latent representations, we present DropGraph that learns a regularization function by constructing a stand-alone graph from the backbone features. DropGraph first samples stochastic spatial feature vectors and then incorporates graph reasoning methods to generate feature map distortions. This add-on graph regularizes the network during training and can be completely skipped during inference. We provide intuitions on the linkage between graph reasoning and Dropout with further discussions on how partial graph reasoning method reduces feature correlations. To this end, we extensively study the modeling of graph vertex dependencies and the utilization of the graph for distorting backbone feature maps. DropGraph was validated on 4 tasks with a total of 8 different datasets. The experimental results show that our method outperforms other state-of-the-art regularizers while leaving the base model structure unmodified during inference.

preprint2022arXiv

SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

Guided image synthesis enables everyday users to create and edit photo-realistic images with minimum effort. The key challenge is balancing faithfulness to the user input (e.g., hand-drawn colored strokes) and realism of the synthesized image. Existing GAN-based methods attempt to achieve such balance using either conditional GANs or GAN inversions, which are challenging and often require additional training data or loss functions for individual applications. To address these issues, we introduce a new image synthesis and editing method, Stochastic Differential Editing (SDEdit), based on a diffusion model generative prior, which synthesizes realistic images by iteratively denoising through a stochastic differential equation (SDE). Given an input image with user guide of any type, SDEdit first adds noise to the input, then subsequently denoises the resulting image through the SDE prior to increase its realism. SDEdit does not require task-specific training or inversions and can naturally achieve the balance between realism and faithfulness. SDEdit significantly outperforms state-of-the-art GAN-based methods by up to 98.09% on realism and 91.72% on overall satisfaction scores, according to a human perception study, on multiple tasks, including stroke-based image synthesis and editing as well as image compositing.

preprint2022arXiv

Solving Inverse Problems in Medical Imaging with Score-Based Generative Models

Reconstructing medical images from partial measurements is an important inverse problem in Computed Tomography (CT) and Magnetic Resonance Imaging (MRI). Existing solutions based on machine learning typically train a model to directly map measurements to medical images, leveraging a training dataset of paired images and measurements. These measurements are typically synthesized from images using a fixed physical model of the measurement process, which hinders the generalization capability of models to unknown measurement processes. To address this issue, we propose a fully unsupervised technique for inverse problem solving, leveraging the recently introduced score-based generative models. Specifically, we first train a score-based generative model on medical images to capture their prior distribution. Given measurements and a physical model of the measurement process at test time, we introduce a sampling method to reconstruct an image consistent with both the prior and the observed measurements. Our method does not assume a fixed measurement process during training, and can thus be flexibly adapted to different measurement processes at test time. Empirically, we observe comparable or better performance to supervised learning techniques in several medical imaging tasks in CT and MRI, while demonstrating significantly better generalization to unknown measurement processes.

preprint2022arXiv

SupWMA: Consistent and Efficient Tractography Parcellation of Superficial White Matter with Deep Learning

White matter parcellation classifies tractography streamlines into clusters or anatomically meaningful tracts to enable quantification and visualization. Most parcellation methods focus on the deep white matter (DWM), while fewer methods address the superficial white matter (SWM) due to its complexity. We propose a deep-learning-based framework, Superficial White Matter Analysis (SupWMA), that performs an efficient and consistent parcellation of 198 SWM clusters from whole-brain tractography. A point-cloud-based network is modified for our SWM parcellation task, and supervised contrastive learning enables more discriminative representations between plausible streamlines and outliers. We perform evaluation on a large tractography dataset with ground truth labels and on three independently acquired testing datasets from individuals across ages and health conditions. Compared to several state-of-the-art methods, SupWMA obtains a highly consistent and accurate SWM parcellation result. In addition, the computational speed of SupWMA is much faster than other methods.

preprint2022arXiv

Toward Consistent and Efficient Map-based Visual-inertial Localization: Theory Framework and Filter Design

This paper focuses on designing a consistent and efficient filter for map-based visual-inertial localization. First, we propose a new Lie group with its algebra, based on which a novel invariant extended Kalman filter (invariant EKF) is designed. We theoretically prove that, when we do not consider the uncertainty of the map information, the proposed invariant EKF can naturally maintain the correct observability properties of the system. To consider the uncertainty of the map information, we introduce a Schmidt filter. With the Schmidt filter, the uncertainty of the map information can be taken into consideration to avoid over-confident estimation while the computation cost only increases linearly with the size of the map keyframes. In addition, we introduce an easily implemented observability-constrained technique because directly combining the invariant EKF with the Schmidt filter cannot maintain the correct observability properties of the system that considers the uncertainty of the map information. Finally, we validate our proposed system's high consistency, accuracy, and efficiency via extensive simulations and real-world experiments.

preprint2022arXiv

Towards Bi-directional Skip Connections in Encoder-Decoder Architectures and Beyond

U-Net, as an encoder-decoder architecture with forward skip connections, has achieved promising results in various medical image analysis tasks. Many recent approaches have also extended U-Net with more complex building blocks, which typically increase the number of network parameters considerably. Such complexity makes the inference stage highly inefficient for clinical applications. Towards an effective yet economic segmentation network design, in this work, we propose backward skip connections that bring decoded features back to the encoder. Our design can be jointly adopted with forward skip connections in any encoder-decoder architecture forming a recurrence structure without introducing extra parameters. With the backward skip connections, we propose a U-Net based network family, namely Bi-directional O-shape networks, which set new benchmarks on multiple public medical imaging segmentation datasets. On the other hand, with the most plain architecture (BiO-Net), network computations inevitably increase along with the pre-set recurrence time. We have thus studied the deficiency bottleneck of such recurrent design and propose a novel two-phase Neural Architecture Search (NAS) algorithm, namely BiX-NAS, to search for the best multi-scale bi-directional skip connections. The ineffective skip connections are then discarded to reduce computational costs and speed up network inference. The finally searched BiX-Net yields the least network complexity and outperforms other state-of-the-art counterparts by large margins. We evaluate our methods on both 2D and 3D segmentation tasks in a total of six datasets. Extensive ablation studies have also been conducted to provide a comprehensive analysis for our proposed methods.

preprint2022arXiv

White Matter Tracts are Point Clouds: Neuropsychological Score Prediction and Critical Region Localization via Geometric Deep Learning

White matter tract microstructure has been shown to influence neuropsychological scores of cognitive performance. However, prediction of these scores from white matter tract data has not been attempted. In this paper, we propose a deep-learning-based framework for neuropsychological score prediction using microstructure measurements estimated from diffusion magnetic resonance imaging (dMRI) tractography, focusing on predicting performance on a receptive vocabulary assessment task based on a critical fiber tract for language, the arcuate fasciculus (AF). We directly utilize information from all points in a fiber tract, without the need to average data along the fiber as is traditionally required by diffusion MRI tractometry methods. Specifically, we represent the AF as a point cloud with microstructure measurements at each point, enabling adoption of point-based neural networks. We improve prediction performance with the proposed Paired-Siamese Loss that utilizes information about differences between continuous neuropsychological scores. Finally, we propose a Critical Region Localization (CRL) algorithm to localize informative anatomical regions containing points with strong contributions to the prediction results. Our method is evaluated on data from 806 subjects from the Human Connectome Project dataset. Results demonstrate superior neuropsychological score prediction performance compared to baseline methods. We discover that critical regions in the AF are strikingly consistent across subjects, with the highest number of strongly contributing points located in frontal cortical regions (i.e., the rostral middle frontal, pars opercularis, and pars triangularis), which are strongly implicated as critical areas for language processes.

preprint2021arXiv

Anytime Sampling for Autoregressive Models via Ordered Autoencoding

Autoregressive models are widely used for tasks such as image and audio generation. The sampling process of these models, however, does not allow interruptions and cannot adapt to real-time computational resources. This challenge impedes the deployment of powerful autoregressive models, which involve a slow sampling process that is sequential in nature and typically scales linearly with respect to the data dimension. To address this difficulty, we propose a new family of autoregressive models that enables anytime sampling. Inspired by Principal Component Analysis, we learn a structured representation space where dimensions are ordered based on their importance with respect to reconstruction. Using an autoregressive model in this latent space, we trade off sample quality for computational efficiency by truncating the generation process before decoding into the original data space. Experimentally, we demonstrate in several image and audio generation tasks that sample quality degrades gracefully as we reduce the computational budget for sampling. The approach suffers almost no loss in sample quality (measured by FID) using only 60\% to 80\% of all latent dimensions for image data. Code is available at https://github.com/Newbeeer/Anytime-Auto-Regressive-Model .

preprint2021arXiv

CellTrack R-CNN: A Novel End-To-End Deep Neural Network for Cell Segmentation and Tracking in Microscopy Images

Cell segmentation and tracking in microscopy images are of great significance to new discoveries in biology and medicine. In this study, we propose a novel approach to combine cell segmentation and cell tracking into a unified end-to-end deep learning based framework, where cell detection and segmentation are performed with a current instance segmentation pipeline and cell tracking is implemented by integrating Siamese Network with the pipeline. Besides, tracking performance is improved by incorporating spatial information into the network and fusing spatial and visual prediction. Our approach was evaluated on the DeepCell benchmark dataset. Despite being simple and efficient, our method outperforms state-of-the-art algorithms in terms of both cell segmentation and cell tracking accuracies.

preprint2021arXiv

Crystal orientation-dependent oxidation of epitaxial TiN films with tunable plasmonics

Titanium nitride (TiN) is a paradigm of refractory transition metal nitrides with great potential in vast applications. Generally, the plasmonic performance of TiN can be tuned by oxidation, which was thought to be only temperature-, oxygen partial pressure-, and time-dependent. Regarding the role of crystallographic orientation in the oxidation and resultant optical properties of TiN films, little is known thus far. Here we reveal that both the oxidation resistance behavior and the plasmonic performance of epitaxial TiN films follow the order of (001) < (110) < (111). The effects of crystallographic orientation on the lattice constants, optical properties, and oxidation levels of epitaxial TiN films have been systematically studied by combined high-resolution X-ray diffraction, spectroscopic ellipsometry, X-ray absorption spectroscopy, and X-ray photoemission spectroscopy. To further understand the role of crystallographic orientation in the initial oxidation process of TiN films, density-functional-theory calculations are carried out, indicating the energy cost of oxidation is (001) < (110) < (111), consistent with the experiments. The superior endurance of the (111) orientation against mild oxidation can largely alleviate the previously stringent technical requirements for the growth of TiN films with high plasmonic performance. The crystallographic orientation can also offer an effective controlling parameter to design TiN-based plasmonic devices with desired peculiarity, e.g., superior chemical stability against mild oxidation or large optical tunability upon oxidation.

preprint2021arXiv

Extremely Small BERT Models from Mixed-Vocabulary Training

Pretrained language models like BERT have achieved good results on NLP tasks, but are impractical on resource-limited devices due to memory footprint. A large fraction of this footprint comes from the input embeddings with large input vocabulary and embedding dimensions. Existing knowledge distillation methods used for model compression cannot be directly applied to train student models with reduced vocabulary sizes. To this end, we propose a distillation method to align the teacher and student embeddings via mixed-vocabulary training. Our method compresses BERT-LARGE to a task-agnostic model with smaller vocabulary and hidden dimensions, which is an order of magnitude smaller than other distilled BERT models and offers a better size-accuracy trade-off on language understanding benchmarks as well as a practical dialogue task.

preprint2021arXiv

How to Train Your Energy-Based Models

Energy-Based Models (EBMs), also known as non-normalized probabilistic models, specify probability density or mass functions up to an unknown normalizing constant. Unlike most other probabilistic models, EBMs do not place a restriction on the tractability of the normalizing constant, thus are more flexible to parameterize and can model a more expressive family of probability distributions. However, the unknown normalizing constant of EBMs makes training particularly difficult. Our goal is to provide a friendly introduction to modern approaches for EBM training. We start by explaining maximum likelihood training with Markov chain Monte Carlo (MCMC), and proceed to elaborate on MCMC-free approaches, including Score Matching (SM) and Noise Constrastive Estimation (NCE). We highlight theoretical connections among these three approaches, and end with a brief survey on alternative training methods, which are still under active research. Our tutorial is targeted at an audience with basic understanding of generative models who want to apply EBMs or start a research project in this direction.

preprint2021arXiv

Panoptic Feature Fusion Net: A Novel Instance Segmentation Paradigm for Biomedical and Biological Images

Instance segmentation is an important task for biomedical and biological image analysis. Due to the complicated background components, the high variability of object appearances, numerous overlapping objects, and ambiguous object boundaries, this task still remains challenging. Recently, deep learning based methods have been widely employed to solve these problems and can be categorized into proposal-free and proposal-based methods. However, both proposal-free and proposal-based methods suffer from information loss, as they focus on either global-level semantic or local-level instance features. To tackle this issue, we present a Panoptic Feature Fusion Net (PFFNet) that unifies the semantic and instance features in this work. Specifically, our proposed PFFNet contains a residual attention feature fusion mechanism to incorporate the instance prediction with the semantic features, in order to facilitate the semantic contextual information learning in the instance branch. Then, a mask quality sub-branch is designed to align the confidence score of each object with the quality of the mask prediction. Furthermore, a consistency regularization mechanism is designed between the semantic segmentation tasks in the semantic and instance branches, for the robust learning of both tasks. Extensive experiments demonstrate the effectiveness of our proposed PFFNet, which outperforms several state-of-the-art methods on various biomedical and biological datasets.

preprint2021arXiv

Score-Based Generative Modeling through Stochastic Differential Equations

Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.

preprint2021arXiv

Single Neuron Segmentation using Graph-based Global Reasoning with Auxiliary Skeleton Loss from 3D Optical Microscope Images

One of the critical steps in improving accurate single neuron reconstruction from three-dimensional (3D) optical microscope images is the neuronal structure segmentation. However, they are always hard to segment due to the lack in quality. Despite a series of attempts to apply convolutional neural networks (CNNs) on this task, noise and disconnected gaps are still challenging to alleviate with the neglect of the non-local features of graph-like tubular neural structures. Hence, we present an end-to-end segmentation network by jointly considering the local appearance and the global geometry traits through graph reasoning and a skeleton-based auxiliary loss. The evaluation results on the Janelia dataset from the BigNeuron project demonstrate that our proposed method exceeds the counterpart algorithms in performance.

preprint2021arXiv

The effect of viscosity and resistivity on Rayleigh-Taylor instability induced mixing in magnetized high energy density plasmas

This work numerically investigates the role of viscosity and resistivity on Rayleigh-Taylor instabilities in magnetized high-energy-density (HED) plasmas for a high Atwood number and high plasma beta regimes surveying across plasma beta and magnetic Prandtl numbers. The numerical simulations are performed using the visco-resistive magnetohydrodynamic (MHD) equations. Results presented here show that the inclusion of self-consistent viscosity and resistivity in the system drastically changes the growth of the Rayleigh-Taylor instability (RTI) as well as modifies its internal structure at smaller scales. It is seen here that the viscosity has a stabilizing effect on the RTI. Moreover, the viscosity inhibits the development of small scale structures and also modifies the morphology of the tip of the RTI spikes. On the other hand, the resistivity reduces the magnetic field stabilization supporting the development of small scale structures. The morphology of the RTI spikes is seen to be unaffected by the presence of resistivity in the system. An additional novelty of this work is in the disparate viscosity and resistivity profiles that may exist in HED plasmas and their impact on RTI growth, morphology, and the resulting turbulence spectra. Furthermore, this work shows that the dynamics of the magnetic field is independent of viscosity and likewise the resistivity does not affect the dissipation of enstrophy and kinetic energy. In addition, power-law scalings of enstrophy, kinetic energy, and magnetic field energy are provided in both injection range and inertial sub-range which could be useful for understanding RTI induced turbulent mixing in HED laboratory and astrophysical plasmas and could aid in the interpretation of observations of RTI-induced turbulence spectra.

preprint2020arXiv

Arbitrarily Strong Utility-Privacy Tradeoff in Multi-Agent Systems

Each agent in a network makes a local observation that is linearly related to a set of public and private parameters. The agents send their observations to a fusion center to allow it to estimate the public parameters. To prevent leakage of the private parameters, each agent first sanitizes its local observation using a local privacy mechanism before transmitting it to the fusion center. We investigate the utility-privacy tradeoff in terms of the Cramér-Rao lower bounds for estimating the public and private parameters. We study the class of privacy mechanisms given by linear compression and noise perturbation, and derive necessary and sufficient conditions for achieving arbitrarily strong utility-privacy tradeoff in a multi-agent system for both the cases where prior information is available and unavailable, respectively. We also provide a method to find the maximum estimation privacy achievable without compromising the utility and propose an alternating algorithm to optimize the utility-privacy tradeoff in the case where arbitrarily strong utility-privacy tradeoff is not achievable.

preprint2020arXiv

BiO-Net: Learning Recurrent Bi-directional Connections for Encoder-Decoder Architecture

U-Net has become one of the state-of-the-art deep learning-based approaches for modern computer vision tasks such as semantic segmentation, super resolution, image denoising, and inpainting. Previous extensions of U-Net have focused mainly on the modification of its existing building blocks or the development of new functional modules for performance gains. As a result, these variants usually lead to an unneglectable increase in model complexity. To tackle this issue in such U-Net variants, in this paper, we present a novel Bi-directional O-shape network (BiO-Net) that reuses the building blocks in a recurrent manner without introducing any extra parameters. Our proposed bi-directional skip connections can be directly adopted into any encoder-decoder architecture to further enhance its capabilities in various task domains. We evaluated our method on various medical image analysis tasks and the results show that our BiO-Net significantly outperforms the vanilla U-Net as well as other state-of-the-art methods. Our code is available at https://github.com/tiangexiang/BiO-Net.

preprint2020arXiv

BrePartition: Optimized High-Dimensional kNN Search with Bregman Distances

Bregman distances (also known as Bregman divergences) are widely used in machine learning, speech recognition and signal processing, and kNN searches with Bregman distances have become increasingly important with the rapid advances of multimedia applications. Data in multimedia applications such as images and videos are commonly transformed into space of hundreds of dimensions. Such high-dimensional space has posed significant challenges for existing kNN search algorithms with Bregman distances, which could only handle data of medium dimensionality (typically less than 100). This paper addresses the urgent problem of high-dimensional kNN search with Bregman distances. We propose a novel partition-filter-refinement framework. Specifically, we propose an optimized dimensionality partitioning scheme to solve several non-trivial issues. First, an effective bound from each partitioned subspace to obtain exact kNN results is derived. Second, we conduct an in-depth analysis of the optimized number of partitions and devise an effective strategy for partitioning. Third, we design an efficient integrated index structure for all the subspaces together to accelerate the search processing. Moreover, we extend our exact solution to an approximate version by a trade-off between the accuracy and efficiency. Experimental results on four real-world datasets and two synthetic datasets show the clear advantage of our method in comparison to state-of-the-art algorithms.

preprint2020arXiv

Creating and manipulating interfacial spin with giant magnetic response in 4$f$ antiferromagnets

Creating and manipulating spin polarization in low-dimensional electron systems (such as two-dimensional electron gases) is fundamentally essential for spintronic applications, which is yet a challenge to date. In this work, we establish the metamagnetic phase diagram of 4$f$ antiferromagnetic TbScO$_3$ and reveal its giant magnetic response to sub-tesla magnetic field, which has not been reported thus far. Utilizing this giant magnetic response, we demonstrate that the spin polarization of two-dimensional electron gas in SrTiO$_3$/LaTiO$_3$/TbScO$_3$ heterostructure can be manipulated successfully in aid of interfacial 3\textit{d}-4\textit{f} exchange interaction. Remarkably, the hysteretic magnetoresistances of two-dimensional electron gas at the SrTiO$_3$/LaTiO$_3$ interface are entirely determined by the metamagnetic phase transitions of the underlying TbScO$_3$ substrate. Our results pave a novel route to engineer the spin polarization of low-dimensional electron systems in 4$f$ antiferromagnet-based heterostructures.

preprint2020arXiv

Design of an MISO-SWIPT-Aided Code-Index Modulated Multi-Carrier M-DCSK System for e-Health IoT

Code index modulated multi-carrier M-ary differential chaos shift keying (CIM-MC-M-DCSK) system not only inherits low-power and low-complexity advantages of the conventional DCSK system, but also significantly increases the transmission rate. This feature is of particular importance to Internet of Things (IoT) with trillions of low-cost devices. In particular, for e-health IoT applications, an efficient transmission scheme is designed to solve the challenge of the limited battery capacity for numerous user equipments served by one base station. In this paper, a new multiple-input-single-output simultaneous wireless information and power transfer (MISO-SWIPT) scheme for CIM-MC-M-DCSK system is proposed by utilizing orthogonal characteristic of chaotic signals with different initial values. The proposed system adopts power splitting mode, which is very promising for simultaneously providing energy and transmitting information of the user equipments without any external power supply. In particular, the new system can achieve desirable anti-multipath-fading capability without using channel estimator. Moreover, the analytical bit-error-rate expression of the proposed system is derived over multipath Rayleigh fading channels. Furthermore, the spectral efficiency and energy efficiency of the proposed system are analyzed. Simulation results not only validate the analytical expressions, but also demonstrate the superiority of the proposed system.

preprint2020arXiv

Detecting Problem Statements in Peer Assessments

Effective peer assessment requires students to be attentive to the deficiencies in the work they rate. Thus, their reviews should identify problems. But what ways are there to check that they do? We attempt to automate the process of deciding whether a review comment detects a problem. We use over 18,000 review comments that were labeled by the reviewees as either detecting or not detecting a problem with the work. We deploy several traditional machine-learning models, as well as neural-network models using GloVe and BERT embeddings. We find that the best performer is the Hierarchical Attention Network classifier, followed by the Bidirectional Gated Recurrent Units (GRU) Attention and Capsule model with scores of 93.1% and 90.5% respectively. The best non-neural network model was the support vector machine with a score of 89.71%. This is followed by the Stochastic Gradient Descent model and the Logistic Regression model with 89.70% and 88.98%.

preprint2020arXiv

Efficient Graph Generation with Graph Recurrent Attention Networks

We propose a new family of efficient and expressive deep generative models of graphs, called Graph Recurrent Attention Networks (GRANs). Our model generates graphs one block of nodes and associated edges at a time. The block size and sampling stride allow us to trade off sample quality for efficiency. Compared to previous RNN-based graph generative models, our framework better captures the auto-regressive conditioning between the already-generated and to-be-generated parts of the graph using Graph Neural Networks (GNNs) with attention. This not only reduces the dependency on node ordering but also bypasses the long-term bottleneck caused by the sequential nature of RNNs. Moreover, we parameterize the output distribution per block using a mixture of Bernoulli, which captures the correlations among generated edges within the block. Finally, we propose to handle node orderings in generation by marginalizing over a family of canonical orderings. On standard benchmarks, we achieve state-of-the-art time efficiency and sample quality compared to previous models. Additionally, we show our model is capable of generating large graphs of up to 5K nodes with good quality. To the best of our knowledge, GRAN is the first deep graph generative model that can scale to this size. Our code is released at: https://github.com/lrjconan/GRAN.

preprint2020arXiv

Gaussianization Flows

Iterative Gaussianization is a fixed-point iteration procedure that can transform any continuous random vector into a Gaussian one. Based on iterative Gaussianization, we propose a new type of normalizing flow model that enables both efficient computation of likelihoods and efficient inversion for sample generation. We demonstrate that these models, named Gaussianization flows, are universal approximators for continuous probability distributions under some regularity conditions. Because of this guaranteed expressivity, they can capture multimodal target distributions without compromising the efficiency of sample generation. Experimentally, we show that Gaussianization flows achieve better or comparable performance on several tabular datasets compared to other efficiently invertible flow models such as Real NVP, Glow and FFJORD. In particular, Gaussianization flows are easier to initialize, demonstrate better robustness with respect to different transformations of the training data, and generalize better on small training sets.

preprint2020arXiv

Gleason Score Prediction using Deep Learning in Tissue Microarray Image

Prostate cancer (PCa) is one of the most common cancers in men around the world. The most accurate method to evaluate lesion levels of PCa is microscopic inspection of stained biopsy tissue and estimate the Gleason score of tissue microarray (TMA) image by expert pathologists. However, it is time-consuming for pathologists to identify the cellular and glandular patterns for Gleason grading in large TMA images. We used Gleason2019 Challenge dataset to build a convolutional neural network (CNN) model to segment TMA images to regions of different Gleason grades and predict the Gleason score according to the grading segmentation. We used a pre-trained model of prostate segmentation to increase the accuracy of the Gleason grade segmentation. The model achieved a mean Dice of 75.6% on the test cohort and ranked 4th in the Gleason2019 Challenge with a score of 0.778 combined of Cohen&#39;s kappa and the f1-score.

preprint2020arXiv

Improving 3D Object Detection through Progressive Population Based Augmentation

Data augmentation has been widely adopted for object detection in 3D point clouds. However, all previous related efforts have focused on manually designing specific data augmentation methods for individual architectures. In this work, we present the first attempt to automate the design of data augmentation policies for 3D object detection. We introduce the Progressive Population Based Augmentation (PPBA) algorithm, which learns to optimize augmentation strategies by narrowing down the search space and adopting the best parameters discovered in previous iterations. On the KITTI 3D detection test set, PPBA improves the StarNet detector by substantial margins on the moderate difficulty category of cars, pedestrians, and cyclists, outperforming all current state-of-the-art single-stage detection models. Additional experiments on the Waymo Open Dataset indicate that PPBA continues to effectively improve the StarNet and PointPillars detectors on a 20x larger dataset compared to KITTI. The magnitude of the improvements may be comparable to advances in 3D perception architectures and the gains come without an incurred cost at inference time. In subsequent experiments, we find that PPBA may be up to 10x more data efficient than baseline 3D detection models without augmentation, highlighting that 3D detection models may achieve competitive accuracy with far fewer labeled examples.

preprint2020arXiv

Learning to Recommend with Multiple Cascading Behaviors

Most existing recommender systems leverage user behavior data of one type only, such as the purchase behavior in E-commerce that is directly related to the business KPI (Key Performance Indicator) of conversion rate. Besides the key behavioral data, we argue that other forms of user behaviors also provide valuable signal, such as views, clicks, adding a product to shop carts and so on. They should be taken into account properly to provide quality recommendation for users. In this work, we contribute a new solution named NMTR (short for Neural Multi-Task Recommendation) for learning recommender systems from user multi-behavior data. We develop a neural network model to capture the complicated and multi-type interactions between users and items. In particular, our model accounts for the cascading relationship among different types of behaviors (e.g., a user must click on a product before purchasing it). To fully exploit the signal in the data of multiple types of behaviors, we perform a joint optimization based on the multi-task learning framework, where the optimization on a behavior is treated as a task. Extensive experiments on two real-world datasets demonstrate that NMTR significantly outperforms state-of-the-art recommender systems that are designed to learn from both single-behavior data and multi-behavior data. Further analysis shows that modeling multiple behaviors is particularly useful for providing recommendation for sparse users that have very few interactions.

preprint2020arXiv

Permutation Invariant Graph Generation via Score-Based Generative Modeling

Learning generative models for graph-structured data is challenging because graphs are discrete, combinatorial, and the underlying data distribution is invariant to the ordering of nodes. However, most of the existing generative models for graphs are not invariant to the chosen ordering, which might lead to an undesirable bias in the learned distribution. To address this difficulty, we propose a permutation invariant approach to modeling graphs, using the recent framework of score-based generative modeling. In particular, we design a permutation equivariant, multi-channel graph neural network to model the gradient of the data distribution at the input graph (a.k.a., the score function). This permutation equivariant model of gradients implicitly defines a permutation invariant distribution for graphs. We train this graph neural network with score matching and sample from it with annealed Langevin dynamics. In our experiments, we first demonstrate the capacity of this new architecture in learning discrete graph algorithms. For graph generation, we find that our learning approach achieves better or comparable results to existing models on benchmark datasets.

preprint2020arXiv

Shape-Oriented Convolution Neural Network for Point Cloud Analysis

Point cloud is a principal data structure adopted for 3D geometric information encoding. Unlike other conventional visual data, such as images and videos, these irregular points describe the complex shape features of 3D objects, which makes shape feature learning an essential component of point cloud analysis. To this end, a shape-oriented message passing scheme dubbed ShapeConv is proposed to focus on the representation learning of the underlying shape formed by each local neighboring point. Despite this intra-shape relationship learning, ShapeConv is also designed to incorporate the contextual effects from the inter-shape relationship through capturing the long-ranged dependencies between local underlying shapes. This shape-oriented operator is stacked into our hierarchical learning architecture, namely Shape-Oriented Convolutional Neural Network (SOCNN), developed for point cloud analysis. Extensive experiments have been performed to evaluate its significance in the tasks of point cloud classification and part segmentation.

preprint2020arXiv

Toward Predicting Success and Failure in CS2: A Mixed-Method Analysis

Factors driving success and failure in CS1 are the subject of much study but less so for CS2. This paper investigates the transition from CS1 to CS2 in search of leading indicators of success in CS2. Both CS1 and CS2 at the University of North Carolina Wilmington (UNCW) are taught in Python with annual enrollments of 300 and 150 respectively. In this paper, we report on the following research questions: 1) Are CS1 grades indicators of CS2 grades? 2) Does a quantitative relationship exist between CS2 course grade and a modified version of the SCS1 concept inventory? 3) What are the most challenging aspects of CS2, and how well does CS1 prepare students for CS2 from the student&#39;s perspective? We provide a quantitative analysis of 2300 CS1 and CS2 course grades from 2013--2019. In Spring 2019, we administered a modified version of the SCS1 concept inventory to 44 students in the first week of CS2. Further, 69 students completed an exit questionnaire at the conclusion of CS2 to gain qualitative student feedback on their challenges in CS2 and on how well CS1 prepared them for CS2. We find that 56% of students&#39; grades were lower in CS2 than CS1, 18% improved their grades, and 26% earned the same grade. Of the changes, 62% were within one grade point. We find a statistically significant correlation between the modified SCS1 score and CS2 grade points. Students identify linked lists and class/object concepts among the most challenging. Student feedback on CS2 challenges and the adequacy of their CS1 preparations identify possible avenues for improving the CS1-CS2 transition.

preprint2020arXiv

Training Deep Energy-Based Models with f-Divergence Minimization

Deep energy-based models (EBMs) are very flexible in distribution parametrization but computationally challenging because of the intractable partition function. They are typically trained via maximum likelihood, using contrastive divergence to approximate the gradient of the KL divergence between data and model distribution. While KL divergence has many desirable properties, other f-divergences have shown advantages in training implicit density generative models such as generative adversarial networks. In this paper, we propose a general variational framework termed f-EBM to train EBMs using any desired f-divergence. We introduce a corresponding optimization algorithm and prove its local convergence property with non-linear dynamical systems theory. Experimental results demonstrate the superiority of f-EBM over contrastive divergence, as well as the benefits of training EBMs using f-divergences other than KL.

preprint2020arXiv

Unsupervised Instance Segmentation in Microscopy Images via Panoptic Domain Adaptation and Task Re-weighting

Unsupervised domain adaptation (UDA) for nuclei instance segmentation is important for digital pathology, as it alleviates the burden of labor-intensive annotation and domain shift across datasets. In this work, we propose a Cycle Consistency Panoptic Domain Adaptive Mask R-CNN (CyC-PDAM) architecture for unsupervised nuclei segmentation in histopathology images, by learning from fluorescence microscopy images. More specifically, we first propose a nuclei inpainting mechanism to remove the auxiliary generated objects in the synthesized images. Secondly, a semantic branch with a domain discriminator is designed to achieve panoptic-level domain adaptation. Thirdly, in order to avoid the influence of the source-biased features, we propose a task re-weighting mechanism to dynamically add trade-off weights for the task-specific loss functions. Experimental results on three datasets indicate that our proposed method outperforms state-of-the-art UDA methods significantly, and demonstrates a similar performance as fully supervised methods.

preprint2019arXiv

Unconventional crystal field splitting in non-centrosymmetric BaTiO$_3$ thin films

Understanding the crystal field splitting and orbital polarization in non-centrosymmetric systems such as ferroelectric materials is fundamentally important. In this study, taking BaTiO$_3$ (BTO) as a representative material we investigate titanium crystal field splitting and orbital polarization in non-centrosymmetric TiO$_6$ octahedra with resonant X-ray linear dichroism at Ti $L_{2,3}$-edge. The high-quality BaTiO$_3$ thin films were deposited on DyScO$_3$ (110) single crystal substrates in a layer-by-layer way by pulsed laser deposition. The reflection high-energy electron diffraction (RHEED) and element specific X-ray absorption spectroscopy (XAS) were performed to characterize the structural and electronic properties of the films. In sharp contrast to conventional crystal field splitting and orbital configuration ($d_{xz}$/$d_{yz}$ $<$ $d_{xy}$ $<$ $d_{3z^2-r^2}$ $<$ $d_{x^2-y^2}$ or $d_{xy}$ $<$ $d_{xz}$/$d_{yz}$ $<$ $d_{x^2-y^2}$ $<$ $d_{3z^2-r^2}$) according to Jahn-Teller effect, it is revealed that $d_{xz}$, $d_{yz}$, and $d_{xy}$ orbitals are nearly degenerate, whereas $d_{3z^2-r^2}$ and $d_{x^2-y^2}$ orbitals are split with an energy gap $\sim$ 100 meV in the epitaxial BTO films. The unexpected degenerate states $d_{xz}$/$d_{yz}$/$d_{xy}$ are coupled to Ti-O displacements resulting from competition between polar and Jahn-Teller distortions in non-centrosymmetric TiO$_6$ octhedra of BTO films. Our results provide a route to manipulate orbital degree of freedom by switching electric polarization in ferroelectric materials.