Source author record

Yang Song

Yang Song appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

91works

37topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Universal and Robust Framework for Multiple Gas Recognition Based-on Spherical Normalization-Coupled Mahalanobis Algorithm

Electronic nose (E-nose) systems face two interconnected challenges in open-set gas recognition: feature distribution shift caused by signal drift and decision boundary failure induced by unknown gas interference. Existing methods predominantly rely on Euclidean distance or conventional classifiers, failing to account for anisotropic feature distributions and dynamic signal intensity variations. To address these issues, this study proposes the Spherical Normalization coupled Mahalanobis (SNM) module, a universal post-processing module for open-set gas recognition. First, it achieves geometric decoupling through cascaded batch and L2 normalization, projecting features onto a unit hypersphere to eliminate signal intensity fluctuations. Second, it utilizes Mahalanobis distance to construct adaptive ellipsoidal decision boundaries that conform to the anisotropic feature geometry. The architecture-agnostic SNM-Module seamlessly integrates with mainstream backbones including Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Transformer. Experiments on the public Vergara dataset demonstrate that the Transformer+SNM configuration achieves near-theoretical-limit performance in discriminating among multiple target gases, with an AUROC of 0.9977 and an unknown gas detection rate of 99.57% at 5% false positive rate, significantly outperforming state-of-the-art methods with a 3.0% AUROC improvement and 91.0% standard deviation reduction compared to Class Anchor Clustering (CAC). The module maintains exceptional robustness across five sensor positions, with standard deviations below 0.0028. This work effectively addresses the critical challenge of simultaneously achieving high accuracy and high stability in open-set gas recognition, providing solid support for industrial E-nose deployment.

preprint2026arXiv

Edge Deep Learning in Computer Vision and Medical Diagnostics: A Comprehensive Survey

Edge deep learning, a paradigm change reconciling edge computing and deep learning, facilitates real-time decision making attuned to environmental factors through the close integration of computational resources and data sources. Here we provide a comprehensive review of the current state of the art in edge deep learning, focusing on computer vision applications, in particular medical diagnostics. An overview of the foundational principles and technical advantages of edge deep learning is presented, emphasising the capacity of this technology to revolutionise a wide range of domains. Furthermore, we present a novel categorisation of edge hardware platforms based on performance and usage scenarios, facilitating platform selection and operational effectiveness. Following this, we dive into approaches to effectively implement deep neural networks on edge devices, encompassing methods such as lightweight design and model compression. Reviewing practical applications in the fields of computer vision in general and medical diagnostics in particular, we demonstrate the profound impact edge-deployed deep learning models can have in real-life situations. Finally, we provide an analysis of potential future directions and obstacles to the adoption of edge deep learning, with the intention to stimulate further investigations and advancements of intelligent edge deep learning solutions. This survey provides researchers and practitioners with a comprehensive reference shedding light on the critical role deep learning plays in the advancement of edge computing applications.

preprint2026arXiv

Reinforcement Learning for Follow-the-Leader Robotic Endoscopic Navigation via Synthetic Data

Autonomous navigation is crucial for both medical and industrial endoscopic robots, enabling safe and efficient exploration of narrow tubular environments without continuous human intervention, where avoiding contact with the inner walls has been a longstanding challenge for prior approaches. We present a follow-the-leader endoscopic robot based on a flexible continuum structure designed to minimize contact between the endoscope body and intestinal walls, thereby reducing patient discomfort. To achieve this objective, we propose a vision-based deep reinforcement learning framework guided by monocular depth estimation. A realistic intestinal simulation environment was constructed in \textit{NVIDIA Omniverse} to train and evaluate autonomous navigation strategies. Furthermore, thousands of synthetic intraluminal images were generated using NVIDIA Replicator to fine-tune the Depth Anything model, enabling dense three-dimensional perception of the intestinal environment with a single monocular camera. Subsequently, we introduce a geometry-aware reward and penalty mechanism to enable accurate lumen tracking. Compared with the original Depth Anything model, our method improves $δ_{1}$ depth accuracy by 39.2% and reduces the navigation J-index by 0.67 relative to the second-best method, demonstrating the robustness and effectiveness of the proposed approach.

preprint2026arXiv

SPECTRA: Revealing the Full Spectrum of User Preferences via Distributional LLM Inference

Large Language Models (LLMs) are increasingly used to understand user preferences, typically via the direct generation of ranked item lists. However, this end-to-end generative paradigm inherits the bias and opacity of autoregressive decoding, over-emphasizing frequent (head) preferences and obscure long-tail ones, thereby biasing personalization toward head preferences. To address this, we propose SPECTRA (Semantic Preference Extraction and Clustered TRAcking), which treats the LLM as an implicit probabilistic model by probing it to infer a probability distribution over interpretable preference clusters. In doing so, SPECTRA reframes user modeling from sequence generation with decoding heuristics to distributional inference, yielding explicit, cluster-level user preference representations. We evaluate SPECTRA on MovieLens, Yelp, and a large-scale short-video platform, demonstrating significant gains across three dimensions: SPECTRA achieves (i) distributional alignment, reducing Jensen-Shannon divergence to empirical distributions by 25% against strong baselines; (ii) long-tail exposure, reducing decoding-induced head concentration and increasing global exposure entropy by 30%; and (iii) downstream applications such as personalized ranking, translating these gains into a 40% NDCG boost on public datasets and a 7x improvement on ranking long-tail preferences against an industry-leading Transformer-based production baseline.

preprint2026arXiv

TrajPrism: A Multi-Task Benchmark for Language-Grounded Urban Trajectory Understanding

Urban mobility is naturally expressed both as trajectories in space and as natural-language descriptions of travel intent, constraints, and preferences. However, prior work rarely evaluates these two modalities together on the same real-world trajectories: trajectory modeling often stays geometry-centric, while language-centric mobility benchmarks frequently target route planning and tool use rather than fine-grained, verifiable alignment between text and the underlying route. We introduce TrajPrism, a multi-task benchmark for language-trajectory alignment that unifies (i) instruction-conditioned trajectory generation, (ii) language-driven semantic trajectory retrieval, and (iii) trajectory captioning, together with an evaluation protocol that measures trajectory fidelity, retrieval quality, and language groundedness. We construct TrajPrism by pairing real urban trajectories with judge-filtered language annotations generated under a four-dimensional travel-intent taxonomy. The benchmark contains 300K selected trajectories across Porto, San Francisco, and Beijing, yielding 2.1M task instances from three instruction variants, three retrieval queries, and one caption per trajectory. We further develop proof-of-concept models for each task: TrajAnchor for instruction-conditioned trajectory generation, TrajFuse for semantic trajectory retrieval, and TrajRap for trajectory captioning. These models instantiate the proposed tasks and show that geometry-only trajectory baselines leave a large gap on our protocol, especially where language is part of the input-output interface. We release TrajPrism with code and a reproducible annotation pipeline that is designed to be portable across cities, given compatible trajectory inputs and map resources.

preprint2026arXiv

When Rules Fall Short: Agent-Driven Discovery of Emerging Content Issues in Short Video Platforms

Trends on short-video platforms evolve at a rapid pace, with new content issues emerging every day that fall outside the coverage of existing annotation policies. However, traditional human-driven discovery of emerging issues is too slow, which leads to delayed updates of annotation policies and poses a major challenge for effective content governance. In this work, we propose an automatic issue discovery method based on multimodal LLM agents. Our approach automatically recalls short videos containing potential new issues and applies a two-stage clustering strategy to group them, with each cluster corresponding to a newly discovered issue. The agent then generates updated annotation policies from these clusters, thereby extending coverage to these emerging issues. Our agent has been deployed in the real system. Both offline and online experiments demonstrate that this agent-based method significantly improves the effectiveness of emerging-issue discovery (with an F1 score improvement of over 20%) and enhances the performance of subsequent issue governance (reducing the view count of problematic videos by approximately 15%). More importantly, compared to manual issue discovery, it greatly reduces time costs and substantially accelerates the iteration of annotation policies.

preprint2026arXiv

X-OmniClaw Technical Report: A Unified Mobile Agent for Multimodal Understanding and Interaction

Inspired by the development of OpenClaw, there is a growing demand for mobile-based personal agents capable of handling complex and intuitive interactions. In this technical report, we introduce X-OmniClaw, a unified mobile agent designed for multimodal understanding and interaction in the Android ecosystem. This unified architecture of perception, memory, and action enables the agent to handle complex mobile tasks with high contextual awareness. Specifically, Omni Perception provides a unified multimodal ingress pipeline that integrates UI states, real-world visual contexts, and speech inputs, leveraging a temporal alignment module to decompose raw data into structured multimodal intent representations. Omni Memory leverages multimodal memory optimization to enhance personalized intelligence by integrating runtime working memory for task continuity with long-term personal memory distilled from local data, enabling highly context-aware and personalized interactions. Finally, Omni Action employs a hybrid grounding strategy that combines structural XML metadata with visual perception for robust interaction. Through Behavior Cloning and Trajectory Replay, the system captures user navigation as reusable skills, enabling precise direct-access execution. Demonstrations across diverse scenarios show that X-OmniClaw effectively enhances interaction efficiency and task reliability, providing a practical architectural blueprint for the next generation of mobile-native personal assistants.

preprint2024arXiv

SponTTS: modeling and transferring spontaneous style for TTS

Spontaneous speaking style exhibits notable differences from other speaking styles due to various spontaneous phenomena (e.g., filled pauses, prolongation) and substantial prosody variation (e.g., diverse pitch and duration variation, occasional non-verbal speech like a smile), posing challenges to modeling and prediction of spontaneous style. Moreover, the limitation of high-quality spontaneous data constrains spontaneous speech generation for speakers without spontaneous data. To address these problems, we propose SponTTS, a two-stage approach based on neural bottleneck (BN) features to model and transfer spontaneous style for TTS. In the first stage, we adopt a Conditional Variational Autoencoder (CVAE) to capture spontaneous prosody from a BN feature and involve the spontaneous phenomena by the constraint of spontaneous phenomena embedding prediction loss. Besides, we introduce a flow-based predictor to predict a latent spontaneous style representation from the text, which enriches the prosody and context-specific spontaneous phenomena during inference. In the second stage, we adopt a VITS-like module to transfer the spontaneous style learned in the first stage to the target speakers. Experiments demonstrate that SponTTS is effective in modeling spontaneous style and transferring the style to the target speakers, generating spontaneous speech with high naturalness, expressiveness, and speaker similarity. The zero-shot spontaneous style TTS test further verifies the generalization and robustness of SponTTS in generating spontaneous speech for unseen speakers.

preprint2023arXiv

EZInterviewer: To Improve Job Interview Performance with Mock Interview Generator

Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.

preprint2023arXiv

TractGraphCNN: anatomically informed graph CNN for classification using diffusion MRI tractography

The structure and variability of the brain's connections can be investigated via prediction of non-imaging phenotypes using neural networks. However, known neuroanatomical relationships between input features are generally ignored in network design. We propose TractGraphCNN, a novel, anatomically informed graph CNN framework for machine learning tasks using diffusion MRI tractography. An EdgeConv module aggregates features from anatomically similar white matter connections indicated by graph edges, and an attention module enables interpretation of predictive white matter tracts. Results in a sex prediction testbed task demonstrate strong performance of TractGraphCNN in two large datasets (HCP and ABCD). Graphs informed by white matter geometry demonstrate higher performance than graphs informed by gray matter connectivity. Overall, the bilateral cingulum and left middle longitudinal fasciculus are consistently highly predictive of sex. This work shows the potential of incorporating anatomical information, especially known anatomical similarities between input features, to guide convolutions in neural networks.

preprint2022arXiv

A Model-Agnostic Causal Learning Framework for Recommendation using Search Data

Machine-learning based recommender systems(RSs) has become an effective means to help people automatically discover their interests. Existing models often represent the rich information for recommendation, such as items, users, and contexts, as embedding vectors and leverage them to predict users' feedback. In the view of causal analysis, the associations between these embedding vectors and users' feedback are a mixture of the causal part that describes why an item is preferred by a user, and the non-causal part that merely reflects the statistical dependencies between users and items, for example, the exposure mechanism, public opinions, display position, etc. However, existing RSs mostly ignored the striking differences between the causal parts and non-causal parts when using these embedding vectors. In this paper, we propose a model-agnostic framework named IV4Rec that can effectively decompose the embedding vectors into these two parts, hence enhancing recommendation results. Specifically, we jointly consider users' behaviors in search scenarios and recommendation scenarios. Adopting the concepts in causal analysis, we embed users' search behaviors as instrumental variables (IVs), to help decompose original embedding vectors in recommendation, i.e., treatments. IV4Rec then combines the two parts through deep neural networks and uses the combined results for recommendation. IV4Rec is model-agnostic and can be applied to a number of existing RSs such as DIN and NRHUB. Experimental results on both public and proprietary industrial datasets demonstrate that IV4Rec consistently enhances RSs and outperforms a framework that jointly considers search and recommendation.

preprint2022arXiv

A Wearable ECG Monitor for Deep Learning Based Real-Time Cardiovascular Disease Detection

Cardiovascular disease has become one of the most significant threats endangering human life and health. Recently, Electrocardiogram (ECG) monitoring has been transformed into remote cardiac monitoring by Holter surveillance. However, the widely used Holter can bring a great deal of discomfort and inconvenience to the individuals who carry them. We developed a new wireless ECG patch in this work and applied a deep learning framework based on the Convolutional Neural Network (CNN) and Long Short-term Memory (LSTM) models. However, we find that the models using the existing techniques are not able to differentiate two main heartbeat types (Supraventricular premature beat and Atrial fibrillation) in our newly obtained dataset, resulting in low accuracy of 58.0 %. We proposed a semi-supervised method to process the badly labelled data samples with using the confidence-level-based training. The experiment results conclude that the proposed method can approach an average accuracy of 90.2 %, i.e., 5.4 % higher than the accuracy of conventional ECG classification methods.

preprint2022arXiv

Billion-user Customer Lifetime Value Prediction: An Industrial-scale Solution from Kuaishou

Customer Life Time Value (LTV) is the expected total revenue that a single user can bring to a business. It is widely used in a variety of business scenarios to make operational decisions when acquiring new customers. Modeling LTV is a challenging problem, due to its complex and mutable data distribution. Existing approaches either directly learn from posterior feature distributions or leverage statistical models that make strong assumption on prior distributions, both of which fail to capture those mutable distributions. In this paper, we propose a complete set of industrial-level LTV modeling solutions. Specifically, we introduce an Order Dependency Monotonic Network (ODMN) that models the ordered dependencies between LTVs of different time spans, which greatly improves model performance. We further introduce a Multi Distribution Multi Experts (MDME) module based on the Divide-and-Conquer idea, which transforms the severely imbalanced distribution modeling problem into a series of relatively balanced sub-distribution modeling problems hence greatly reduces the modeling complexity. In addition, a novel evaluation metric Mutual Gini is introduced to better measure the distribution difference between the estimated value and the ground-truth label based on the Lorenz Curve. The ODMN framework has been successfully deployed in many business scenarios of Kuaishou, and achieved great performance. Extensive experiments on real-world industrial data demonstrate the superiority of the proposed methods compared to state-of-the-art baselines including ZILN and Two-Stage XGBoost models.

preprint2022arXiv

Computer-Aided Extraction of Select MRI Markers of Cerebral Small Vessel Disease: A Systematic Review

Cerebral small vessel disease (CSVD) is a major vascular contributor to cognitive impairment in ageing, including dementias. Imaging remains the most promising method for in vivo studies of CSVD. To replace the subjective and laborious visual rating approaches, emerging studies have applied state-of-the-art artificial intelligence to extract imaging biomarkers of CSVD from MRI scans. We aimed to summarise published computer-aided methods to examine three imaging biomarkers of CSVD, namely cerebral microbleeds (CMB), dilated perivascular spaces (PVS), and lacunes of presumed vascular origin. Seventy-one classical image processing, classical machine learning, and deep learning studies were identified. CMB and PVS have been better studied, compared to lacunes. While good performance metrics have been achieved in local test datasets, there have not been generalisable pipelines validated in different research or clinical cohorts. Transfer learning and weak supervision techniques have been applied to accommodate the limitations in training data. Future studies could consider pooling data from multiple sources to increase diversity, and validating the performance of the methods using both image processing metrics and associations with clinical measures.

preprint2022arXiv

Decompose to Adapt: Cross-domain Object Detection via Feature Disentanglement

Recent advances in unsupervised domain adaptation (UDA) techniques have witnessed great success in cross-domain computer vision tasks, enhancing the generalization ability of data-driven deep learning architectures by bridging the domain distribution gaps. For the UDA-based cross-domain object detection methods, the majority of them alleviate the domain bias by inducing the domain-invariant feature generation via adversarial learning strategy. However, their domain discriminators have limited classification ability due to the unstable adversarial training process. Therefore, the extracted features induced by them cannot be perfectly domain-invariant and still contain domain-private factors, bringing obstacles to further alleviate the cross-domain discrepancy. To tackle this issue, we design a Domain Disentanglement Faster-RCNN (DDF) to eliminate the source-specific information in the features for detection task learning. Our DDF method facilitates the feature disentanglement at the global and local stages, with a Global Triplet Disentanglement (GTD) module and an Instance Similarity Disentanglement (ISD) module, respectively. By outperforming state-of-the-art methods on four benchmark UDA object detection tasks, our DDF method is demonstrated to be effective with wide applicability.

preprint2022arXiv

Density Ratio Estimation via Infinitesimal Classification

Density ratio estimation (DRE) is a fundamental machine learning technique for comparing two probability distributions. However, existing methods struggle in high-dimensional settings, as it is difficult to accurately compare probability distributions based on finite samples. In this work we propose DRE-\infty, a divide-and-conquer approach to reduce DRE to a series of easier subproblems. Inspired by Monte Carlo methods, we smoothly interpolate between the two distributions via an infinite continuum of intermediate bridge distributions. We then estimate the instantaneous rate of change of the bridge distributions indexed by time (the "time score") -- a quantity defined analogously to data (Stein) scores -- with a novel time score matching objective. Crucially, the learned time scores can then be integrated to compute the desired density ratio. In addition, we show that traditional (Stein) scores can be used to obtain integration paths that connect regions of high density in both distributions, improving performance in practice. Empirically, we demonstrate that our approach performs well on downstream tasks such as mutual information estimation and energy-based modeling on complex, high-dimensional datasets.

preprint2022arXiv

Disentangling Long and Short-Term Interests for Recommendation

Modeling user's long-term and short-term interests is crucial for accurate recommendation. However, since there is no manually annotated label for user interests, existing approaches always follow the paradigm of entangling these two aspects, which may lead to inferior recommendation accuracy and interpretability. In this paper, to address it, we propose a Contrastive learning framework to disentangle Long and Short-term interests for Recommendation (CLSR) with self-supervision. Specifically, we first propose two separate encoders to independently capture user interests of different time scales. We then extract long-term and short-term interests proxies from the interaction sequences, which serve as pseudo labels for user interests. Then pairwise contrastive tasks are designed to supervise the similarity between interest representations and their corresponding interest proxies. Finally, since the importance of long-term and short-term interests is dynamically changing, we propose to adaptively aggregate them through an attention-based network for prediction. We conduct experiments on two large-scale real-world datasets for e-commerce and short-video recommendation. Empirical results show that our CLSR consistently outperforms all state-of-the-art models with significant improvements: GAUC is improved by over 0.01, and NDCG is improved by over 4%. Further counterfactual evaluations demonstrate that stronger disentanglement of long and short-term interests is successfully achieved by CLSR. The code and data are available at https://github.com/tsinghua-fib-lab/CLSR.

preprint2022arXiv

DSNet: A Dual-Stream Framework for Weakly-Supervised Gigapixel Pathology Image Analysis

We present a novel weakly-supervised framework for classifying whole slide images (WSIs). WSIs, due to their gigapixel resolution, are commonly processed by patch-wise classification with patch-level labels. However, patch-level labels require precise annotations, which is expensive and usually unavailable on clinical data. With image-level labels only, patch-wise classification would be sub-optimal due to inconsistency between the patch appearance and image-level label. To address this issue, we posit that WSI analysis can be effectively conducted by integrating information at both high magnification (local) and low magnification (regional) levels. We auto-encode the visual signals in each patch into a latent embedding vector representing local information, and down-sample the raw WSI to hardware-acceptable thumbnails representing regional information. The WSI label is then predicted with a Dual-Stream Network (DSNet), which takes the transformed local patch embeddings and multi-scale thumbnail images as inputs and can be trained by the image-level label only. Experiments conducted on two large-scale public datasets demonstrate that our method outperforms all recent state-of-the-art weakly-supervised WSI classification methods.

preprint2022arXiv

Explainability in Graph Neural Networks: An Experimental Survey

Graph neural networks (GNNs) have been extensively developed for graph representation learning in various application domains. However, similar to all other neural networks models, GNNs suffer from the black-box problem as people cannot understand the mechanism underlying them. To solve this problem, several GNN explainability methods have been proposed to explain the decisions made by GNNs. In this survey, we give an overview of the state-of-the-art GNN explainability methods and how they are evaluated. Furthermore, we propose a new evaluation metric and conduct thorough experiments to compare GNN explainability methods on real world datasets. We also suggest future directions for GNN explainability.

preprint2022arXiv

GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation

Predicting molecular conformations from molecular graphs is a fundamental problem in cheminformatics and drug discovery. Recently, significant progress has been achieved with machine learning approaches, especially with deep generative models. Inspired by the diffusion process in classical non-equilibrium thermodynamics where heated particles will diffuse from original states to a noise distribution, in this paper, we propose a novel generative model named GeoDiff for molecular conformation prediction. GeoDiff treats each atom as a particle and learns to directly reverse the diffusion process (i.e., transforming from a noise distribution to stable conformations) as a Markov chain. Modeling such a generation process is however very challenging as the likelihood of conformations should be roto-translational invariant. We theoretically show that Markov chains evolving with equivariant Markov kernels can induce an invariant distribution by design, and further propose building blocks for the Markov kernels to preserve the desirable equivariance property. The whole framework can be efficiently trained in an end-to-end fashion by optimizing a weighted variational lower bound to the (conditional) likelihood. Experiments on multiple benchmarks show that GeoDiff is superior or comparable to existing state-of-the-art approaches, especially on large molecules.

preprint2022arXiv

GrainSpace: A Large-scale Dataset for Fine-grained and Domain-adaptive Recognition of Cereal Grains

Cereal grains are a vital part of human diets and are important commodities for people's livelihood and international trade. Grain Appearance Inspection (GAI) serves as one of the crucial steps for the determination of grain quality and grain stratification for proper circulation, storage and food processing, etc. GAI is routinely performed manually by qualified inspectors with the aid of some hand tools. Automated GAI has the benefit of greatly assisting inspectors with their jobs but has been limited due to the lack of datasets and clear definitions of the tasks. In this paper we formulate GAI as three ubiquitous computer vision tasks: fine-grained recognition, domain adaptation and out-of-distribution recognition. We present a large-scale and publicly available cereal grains dataset called GrainSpace. Specifically, we construct three types of device prototypes for data acquisition, and a total of 5.25 million images determined by professional inspectors. The grain samples including wheat, maize and rice are collected from five countries and more than 30 regions. We also develop a comprehensive benchmark based on semi-supervised learning and self-supervised learning techniques. To the best of our knowledge, GrainSpace is the first publicly released dataset for cereal grain inspection.

preprint2022arXiv

HybridGNN: Learning Hybrid Representation in Multiplex Heterogeneous Networks

Recently, graph neural networks have shown the superiority of modeling the complex topological structures in heterogeneous network-based recommender systems. Due to the diverse interactions among nodes and abundant semantics emerging from diverse types of nodes and edges, there is a bursting research interest in learning expressive node representations in multiplex heterogeneous networks. One of the most important tasks in recommender systems is to predict the potential connection between two nodes under a specific edge type (i.e., relationship). Although existing studies utilize explicit metapaths to aggregate neighbors, practically they only consider intra-relationship metapaths and thus fail to leverage the potential uplift by inter-relationship information. Moreover, it is not always straightforward to exploit inter-relationship metapaths comprehensively under diverse relationships, especially with the increasing number of node and edge types. In addition, contributions of different relationships between two nodes are difficult to measure. To address the challenges, we propose HybridGNN, an end-to-end GNN model with hybrid aggregation flows and hierarchical attentions to fully utilize the heterogeneity in the multiplex scenarios. Specifically, HybridGNN applies a randomized inter-relationship exploration module to exploit the multiplexity property among different relationships. Then, our model leverages hybrid aggregation flows under intra-relationship metapaths and randomized exploration to learn the rich semantics. To explore the importance of different aggregation flow and take advantage of the multiplexity property, we bring forward a novel hierarchical attention module which leverages both metapath-level attention and relationship-level attention. Extensive experimental results suggest that HybridGNN achieves the best performance compared to several state-of-the-art baselines.

preprint2022arXiv

Leveraging Search History for Improving Person-Job Fit

As the core technique of online recruitment platforms, person-job fit can improve hiring efficiency by accurately matching job positions with qualified candidates. However, existing studies mainly focus on the recommendation scenario, while neglecting another important channel for linking positions with job seekers, i.e. search. Intuitively, search history contains rich user behavior in job seeking, reflecting important evidence for job intention of users. In this paper, we present a novel Search History enhanced Person-Job Fit model, named as SHPJF. To utilize both text content from jobs/resumes and search histories from users, we propose two components with different purposes. For text matching component, we design a BERT-based text encoder for capturing the semantic interaction between resumes and job descriptions. For intention modeling component, we design two kinds of intention modeling approaches based on the Transformer architecture, either based on the click sequence or query text sequence. To capture underlying job intentions, we further propose an intention clustering technique to identify and summarize the major intentions from search logs. Extensive experiments on a large real-world recruitment dataset have demonstrated the effectiveness of our approach.

preprint2022arXiv

Modeling Two-Way Selection Preference for Person-Job Fit

Person-job fit is the core technique of online recruitment platforms, which can improve the efficiency of recruitment by accurately matching the job positions with the job seekers. Existing works mainly focus on modeling the unidirectional process or overall matching. However, recruitment is a two-way selection process, which means that both candidate and employer involved in the interaction should meet the expectation of each other, instead of unilateral satisfaction. In this paper, we propose a dual-perspective graph representation learning approach to model directed interactions between candidates and jobs. To model the two-way selection preference from the dual-perspective of job seekers and employers, we incorporate two different nodes for each candidate (or job) and characterize both successful matching and failed matching via a unified dual-perspective interaction graph. To learn dual-perspective node representations effectively, we design an effective optimization algorithm, which involves a quadruple-based loss and a dual-perspective contrastive learning loss. Extensive experiments on three large real-world recruitment datasets have shown the effectiveness of our approach.

preprint2022arXiv

Multiple Sclerosis Lesion Analysis in Brain Magnetic Resonance Images: Techniques and Clinical Applications

Multiple sclerosis (MS) is a chronic inflammatory and degenerative disease of the central nervous system, characterized by the appearance of focal lesions in the white and gray matter that topographically correlate with an individual patient's neurological symptoms and signs. Magnetic resonance imaging (MRI) provides detailed in-vivo structural information, permitting the quantification and categorization of MS lesions that critically inform disease management. Traditionally, MS lesions have been manually annotated on 2D MRI slices, a process that is inefficient and prone to inter-/intra-observer errors. Recently, automated statistical imaging analysis techniques have been proposed to detect and segment MS lesions based on MRI voxel intensity. However, their effectiveness is limited by the heterogeneity of both MRI data acquisition techniques and the appearance of MS lesions. By learning complex lesion representations directly from images, deep learning techniques have achieved remarkable breakthroughs in the MS lesion segmentation task. Here, we provide a comprehensive review of state-of-the-art automatic statistical and deep-learning MS segmentation methods and discuss current and future clinical applications. Further, we review technical strategies, such as domain adaptation, to enhance MS lesion segmentation in real-world clinical settings.

preprint2022arXiv

Partial Graph Reasoning for Neural Network Regularization

Regularizers help deep neural networks prevent feature co-adaptations. Dropout, as a commonly used regularization technique, stochastically disables neuron activations during network optimization. However, such complete feature disposal can affect the feature representation and network understanding. Toward better descriptions of latent representations, we present DropGraph that learns a regularization function by constructing a stand-alone graph from the backbone features. DropGraph first samples stochastic spatial feature vectors and then incorporates graph reasoning methods to generate feature map distortions. This add-on graph regularizes the network during training and can be completely skipped during inference. We provide intuitions on the linkage between graph reasoning and Dropout with further discussions on how partial graph reasoning method reduces feature correlations. To this end, we extensively study the modeling of graph vertex dependencies and the utilization of the graph for distorting backbone feature maps. DropGraph was validated on 4 tasks with a total of 8 different datasets. The experimental results show that our method outperforms other state-of-the-art regularizers while leaving the base model structure unmodified during inference.

preprint2022arXiv

SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

Guided image synthesis enables everyday users to create and edit photo-realistic images with minimum effort. The key challenge is balancing faithfulness to the user input (e.g., hand-drawn colored strokes) and realism of the synthesized image. Existing GAN-based methods attempt to achieve such balance using either conditional GANs or GAN inversions, which are challenging and often require additional training data or loss functions for individual applications. To address these issues, we introduce a new image synthesis and editing method, Stochastic Differential Editing (SDEdit), based on a diffusion model generative prior, which synthesizes realistic images by iteratively denoising through a stochastic differential equation (SDE). Given an input image with user guide of any type, SDEdit first adds noise to the input, then subsequently denoises the resulting image through the SDE prior to increase its realism. SDEdit does not require task-specific training or inversions and can naturally achieve the balance between realism and faithfulness. SDEdit significantly outperforms state-of-the-art GAN-based methods by up to 98.09% on realism and 91.72% on overall satisfaction scores, according to a human perception study, on multiple tasks, including stroke-based image synthesis and editing as well as image compositing.

preprint2022arXiv

Solving Inverse Problems in Medical Imaging with Score-Based Generative Models

Reconstructing medical images from partial measurements is an important inverse problem in Computed Tomography (CT) and Magnetic Resonance Imaging (MRI). Existing solutions based on machine learning typically train a model to directly map measurements to medical images, leveraging a training dataset of paired images and measurements. These measurements are typically synthesized from images using a fixed physical model of the measurement process, which hinders the generalization capability of models to unknown measurement processes. To address this issue, we propose a fully unsupervised technique for inverse problem solving, leveraging the recently introduced score-based generative models. Specifically, we first train a score-based generative model on medical images to capture their prior distribution. Given measurements and a physical model of the measurement process at test time, we introduce a sampling method to reconstruct an image consistent with both the prior and the observed measurements. Our method does not assume a fixed measurement process during training, and can thus be flexibly adapted to different measurement processes at test time. Empirically, we observe comparable or better performance to supervised learning techniques in several medical imaging tasks in CT and MRI, while demonstrating significantly better generalization to unknown measurement processes.

preprint2022arXiv

SupWMA: Consistent and Efficient Tractography Parcellation of Superficial White Matter with Deep Learning

White matter parcellation classifies tractography streamlines into clusters or anatomically meaningful tracts to enable quantification and visualization. Most parcellation methods focus on the deep white matter (DWM), while fewer methods address the superficial white matter (SWM) due to its complexity. We propose a deep-learning-based framework, Superficial White Matter Analysis (SupWMA), that performs an efficient and consistent parcellation of 198 SWM clusters from whole-brain tractography. A point-cloud-based network is modified for our SWM parcellation task, and supervised contrastive learning enables more discriminative representations between plausible streamlines and outliers. We perform evaluation on a large tractography dataset with ground truth labels and on three independently acquired testing datasets from individuals across ages and health conditions. Compared to several state-of-the-art methods, SupWMA obtains a highly consistent and accurate SWM parcellation result. In addition, the computational speed of SupWMA is much faster than other methods.

preprint2022arXiv

Toward Consistent and Efficient Map-based Visual-inertial Localization: Theory Framework and Filter Design

This paper focuses on designing a consistent and efficient filter for map-based visual-inertial localization. First, we propose a new Lie group with its algebra, based on which a novel invariant extended Kalman filter (invariant EKF) is designed. We theoretically prove that, when we do not consider the uncertainty of the map information, the proposed invariant EKF can naturally maintain the correct observability properties of the system. To consider the uncertainty of the map information, we introduce a Schmidt filter. With the Schmidt filter, the uncertainty of the map information can be taken into consideration to avoid over-confident estimation while the computation cost only increases linearly with the size of the map keyframes. In addition, we introduce an easily implemented observability-constrained technique because directly combining the invariant EKF with the Schmidt filter cannot maintain the correct observability properties of the system that considers the uncertainty of the map information. Finally, we validate our proposed system's high consistency, accuracy, and efficiency via extensive simulations and real-world experiments.

preprint2022arXiv

Towards Bi-directional Skip Connections in Encoder-Decoder Architectures and Beyond

U-Net, as an encoder-decoder architecture with forward skip connections, has achieved promising results in various medical image analysis tasks. Many recent approaches have also extended U-Net with more complex building blocks, which typically increase the number of network parameters considerably. Such complexity makes the inference stage highly inefficient for clinical applications. Towards an effective yet economic segmentation network design, in this work, we propose backward skip connections that bring decoded features back to the encoder. Our design can be jointly adopted with forward skip connections in any encoder-decoder architecture forming a recurrence structure without introducing extra parameters. With the backward skip connections, we propose a U-Net based network family, namely Bi-directional O-shape networks, which set new benchmarks on multiple public medical imaging segmentation datasets. On the other hand, with the most plain architecture (BiO-Net), network computations inevitably increase along with the pre-set recurrence time. We have thus studied the deficiency bottleneck of such recurrent design and propose a novel two-phase Neural Architecture Search (NAS) algorithm, namely BiX-NAS, to search for the best multi-scale bi-directional skip connections. The ineffective skip connections are then discarded to reduce computational costs and speed up network inference. The finally searched BiX-Net yields the least network complexity and outperforms other state-of-the-art counterparts by large margins. We evaluate our methods on both 2D and 3D segmentation tasks in a total of six datasets. Extensive ablation studies have also been conducted to provide a comprehensive analysis for our proposed methods.

preprint2022arXiv

White Matter Tracts are Point Clouds: Neuropsychological Score Prediction and Critical Region Localization via Geometric Deep Learning

White matter tract microstructure has been shown to influence neuropsychological scores of cognitive performance. However, prediction of these scores from white matter tract data has not been attempted. In this paper, we propose a deep-learning-based framework for neuropsychological score prediction using microstructure measurements estimated from diffusion magnetic resonance imaging (dMRI) tractography, focusing on predicting performance on a receptive vocabulary assessment task based on a critical fiber tract for language, the arcuate fasciculus (AF). We directly utilize information from all points in a fiber tract, without the need to average data along the fiber as is traditionally required by diffusion MRI tractometry methods. Specifically, we represent the AF as a point cloud with microstructure measurements at each point, enabling adoption of point-based neural networks. We improve prediction performance with the proposed Paired-Siamese Loss that utilizes information about differences between continuous neuropsychological scores. Finally, we propose a Critical Region Localization (CRL) algorithm to localize informative anatomical regions containing points with strong contributions to the prediction results. Our method is evaluated on data from 806 subjects from the Human Connectome Project dataset. Results demonstrate superior neuropsychological score prediction performance compared to baseline methods. We discover that critical regions in the AF are strikingly consistent across subjects, with the highest number of strongly contributing points located in frontal cortical regions (i.e., the rostral middle frontal, pars opercularis, and pars triangularis), which are strongly implicated as critical areas for language processes.

preprint2021arXiv

Anytime Sampling for Autoregressive Models via Ordered Autoencoding

Autoregressive models are widely used for tasks such as image and audio generation. The sampling process of these models, however, does not allow interruptions and cannot adapt to real-time computational resources. This challenge impedes the deployment of powerful autoregressive models, which involve a slow sampling process that is sequential in nature and typically scales linearly with respect to the data dimension. To address this difficulty, we propose a new family of autoregressive models that enables anytime sampling. Inspired by Principal Component Analysis, we learn a structured representation space where dimensions are ordered based on their importance with respect to reconstruction. Using an autoregressive model in this latent space, we trade off sample quality for computational efficiency by truncating the generation process before decoding into the original data space. Experimentally, we demonstrate in several image and audio generation tasks that sample quality degrades gracefully as we reduce the computational budget for sampling. The approach suffers almost no loss in sample quality (measured by FID) using only 60\% to 80\% of all latent dimensions for image data. Code is available at https://github.com/Newbeeer/Anytime-Auto-Regressive-Model .

preprint2021arXiv

CellTrack R-CNN: A Novel End-To-End Deep Neural Network for Cell Segmentation and Tracking in Microscopy Images

Cell segmentation and tracking in microscopy images are of great significance to new discoveries in biology and medicine. In this study, we propose a novel approach to combine cell segmentation and cell tracking into a unified end-to-end deep learning based framework, where cell detection and segmentation are performed with a current instance segmentation pipeline and cell tracking is implemented by integrating Siamese Network with the pipeline. Besides, tracking performance is improved by incorporating spatial information into the network and fusing spatial and visual prediction. Our approach was evaluated on the DeepCell benchmark dataset. Despite being simple and efficient, our method outperforms state-of-the-art algorithms in terms of both cell segmentation and cell tracking accuracies.

preprint2021arXiv

Crystal orientation-dependent oxidation of epitaxial TiN films with tunable plasmonics

Titanium nitride (TiN) is a paradigm of refractory transition metal nitrides with great potential in vast applications. Generally, the plasmonic performance of TiN can be tuned by oxidation, which was thought to be only temperature-, oxygen partial pressure-, and time-dependent. Regarding the role of crystallographic orientation in the oxidation and resultant optical properties of TiN films, little is known thus far. Here we reveal that both the oxidation resistance behavior and the plasmonic performance of epitaxial TiN films follow the order of (001) < (110) < (111). The effects of crystallographic orientation on the lattice constants, optical properties, and oxidation levels of epitaxial TiN films have been systematically studied by combined high-resolution X-ray diffraction, spectroscopic ellipsometry, X-ray absorption spectroscopy, and X-ray photoemission spectroscopy. To further understand the role of crystallographic orientation in the initial oxidation process of TiN films, density-functional-theory calculations are carried out, indicating the energy cost of oxidation is (001) < (110) < (111), consistent with the experiments. The superior endurance of the (111) orientation against mild oxidation can largely alleviate the previously stringent technical requirements for the growth of TiN films with high plasmonic performance. The crystallographic orientation can also offer an effective controlling parameter to design TiN-based plasmonic devices with desired peculiarity, e.g., superior chemical stability against mild oxidation or large optical tunability upon oxidation.

preprint2021arXiv

Extremely Small BERT Models from Mixed-Vocabulary Training

Pretrained language models like BERT have achieved good results on NLP tasks, but are impractical on resource-limited devices due to memory footprint. A large fraction of this footprint comes from the input embeddings with large input vocabulary and embedding dimensions. Existing knowledge distillation methods used for model compression cannot be directly applied to train student models with reduced vocabulary sizes. To this end, we propose a distillation method to align the teacher and student embeddings via mixed-vocabulary training. Our method compresses BERT-LARGE to a task-agnostic model with smaller vocabulary and hidden dimensions, which is an order of magnitude smaller than other distilled BERT models and offers a better size-accuracy trade-off on language understanding benchmarks as well as a practical dialogue task.

preprint2021arXiv

How to Train Your Energy-Based Models

Energy-Based Models (EBMs), also known as non-normalized probabilistic models, specify probability density or mass functions up to an unknown normalizing constant. Unlike most other probabilistic models, EBMs do not place a restriction on the tractability of the normalizing constant, thus are more flexible to parameterize and can model a more expressive family of probability distributions. However, the unknown normalizing constant of EBMs makes training particularly difficult. Our goal is to provide a friendly introduction to modern approaches for EBM training. We start by explaining maximum likelihood training with Markov chain Monte Carlo (MCMC), and proceed to elaborate on MCMC-free approaches, including Score Matching (SM) and Noise Constrastive Estimation (NCE). We highlight theoretical connections among these three approaches, and end with a brief survey on alternative training methods, which are still under active research. Our tutorial is targeted at an audience with basic understanding of generative models who want to apply EBMs or start a research project in this direction.

preprint2021arXiv

Panoptic Feature Fusion Net: A Novel Instance Segmentation Paradigm for Biomedical and Biological Images

Instance segmentation is an important task for biomedical and biological image analysis. Due to the complicated background components, the high variability of object appearances, numerous overlapping objects, and ambiguous object boundaries, this task still remains challenging. Recently, deep learning based methods have been widely employed to solve these problems and can be categorized into proposal-free and proposal-based methods. However, both proposal-free and proposal-based methods suffer from information loss, as they focus on either global-level semantic or local-level instance features. To tackle this issue, we present a Panoptic Feature Fusion Net (PFFNet) that unifies the semantic and instance features in this work. Specifically, our proposed PFFNet contains a residual attention feature fusion mechanism to incorporate the instance prediction with the semantic features, in order to facilitate the semantic contextual information learning in the instance branch. Then, a mask quality sub-branch is designed to align the confidence score of each object with the quality of the mask prediction. Furthermore, a consistency regularization mechanism is designed between the semantic segmentation tasks in the semantic and instance branches, for the robust learning of both tasks. Extensive experiments demonstrate the effectiveness of our proposed PFFNet, which outperforms several state-of-the-art methods on various biomedical and biological datasets.

preprint2021arXiv

Score-Based Generative Modeling through Stochastic Differential Equations

Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.

preprint2021arXiv

Single Neuron Segmentation using Graph-based Global Reasoning with Auxiliary Skeleton Loss from 3D Optical Microscope Images

One of the critical steps in improving accurate single neuron reconstruction from three-dimensional (3D) optical microscope images is the neuronal structure segmentation. However, they are always hard to segment due to the lack in quality. Despite a series of attempts to apply convolutional neural networks (CNNs) on this task, noise and disconnected gaps are still challenging to alleviate with the neglect of the non-local features of graph-like tubular neural structures. Hence, we present an end-to-end segmentation network by jointly considering the local appearance and the global geometry traits through graph reasoning and a skeleton-based auxiliary loss. The evaluation results on the Janelia dataset from the BigNeuron project demonstrate that our proposed method exceeds the counterpart algorithms in performance.

preprint2021arXiv

The effect of viscosity and resistivity on Rayleigh-Taylor instability induced mixing in magnetized high energy density plasmas

This work numerically investigates the role of viscosity and resistivity on Rayleigh-Taylor instabilities in magnetized high-energy-density (HED) plasmas for a high Atwood number and high plasma beta regimes surveying across plasma beta and magnetic Prandtl numbers. The numerical simulations are performed using the visco-resistive magnetohydrodynamic (MHD) equations. Results presented here show that the inclusion of self-consistent viscosity and resistivity in the system drastically changes the growth of the Rayleigh-Taylor instability (RTI) as well as modifies its internal structure at smaller scales. It is seen here that the viscosity has a stabilizing effect on the RTI. Moreover, the viscosity inhibits the development of small scale structures and also modifies the morphology of the tip of the RTI spikes. On the other hand, the resistivity reduces the magnetic field stabilization supporting the development of small scale structures. The morphology of the RTI spikes is seen to be unaffected by the presence of resistivity in the system. An additional novelty of this work is in the disparate viscosity and resistivity profiles that may exist in HED plasmas and their impact on RTI growth, morphology, and the resulting turbulence spectra. Furthermore, this work shows that the dynamics of the magnetic field is independent of viscosity and likewise the resistivity does not affect the dissipation of enstrophy and kinetic energy. In addition, power-law scalings of enstrophy, kinetic energy, and magnetic field energy are provided in both injection range and inertial sub-range which could be useful for understanding RTI induced turbulent mixing in HED laboratory and astrophysical plasmas and could aid in the interpretation of observations of RTI-induced turbulence spectra.

preprint2020arXiv

Arbitrarily Strong Utility-Privacy Tradeoff in Multi-Agent Systems

Each agent in a network makes a local observation that is linearly related to a set of public and private parameters. The agents send their observations to a fusion center to allow it to estimate the public parameters. To prevent leakage of the private parameters, each agent first sanitizes its local observation using a local privacy mechanism before transmitting it to the fusion center. We investigate the utility-privacy tradeoff in terms of the Cramér-Rao lower bounds for estimating the public and private parameters. We study the class of privacy mechanisms given by linear compression and noise perturbation, and derive necessary and sufficient conditions for achieving arbitrarily strong utility-privacy tradeoff in a multi-agent system for both the cases where prior information is available and unavailable, respectively. We also provide a method to find the maximum estimation privacy achievable without compromising the utility and propose an alternating algorithm to optimize the utility-privacy tradeoff in the case where arbitrarily strong utility-privacy tradeoff is not achievable.

preprint2020arXiv

BiO-Net: Learning Recurrent Bi-directional Connections for Encoder-Decoder Architecture

U-Net has become one of the state-of-the-art deep learning-based approaches for modern computer vision tasks such as semantic segmentation, super resolution, image denoising, and inpainting. Previous extensions of U-Net have focused mainly on the modification of its existing building blocks or the development of new functional modules for performance gains. As a result, these variants usually lead to an unneglectable increase in model complexity. To tackle this issue in such U-Net variants, in this paper, we present a novel Bi-directional O-shape network (BiO-Net) that reuses the building blocks in a recurrent manner without introducing any extra parameters. Our proposed bi-directional skip connections can be directly adopted into any encoder-decoder architecture to further enhance its capabilities in various task domains. We evaluated our method on various medical image analysis tasks and the results show that our BiO-Net significantly outperforms the vanilla U-Net as well as other state-of-the-art methods. Our code is available at https://github.com/tiangexiang/BiO-Net.

preprint2020arXiv

BrePartition: Optimized High-Dimensional kNN Search with Bregman Distances

Bregman distances (also known as Bregman divergences) are widely used in machine learning, speech recognition and signal processing, and kNN searches with Bregman distances have become increasingly important with the rapid advances of multimedia applications. Data in multimedia applications such as images and videos are commonly transformed into space of hundreds of dimensions. Such high-dimensional space has posed significant challenges for existing kNN search algorithms with Bregman distances, which could only handle data of medium dimensionality (typically less than 100). This paper addresses the urgent problem of high-dimensional kNN search with Bregman distances. We propose a novel partition-filter-refinement framework. Specifically, we propose an optimized dimensionality partitioning scheme to solve several non-trivial issues. First, an effective bound from each partitioned subspace to obtain exact kNN results is derived. Second, we conduct an in-depth analysis of the optimized number of partitions and devise an effective strategy for partitioning. Third, we design an efficient integrated index structure for all the subspaces together to accelerate the search processing. Moreover, we extend our exact solution to an approximate version by a trade-off between the accuracy and efficiency. Experimental results on four real-world datasets and two synthetic datasets show the clear advantage of our method in comparison to state-of-the-art algorithms.

preprint2020arXiv

Creating and manipulating interfacial spin with giant magnetic response in 4$f$ antiferromagnets

Creating and manipulating spin polarization in low-dimensional electron systems (such as two-dimensional electron gases) is fundamentally essential for spintronic applications, which is yet a challenge to date. In this work, we establish the metamagnetic phase diagram of 4$f$ antiferromagnetic TbScO$_3$ and reveal its giant magnetic response to sub-tesla magnetic field, which has not been reported thus far. Utilizing this giant magnetic response, we demonstrate that the spin polarization of two-dimensional electron gas in SrTiO$_3$/LaTiO$_3$/TbScO$_3$ heterostructure can be manipulated successfully in aid of interfacial 3\textit{d}-4\textit{f} exchange interaction. Remarkably, the hysteretic magnetoresistances of two-dimensional electron gas at the SrTiO$_3$/LaTiO$_3$ interface are entirely determined by the metamagnetic phase transitions of the underlying TbScO$_3$ substrate. Our results pave a novel route to engineer the spin polarization of low-dimensional electron systems in 4$f$ antiferromagnet-based heterostructures.

preprint2020arXiv

Design of an MISO-SWIPT-Aided Code-Index Modulated Multi-Carrier M-DCSK System for e-Health IoT

Code index modulated multi-carrier M-ary differential chaos shift keying (CIM-MC-M-DCSK) system not only inherits low-power and low-complexity advantages of the conventional DCSK system, but also significantly increases the transmission rate. This feature is of particular importance to Internet of Things (IoT) with trillions of low-cost devices. In particular, for e-health IoT applications, an efficient transmission scheme is designed to solve the challenge of the limited battery capacity for numerous user equipments served by one base station. In this paper, a new multiple-input-single-output simultaneous wireless information and power transfer (MISO-SWIPT) scheme for CIM-MC-M-DCSK system is proposed by utilizing orthogonal characteristic of chaotic signals with different initial values. The proposed system adopts power splitting mode, which is very promising for simultaneously providing energy and transmitting information of the user equipments without any external power supply. In particular, the new system can achieve desirable anti-multipath-fading capability without using channel estimator. Moreover, the analytical bit-error-rate expression of the proposed system is derived over multipath Rayleigh fading channels. Furthermore, the spectral efficiency and energy efficiency of the proposed system are analyzed. Simulation results not only validate the analytical expressions, but also demonstrate the superiority of the proposed system.

preprint2020arXiv

Detecting Problem Statements in Peer Assessments

Effective peer assessment requires students to be attentive to the deficiencies in the work they rate. Thus, their reviews should identify problems. But what ways are there to check that they do? We attempt to automate the process of deciding whether a review comment detects a problem. We use over 18,000 review comments that were labeled by the reviewees as either detecting or not detecting a problem with the work. We deploy several traditional machine-learning models, as well as neural-network models using GloVe and BERT embeddings. We find that the best performer is the Hierarchical Attention Network classifier, followed by the Bidirectional Gated Recurrent Units (GRU) Attention and Capsule model with scores of 93.1% and 90.5% respectively. The best non-neural network model was the support vector machine with a score of 89.71%. This is followed by the Stochastic Gradient Descent model and the Logistic Regression model with 89.70% and 88.98%.

preprint2020arXiv

Efficient Graph Generation with Graph Recurrent Attention Networks

We propose a new family of efficient and expressive deep generative models of graphs, called Graph Recurrent Attention Networks (GRANs). Our model generates graphs one block of nodes and associated edges at a time. The block size and sampling stride allow us to trade off sample quality for efficiency. Compared to previous RNN-based graph generative models, our framework better captures the auto-regressive conditioning between the already-generated and to-be-generated parts of the graph using Graph Neural Networks (GNNs) with attention. This not only reduces the dependency on node ordering but also bypasses the long-term bottleneck caused by the sequential nature of RNNs. Moreover, we parameterize the output distribution per block using a mixture of Bernoulli, which captures the correlations among generated edges within the block. Finally, we propose to handle node orderings in generation by marginalizing over a family of canonical orderings. On standard benchmarks, we achieve state-of-the-art time efficiency and sample quality compared to previous models. Additionally, we show our model is capable of generating large graphs of up to 5K nodes with good quality. To the best of our knowledge, GRAN is the first deep graph generative model that can scale to this size. Our code is released at: https://github.com/lrjconan/GRAN.

preprint2020arXiv

Gaussianization Flows

Iterative Gaussianization is a fixed-point iteration procedure that can transform any continuous random vector into a Gaussian one. Based on iterative Gaussianization, we propose a new type of normalizing flow model that enables both efficient computation of likelihoods and efficient inversion for sample generation. We demonstrate that these models, named Gaussianization flows, are universal approximators for continuous probability distributions under some regularity conditions. Because of this guaranteed expressivity, they can capture multimodal target distributions without compromising the efficiency of sample generation. Experimentally, we show that Gaussianization flows achieve better or comparable performance on several tabular datasets compared to other efficiently invertible flow models such as Real NVP, Glow and FFJORD. In particular, Gaussianization flows are easier to initialize, demonstrate better robustness with respect to different transformations of the training data, and generalize better on small training sets.

preprint2020arXiv

Gleason Score Prediction using Deep Learning in Tissue Microarray Image

Prostate cancer (PCa) is one of the most common cancers in men around the world. The most accurate method to evaluate lesion levels of PCa is microscopic inspection of stained biopsy tissue and estimate the Gleason score of tissue microarray (TMA) image by expert pathologists. However, it is time-consuming for pathologists to identify the cellular and glandular patterns for Gleason grading in large TMA images. We used Gleason2019 Challenge dataset to build a convolutional neural network (CNN) model to segment TMA images to regions of different Gleason grades and predict the Gleason score according to the grading segmentation. We used a pre-trained model of prostate segmentation to increase the accuracy of the Gleason grade segmentation. The model achieved a mean Dice of 75.6% on the test cohort and ranked 4th in the Gleason2019 Challenge with a score of 0.778 combined of Cohen's kappa and the f1-score.

preprint2020arXiv

Improving 3D Object Detection through Progressive Population Based Augmentation

Data augmentation has been widely adopted for object detection in 3D point clouds. However, all previous related efforts have focused on manually designing specific data augmentation methods for individual architectures. In this work, we present the first attempt to automate the design of data augmentation policies for 3D object detection. We introduce the Progressive Population Based Augmentation (PPBA) algorithm, which learns to optimize augmentation strategies by narrowing down the search space and adopting the best parameters discovered in previous iterations. On the KITTI 3D detection test set, PPBA improves the StarNet detector by substantial margins on the moderate difficulty category of cars, pedestrians, and cyclists, outperforming all current state-of-the-art single-stage detection models. Additional experiments on the Waymo Open Dataset indicate that PPBA continues to effectively improve the StarNet and PointPillars detectors on a 20x larger dataset compared to KITTI. The magnitude of the improvements may be comparable to advances in 3D perception architectures and the gains come without an incurred cost at inference time. In subsequent experiments, we find that PPBA may be up to 10x more data efficient than baseline 3D detection models without augmentation, highlighting that 3D detection models may achieve competitive accuracy with far fewer labeled examples.

preprint2020arXiv

Learning to Recommend with Multiple Cascading Behaviors

Most existing recommender systems leverage user behavior data of one type only, such as the purchase behavior in E-commerce that is directly related to the business KPI (Key Performance Indicator) of conversion rate. Besides the key behavioral data, we argue that other forms of user behaviors also provide valuable signal, such as views, clicks, adding a product to shop carts and so on. They should be taken into account properly to provide quality recommendation for users. In this work, we contribute a new solution named NMTR (short for Neural Multi-Task Recommendation) for learning recommender systems from user multi-behavior data. We develop a neural network model to capture the complicated and multi-type interactions between users and items. In particular, our model accounts for the cascading relationship among different types of behaviors (e.g., a user must click on a product before purchasing it). To fully exploit the signal in the data of multiple types of behaviors, we perform a joint optimization based on the multi-task learning framework, where the optimization on a behavior is treated as a task. Extensive experiments on two real-world datasets demonstrate that NMTR significantly outperforms state-of-the-art recommender systems that are designed to learn from both single-behavior data and multi-behavior data. Further analysis shows that modeling multiple behaviors is particularly useful for providing recommendation for sparse users that have very few interactions.

preprint2020arXiv

Permutation Invariant Graph Generation via Score-Based Generative Modeling

Learning generative models for graph-structured data is challenging because graphs are discrete, combinatorial, and the underlying data distribution is invariant to the ordering of nodes. However, most of the existing generative models for graphs are not invariant to the chosen ordering, which might lead to an undesirable bias in the learned distribution. To address this difficulty, we propose a permutation invariant approach to modeling graphs, using the recent framework of score-based generative modeling. In particular, we design a permutation equivariant, multi-channel graph neural network to model the gradient of the data distribution at the input graph (a.k.a., the score function). This permutation equivariant model of gradients implicitly defines a permutation invariant distribution for graphs. We train this graph neural network with score matching and sample from it with annealed Langevin dynamics. In our experiments, we first demonstrate the capacity of this new architecture in learning discrete graph algorithms. For graph generation, we find that our learning approach achieves better or comparable results to existing models on benchmark datasets.

preprint2020arXiv

Shape-Oriented Convolution Neural Network for Point Cloud Analysis

Point cloud is a principal data structure adopted for 3D geometric information encoding. Unlike other conventional visual data, such as images and videos, these irregular points describe the complex shape features of 3D objects, which makes shape feature learning an essential component of point cloud analysis. To this end, a shape-oriented message passing scheme dubbed ShapeConv is proposed to focus on the representation learning of the underlying shape formed by each local neighboring point. Despite this intra-shape relationship learning, ShapeConv is also designed to incorporate the contextual effects from the inter-shape relationship through capturing the long-ranged dependencies between local underlying shapes. This shape-oriented operator is stacked into our hierarchical learning architecture, namely Shape-Oriented Convolutional Neural Network (SOCNN), developed for point cloud analysis. Extensive experiments have been performed to evaluate its significance in the tasks of point cloud classification and part segmentation.

preprint2020arXiv

Toward Predicting Success and Failure in CS2: A Mixed-Method Analysis

Factors driving success and failure in CS1 are the subject of much study but less so for CS2. This paper investigates the transition from CS1 to CS2 in search of leading indicators of success in CS2. Both CS1 and CS2 at the University of North Carolina Wilmington (UNCW) are taught in Python with annual enrollments of 300 and 150 respectively. In this paper, we report on the following research questions: 1) Are CS1 grades indicators of CS2 grades? 2) Does a quantitative relationship exist between CS2 course grade and a modified version of the SCS1 concept inventory? 3) What are the most challenging aspects of CS2, and how well does CS1 prepare students for CS2 from the student's perspective? We provide a quantitative analysis of 2300 CS1 and CS2 course grades from 2013--2019. In Spring 2019, we administered a modified version of the SCS1 concept inventory to 44 students in the first week of CS2. Further, 69 students completed an exit questionnaire at the conclusion of CS2 to gain qualitative student feedback on their challenges in CS2 and on how well CS1 prepared them for CS2. We find that 56% of students' grades were lower in CS2 than CS1, 18% improved their grades, and 26% earned the same grade. Of the changes, 62% were within one grade point. We find a statistically significant correlation between the modified SCS1 score and CS2 grade points. Students identify linked lists and class/object concepts among the most challenging. Student feedback on CS2 challenges and the adequacy of their CS1 preparations identify possible avenues for improving the CS1-CS2 transition.

preprint2020arXiv

Training Deep Energy-Based Models with f-Divergence Minimization

Deep energy-based models (EBMs) are very flexible in distribution parametrization but computationally challenging because of the intractable partition function. They are typically trained via maximum likelihood, using contrastive divergence to approximate the gradient of the KL divergence between data and model distribution. While KL divergence has many desirable properties, other f-divergences have shown advantages in training implicit density generative models such as generative adversarial networks. In this paper, we propose a general variational framework termed f-EBM to train EBMs using any desired f-divergence. We introduce a corresponding optimization algorithm and prove its local convergence property with non-linear dynamical systems theory. Experimental results demonstrate the superiority of f-EBM over contrastive divergence, as well as the benefits of training EBMs using f-divergences other than KL.

preprint2020arXiv

Unsupervised Instance Segmentation in Microscopy Images via Panoptic Domain Adaptation and Task Re-weighting

Unsupervised domain adaptation (UDA) for nuclei instance segmentation is important for digital pathology, as it alleviates the burden of labor-intensive annotation and domain shift across datasets. In this work, we propose a Cycle Consistency Panoptic Domain Adaptive Mask R-CNN (CyC-PDAM) architecture for unsupervised nuclei segmentation in histopathology images, by learning from fluorescence microscopy images. More specifically, we first propose a nuclei inpainting mechanism to remove the auxiliary generated objects in the synthesized images. Secondly, a semantic branch with a domain discriminator is designed to achieve panoptic-level domain adaptation. Thirdly, in order to avoid the influence of the source-biased features, we propose a task re-weighting mechanism to dynamically add trade-off weights for the task-specific loss functions. Experimental results on three datasets indicate that our proposed method outperforms state-of-the-art UDA methods significantly, and demonstrates a similar performance as fully supervised methods.

preprint2019arXiv

Unconventional crystal field splitting in non-centrosymmetric BaTiO$_3$ thin films

Understanding the crystal field splitting and orbital polarization in non-centrosymmetric systems such as ferroelectric materials is fundamentally important. In this study, taking BaTiO$_3$ (BTO) as a representative material we investigate titanium crystal field splitting and orbital polarization in non-centrosymmetric TiO$_6$ octahedra with resonant X-ray linear dichroism at Ti $L_{2,3}$-edge. The high-quality BaTiO$_3$ thin films were deposited on DyScO$_3$ (110) single crystal substrates in a layer-by-layer way by pulsed laser deposition. The reflection high-energy electron diffraction (RHEED) and element specific X-ray absorption spectroscopy (XAS) were performed to characterize the structural and electronic properties of the films. In sharp contrast to conventional crystal field splitting and orbital configuration ($d_{xz}$/$d_{yz}$ $<$ $d_{xy}$ $<$ $d_{3z^2-r^2}$ $<$ $d_{x^2-y^2}$ or $d_{xy}$ $<$ $d_{xz}$/$d_{yz}$ $<$ $d_{x^2-y^2}$ $<$ $d_{3z^2-r^2}$) according to Jahn-Teller effect, it is revealed that $d_{xz}$, $d_{yz}$, and $d_{xy}$ orbitals are nearly degenerate, whereas $d_{3z^2-r^2}$ and $d_{x^2-y^2}$ orbitals are split with an energy gap $\sim$ 100 meV in the epitaxial BTO films. The unexpected degenerate states $d_{xz}$/$d_{yz}$/$d_{xy}$ are coupled to Ti-O displacements resulting from competition between polar and Jahn-Teller distortions in non-centrosymmetric TiO$_6$ octhedra of BTO films. Our results provide a route to manipulate orbital degree of freedom by switching electric polarization in ferroelectric materials.

preprint2016arXiv

Canonical correlation analysis of high-dimensional data with very small sample support

This paper is concerned with the analysis of correlation between two high-dimensional data sets when there are only few correlated signal components but the number of samples is very small, possibly much smaller than the dimensions of the data. In such a scenario, a principal component analysis (PCA) rank-reduction preprocessing step is commonly performed before applying canonical correlation analysis (CCA). We present simple, yet very effective approaches to the joint model-order selection of the number of dimensions that should be retained through the PCA step and the number of correlated signals. These approaches are based on reduced-rank versions of the Bartlett-Lawley hypothesis test and the minimum description length information-theoretic criterion. Simulation results show that the techniques perform well for very small sample sizes even in colored noise.

preprint2016arXiv

CHSalign: A Web Server That Builds upon Junction-Explorer and RNAJAG for Pairwise Alignment of RNA Secondary Structures with Coaxial Helical Stacking

RNA junctions are important structural elements of RNA molecules. They are formed when three or more helices come together in three-dimensional space. Recent studies have focused on the annotation and prediction of coaxial helical stacking (CHS) motifs within junctions. Here we exploit such predictions to develop an efficient alignment tool to handle RNA secondary structures with CHS motifs. Specifically, we build upon our Junction-Explorer software for predicting coaxial stacking and RNAJAG for modelling junction topologies as tree graphs to incorporate constrained tree matching and dynamic programming algorithms into a new method, called CHSalign, for aligning the secondary structures of RNA molecules containing CHS motifs. Thus, CHSalign is intended to be an efficient alignment tool for RNAs containing similar junctions. Experimental results based on thousands of alignments demonstrate that CHSalign can align two RNA secondary structures containing CHS motifs more accurately than other RNA secondary structure alignment tools. CHSalign yields a high score when aligning two RNA secondary structures with similar CHS motifs or helical arrangement patterns, and a low score otherwise. This new method has been implemented in a web server, and the program is also made freely available, at http://bioinformatics.njit.edu/CHSalign/.

preprint2016arXiv

Derivative Delay Embedding: Online Modeling of Streaming Time Series

The staggering amount of streaming time series coming from the real world calls for more efficient and effective online modeling solution. For time series modeling, most existing works make some unrealistic assumptions such as the input data is of fixed length or well aligned, which requires extra effort on segmentation or normalization of the raw streaming data. Although some literature claim their approaches to be invariant to data length and misalignment, they are too time-consuming to model a streaming time series in an online manner. We propose a novel and more practical online modeling and classification scheme, DDE-MGM, which does not make any assumptions on the time series while maintaining high efficiency and state-of-the-art performance. The derivative delay embedding (DDE) is developed to incrementally transform time series to the embedding space, where the intrinsic characteristics of data is preserved as recursive patterns regardless of the stream length and misalignment. Then, a non-parametric Markov geographic model (MGM) is proposed to both model and classify the pattern in an online manner. Experimental results demonstrate the effectiveness and superior classification accuracy of the proposed DDE-MGM in an online setting as compared to the state-of-the-art.

preprint2016arXiv

Fast control of semiconductor qubits beyond the rotating-wave approximation

We present a theoretical study of single-qubit operations by oscillatory fields on various semiconductor platforms. We explicitly show how to perform faster gate operations by going beyond the universally-used rotating wave approximation (RWA) regime, while using only two sinusoidal pulses. No complicated pulse shaping or optimal control sequences are required. We first show for specific published experiments how much error is currently incurred by implementing pulses designed using standard RWA. We then show that an even modest increase in gate speed would cause problems in using RWA for gate design in the singlet-triplet (ST) and resonant-exchange (RX) qubits. We discuss the extent to which analytically keeping higher orders in the perturbation theory would address the problem. More strikingly, we give a new prescription for gating with strong coupling far beyond the RWA regime. We perform numerical calculations for the phases and the durations of two consecutive pulses to realize the key Hadamard and $\fracπ{8}$ gates with coupling strengths up to several times the qubit splitting. Working in this manifestly non-RWA regime, the gate operation speeds up by two to three orders of magnitude.

preprint2016arXiv

Improving the Robustness of Deep Neural Networks via Stability Training

In this paper we address the issue of output instability of deep neural networks: small perturbations in the visual input can significantly distort the feature embeddings and output of a neural network. Such instability affects many deep architectures with state-of-the-art performance on a wide range of computer vision tasks. We present a general stability training method to stabilize deep networks against small input distortions that result from various types of common image processing, such as compression, rescaling, and cropping. We validate our method by stabilizing the state-of-the-art Inception architecture against these types of distortions. In addition, we demonstrate that our stabilized model gives robust state-of-the-art performance on large-scale near-duplicate detection, similar-image ranking, and classification on noisy datasets.

preprint2016arXiv

Kernel Bayesian Inference with Posterior Regularization

We propose a vector-valued regression problem whose solution is equivalent to the reproducing kernel Hilbert space (RKHS) embedding of the Bayesian posterior distribution. This equivalence provides a new understanding of kernel Bayesian inference. Moreover, the optimization problem induces a new regularization for the posterior embedding estimator, which is faster and has comparable performance to the squared regularization in kernel Bayes' rule. This regularization coincides with a former thresholding approach used in kernel POMDPs whose consistency remains to be established. Our theoretical work solves this open problem and provides consistency analysis in regression settings. Based on our optimizational formulation, we propose a flexible Bayesian posterior regularization framework which for the first time enables us to put regularization at the distribution level. We apply this method to nonparametric state-space filtering tasks with extremely nonlinear dynamics and show performance gains over all other baselines.

preprint2016arXiv

Training Deep Neural Networks via Direct Loss Minimization

Supervised training of deep neural nets typically relies on minimizing cross-entropy. However, in many domains, we are interested in performing well on metrics specific to the application. In this paper we propose a direct loss minimization approach to train deep neural networks, which provably minimizes the application-specific loss function. This is often non-trivial, since these functions are neither smooth nor decomposable and thus are not amenable to optimization with standard gradient-based methods. We demonstrate the effectiveness of our approach in the context of maximizing average precision for ranking problems. Towards this goal, we develop a novel dynamic programming algorithm that can efficiently compute the weight updates. Our approach proves superior to a variety of baselines in the context of action classification and object detection, especially in the presence of label noise.

preprint2015arXiv

Bayesian Matrix Completion via Adaptive Relaxed Spectral Regularization

Bayesian matrix completion has been studied based on a low-rank matrix factorization formulation with promising results. However, little work has been done on Bayesian matrix completion based on the more direct spectral regularization formulation. We fill this gap by presenting a novel Bayesian matrix completion method based on spectral regularization. In order to circumvent the difficulties of dealing with the orthonormality constraints of singular vectors, we derive a new equivalent form with relaxed constraints, which then leads us to design an adaptive version of spectral regularization feasible for Bayesian inference. Our Bayesian method requires no parameter tuning and can infer the number of latent factors automatically. Experiments on synthetic and real datasets demonstrate encouraging results on rank recovery and collaborative filtering, with notably good results for very sparse matrices.

preprint2015arXiv

Exact Tail Asymptotics --- Revisit of a Retrial Queue with Two Input Streams and Two Orbits

We revisit a single-server retrial queue with two independent Poisson streams (corresponding to two types of customers) and two orbits. The size of each orbit is infinite. The exponential server (with a rate independent of the type of customers) can hold at most one customer at a time and there is no waiting room. Upon arrival, if a type $i$ customer $(i=1,2)$ finds a busy server, it will join the type $i$ orbit. After an exponential time with a constant (retrial) rate $μ_i$, an type $i$ customer attempts to get service. This model has been recently studied by Avrachenkov, Nain and Yechiali~\cite{ANY2014} by solving a Riemann-Hilbert boundary value problem. One may notice that, this model is not a random walk in the quarter plane. Instead, it can be viewed as a random walk in the quarter plane modulated by a two-state Markov chain, or a two-dimensional quasi-birth-and-death (QBD) process. The special structure of this chain allows us to deal with the fundamental form corresponding to one state of the chain at a time, and therefore it can be studied through a boundary value problem. Inspired by this fact, in this paper, we focus on the tail asymptotic behaviour of the stationary joint probability distribution of the two orbits with either an idle or busy server by using the kernel method, a different one that does not require a full determination of the unknown generating function. To take advantage of existing literature results on the kernel method, we identify a censored random walk, which is an usual walk in the quarter plane. This technique can also be used for other random walks modulated by a finite-state Markov chain with a similar structure property.

preprint2015arXiv

Joint Channel Direction Information Quantization For Spatially Correlated 3D MIMO Channels

This paper proposes a codebook for jointly quantizing channel direction information (CDI) of spatially correlated three-dimensional (3D) multi-input-multi-output (MIMO) channels. To reduce the dimension for quantizing the CDI of large antenna arrays, we introduce a special structure to the codewords by using Tucker decomposition to exploit the unique features of 3D MIMO channels. Specifically, the codeword consists of four parts each with low dimension individually targeting at a different type of information: statistical CDIs in horizontal direction and in vertical direction, statistical power coupling, and instantaneous CDI. The proposed codebook avoids the redundancy led by existing independent CDI quantization. Analytical results provide a sufficient condition on 3D MIMO channels to show that the proposed codebook can achieve the same quantization performance as the well-known rotated codebook applied to the global channel CDI, but with significant reduction in the required statistical channel information. Simulation results validate our analysis and demonstrate that the proposed joint CDI quantization provides substantial performance gain over independent CDI quantization.

preprint2015arXiv

Low-Complexity Channel Reconstruction Methods Based on SVD-ZF Precoding in Massive 3D-MIMO Systems

In this paper, we study the low-complexity channel reconstruction methods for downlink precoding in massive multiple-Input multiple-Output (MIMO) systems. When the user is allocated less streams than the number of its antennas, the base station (BS) or user usually utilizes the singular value decomposition (SVD) to get the effective channels, whose dimension is equal to the number of streams. This process is called channel reconstruction and done in BS for time division duplex (TDD) mode. However, with the increasing of antennas in BS, the computation burden of SVD is getting incredible. Here, we propose a series of novel low-complexity channel reconstruction methods for downlink precoding in 3D spatial channel model. We consider different correlations between elevation and azimuth antennas, and divide the large dimensional matrix SVD into two kinds of small-size matrix SVD. The simulation results show that the proposed methods only produce less than 10% float computation than the traditional SVD zero-forcing (SVD-ZF) precoding method, while keeping nearly the same performance of 1Gbps.

preprint2015arXiv

Polarization analysis of excitons in monolayer and bilayer transition-metal dichalcogenides

The polarization analysis of optical transitions in monolayer and bilayer transition-metal dichalcogenides provides invaluable information on the spin and valley (pseudospin) degrees of freedom. To explain optical properties of a given monolayer transition-metal dichalcogenide, one should consider (i) the order of its spin-split conduction bands, (ii) whether intervalley scattering is prone to phonon bottleneck, (iii) and whether valley mixing by electron-hole exchange can take place. Using these principles, we present a consistent physical picture that elucidates a variety of features in the optical spectra of these materials. We explain the differences between optical transitions in monolayer MoSe$_2$ and monolayer WSe$_2$, finding that indirect excitons in the latter correspond to several low-energy optical transitions that so far were attributed to excitons bound to impurities. A possible mechanism that can explain the vanishing polarization in MoSe$_2$ is discussed. Finally, we consider the effect of an out-of-plane electric field, showing that it can reduce the initial polarization of bright excitons due to a Rashba-type coupling with dark excitons.

preprint2015arXiv

S-index: Towards Better Metrics for Quantifying Research Impact

The ongoing growth in the volume of scientific literature available today precludes researchers from efficiently discerning the relevant from irrelevant content. Researchers are constantly interested in impactful papers, authors and venues in their respective fields. Moreover, they are interested in the so-called recent "rising stars" of these contexts which may lead to attractive directions for future work, collaborations or impactful publication venues. In this work, we address the problem of quantifying research impact in each of these contexts, in order to better direct attention of researchers and streamline the processes of comparison, ranking and evaluation of contribution. Specifically, we begin by outlining intuitive underlying assumptions that impact quantification methods should obey and evaluate when current state-of-the-art methods fail to satisfy these properties. To this end, we introduce the s-index metric which quantifies research impact through influence propagation over a heterogeneous citation network. s-index is tailored from these intuitive assumptions and offers a number of desirable qualities including robustness, natural temporality and straightforward extensibility from the paper impact to broader author and venue impact contexts. We evaluate its effectiveness on the publicly available Microsoft Academic Search citation graph with over 119 million papers and 1 billion citation edges with 103 million and 21 thousand associated authors and venues respectively.

preprint2015arXiv

Success factors for Crowdfunding founders and funders

Crowdfunding has been used as one of the effective ways for entrepreneurs to raise funding especially in creative industries. Individuals as well as organizations are paying more attentions to the emergence of new crowdfunding platforms. In the Netherlands, the government is also trying to help artists access financial resources through crowdfunding platforms. This research aims at discovering the success factors for crowdfunding projects through crowdfunding platforms from both founders and funders perspective. We designed our own website for founders and funders to observe crowdfunding behaviors. Our research will contribute to crowdfunding success factors related to issues of trust and decision making and provide practical recommendations for practitioners and researchers.

preprint2015arXiv

The Implementation of Hadoop-based Crawler System and Graphlite-based PageRank-Calculation In Search Engine

Nowadays, the size of the Internet is experiencing rapid growth. As of December 2014, the number of global Internet websites has more than 1 billion and all kinds of information resources are integrated together on the Internet, however,the search engine is to be a necessary tool for all users to retrieve useful information from vast amounts of web data. Generally speaking, a complete search engine includes the crawler system, index building systems, sorting systems and retrieval system. At present there are many open source implementation of search engine, such as lucene, solr, katta, elasticsearch, solandra and so on. The crawler system and sorting system is indispensable for any kind of search engine and in order to guarantee its efficiency, the former needs to update crawled vast amounts of data and the latter requires real-time to build index on newly crawled web pages and calculae its corresponding PageRank value. It is unlikely to accomplish such huge computation tasks depending on a single hardware implementation of the crawler system and sorting system,from which aspect, the distributed cluster technology is brought to the front. In this paper, we use the hadoop Map - Reduce computing framework to implement a distributed crawler system, and use the GraphLite, a distributed synchronous graph-computing framework, to achieve the real-time computation in getting the PageRank value of the new crawled web page.

preprint2014arXiv

CFD Simulation of the NREL Phase VI Rotor

The simulation of the turbulent and potentially separating flow around a rotating, twisted, and tapered airfoil is a challenging task for CFD simulations. This paper describes CFD simulations of the NREL Phase VI turbine that was experimentally characterized in the 24.4m x 36.6m NREL/NASA Ames wind tunnel (Hand et al., 2001). All computations in this article are performed on the experimental base configuration of 0o yaw angle, 3o tip pitch angle, and a rotation rate of 72 rpm. The significance of specific mesh resolution regions to the accuracy of the CFD prediction is discussed. The ability of CFD to capture bulk quantities, such as the shaft torque, and the detailed flow characteristics, such as the surface pressure distributions, are explored for different inlet wind speeds. Finally, the significant three-dimensionality of the boundary layer flow is demonstrated.

preprint2014arXiv

Donor-driven spin relaxation in multi-valley semiconductors

We present a theory for spin relaxation of electrons due to scattering off the central-cell potential of impurities in silicon. Taking into account the multivalley nature of the conduction band and the violation of translation symmetry, the spin-flip amplitude is dominated by this short-range impurity scattering after which the electron is transferred to a valley on a different axis in $k$-space (the so called $f$-process). These $f$-processes dominate the spin relaxation at all temperatures, where scattering off the impurity central-cell dominate at low temperatures, and scattering with $Σ$-axis phonons at elevated temperatures. To the best of our knowledge, the theory is the first to explain and accurately quantify the empirically-found dependence of spin relaxation on the impurity identity. Accordingly, the new formalism fills a longstanding gap in the spin relaxation theory of $n$-type silicon, and it is valuable for characterization of silicon-based spintronic devices.

preprint2014arXiv

Exact tail asymptotics for a discrete-time preemptive priority queue

In this paper, we consider a discrete-time preemptive priority queue with different service rates for two classes of customers, one with high-priority and the other with low-priority. This model corresponds to the classical preemptive priority queueing system with two classes of independent Poisson customers and a single exponential server. Due to the possibility of customers' arriving and departing at the same time in a discrete-time queue, the model considered in this paper is more complicated. In this model, we focus on the characterization of exact tail asymptotics for the joint stationary distribution of the queue length of the two classes of customers, for the two boundary distributions and for the two marginal distributions, respectively. By using generating functions and kernel method, we get an explicit expression of exact tail asymptotics along the low-priority queue direction, as well as along the high-priority queue direction.

preprint2014arXiv

Exchange-driven magnetoresistance in silicon facilitated by electrical spin injection

We use electrical spin injection to probe exchange interactions in phosphorus doped silicon (Si:P). The detection is enabled by a magnetoresistance effect that demonstrates the efficiency of exchange in imprinting spin information from the magnetic lead onto the localized moments in the Si:P region. A unique Lorentzian-shaped signal existing only at low temperatures ($\lesssim 25 K$) is observed experimentally and analyzed theoretically in electrical Hanle effect measurement. It stems from spin-dependent scattering of electrons by neutral impurities in Si:P. The shape of this signal is not directly related to spin relaxation but to exchange interaction between spin-polarized electrons that are localized on adjacent impurities.

preprint2014arXiv

Impurity-assisted tunneling magnetoresistance under weak magnetic field

Injection of spins into semiconductors is essential for the integration of the spin functionality into conventional electronics. Insulating layers are often inserted between ferromagnetic metals and semiconductors for obtaining an efficient spin injection, and it is therefore crucial to distinguish between signatures of electrical spin injection and impurity-driven effects in the tunnel barrier. Here we demonstrate an impurity-assisted tunneling magnetoresistance effect in nonmagnetic-insulator-nonmagnetic and ferromagnetic-insulator-nonmagnetic tunnel barriers. In both cases, the effect reflects on/off switching of the tunneling current through impurity channels by the external magnetic field. The reported effect, which is universal for any impurity-assisted tunneling process, finally clarifies the controversy of a widely used technique that employs the same ferromagnetic electrode to inject and detect spin accumulation.

preprint2014arXiv

Learning Fine-grained Image Similarity with Deep Ranking

Learning fine-grained image similarity is a challenging task. It needs to capture between-class and within-class image differences. This paper proposes a deep ranking model that employs deep learning techniques to learn similarity metric directly from images.It has higher learning capability than models based on hand-crafted features. A novel multiscale network structure has been developed to describe the images effectively. An efficient triplet sampling algorithm is proposed to learn the model with distributed asynchronized stochastic gradient. Extensive experiments show that the proposed algorithm outperforms models based on hand-crafted visual features and deep classification models.

preprint2014arXiv

Magnetic-Field-Modulated Resonant Tunneling in Ferromagnetic-Insulator-Nonmagnetic junctions

We present a theory for resonance-tunneling magnetoresistance (MR) in Ferromagnetic-Insulator-Nonmagnetic junctions. The theory sheds light on many of the recent electrical spin injection experiments, suggesting that this MR effect rather than spin accumulation in the nonmagnetic channel corresponds to the electrically detected signal. We quantify the dependence of the tunnel current on the magnetic field by quantum rate equations derived from the Anderson impurity model, with important addition of impurity spin interactions. Considering the on-site Coulomb correlation, the MR effect is caused by competition between the field, spin interactions and coupling to the magnetic lead. By extending the theory, we present a basis for operation of novel nm-size memories.

preprint2014arXiv

Sprinklers: A Randomized Variable-Size Striping Approach to Reordering-Free Load-Balanced Switching

Internet traffic continues to grow exponentially, calling for switches that can scale well in both size and speed. While load-balanced switches can achieve such scalability, they suffer from a fundamental packet reordering problem. Existing proposals either suffer from poor worst-case packet delays or require sophisticated matching mechanisms. In this paper, we propose a new family of stable load-balanced switches called "Sprinklers" that has comparable implementation cost and performance as the baseline load-balanced switch, but yet can guarantee packet ordering. The main idea is to force all packets within the same virtual output queue (VOQ) to traverse the same "fat path" through the switch, so that packet reordering cannot occur. At the core of Sprinklers are two key innovations: a randomized way to determine the "fat path" for each VOQ, and a way to determine its "fatness" roughly in proportion to the rate of the VOQ. These innovations enable Sprinklers to achieve near-perfect load-balancing under arbitrary admissible traffic. Proving this property rigorously using novel worst-case large deviation techniques is another key contribution of this work.

preprint2014arXiv

Theoretical Details of Tunnel Magnetoresistance via inelastic hopping at regime $gμB\ll k_B T \ll eV$

Detailed theoretical derivation is given for the tunnel magnetoresistance via phonon-assisted hopping through an impurity chain under small magnetic field and a large bias window. This derivation provides a rigorous basis for the physical picture of Pauli blockade switch proposed in our previous paper (arXiv:1404.0633). This picture captures the competition of external magnetic field and internal spin interactions in the tunnel barrier, and relies critically on the strong on-site Coulomb correlation at the impurities. The master equations are obtained by deriving the equations of motion for the Green functions at the impurity sites in a slave-boson representation, and utilizing the so-called Langreth theorem to finally express the spin-dependent density matrix in terms of the equilibrium distributions of the contact electrons and of the phonon reservoir.

preprint2013arXiv

Probing the dynamic interference in molecular high-order harmonic generation through Bohmian trajectories

By using Bohmian trajectory method, we investigate the dynamic interference in diatomic molecular high-order harmonic generation progress. It is demonstrated that the main characteristics of the molecular harmonic spectrum can be well reproduced by only two Bohmain trajectories which are located respectively at the two ions. This is because these two localized trajectories can receive and store the whole collision information coming from all of the other recollision trajectories. Therefore, the amplitudes and frequencies of these two trajectories represent the intensity and frequency distribution of the harmonic generation. Moreover, the interference between these two trajectories shows a dip in the harmonic spectrum, which indicates the molecular structure information. ?

preprint2013arXiv

Strain effects on the spin-orbit induced band structure splittings in monolayer MoS2 and graphene

The strain effects on the spin-orbit induced splitting of the valence band maximum and conduction band minimum in monolayer MoS2 and the gap in graphene are calculated using first-principles calculations. The dependence of these splittings on the various symmetry types of strain is described by means of an effective Hamiltonian based on the method of invariants and the parameters in the model are extracted by fitting to the theory. These splittings are related to acoustic phonon deformation potentials, or electron-phonon coupling matrix elements which enter the spin-dependent scattering theory of conduction in these materials.

preprint2013arXiv

Transport Theory of Monolayer Transition-Metal Dichalcogenides through Symmetry

We present a theory that elucidates the major momentum and spin relaxation processes for electrons, holes and hot excitons in monolayer transition-metal dichalcogenides. We expand on spin flips induced by flexural phonons and show that the spin relaxation is ultrafast for electrons in free-standing membranes while being mitigated in supported membranes. This behavior due to interaction with flexural phonons is universal in two-dimensional membranes that respect mirror symmetry and it leads to a counterintuitive inverse relation between mobility and spin relaxation.

preprint2012arXiv

Analysis of phonon-induced spin relaxation processes in silicon

We study all of the leading-order contributions to spin relaxation of \textit{conduction} electrons in silicon due to the electron-phonon interaction. Using group theory, $k\cdot p$ perturbation method and rigid-ion model, we derive an extensive set of matrix element expressions for all of the important spin-flip transitions in the multi-valley conduction band. The scattering angle has an explicit dependence on the electron wavevectors, phonon polarization, valley position and spin orientation of the electron. Comparison of the derived analytical expressions with results of empirical pseudopotential and adiabatic band charge models shows excellent agreement.

preprint2012arXiv

Intrinsic Spin Lifetime of Conduction Electrons in Germanium

We investigate the intrinsic spin relaxation of conduction electrons in germanium due to electron-phonon scattering. We derive intravalley and intervalley spin-flip matrix elements for a general spin orientation and quantify the resulting anisotropy in spin relaxation. The form of the intravalley spin-flip matrix element is derived from the eigenstates of a compact spin-dependent $\mathbf{k}$$\cdot$$\mathbf{p}$ Hamiltonian in the vicinity of the $L$ point (where thermal electrons are populated in Ge). Spin lifetimes from analytical integrations of the intravalley and intervalley matrix elements show excellent agreement with independent results from elaborate numerical methods.

preprint2011arXiv

Controlling the spin orientation of photoexcited electrons by symmetry breaking

We study reflection of optically spin-oriented hot electrons as a means to probe the semiconductor crystal symmetry and its intimate relation with the spin-orbit coupling. The symmetry breaking by reflection manifests itself by tipping the net-spin vector of the photoexcited electrons out of the light propagation direction. The tipping angle and the pointing direction of the net-spin vector are set by the crystal-induced spin precession, momentum alignment and spin-momentum correlation of the initial photoexcited electron population. We examine non-magnetic semiconductor heterostructures and semiconductor/ferromagnet systems and show the unique signatures of these effects.

preprint2011arXiv

Reconfigurable nanoelectronics using graphene based spintronic logic gates

This paper presents a novel design concept for spintronic nanoelectronics that emphasizes a seamless integration of spin-based memory and logic circuits. The building blocks are magneto-logic gates based on a hybrid graphene/ferromagnet material system. We use network search engines as a technology demonstration vehicle and present a spin-based circuit design with smaller area, faster speed, and lower energy consumption than the state-of-the-art CMOS counterparts. This design can also be applied in applications such as data compression, coding and image recognition. In the proposed scheme, over 100 spin-based logic operations are carried out before any need for a spin-charge conversion. Consequently, supporting CMOS electronics requires little power consumption. The spintronic-CMOS integrated system can be implemented on a single 3-D chip. These nonvolatile logic circuits hold potential for a paradigm shift in computing applications.

preprint2011arXiv

Silicon spin communication

Recent experimental breakthroughs have demonstrated that the electron spin in silicon can be reliably injected and detected as well as transferred over distances exceeding 1 mm. We propose an on-chip communication paradigm which is based on modulating spin polarization of a constant current in silicon wires. We provide figures of merit for this scheme by studying spin relaxation and drift-diffusion models in silicon.

preprint2009arXiv

Spin transport theory in ferromagnet/semiconductor systems with non-collinear magnetization configurations

We present a comprehensive theory of spin transport in a non-degenerate semiconductor that is in contact with multiple ferromagnetic terminals. The spin dynamics in the semiconductor is studied during a perturbation of a general, non-collinear magnetization configuration and a method is shown to identify the various configurations from current signals. The conventional Landauer-Büttiker description for spin transport across Schottky contacts is generalized by the use of a non-linearized I-V relation, and it is extended by taking into account non-coherent transport mechanisms. The theory is used to analyze a three terminal lateral structure where a significant difference in the spin accumulation profile is found when comparing the results of this model with the conventional model.

Yang Song

What is connected

Connect this record

See the researcher in context

Building this map preview

91 published item(s)

A Universal and Robust Framework for Multiple Gas Recognition Based-on Spherical Normalization-Coupled Mahalanobis Algorithm

Edge Deep Learning in Computer Vision and Medical Diagnostics: A Comprehensive Survey

Reinforcement Learning for Follow-the-Leader Robotic Endoscopic Navigation via Synthetic Data

SPECTRA: Revealing the Full Spectrum of User Preferences via Distributional LLM Inference

TrajPrism: A Multi-Task Benchmark for Language-Grounded Urban Trajectory Understanding

When Rules Fall Short: Agent-Driven Discovery of Emerging Content Issues in Short Video Platforms

X-OmniClaw Technical Report: A Unified Mobile Agent for Multimodal Understanding and Interaction

SponTTS: modeling and transferring spontaneous style for TTS

EZInterviewer: To Improve Job Interview Performance with Mock Interview Generator

TractGraphCNN: anatomically informed graph CNN for classification using diffusion MRI tractography

A Model-Agnostic Causal Learning Framework for Recommendation using Search Data

A Wearable ECG Monitor for Deep Learning Based Real-Time Cardiovascular Disease Detection

Billion-user Customer Lifetime Value Prediction: An Industrial-scale Solution from Kuaishou

Computer-Aided Extraction of Select MRI Markers of Cerebral Small Vessel Disease: A Systematic Review

Decompose to Adapt: Cross-domain Object Detection via Feature Disentanglement

Density Ratio Estimation via Infinitesimal Classification

Disentangling Long and Short-Term Interests for Recommendation

DSNet: A Dual-Stream Framework for Weakly-Supervised Gigapixel Pathology Image Analysis

Explainability in Graph Neural Networks: An Experimental Survey

GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation

GrainSpace: A Large-scale Dataset for Fine-grained and Domain-adaptive Recognition of Cereal Grains

HybridGNN: Learning Hybrid Representation in Multiplex Heterogeneous Networks

Leveraging Search History for Improving Person-Job Fit

Modeling Two-Way Selection Preference for Person-Job Fit

Multiple Sclerosis Lesion Analysis in Brain Magnetic Resonance Images: Techniques and Clinical Applications

Partial Graph Reasoning for Neural Network Regularization

SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

Solving Inverse Problems in Medical Imaging with Score-Based Generative Models

SupWMA: Consistent and Efficient Tractography Parcellation of Superficial White Matter with Deep Learning

Toward Consistent and Efficient Map-based Visual-inertial Localization: Theory Framework and Filter Design

Towards Bi-directional Skip Connections in Encoder-Decoder Architectures and Beyond

White Matter Tracts are Point Clouds: Neuropsychological Score Prediction and Critical Region Localization via Geometric Deep Learning

Anytime Sampling for Autoregressive Models via Ordered Autoencoding

CellTrack R-CNN: A Novel End-To-End Deep Neural Network for Cell Segmentation and Tracking in Microscopy Images

Crystal orientation-dependent oxidation of epitaxial TiN films with tunable plasmonics

Extremely Small BERT Models from Mixed-Vocabulary Training

How to Train Your Energy-Based Models

Panoptic Feature Fusion Net: A Novel Instance Segmentation Paradigm for Biomedical and Biological Images

Score-Based Generative Modeling through Stochastic Differential Equations

Single Neuron Segmentation using Graph-based Global Reasoning with Auxiliary Skeleton Loss from 3D Optical Microscope Images

The effect of viscosity and resistivity on Rayleigh-Taylor instability induced mixing in magnetized high energy density plasmas

Arbitrarily Strong Utility-Privacy Tradeoff in Multi-Agent Systems

BiO-Net: Learning Recurrent Bi-directional Connections for Encoder-Decoder Architecture

BrePartition: Optimized High-Dimensional kNN Search with Bregman Distances

Creating and manipulating interfacial spin with giant magnetic response in 4$f$ antiferromagnets

Design of an MISO-SWIPT-Aided Code-Index Modulated Multi-Carrier M-DCSK System for e-Health IoT

Detecting Problem Statements in Peer Assessments

Efficient Graph Generation with Graph Recurrent Attention Networks

Gaussianization Flows

Gleason Score Prediction using Deep Learning in Tissue Microarray Image

Improving 3D Object Detection through Progressive Population Based Augmentation

Learning to Recommend with Multiple Cascading Behaviors

Permutation Invariant Graph Generation via Score-Based Generative Modeling

Shape-Oriented Convolution Neural Network for Point Cloud Analysis

Toward Predicting Success and Failure in CS2: A Mixed-Method Analysis

Training Deep Energy-Based Models with f-Divergence Minimization

Unsupervised Instance Segmentation in Microscopy Images via Panoptic Domain Adaptation and Task Re-weighting

Unconventional crystal field splitting in non-centrosymmetric BaTiO$_3$ thin films

Canonical correlation analysis of high-dimensional data with very small sample support

CHSalign: A Web Server That Builds upon Junction-Explorer and RNAJAG for Pairwise Alignment of RNA Secondary Structures with Coaxial Helical Stacking

Derivative Delay Embedding: Online Modeling of Streaming Time Series

Fast control of semiconductor qubits beyond the rotating-wave approximation

Improving the Robustness of Deep Neural Networks via Stability Training

Kernel Bayesian Inference with Posterior Regularization

Training Deep Neural Networks via Direct Loss Minimization

Bayesian Matrix Completion via Adaptive Relaxed Spectral Regularization

Exact Tail Asymptotics --- Revisit of a Retrial Queue with Two Input Streams and Two Orbits

Joint Channel Direction Information Quantization For Spatially Correlated 3D MIMO Channels

Low-Complexity Channel Reconstruction Methods Based on SVD-ZF Precoding in Massive 3D-MIMO Systems

Polarization analysis of excitons in monolayer and bilayer transition-metal dichalcogenides

S-index: Towards Better Metrics for Quantifying Research Impact

Success factors for Crowdfunding founders and funders

The Implementation of Hadoop-based Crawler System and Graphlite-based PageRank-Calculation In Search Engine

CFD Simulation of the NREL Phase VI Rotor