Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
18works
0followers
18topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

18 published item(s)

preprint2026arXiv

Causal Invariance Learning via Efficient Nonconvex Optimization

Identifying the causal relationship among variables from observational data is an important yet challenging task. This work focuses on identifying the direct causes of an outcome and estimating their magnitude, i.e., learning the causal outcome model. Data from multiple environments provide valuable opportunities to uncover causality by exploiting the invariance principle that the causal outcome model holds across heterogeneous environments. Based on the invariance principle, we propose the Negative Weighted Distributionally Robust Optimization (NegDRO) framework to learn an invariant prediction model. NegDRO minimizes the worst-case combination of risks across multiple environments and enforces invariance by allowing potential negative weights. Under the additive interventions regime, we establish three major contributions: (i) On the statistical side, we provide sufficient and nearly necessary identification conditions under which the invariant prediction model coincides with the causal outcome model; (ii) On the optimization side, despite the nonconvexity of NegDRO, we establish its benign optimization landscape, where all stationary points lie close to the true causal outcome model; (iii) On the computational side, we develop a gradient-based algorithm that provably converges to the causal outcome model, with non-asymptotic convergence rates in both sample size and gradient-descent iterations. In particular, our method avoids exhaustive combinatorial searches over exponentially many subsets of covariates found in the literature, ensuring scalability even when the dimension of the covariates is large. To our knowledge, this is the first causal invariance learning method that finds the approximate global optimality for a nonconvex optimization problem efficiently.

preprint2026arXiv

EmoTransCap: Dataset and Pipeline for Emotion Transition-Aware Speech Captioning in Discourses

Emotion perception and adaptive expression are fundamental capabilities in human-agent interaction. While recent advances in speech emotion captioning (SEC) have improved fine-grained emotional modeling, existing systems remain limited to static, single-emotion characterization within isolated sentences, neglecting dynamic emotional transitions at the discourse level. To address this gap, we propose Emotion Transition-Aware Speech Captioning (EmoTransCap), a paradigm that integrates temporal emotion dynamics with discourse-level speech description. To construct a dataset rich in emotion transitions while enabling scalable expansion, we design an automated pipeline for dataset creation. This is the first large-scale dataset explicitly designed to capture discourse-level emotion transitions. To generate semantically rich descriptions, we incorporate acoustic attributes and temporal cues from discourse-level speech. Our Multi-Task Emotion Transition Recognition (MTETR) model performs joint emotion transition detection and diarization. Leveraging the semantic analysis capabilities of LLMs, we produce two annotation versions: descriptive and instruction-oriented. These data and annotations offer a valuable resource for advancing emotion perception and emotional expressiveness. The dataset enables speech captions that capture emotional transitions, facilitating temporal-dynamic and fine-grained emotion understanding. We also introduce a controllable, transition-aware emotional speech synthesis system at the discourse level, enhancing anthropomorphic emotional expressiveness and supporting emotionally intelligent conversational agents.

preprint2023arXiv

CRYPTEXT: Database and Interactive Toolkit of Human-Written Text Perturbations in the Wild

User-generated textual contents on the Internet are often noisy, erroneous, and not in correct forms in grammar. In fact, some online users choose to express their opinions online through carefully perturbed texts, especially in controversial topics (e.g., politics, vaccine mandate) or abusive contexts (e.g., cyberbullying, hate-speech). However, to the best of our knowledge, there is no framework that explores these online ``human-written" perturbations (as opposed to algorithm-generated perturbations). Therefore, we introduce an interactive system called CRYPTEXT. CRYPTEXT is a data-intensive application that provides the users with a database and several tools to extract and interact with human-written perturbations. Specifically, CRYPTEXT helps look up, perturb, and normalize (i.e., de-perturb) texts. CRYPTEXT also provides an interactive interface to monitor and analyze text perturbations online. A short demo video is available at: https://youtu.be/8WT3G8xjIoI

preprint2023arXiv

Deep-Reinforcement-Learning-based Path Planning for Industrial Robots using Distance Sensors as Observation

Industrial robots are widely used in various manufacturing environments due to their efficiency in doing repetitive tasks such as assembly or welding. A common problem for these applications is to reach a destination without colliding with obstacles or other robot arms. Commonly used sampling-based path planning approaches such as RRT require long computation times, especially in complex environments. Furthermore, the environment in which they are employed needs to be known beforehand. When utilizing the approaches in new environments, a tedious engineering effort in setting hyperparameters needs to be conducted, which is time- and cost-intensive. On the other hand, Deep Reinforcement Learning has shown remarkable results in dealing with unknown environments, generalizing new problem instances, and solving motion planning problems efficiently. On that account, this paper proposes a Deep-Reinforcement-Learning-based motion planner for robotic manipulators. We evaluated our model against state-of-the-art sampling-based planners in several experiments. The results show the superiority of our planner in terms of path length and execution time.

preprint2022arXiv

Advancing Deep Residual Learning by Solving the Crux of Degradation in Spiking Neural Networks

Despite the rapid progress of neuromorphic computing, the inadequate depth and the resulting insufficient representation power of spiking neural networks (SNNs) severely restrict their application scope in practice. Residual learning and shortcuts have been evidenced as an important approach for training deep neural networks, but rarely did previous work assess their applicability to the characteristics of spike-based communication and spatiotemporal dynamics. This negligence leads to impeded information flow and the accompanying degradation problem. In this paper, we identify the crux and then propose a novel residual block for SNNs, which is able to significantly extend the depth of directly trained SNNs, e.g., up to 482 layers on CIFAR-10 and 104 layers on ImageNet, without observing any slight degradation problem. We validate the effectiveness of our methods on both frame-based and neuromorphic datasets, and our SRM-ResNet104 achieves a superior result of 76.02% accuracy on ImageNet, the first time in the domain of directly trained SNNs. The great energy efficiency is estimated and the resulting networks need on average only one spike per neuron for classifying an input sample. We believe our powerful and scalable modeling will provide a strong support for further exploration of SNNs.

preprint2022arXiv

BSDGAN: Balancing Sensor Data Generative Adversarial Networks for Human Activity Recognition

The development of IoT technology enables a variety of sensors can be integrated into mobile devices. Human Activity Recognition (HAR) based on sensor data has become an active research topic in the field of machine learning and ubiquitous computing. However, due to the inconsistent frequency of human activities, the amount of data for each activity in the human activity dataset is imbalanced. Considering the limited sensor resources and the high cost of manually labeled sensor data, human activity recognition is facing the challenge of highly imbalanced activity datasets. In this paper, we propose Balancing Sensor Data Generative Adversarial Networks (BSDGAN) to generate sensor data for minority human activities. The proposed BSDGAN consists of a generator model and a discriminator model. Considering the extreme imbalance of human activity dataset, an autoencoder is employed to initialize the training process of BSDGAN, ensure the data features of each activity can be learned. The generated activity data is combined with the original dataset to balance the amount of activity data across human activity classes. We deployed multiple human activity recognition models on two publicly available imbalanced human activity datasets, WISDM and UNIMIB. Experimental results show that the proposed BSDGAN can effectively capture the data features of real human activity sensor data, and generate realistic synthetic sensor data. Meanwhile, the balanced activity dataset can effectively help the activity recognition model to improve the recognition accuracy.

preprint2022arXiv

MetaCon: Unified Predictive Segments System with Trillion Concept Meta-Learning

Accurate understanding of users in terms of predicative segments play an essential role in the day to day operation of modern internet enterprises. Nevertheless, there are significant challenges that limit the quality of data, especially on long tail predictive tasks. In this work, we present MetaCon, our unified predicative segments system with scalable, trillion concepts meta learning that addresses these challenges. It builds on top of a flat concept representation that summarizes entities' heterogeneous digital footprint, jointly considers the entire spectrum of predicative tasks as a single learning task, and leverages principled meta learning approach with efficient first order meta-optimization procedure under a provable performance guarantee in order to solve the learning task. Experiments on both proprietary production datasets and public structured learning tasks demonstrate that MetaCon can lead to substantial improvements over state of the art recommendation and ranking approaches.

preprint2022arXiv

MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset

Text-to-Speech (TTS) synthesis for low-resource languages is an attractive research issue in academia and industry nowadays. Mongolian is the official language of the Inner Mongolia Autonomous Region and a representative low-resource language spoken by over 10 million people worldwide. However, there is a relative lack of open-source datasets for Mongolian TTS. Therefore, we make public an open-source multi-speaker Mongolian TTS dataset, named MnTTS2, for the benefit of related researchers. In this work, we prepare the transcription from various topics and invite three professional Mongolian announcers to form a three-speaker TTS dataset, in which each announcer records 10 hours of speeches in Mongolian, resulting 30 hours in total. Furthermore, we build the baseline system based on the state-of-the-art FastSpeech2 model and HiFi-GAN vocoder. The experimental results suggest that the constructed MnTTS2 dataset is sufficient to build robust multi-speaker TTS models for real-world applications. The MnTTS2 dataset, training recipe, and pretrained models are released at: \url{https://github.com/ssmlkl/MnTTS2}

preprint2022arXiv

Perturbations in the Wild: Leveraging Human-Written Text Perturbations for Realistic Adversarial Attack and Defense

We proposes a novel algorithm, ANTHRO, that inductively extracts over 600K human-written text perturbations in the wild and leverages them for realistic adversarial attack. Unlike existing character-based attacks which often deductively hypothesize a set of manipulation strategies, our work is grounded on actual observations from real-world texts. We find that adversarial texts generated by ANTHRO achieve the best trade-off between (1) attack success rate, (2) semantic preservation of the original text, and (3) stealthiness--i.e. indistinguishable from human writings hence harder to be flagged as suspicious. Specifically, our attacks accomplished around 83% and 91% attack success rates on BERT and RoBERTa, respectively. Moreover, it outperformed the TextBugger baseline with an increase of 50% and 40% in terms of semantic preservation and stealthiness when evaluated by both layperson and professional human workers. ANTHRO can further enhance a BERT classifier's performance in understanding different variations of human-written toxic texts via adversarial training when compared to the Perspective API.

preprint2022arXiv

Rethinking Pretraining as a Bridge from ANNs to SNNs

Spiking neural networks (SNNs) are known as a typical kind of brain-inspired models with their unique features of rich neuronal dynamics, diverse coding schemes and low power consumption properties. How to obtain a high-accuracy model has always been the main challenge in the field of SNN. Currently, there are two mainstream methods, i.e., obtaining a converted SNN through converting a well-trained Artificial Neural Network (ANN) to its SNN counterpart or training an SNN directly. However, the inference time of a converted SNN is too long, while SNN training is generally very costly and inefficient. In this work, a new SNN training paradigm is proposed by combining the concepts of the two different training methods with the help of the pretrain technique and BP-based deep SNN training mechanism. We believe that the proposed paradigm is a more efficient pipeline for training SNNs. The pipeline includes pipeS for static data transfer tasks and pipeD for dynamic data transfer tasks. SOTA results are obtained in a large-scale event-driven dataset ES-ImageNet. For training acceleration, we achieve the same (or higher) best accuracy as similar LIF-SNNs using 1/10 training time on ImageNet-1K and 2/5 training time on ES-ImageNet and also provide a time-accuracy benchmark for a new dataset ES-UCF101. These experimental results reveal the similarity of the functions of parameters between ANNs and SNNs and also demonstrate the various potential applications of this SNN training pipeline.

preprint2022arXiv

SuperCone: Unified User Segmentation over Heterogeneous Experts via Concept Meta-learning

We study the problem of user segmentation: given a set of users and one or more predefined groups or segments, assign users to their corresponding segments. As an example, for a segment indicating particular interest in a certain area of sports or entertainment, the task will be to predict whether each single user will belong to the segment. However, there may exist numerous long tail prediction tasks that suffer from data availability and may be of heterogeneous nature, which make it hard to capture using single off the shelf model architectures. In this work, we present SuperCone, our unified predicative segments system that addresses the above challenges. It builds on top of a flat concept representation that summarizes each user's heterogeneous digital footprints, and uniformly models each of the prediction task using an approach called "super learning ", that is, combining prediction models with diverse architectures or learning method that are not compatible with each other. Following this, we provide an end to end approach that learns to flexibly attend to best suited heterogeneous experts adaptively, while at the same time incorporating deep representations of the input concepts that augments the above experts. Experiments show that SuperCone significantly outperform state-of-the-art recommendation and ranking algorithms on a wide range of predicative segment tasks and public structured data learning benchmarks.

preprint2021arXiv

What's in a Name? -- Gender Classification of Names with Character Based Machine Learning Models

Gender information is no longer a mandatory input when registering for an account at many leading Internet companies. However, prediction of demographic information such as gender and age remains an important task, especially in intervention of unintentional gender/age bias in recommender systems. Therefore it is necessary to infer the gender of those users who did not to provide this information during registration. We consider the problem of predicting the gender of registered users based on their declared name. By analyzing the first names of 100M+ users, we found that genders can be very effectively classified using the composition of the name strings. We propose a number of character based machine learning models, and demonstrate that our models are able to infer the gender of users with much higher accuracy than baseline models. Moreover, we show that using the last names in addition to the first names improves classification performance further.

preprint2020arXiv

A Deep Structural Model for Analyzing Correlated Multivariate Time Series

Multivariate time series are routinely encountered in real-world applications, and in many cases, these time series are strongly correlated. In this paper, we present a deep learning structural time series model which can (i) handle correlated multivariate time series input, and (ii) forecast the targeted temporal sequence by explicitly learning/extracting the trend, seasonality, and event components. The trend is learned via a 1D and 2D temporal CNN and LSTM hierarchical neural net. The CNN-LSTM architecture can (i) seamlessly leverage the dependency among multiple correlated time series in a natural way, (ii) extract the weighted differencing feature for better trend learning, and (iii) memorize the long-term sequential pattern. The seasonality component is approximated via a non-liner function of a set of Fourier terms, and the event components are learned by a simple linear function of regressor encoding the event dates. We compare our model with several state-of-the-art methods through a comprehensive set of experiments on a variety of time series data sets, such as forecasts of Amazon AWS Simple Storage Service (S3) and Elastic Compute Cloud (EC2) billings, and the closing prices for corporate stocks in the same category.

preprint2020arXiv

Comparing to Learn: Surpassing ImageNet Pretraining on Radiographs By Comparing Image Representations

In deep learning era, pretrained models play an important role in medical image analysis, in which ImageNet pretraining has been widely adopted as the best way. However, it is undeniable that there exists an obvious domain gap between natural images and medical images. To bridge this gap, we propose a new pretraining method which learns from 700k radiographs given no manual annotations. We call our method as Comparing to Learn (C2L) because it learns robust features by comparing different image representations. To verify the effectiveness of C2L, we conduct comprehensive ablation studies and evaluate it on different tasks and datasets. The experimental results on radiographs show that C2L can outperform ImageNet pretraining and previous state-of-the-art approaches significantly. Code and models are available.

preprint2020arXiv

Comprehensive SNN Compression Using ADMM Optimization and Activity Regularization

As well known, the huge memory and compute costs of both artificial neural networks (ANNs) and spiking neural networks (SNNs) greatly hinder their deployment on edge devices with high efficiency. Model compression has been proposed as a promising technique to improve the running efficiency via parameter and operation reduction. Whereas, this technique is mainly practiced in ANNs rather than SNNs. It is interesting to answer how much an SNN model can be compressed without compromising its functionality, where two challenges should be addressed: i) the accuracy of SNNs is usually sensitive to model compression, which requires an accurate compression methodology; ii) the computation of SNNs is event-driven rather than static, which produces an extra compression dimension on dynamic spikes. To this end, we realize a comprehensive SNN compression through three steps. First, we formulate the connection pruning and weight quantization as a constrained optimization problem. Second, we combine spatio-temporal backpropagation (STBP) and alternating direction method of multipliers (ADMM) to solve the problem with minimum accuracy loss. Third, we further propose activity regularization to reduce the spike events for fewer active operations. These methods can be applied in either a single way for moderate compression or a joint way for aggressive compression. We define several quantitative metrics to evaluation the compression performance for SNNs. Our methodology is validated in pattern recognition tasks over MNIST, N-MNIST, CIFAR10, and CIFAR100 datasets, where extensive comparisons, analyses, and insights are provided. To our best knowledge, this is the first work that studies SNN compression in a comprehensive manner by exploiting all compressible components and achieves better results.

preprint2020arXiv

Embedding Task Knowledge into 3D Neural Networks via Self-supervised Learning

Deep learning highly relies on the amount of annotated data. However, annotating medical images is extremely laborious and expensive. To this end, self-supervised learning (SSL), as a potential solution for deficient annotated data, attracts increasing attentions from the community. However, SSL approaches often design a proxy task that is not necessarily related to target task. In this paper, we propose a novel SSL approach for 3D medical image classification, namely Task-related Contrastive Prediction Coding (TCPC), which embeds task knowledge into training 3D neural networks. The proposed TCPC first locates the initial candidate lesions via supervoxel estimation using simple linear iterative clustering. Then, we extract features from the sub-volume cropped around potential lesion areas, and construct a calibrated contrastive predictive coding scheme for self-supervised learning. Extensive experiments are conducted on public and private datasets. The experimental results demonstrate the effectiveness of embedding lesion-related prior-knowledge into neural networks for 3D medical image classification.

preprint2020arXiv

Large-scale Gender/Age Prediction of Tumblr Users

Tumblr, as a leading content provider and social media, attracts 371 million monthly visits, 280 million blogs and 53.3 million daily posts. The popularity of Tumblr provides great opportunities for advertisers to promote their products through sponsored posts. However, it is a challenging task to target specific demographic groups for ads, since Tumblr does not require user information like gender and ages during their registration. Hence, to promote ad targeting, it is essential to predict user's demography using rich content such as posts, images and social connections. In this paper, we propose graph based and deep learning models for age and gender predictions, which take into account user activities and content features. For graph based models, we come up with two approaches, network embedding and label propagation, to generate connection features as well as directly infer user's demography. For deep learning models, we leverage convolutional neural network (CNN) and multilayer perceptron (MLP) to prediction users' age and gender. Experimental results on real Tumblr daily dataset, with hundreds of millions of active users and billions of following relations, demonstrate that our approaches significantly outperform the baseline model, by improving the accuracy relatively by 81% for age, and the AUC and accuracy by 5\% for gender.

preprint2020arXiv

Multi-Modality Generative Adversarial Networks with Tumor Consistency Loss for Brain MR Image Synthesis

Magnetic Resonance (MR) images of different modalities can provide complementary information for clinical diagnosis, but whole modalities are often costly to access. Most existing methods only focus on synthesizing missing images between two modalities, which limits their robustness and efficiency when multiple modalities are missing. To address this problem, we propose a multi-modality generative adversarial network (MGAN) to synthesize three high-quality MR modalities (FLAIR, T1 and T1ce) from one MR modality T2 simultaneously. The experimental results show that the quality of the synthesized images by our proposed methods is better than the one synthesized by the baseline model, pix2pix. Besides, for MR brain image synthesis, it is important to preserve the critical tumor information in the generated modalities, so we further introduce a multi-modality tumor consistency loss to MGAN, called TC-MGAN. We use the synthesized modalities by TC-MGAN to boost the tumor segmentation accuracy, and the results demonstrate its effectiveness.