Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
30works
0followers
15topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

30 published item(s)

preprint2026arXiv

EndPrompt: Efficient Long-Context Extension via Terminal Anchoring

Extending the context window of large language models typically requires training on sequences at the target length, incurring quadratic memory and computational costs that make long-context adaptation expensive and difficult to reproduce. We propose EndPrompt, a method that achieves effective context extension using only short training sequences. The core insight is that exposing a model to long-range relative positional distances does not require constructing full-length inputs: we preserve the original short context as an intact first segment and append a brief terminal prompt as a second segment, assigning it positional indices near the target context length. This two-segment construction introduces both local and long-range relative distances within a short physical sequence while maintaining the semantic continuity of the training text--a property absent in chunk-based simulation approaches that split contiguous context. We provide a theoretical analysis grounded in Rotary Position Embedding and the Bernstein inequality, showing that position interpolation induces a rigorous smoothness constraint over the attention function, with shared Transformer parameters further suppressing unstable extrapolation to unobserved intermediate distances. Applied to LLaMA-family models extending the context window from 8K to 64K, EndPrompt achieves an average RULER score of 76.03 and the highest average on LongBench, surpassing LCEG (72.24), LongLoRA (72.95), and full-length fine-tuning (69.23) while requiring substantially less computation. These results demonstrate that long-context generalization can be induced from sparse positional supervision, challenging the prevailing assumption that dense long-sequence training is necessary for reliable context-window extension. The code is available at https://github.com/clx1415926/EndPrompt.

preprint2026arXiv

Evaluating LLM-based Agents for Multi-Turn Conversations: A Survey

This survey examines evaluation methods for large language model (LLM)-based agents in multi-turn conversational settings. Using a PRISMA-inspired framework, we systematically reviewed nearly 250 scholarly sources, capturing the state of the art from various venues of publication, and establishing a solid foundation for our analysis. Our study offers a structured approach by developing two interrelated taxonomy systems: one that defines \emph{what to evaluate} and another that explains \emph{how to evaluate}. The first taxonomy identifies key components of LLM-based agents for multi-turn conversations and their evaluation dimensions, including task completion, response quality, user experience, memory and context retention, as well as planning and tool integration. These components ensure that the performance of conversational agents is assessed in a holistic and meaningful manner. The second taxonomy system focuses on the evaluation methodologies. It categorizes approaches into annotation-based evaluations, automated metrics, hybrid strategies that combine human assessments with quantitative measures, and self-judging methods utilizing LLMs. This framework not only captures traditional metrics derived from language understanding, such as BLEU and ROUGE scores, but also incorporates advanced techniques that reflect the dynamic, interactive nature of multi-turn dialogues.

preprint2026arXiv

FlexSpec: Frozen Drafts Meet Evolving Targets in Edge-Cloud Collaborative LLM Speculative Decoding

Deploying large language models (LLMs) in mobile and edge computing environments is constrained by limited on-device resources, scarce wireless bandwidth, and frequent model evolution. Although edge-cloud collaborative inference with speculative decoding (SD) can reduce end-to-end latency by executing a lightweight draft model at the edge and verifying it with a cloud-side target model, existing frameworks fundamentally rely on tight coupling between the two models. Consequently, repeated model synchronization introduces excessive communication overhead, increasing end-to-end latency, and ultimately limiting the scalability of SD in edge environments. To address these limitations, we propose FlexSpec, a communication-efficient collaborative inference framework tailored for evolving edge-cloud systems. The core design of FlexSpec is a shared-backbone architecture that allows a single and static edge-side draft model to remain compatible with a large family of evolving cloud-side target models. By decoupling edge deployment from cloud-side model updates, FlexSpec eliminates the need for edge-side retraining or repeated model downloads, substantially reducing communication and maintenance costs. Furthermore, to accommodate time-varying wireless conditions and heterogeneous device constraints, we develop a channel-aware adaptive speculation mechanism that dynamically adjusts the speculative draft length based on real-time channel state information and device energy budgets. Extensive experiments demonstrate that FlexSpec achieves superior performance compared to conventional SD approaches in terms of inference efficiency.

preprint2026arXiv

Measuring Maximum Activations in Open Large Language Models

The dynamic range of activations is a first-order constraint for low-bit quantization, activation scaling, and stable LLM inference. Prior work characterized outlier features and massive activations on pre-2024 LLaMA-style models, and the downstream activation-quantization stack inherits that picture without revisiting it for the post-LLaMA open-model boom. We ask the deployment-oriented question: how large can activations get in modern open LLMs, and how does this magnitude vary across families, generations, and training stages? Under a unified pipeline (5,000-sample multi-domain corpus, family-specific tokenization, identical hooks across embeddings, hidden states, attention, MLP/MoE, SwiGLU gates, and final norm), we measure global and layerwise maxima on 27 checkpoints from 8 open families spanning dense, MoE, vision-language, intermediate-training, and instruction-tuned variants. We find that (i) global maxima span over nearly four orders of magnitude at comparable parameter counts, with Qwen3.5 and MoE checkpoints in the 10^2 to 10^3 range and Gemma3-27B-it reaching ~7 x 10^5; (ii) cross-family and cross-generation comparisons break simple monotonic scaling; and (iii) MoE checkpoints exhibit 14.0-23.4x lower peaks than matched-scale dense counterparts, while the residual stream carries the global maximum in 22/24 checkpoints. A lightweight INT-8 sanity check shows that measured maxima co-vary with low-bit reconstruction error via activation-scale selection. We conclude that maximum activation magnitude is a model property tied to family, architecture, and training stage - not a simple byproduct of size - and should be measured and reported alongside any open-weight release before low-bit deployment. The code is publicly available at https://github.com/clx1415926/Max_act_llm.

preprint2023arXiv

MedSegDiff: Medical Image Segmentation with Diffusion Probabilistic Model

Diffusion probabilistic model (DPM) recently becomes one of the hottest topic in computer vision. Its image generation application such as Imagen, Latent Diffusion Models and Stable Diffusion have shown impressive generation capabilities, which aroused extensive discussion in the community. Many recent studies also found it is useful in many other vision tasks, like image deblurring, super-resolution and anomaly detection. Inspired by the success of DPM, we propose the first DPM based model toward general medical image segmentation tasks, which we named MedSegDiff. In order to enhance the step-wise regional attention in DPM for the medical image segmentation, we propose dynamic conditional encoding, which establishes the state-adaptive conditions for each sampling step. We further propose Feature Frequency Parser (FF-Parser), to eliminate the negative effect of high-frequency noise component in this process. We verify MedSegDiff on three medical segmentation tasks with different image modalities, which are optic cup segmentation over fundus images, brain tumor segmentation over MRI images and thyroid nodule segmentation over ultrasound images. The experimental results show that MedSegDiff outperforms state-of-the-art (SOTA) methods with considerable performance gap, indicating the generalization and effectiveness of the proposed model. Our code is released at https://github.com/WuJunde/MedSegDiff.

preprint2023arXiv

Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution

Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning. Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism. Though one could lower the complexity of Transformers by inducing the sparsity in point-wise self-attentions for LTTF, the limited information utilization prohibits the model from exploring the complex dependencies comprehensively. To this end, we propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects: (i) an encoder-decoder architecture incorporating a linear complexity without sacrificing information utilization is proposed on top of sliding-window attention and Stationary and Instant Recurrent Network (SIRN); (ii) a module derived from the normalizing flow is devised to further improve the information utilization by inferring the outputs with the latent variables in SIRN directly; (iii) the inter-series correlation and temporal dynamics in time-series data are modeled explicitly to fuel the downstream self-attention mechanism. Extensive experiments on seven real-world datasets demonstrate that Conformer outperforms the state-of-the-art methods on LTTF and generates reliable prediction results with uncertainty quantification.

preprint2022arXiv

A Comparative Survey of Deep Active Learning

While deep learning (DL) is data-hungry and usually relies on extensive labeled data to deliver good performance, Active Learning (AL) reduces labeling costs by selecting a small proportion of samples from unlabeled data for labeling and training. Therefore, Deep Active Learning (DAL) has risen as a feasible solution for maximizing model performance under a limited labeling cost/budget in recent years. Although abundant methods of DAL have been developed and various literature reviews conducted, the performance evaluation of DAL methods under fair comparison settings is not yet available. Our work intends to fill this gap. In this work, We construct a DAL toolkit, DeepAL+, by re-implementing 19 highly-cited DAL methods. We survey and categorize DAL-related works and construct comparative experiments across frequently used datasets and DAL algorithms. Additionally, we explore some factors (e.g., batch size, number of epochs in the training process) that influence the efficacy of DAL, which provides better references for researchers to design their DAL experiments or carry out DAL-related applications.

preprint2022arXiv

A Survey on Video Action Recognition in Sports: Datasets, Methods and Applications

To understand human behaviors, action recognition based on videos is a common approach. Compared with image-based action recognition, videos provide much more information. Reducing the ambiguity of actions and in the last decade, many works focused on datasets, novel models and learning approaches have improved video action recognition to a higher level. However, there are challenges and unsolved problems, in particular in sports analytics where data collection and labeling are more sophisticated, requiring sport professionals to annotate data. In addition, the actions could be extremely fast and it becomes difficult to recognize them. Moreover, in team sports like football and basketball, one action could involve multiple players, and to correctly recognize them, we need to analyse all players, which is relatively complicated. In this paper, we present a survey on video action recognition for sports analytics. We introduce more than ten types of sports, including team sports, such as football, basketball, volleyball, hockey and individual sports, such as figure skating, gymnastics, table tennis, tennis, diving and badminton. Then we compare numerous existing frameworks for sports analysis to present status quo of video action recognition in both team sports and individual sports. Finally, we discuss the challenges and unsolved problems in this area and to facilitate sports analytics, we develop a toolbox using PaddlePaddle, which supports football, basketball, table tennis and figure skating action recognition.

preprint2022arXiv

AA-Forecast: Anomaly-Aware Forecast for Extreme Events

Time series models often deal with extreme events and anomalies, both prevalent in real-world datasets. Such models often need to provide careful probabilistic forecasting, which is vital in risk management for extreme events such as hurricanes and pandemics. However, it is challenging to automatically detect and learn to use extreme events and anomalies for large-scale datasets, which often require manual effort. Hence, we propose an anomaly-aware forecast framework that leverages the previously seen effects of anomalies to improve its prediction accuracy during and after the presence of extreme events. Specifically, the framework automatically extracts anomalies and incorporates them through an attention mechanism to increase its accuracy for future extreme events. Moreover, the framework employs a dynamic uncertainty optimization algorithm that reduces the uncertainty of forecasts in an online manner. The proposed framework demonstrated consistent superior accuracy with less uncertainty on three datasets with different varieties of anomalies over the current prediction models.

preprint2022arXiv

AutoGCL: Automated Graph Contrastive Learning via Learnable View Generators

Contrastive learning has been widely applied to graph representation learning, where the view generators play a vital role in generating effective contrastive samples. Most of the existing contrastive learning methods employ pre-defined view generation methods, e.g., node drop or edge perturbation, which usually cannot adapt to input data or preserve the original semantic structures well. To address this issue, we propose a novel framework named Automated Graph Contrastive Learning (AutoGCL) in this paper. Specifically, AutoGCL employs a set of learnable graph view generators orchestrated by an auto augmentation strategy, where every graph view generator learns a probability distribution of graphs conditioned by the input. While the graph view generators in AutoGCL preserve the most representative structures of the original graph in generation of every contrastive sample, the auto augmentation learns policies to introduce adequate augmentation variances in the whole contrastive learning procedure. Furthermore, AutoGCL adopts a joint training strategy to train the learnable view generators, the graph encoder, and the classifier in an end-to-end manner, resulting in topological heterogeneity yet semantic similarity in the generation of contrastive samples. Extensive experiments on semi-supervised learning, unsupervised learning, and transfer learning demonstrate the superiority of our AutoGCL framework over the state-of-the-arts in graph contrastive learning. In addition, the visualization results further confirm that the learnable view generators can deliver more compact and semantically meaningful contrastive samples compared against the existing view generation methods.

preprint2022arXiv

Distilling Ensemble of Explanations for Weakly-Supervised Pre-Training of Image Segmentation Models

While fine-tuning pre-trained networks has become a popular way to train image segmentation models, such backbone networks for image segmentation are frequently pre-trained using image classification source datasets, e.g., ImageNet. Though image classification datasets could provide the backbone networks with rich visual features and discriminative ability, they are incapable of fully pre-training the target model (i.e., backbone+segmentation modules) in an end-to-end manner. The segmentation modules are left to random initialization in the fine-tuning process due to the lack of segmentation labels in classification datasets. In our work, we propose a method that leverages Pseudo Semantic Segmentation Labels (PSSL), to enable the end-to-end pre-training for image segmentation models based on classification datasets. PSSL was inspired by the observation that the explanation results of classification models, obtained through explanation algorithms such as CAM, SmoothGrad and LIME, would be close to the pixel clusters of visual objects. Specifically, PSSL is obtained for each image by interpreting the classification results and aggregating an ensemble of explanations queried from multiple classifiers to lower the bias caused by single models. With PSSL for every image of ImageNet, the proposed method leverages a weighted segmentation learning procedure to pre-train the segmentation network en masse. Experiment results show that, with ImageNet accompanied by PSSL as the source dataset, the proposed end-to-end pre-training strategy successfully boosts the performance of various segmentation models, i.e., PSPNet-ResNet50, DeepLabV3-ResNet50, and OCRNet-HRNetW18, on a number of segmentation tasks, such as CamVid, VOC-A, VOC-C, ADE20K, and CityScapes, with significant improvements. The source code is availabel at https://github.com/PaddlePaddle/PaddleSeg.

preprint2022arXiv

Face.evoLVe: A High-Performance Face Recognition Library

In this paper, we develop face.evoLVe -- a comprehensive library that collects and implements a wide range of popular deep learning-based methods for face recognition. First of all, face.evoLVe is composed of key components that cover the full process of face analytics, including face alignment, data processing, various backbones, losses, and alternatives with bags of tricks for improving performance. Later, face.evoLVe supports multi-GPU training on top of different deep learning platforms, such as PyTorch and PaddlePaddle, which facilitates researchers to work on both large-scale datasets with millions of images and low-shot counterparts with limited well-annotated data. More importantly, along with face.evoLVe, images before & after alignment in the common benchmark datasets are released with source codes and trained models provided. All these efforts lower the technical burdens in reproducing the existing methods for comparison, while users of our library could focus on developing advanced approaches more efficiently. Last but not least, face.evoLVe is well designed and vibrantly evolving, so that new face recognition approaches can be easily plugged into our framework. Note that we have used face.evoLVe to participate in a number of face recognition competitions and secured the first place. The version that supports PyTorch is publicly available at https://github.com/ZhaoJ9014/face.evoLVe.PyTorch and the PaddlePaddle version is available at https://github.com/ZhaoJ9014/face.evoLVe.PyTorch/tree/master/paddle. Face.evoLVe has been widely used for face analytics, receiving 2.4K stars and 622 forks.

preprint2022arXiv

From Distributed Machine Learning to Federated Learning: A Survey

In recent years, data and computing resources are typically distributed in the devices of end users, various regions or organizations. Because of laws or regulations, the distributed data and computing resources cannot be directly shared among different regions or organizations for machine learning tasks. Federated learning emerges as an efficient approach to exploit distributed data and computing resources, so as to collaboratively train machine learning models, while obeying the laws and regulations and ensuring data security and data privacy. In this paper, we provide a comprehensive survey of existing works for federated learning. We propose a functional architecture of federated learning systems and a taxonomy of related techniques. Furthermore, we present the distributed training, data communication, and security of FL systems. Finally, we analyze their limitations and propose future research directions.

preprint2022arXiv

From Personalized Medicine to Population Health: A Survey of mHealth Sensing Techniques

Mobile Sensing Apps have been widely used as a practical approach to collect behavioral and health-related information from individuals and provide timely intervention to promote health and well-beings, such as mental health and chronic cares. As the objectives of mobile sensing could be either \emph{(a) personalized medicine for individuals} or \emph{(b) public health for populations}, in this work we review the design of these mobile sensing apps, and propose to categorize the design of these apps/systems in two paradigms -- \emph{(i) Personal Sensing} and \emph{(ii) Crowd Sensing} paradigms. While both sensing paradigms might incorporate with common ubiquitous sensing technologies, such as wearable sensors, mobility monitoring, mobile data offloading, and/or cloud-based data analytics to collect and process sensing data from individuals, we present a novel taxonomy system with two major components that can specify and classify apps/systems from aspects of the life-cycle of mHealth Sensing: \emph{(1) Sensing Task Creation \& Participation}, \emph{(2) Health Surveillance \& Data Collection}, and \emph{(3) Data Analysis \& Knowledge Discovery}. With respect to different goals of the two paradigms, this work systematically reviews this field, and summarizes the design of typical apps/systems in the view of the configurations and interactions between these two components. In addition to summarization, the proposed taxonomy system also helps figure out the potential directions of mobile sensing for health from both personalized medicines and population health perspectives.

preprint2022arXiv

Interpretable Deep Learning: Interpretation, Interpretability, Trustworthiness, and Beyond

Deep neural networks have been well-known for their superb handling of various machine learning and artificial intelligence tasks. However, due to their over-parameterized black-box nature, it is often difficult to understand the prediction results of deep models. In recent years, many interpretation tools have been proposed to explain or reveal how deep models make decisions. In this paper, we review this line of research and try to make a comprehensive survey. Specifically, we first introduce and clarify two basic concepts -- interpretations and interpretability -- that people usually get confused about. To address the research efforts in interpretations, we elaborate the designs of a number of interpretation algorithms, from different perspectives, by proposing a new taxonomy. Then, to understand the interpretation results, we also survey the performance metrics for evaluating interpretation algorithms. Further, we summarize the current works in evaluating models' interpretability using "trustworthy" interpretation algorithms. Finally, we review and discuss the connections between deep models' interpretations and other factors, such as adversarial robustness and learning from interpretations, and we introduce several open-source libraries for interpretation algorithms and evaluation approaches.

preprint2022arXiv

Parameter-Free Style Projection for Arbitrary Style Transfer

Arbitrary image style transfer is a challenging task which aims to stylize a content image conditioned on arbitrary style images. In this task the feature-level content-style transformation plays a vital role for proper fusion of features. Existing feature transformation algorithms often suffer from loss of content or style details, non-natural stroke patterns, and unstable training. To mitigate these issues, this paper proposes a new feature-level style transformation technique, named Style Projection, for parameter-free, fast, and effective content-style transformation. This paper further presents a real-time feed-forward model to leverage Style Projection for arbitrary image style transfer, which includes a regularization term for matching the semantics between input contents and stylized outputs. Extensive qualitative analysis, quantitative evaluation, and user study have demonstrated the effectiveness and efficiency of the proposed methods.

preprint2022arXiv

Pareto Optimization for Active Learning under Out-of-Distribution Data Scenarios

Pool-based Active Learning (AL) has achieved great success in minimizing labeling cost by sequentially selecting informative unlabeled samples from a large unlabeled data pool and querying their labels from oracle/annotators. However, existing AL sampling strategies might not work well in out-of-distribution (OOD) data scenarios, where the unlabeled data pool contains some data samples that do not belong to the classes of the target task. Achieving good AL performance under OOD data scenarios is a challenging task due to the natural conflict between AL sampling strategies and OOD sample detection. AL selects data that are hard to be classified by the current basic classifier (e.g., samples whose predicted class probabilities have high entropy), while OOD samples tend to have more uniform predicted class probabilities (i.e., high entropy) than in-distribution (ID) data. In this paper, we propose a sampling scheme, Monte-Carlo Pareto Optimization for Active Learning (POAL), which selects optimal subsets of unlabeled samples with fixed batch size from the unlabeled data pool. We cast the AL sampling task as a multi-objective optimization problem, and thus we utilize Pareto optimization based on two conflicting objectives: (1) the normal AL data sampling scheme (e.g., maximum entropy), and (2) the confidence of not being an OOD sample. Experimental results show its effectiveness on both classical Machine Learning (ML) and Deep Learning (DL) tasks.

preprint2022arXiv

Towards Accurate Knowledge Transfer via Target-awareness Representation Disentanglement

Fine-tuning deep neural networks pre-trained on large scale datasets is one of the most practical transfer learning paradigm given limited quantity of training samples. To obtain better generalization, using the starting point as the reference (SPAR), either through weights or features, has been successfully applied to transfer learning as a regularizer. However, due to the domain discrepancy between the source and target task, there exists obvious risk of negative transfer in a straightforward manner of knowledge preserving. In this paper, we propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED), where the relevant knowledge with respect to the target task is disentangled from the original source model and used as a regularizer during fine-tuning the target model. Specifically, we design two alternative methods, maximizing the Maximum Mean Discrepancy (Max-MMD) and minimizing the mutual information (Min-MI), for the representation disentanglement. Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average. TRED also outperforms related state-of-the-art transfer learning regularizers such as L2-SP, AT, DELTA, and BSS.

preprint2021arXiv

C-Watcher: A Framework for Early Detection of High-Risk Neighborhoods Ahead of COVID-19 Outbreak

The novel coronavirus disease (COVID-19) has crushed daily routines and is still rampaging through the world. Existing solution for nonpharmaceutical interventions usually needs to timely and precisely select a subset of residential urban areas for containment or even quarantine, where the spatial distribution of confirmed cases has been considered as a key criterion for the subset selection. While such containment measure has successfully stopped or slowed down the spread of COVID-19 in some countries, it is criticized for being inefficient or ineffective, as the statistics of confirmed cases are usually time-delayed and coarse-grained. To tackle the issues, we propose C-Watcher, a novel data-driven framework that aims at screening every neighborhood in a target city and predicting infection risks, prior to the spread of COVID-19 from epicenters to the city. In terms of design, C-Watcher collects large-scale long-term human mobility data from Baidu Maps, then characterizes every residential neighborhood in the city using a set of features based on urban mobility patterns. Furthermore, to transfer the firsthand knowledge (witted in epicenters) to the target city before local outbreaks, we adopt a novel adversarial encoder framework to learn "city-invariant" representations from the mobility-related features for precise early detection of high-risk neighborhoods, even before any confirmed cases known, in the target city. We carried out extensive experiments on C-Watcher using the real-data records in the early stage of COVID-19 outbreaks, where the results demonstrate the efficiency and effectiveness of C-Watcher for early detection of high-risk neighborhoods from a large number of cities.

preprint2020arXiv

An Investigation of Containment Measures Against the COVID-19 Pandemic in Mainland China

As the recent COVID-19 outbreak rapidly expands all over the world, various containment measures have been carried out to fight against the COVID-19 pandemic. In Mainland China, the containment measures consist of three types, i.e., Wuhan travel ban, intra-city quarantine and isolation, and inter-city travel restriction. In order to carry out the measures, local economy and information acquisition play an important role. In this paper, we investigate the correlation of local economy and the information acquisition on the execution of containment measures to fight against the COVID-19 pandemic in Mainland China. First, we use a parsimonious model, i.e., SIR-X model, to estimate the parameters, which represent the execution of intra-city quarantine and isolation in major cities of Mainland China. In order to understand the execution of intra-city quarantine and isolation, we analyze the correlation between the representative parameters including local economy, mobility, and information acquisition. To this end, we collect the data of Gross Domestic Product (GDP), the inflows from Wuhan and outflows, and the COVID-19 related search frequency from a widely-used Web mapping service, i.e., Baidu Maps, and Web search engine, i.e., Baidu Search Engine, in Mainland China. Based on the analysis, we confirm the strong correlation between the local economy and the execution of information acquisition in major cities of Mainland China. We further evidence that, although the cities with high GDP per capita attracts bigger inflows from Wuhan, people are more likely to conduct the quarantine measure and to reduce going out to other cities. Finally, the correlation analysis using search data shows that well-informed individuals are likely to carry out containment measures.

preprint2020arXiv

COLAM: Co-Learning of Deep Neural Networks and Soft Labels via Alternating Minimization

Softening labels of training datasets with respect to data representations has been frequently used to improve the training of deep neural networks (DNNs). While such a practice has been studied as a way to leverage privileged information about the distribution of the data, a well-trained learner with soft classification outputs should be first obtained as a prior to generate such privileged information. To solve such chicken-egg problem, we propose COLAM framework that Co-Learns DNNs and soft labels through Alternating Minimization of two objectives - (a) the training loss subject to soft labels and (b) the objective to learn improved soft labels - in one end-to-end training procedure. We performed extensive experiments to compare our proposed method with a series of baselines. The experiment results show that COLAM achieves improved performance on many tasks with better testing classification accuracy. We also provide both qualitative and quantitative analyses that explain why COLAM works well.

preprint2020arXiv

Curriculum Audiovisual Learning

Associating sound and its producer in complex audiovisual scene is a challenging task, especially when we are lack of annotated training data. In this paper, we present a flexible audiovisual model that introduces a soft-clustering module as the audio and visual content detector, and regards the pervasive property of audiovisual concurrency as the latent supervision for inferring the correlation among detected contents. To ease the difficulty of audiovisual learning, we propose a novel curriculum learning strategy that trains the model from simple to complex scene. We show that such ordered learning procedure rewards the model the merits of easy training and fast convergence. Meanwhile, our audiovisual model can also provide effective unimodal representation and cross-modal alignment performance. We further deploy the well-trained model into practical audiovisual sound localization and separation task. We show that our localization model significantly outperforms existing methods, based on which we show comparable performance in sound separation without referring external visual supervision. Our video demo can be found at https://youtu.be/kuClfGG0cFU.

preprint2020arXiv

DELTA: DEep Learning Transfer using Feature Map with Attention for Convolutional Networks

Transfer learning through fine-tuning a pre-trained neural network with an extremely large dataset, such as ImageNet, can significantly accelerate training while the accuracy is frequently bottlenecked by the limited dataset size of the new target task. To solve the problem, some regularization methods, constraining the outer layer weights of the target network using the starting point as references (SPAR), have been studied. In this paper, we propose a novel regularized transfer learning framework DELTA, namely DEep Learning Transfer using Feature Map with Attention. Instead of constraining the weights of neural network, DELTA aims to preserve the outer layer outputs of the target network. Specifically, in addition to minimizing the empirical loss, DELTA intends to align the outer layer outputs of two networks, through constraining a subset of feature maps that are precisely selected by attention that has been learned in an supervised learning manner. We evaluate DELTA with the state-of-the-art algorithms, including L2 and L2-SP. The experiment results show that our proposed method outperforms these baselines with higher accuracy for new tasks.

preprint2020arXiv

Generating Person Images with Appearance-aware Pose Stylizer

Generation of high-quality person images is challenging, due to the sophisticated entanglements among image factors, e.g., appearance, pose, foreground, background, local details, global structures, etc. In this paper, we present a novel end-to-end framework to generate realistic person images based on given person poses and appearances. The core of our framework is a novel generator called Appearance-aware Pose Stylizer (APS) which generates human images by coupling the target pose with the conditioned person appearance progressively. The framework is highly flexible and controllable by effectively decoupling various complex person image factors in the encoding phase, followed by re-coupling them in the decoding phase. In addition, we present a new normalization method named adaptive patch normalization, which enables region-specific normalization and shows a good performance when adopted in person image generation model. Experiments on two benchmark datasets show that our method is capable of generating visually appealing and realistic-looking results using arbitrary image and pose inputs.

preprint2020arXiv

On the Noisy Gradient Descent that Generalizes as SGD

The gradient noise of SGD is considered to play a central role in the observed strong generalization abilities of deep learning. While past studies confirm that the magnitude and the covariance structure of gradient noise are critical for regularization, it remains unclear whether or not the class of noise distributions is important. In this work we provide negative results by showing that noises in classes different from the SGD noise can also effectively regularize gradient descent. Our finding is based on a novel observation on the structure of the SGD noise: it is the multiplication of the gradient matrix and a sampling noise that arises from the mini-batch sampling procedure. Moreover, the sampling noises unify two kinds of gradient regularizing noises that belong to the Gaussian class: the one using (scaled) Fisher as covariance and the one using the gradient covariance of SGD as covariance. Finally, thanks to the flexibility of choosing noise class, an algorithm is proposed to perform noisy gradient descent that generalizes well, the variant of which even benefits large batch SGD training without hurting generalization.

preprint2020arXiv

Quantifying the Economic Impact of COVID-19 in Mainland China Using Human Mobility Data

To contain the pandemic of coronavirus (COVID-19) in Mainland China, the authorities have put in place a series of measures, including quarantines, social distancing, and travel restrictions. While these strategies have effectively dealt with the critical situations of outbreaks, the combination of the pandemic and mobility controls has slowed China's economic growth, resulting in the first quarterly decline of Gross Domestic Product (GDP) since GDP began to be calculated, in 1992. To characterize the potential shrinkage of the domestic economy, from the perspective of mobility, we propose two new economic indicators: the New Venues Created (NVC) and the Volumes of Visits to Venue (V^3), as the complementary measures to domestic investments and consumption activities, using the data of Baidu Maps. The historical records of these two indicators demonstrated strong correlations with the past figures of Chinese GDP, while the status quo has dramatically changed this year, due to the pandemic. We hereby presented a quantitative analysis to project the impact of the pandemic on economies, using the recent trends of NVC and V^3. We found that the most affected sectors would be travel-dependent businesses, such as hotels, educational institutes, and public transportation, while the sectors that are mandatory to human life, such as workplaces, residential areas, restaurants, and shopping sites, have been recovering rapidly. Analysis at the provincial level showed that the self-sufficient and self-sustainable economic regions, with internal supplies, production, and consumption, have recovered faster than those regions relying on global supply chains.

preprint2020arXiv

RIFLE: Backpropagation in Depth for Deep Transfer Learning through Re-Initializing the Fully-connected LayEr

Fine-tuning the deep convolution neural network(CNN) using a pre-trained model helps transfer knowledge learned from larger datasets to the target task. While the accuracy could be largely improved even when the training dataset is small, the transfer learning outcome is usually constrained by the pre-trained model with close CNN weights (Liu et al., 2019), as the backpropagation here brings smaller updates to deeper CNN layers. In this work, we propose RIFLE - a simple yet effective strategy that deepens backpropagation in transfer learning settings, through periodically Re-Initializing the Fully-connected LayEr with random scratch during the fine-tuning procedure. RIFLE brings meaningful updates to the weights of deep CNN layers and improves low-level feature learning, while the effects of randomization can be easily converged throughout the overall learning procedure. The experiments show that the use of RIFLE significantly improves deep transfer learning accuracy on a wide range of datasets, out-performing known tricks for the similar purpose, such as Dropout, DropConnect, StochasticDepth, Disturb Label and Cyclic Learning Rate, under the same settings with 0.5% -2% higher testing accuracy. Empirical cases and ablation studies further indicate RIFLE brings meaningful updates to deep CNN layers with accuracy improved.

preprint2020arXiv

The Weather Impacts the Outbreak of COVID-19 in Mainland China

Recent literature has suggested that climate conditions have considerably significant influences on the transmission of coronavirus COVID-19. However, there is a lack of comprehensive study that investigates the relationships between multiple weather factors and the development of COVID-19 pandemic while excluding the impact of social factors. In this paper, we study the relationships between six main weather factors and the infection statistics of COVID-19 on 250 cities in Mainland China. Our correlation analysis using weather and infection statistics indicates that all the studied weather factors are correlated with the spread of COVID-19, where precipitation shows the strongest correlation. We also build a weather-aware predictive model that forecasts the number of infected cases should there be a second wave of the outbreak in Mainland China. Our predicted results show that cities located in different geographical areas are likely to be challenged with the second wave of COVID-19 at very different time periods and the severity of the outbreak varies to a large degree, in correspondence with the varying weather conditions.

preprint2020arXiv

Ultrafast Photorealistic Style Transfer via Neural Architecture Search

The key challenge in photorealistic style transfer is that an algorithm should faithfully transfer the style of a reference photo to a content photo while the generated image should look like one captured by a camera. Although several photorealistic style transfer algorithms have been proposed, they need to rely on post- and/or pre-processing to make the generated images look photorealistic. If we disable the additional processing, these algorithms would fail to produce plausible photorealistic stylization in terms of detail preservation and photorealism. In this work, we propose an effective solution to these issues. Our method consists of a construction step (C-step) to build a photorealistic stylization network and a pruning step (P-step) for acceleration. In the C-step, we propose a dense auto-encoder named PhotoNet based on a carefully designed pre-analysis. PhotoNet integrates a feature aggregation module (BFA) and instance normalized skip links (INSL). To generate faithful stylization, we introduce multiple style transfer modules in the decoder and INSLs. PhotoNet significantly outperforms existing algorithms in terms of both efficiency and effectiveness. In the P-step, we adopt a neural architecture search method to accelerate PhotoNet. We propose an automatic network pruning framework in the manner of teacher-student learning for photorealistic stylization. The network architecture named PhotoNAS resulted from the search achieves significant acceleration over PhotoNet while keeping the stylization effects almost intact. We conduct extensive experiments on both image and video transfer. The results show that our method can produce favorable results while achieving 20-30 times acceleration in comparison with the existing state-of-the-art approaches. It is worth noting that the proposed algorithm accomplishes better performance without any pre- or post-processing.

preprint2020arXiv

XMixup: Efficient Transfer Learning with Auxiliary Samples by Cross-domain Mixup

Transferring knowledge from large source datasets is an effective way to fine-tune the deep neural networks of the target task with a small sample size. A great number of algorithms have been proposed to facilitate deep transfer learning, and these techniques could be generally categorized into two groups - Regularized Learning of the target task using models that have been pre-trained from source datasets, and Multitask Learning with both source and target datasets to train a shared backbone neural network. In this work, we aim to improve the multitask paradigm for deep transfer learning via Cross-domain Mixup (XMixup). While the existing multitask learning algorithms need to run backpropagation over both the source and target datasets and usually consume a higher gradient complexity, XMixup transfers the knowledge from source to target tasks more efficiently: for every class of the target task, XMixup selects the auxiliary samples from the source dataset and augments training samples via the simple mixup strategy. We evaluate XMixup over six real world transfer learning datasets. Experiment results show that XMixup improves the accuracy by 1.9% on average. Compared with other state-of-the-art transfer learning approaches, XMixup costs much less training time while still obtains higher accuracy.