Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
33works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

33 published item(s)

preprint2026arXiv

Attesting Model Lineage by Consisted Knowledge Evolution with Fine-Tuning Trajectory

The fine-tuning technique in deep learning gives rise to an emerging lineage relationship among models. This lineage provides a promising perspective for addressing security concerns such as unauthorized model redistribution and false claim of model provenance, which are particularly pressing in \textcolor{blue}{open-weight model} libraries where robust lineage verification mechanisms are often lacking. Existing approaches to model lineage detection primarily rely on static architectural similarities, which are insufficient to capture the dynamic evolution of knowledge that underlies true lineage relationships. Drawing inspiration from the genetic mechanism of human evolution, we tackle the problem of model lineage attestation by verifying the joint trajectory of knowledge evolution and parameter modification. To this end, we propose a novel model lineage attestation framework. In our framework, model editing is first leveraged to quantify parameter-level changes introduced by fine-tuning. Subsequently, we introduce a novel knowledge vectorization mechanism that refines the evolved knowledge within the edited models into compact representations by the assistance of probe samples. The probing strategies are adapted to different types of model families. These embeddings serve as the foundation for verifying the arithmetic consistency of knowledge relationships across models, thereby enabling robust attestation of model lineage. Extensive experimental evaluations demonstrate the effectiveness and resilience of our approach in a variety of adversarial scenarios in the real world. Our method consistently achieves reliable lineage verification across a broad spectrum of model types, including classifiers, diffusion models, and large language models.

preprint2026arXiv

Chronicles-OCR: A Cross-Temporal Perception Benchmark for the Evolutionary Trajectory of Chinese Characters

Vision Large Language Models (VLLMs) have achieved remarkable success in modern text-rich visual understanding. However, their perceptual robustness in the face of the continuous morphological evolution of historical writing systems remains largely unexplored. Existing ancient text datasets typically focus on isolated historical periods, failing to capture the systematic visual distribution shifts spanning thousands of years. To bridge this gap and empower Digital Humanities, we introduce Chronicles-OCR, the first comprehensive benchmark specifically designed to evaluate the cross-temporal visual perception capabilities of VLLMs across the complete evolutionary trajectory of Chinese characters, known as the Seven Chinese Scripts. Curated in collaboration with top-tier institutional domain experts, the dataset comprises 2,800 strictly balanced images encompassing highly diverse physical media, ranging from tortoise shells to paper-based calligraphy. To accommodate the drastic morphological and topological variations across different historical stages, we propose a novel Stage-Adaptive Annotation Paradigm. Based on this, Chronicles-OCR formulates four rigorous quantitative tasks: cross-period character spotting, fine-grained archaic character recognition via visual referring, ancient text parsing, and script classification. By isolating visual perception from semantic reasoning, Chronicles-OCR provides an authoritative platform to expose the limitations of current VLLMs, paving the way for robust, evolution-aware historical text perception. Chronicles-OCR is publicly available at https://github.com/VirtualLUOUCAS/Chronicles-OCR.

preprint2026arXiv

Co-Evolving Policy Distillation

RLVR and OPD have become standard paradigms for post-training. We provide a unified analysis of these two paradigms in consolidating multiple expert capabilities into a single model, identifying capability loss in different ways: mixed RLVR suffers from inter-capability divergence cost, while the pipeline of first training experts and then performing OPD, though avoiding divergence, fails to fully absorb teacher capabilities due to large behavioral pattern gaps between teacher and student. We propose Co-Evolving Policy Distillation (CoPD), which encourages parallel training of experts and introduces OPD during each expert's ongoing RLVR training rather than after complete expert training, with experts serving as mutual teachers (making OPD bidirectional) to co-evolve. This enables more consistent behavioral patterns among experts while maintaining sufficient complementary knowledge throughout. Experiments validate that CoPD achieves all-in-one integration of text, image, and video reasoning capabilities, significantly outperforming strong baselines such as mixed RLVR and MOPD, and even surpassing domain-specific experts. The model parallel training pattern offered by CoPD may inspire a novel training scaling paradigm.

preprint2026arXiv

MiA-Signature: Approximating Global Activation for Long-Context Understanding

A growing body of work in cognitive science suggests that reportable conscious access is associated with \emph{global ignition} over distributed memory systems, while such activation is only partially accessible as individuals cannot directly access or enumerate all activated contents. This tension suggests a plausible mechanism that cognition may rely on a compact representation that approximates the global influence of activation on downstream processing. Inspired by this idea, we introduce the concept of \textbf{Mindscape Activation Signature (MiA-Signature)}, a compressed representation of the global activation pattern induced by a query. In LLM systems, this is instantiated via submodular-based selection of high-level concepts that cover the activated context space, optionally refined through lightweight iterative updates using working memory. The resulting MiA-Signature serves as a conditioning signal that approximates the effect of the full activation state while remaining computationally tractable. Integrating MiA-Signatures into both RAG and agentic systems yields consistent performance gains across multiple long-context understanding tasks.

preprint2026arXiv

RumorSphere: A Framework for Million-scale Agent-based Dynamic Simulation of Rumor Propagation

Rumor propagation modeling is critical for understanding and mitigating misinformation. Existing approaches combining rule-based regular agents with LLM-driven core agents provide a promising paradigm for large-scale rumor simulation. However, overlooking the dynamic nature of core agents and the importance of network topology on rumor spread significantly undermines the simulation performance. To address these issues, we present RumorSphere, a dynamic and hierarchical resonance framework for effective rumor simulation at the million-agent scale. Considering the dynamic role of core agents in rumor evolution, we propose a multi-agent dynamic interaction strategy based on the information cocoon theory, which adaptively identifies and activates critical core agents at conflict boundaries using LLMs, effectively supporting simulations with millions of agents. In addition, we design a hierarchical resonance network that integrates opinion leaders and localized community structures, enabling more realistic modeling of explosive rumor spread in real-world scenarios. Experiments on real-world datasets show that RumorSphere outperforms state-of-the-art methods, reducing simulation bias by an average of 26.5%.

preprint2024arXiv

FedNS: A Fast Sketching Newton-Type Algorithm for Federated Learning

Recent Newton-type federated learning algorithms have demonstrated linear convergence with respect to the communication rounds. However, communicating Hessian matrices is often unfeasible due to their quadratic communication complexity. In this paper, we introduce a novel approach to tackle this issue while still achieving fast convergence rates. Our proposed method, named as Federated Newton Sketch methods (FedNS), approximates the centralized Newton's method by communicating the sketched square-root Hessian instead of the exact Hessian. To enhance communication efficiency, we reduce the sketch size to match the effective dimension of the Hessian matrix. We provide convergence analysis based on statistical learning for the federated Newton sketch approaches. Specifically, our approaches reach super-linear convergence rates w.r.t. the communication rounds for the first time. We validate the effectiveness of our algorithms through various experiments, which coincide with our theoretical findings.

preprint2022arXiv

Dirichlet type extensions of Euler sums

In this paper, we study the alternating Euler $T$-sums and $§$-sums, which are infinite series involving (alternating) odd harmonic numbers, and have similar forms and close relations to the Dirichlet beta functions. By using the method of residue computations, we establish the explicit formulas for the (alternating) linear and quadratic Euler $T$-sums and $§$-sums, from which, the parity theorems of Hoffman's double and triple $t$-values and Kaneko-Tsumura's double and triple $T$-values are further obtained. As supplements, we also show that the linear $T$-sums and $§$-sums are expressible in terms of colored multiple zeta values. Some interesting consequences and illustrative examples are presented.

preprint2022arXiv

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Real-world data often follows a long-tailed distribution, which makes the performance of existing classification algorithms degrade heavily. A key issue is that samples in tail categories fail to depict their intra-class diversity. Humans can imagine a sample in new poses, scenes, and view angles with their prior knowledge even if it is the first time to see this category. Inspired by this, we propose a novel reasoning-based implicit semantic data augmentation method to borrow transformation directions from other classes. Since the covariance matrix of each category represents the feature transformation directions, we can sample new directions from similar categories to generate definitely different instances. Specifically, the long-tailed distributed data is first adopted to train a backbone and a classifier. Then, a covariance matrix for each category is estimated, and a knowledge graph is constructed to store the relations of any two categories. Finally, tail samples are adaptively enhanced via propagating information from all the similar categories in the knowledge graph. Experimental results on CIFAR-100-LT, ImageNet-LT, and iNaturalist 2018 have demonstrated the effectiveness of our proposed method compared with the state-of-the-art methods.

preprint2022arXiv

Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training

Recent studies on the lottery ticket hypothesis (LTH) show that pre-trained language models (PLMs) like BERT contain matching subnetworks that have similar transfer learning performance as the original PLM. These subnetworks are found using magnitude-based pruning. In this paper, we find that the BERT subnetworks have even more potential than these studies have shown. Firstly, we discover that the success of magnitude pruning can be attributed to the preserved pre-training performance, which correlates with the downstream transferability. Inspired by this, we propose to directly optimize the subnetwork structure towards the pre-training objectives, which can better preserve the pre-training performance. Specifically, we train binary masks over model weights on the pre-training tasks, with the aim of preserving the universal transferability of the subnetwork, which is agnostic to any specific downstream tasks. We then fine-tune the subnetworks on the GLUE benchmark and the SQuAD dataset. The results show that, compared with magnitude pruning, mask training can effectively find BERT subnetworks with improved overall performance on downstream tasks. Moreover, our method is also more efficient in searching subnetworks and more advantageous when fine-tuning within a certain range of data scarcity. Our code is available at https://github.com/llyx97/TAMT.

preprint2022arXiv

Neutral Utterances are Also Causes: Enhancing Conversational Causal Emotion Entailment with Social Commonsense Knowledge

Conversational Causal Emotion Entailment aims to detect causal utterances for a non-neutral targeted utterance from a conversation. In this work, we build conversations as graphs to overcome implicit contextual modelling of the original entailment style. Following the previous work, we further introduce the emotion information into graphs. Emotion information can markedly promote the detection of causal utterances whose emotion is the same as the targeted utterance. However, it is still hard to detect causal utterances with different emotions, especially neutral ones. The reason is that models are limited in reasoning causal clues and passing them between utterances. To alleviate this problem, we introduce social commonsense knowledge (CSK) and propose a Knowledge Enhanced Conversation graph (KEC). KEC propagates the CSK between two utterances. As not all CSK is emotionally suitable for utterances, we therefore propose a sentiment-realized knowledge selecting strategy to filter CSK. To process KEC, we further construct the Knowledge Enhanced Directed Acyclic Graph networks. Experimental results show that our method outperforms baselines and infers more causes with different emotions from the targeted utterance.

preprint2022arXiv

Stability and Generalization of Differentially Private Minimax Problems

In the field of machine learning, many problems can be formulated as the minimax problem, including reinforcement learning, generative adversarial networks, to just name a few. So the minimax problem has attracted a huge amount of attentions from researchers in recent decades. However, there is relatively little work on studying the privacy of the general minimax paradigm. In this paper, we focus on the privacy of the general minimax setting, combining differential privacy together with minimax optimization paradigm. Besides, via algorithmic stability theory, we theoretically analyze the high probability generalization performance of the differentially private minimax algorithm under the strongly-convex-strongly-concave condition. To the best of our knowledge, this is the first time to analyze the generalization performance of general minimax paradigm, taking differential privacy into account.

preprint2022arXiv

Towards Practical Differential Privacy in Data Analysis: Understanding the Effect of Epsilon on Utility in Private ERM

In this paper, we focus our attention on private Empirical Risk Minimization (ERM), which is one of the most commonly used data analysis method. We take the first step towards solving the above problem by theoretically exploring the effect of epsilon (the parameter of differential privacy that determines the strength of privacy guarantee) on utility of the learning model. We trace the change of utility with modification of epsilon and reveal an established relationship between epsilon and utility. We then formalize this relationship and propose a practical approach for estimating the utility under an arbitrary value of epsilon. Both theoretical analysis and experimental results demonstrate high estimation accuracy and broad applicability of our approach in practical applications. As providing algorithms with strong utility guarantees that also give privacy when possible becomes more and more accepted, our approach would have high practical value and may be likely to be adopted by companies and organizations that would like to preserve privacy but are unwilling to compromise on utility.

preprint2022arXiv

TPSNet: Reverse Thinking of Thin Plate Splines for Arbitrary Shape Scene Text Representation

The research focus of scene text detection and recognition has shifted to arbitrary shape text in recent years, where the text shape representation is a fundamental problem. An ideal representation should be compact, complete, efficient, and reusable for subsequent recognition in our opinion. However, previous representations have flaws in one or more aspects. Thin-Plate-Spline (TPS) transformation has achieved great success in scene text recognition. Inspired by this, we reversely think of its usage and sophisticatedly take TPS as an exquisite representation for arbitrary shape text representation. The TPS representation is compact, complete, and efficient. With the predicted TPS parameters, the detected text region can be directly rectified to a near-horizontal one to assist the subsequent recognition. To further exploit the potential of the TPS representation, the Border Alignment Loss is proposed. Based on these designs, we implement the text detector TPSNet, which can be extended to a text spotter conveniently. Extensive evaluation and ablation of several public benchmarks demonstrate the effectiveness and superiority of the proposed method for text representation and spotting. Particularly, TPSNet achieves the detection F-Measure improvement of 4.4\% (78.4\% vs. 74.0\%) on Art dataset and the end-to-end spotting F-Measure improvement of 5.0\% (78.5\% vs. 73.5\%) on Total-Text, which are large margins with no bells and whistles.

preprint2022arXiv

UNITS: Unsupervised Intermediate Training Stage for Scene Text Detection

Recent scene text detection methods are almost based on deep learning and data-driven. Synthetic data is commonly adopted for pre-training due to expensive annotation cost. However, there are obvious domain discrepancies between synthetic data and real-world data. It may lead to sub-optimal performance to directly adopt the model initialized by synthetic data in the fine-tuning stage. In this paper, we propose a new training paradigm for scene text detection, which introduces an \textbf{UN}supervised \textbf{I}ntermediate \textbf{T}raining \textbf{S}tage (UNITS) that builds a buffer path to real-world data and can alleviate the gap between the pre-training stage and fine-tuning stage. Three training strategies are further explored to perceive information from real-world data in an unsupervised way. With UNITS, scene text detectors are improved without introducing any parameters and computations during inference. Extensive experimental results show consistent performance improvements on three public datasets.

preprint2021arXiv

Rescuing Deep Hashing from Dead Bits Problem

Deep hashing methods have shown great retrieval accuracy and efficiency in large-scale image retrieval. How to optimize discrete hash bits is always the focus in deep hashing methods. A common strategy in these methods is to adopt an activation function, e.g. $\operatorname{sigmoid}(\cdot)$ or $\operatorname{tanh}(\cdot)$, and minimize a quantization loss to approximate discrete values. However, this paradigm may make more and more hash bits stuck into the wrong saturated area of the activation functions and never escaped. We call this problem "Dead Bits Problem~(DBP)". Besides, the existing quantization loss will aggravate DBP as well. In this paper, we propose a simple but effective gradient amplifier which acts before activation functions to alleviate DBP. Moreover, we devise an error-aware quantization loss to further alleviate DBP. It avoids the negative effect of quantization loss based on the similarity between two images. The proposed gradient amplifier and error-aware quantization loss are compatible with a variety of deep hashing methods. Experimental results on three datasets demonstrate the efficiency of the proposed gradient amplifier and the error-aware quantization loss.

preprint2020arXiv

A Hierarchical Transformer with Speaker Modeling for Emotion Recognition in Conversation

Emotion Recognition in Conversation (ERC) is a more challenging task than conventional text emotion recognition. It can be regarded as a personalized and interactive emotion recognition task, which is supposed to consider not only the semantic information of text but also the influences from speakers. The current method models speakers' interactions by building a relation between every two speakers. However, this fine-grained but complicated modeling is computationally expensive, hard to extend, and can only consider local context. To address this problem, we simplify the complicated modeling to a binary version: Intra-Speaker and Inter-Speaker dependencies, without identifying every unique speaker for the targeted speaker. To better achieve the simplified interaction modeling of speakers in Transformer, which shows excellent ability to settle long-distance dependency, we design three types of masks and respectively utilize them in three independent Transformer blocks. The designed masks respectively model the conventional context modeling, Intra-Speaker dependency, and Inter-Speaker dependency. Furthermore, different speaker-aware information extracted by Transformer blocks diversely contributes to the prediction, and therefore we utilize the attention mechanism to automatically weight them. Experiments on two ERC datasets indicate that our model is efficacious to achieve better performance.

preprint2020arXiv

Convolutional Spectral Kernel Learning

Recently, non-stationary spectral kernels have drawn much attention, owing to its powerful feature representation ability in revealing long-range correlations and input-dependent characteristics. However, non-stationary spectral kernels are still shallow models, thus they are deficient to learn both hierarchical features and local interdependence. In this paper, to obtain hierarchical and local knowledge, we build an interpretable convolutional spectral kernel network (\texttt{CSKN}) based on the inverse Fourier transform, where we introduce deep architectures and convolutional filters into non-stationary spectral kernel representations. Moreover, based on Rademacher complexity, we derive the generalization error bounds and introduce two regularizers to improve the performance. Combining the regularizers and recent advancements on random initialization, we finally complete the learning framework of \texttt{CSKN}. Extensive experiments results on real-world datasets validate the effectiveness of the learning framework and coincide with our theoretical findings.

preprint2020arXiv

Expert Training: Task Hardness Aware Meta-Learning for Few-Shot Classification

Deep neural networks are highly effective when a large number of labeled samples are available but fail with few-shot classification tasks. Recently, meta-learning methods have received much attention, which train a meta-learner on massive additional tasks to gain the knowledge to instruct the few-shot classification. Usually, the training tasks are randomly sampled and performed indiscriminately, often making the meta-learner stuck into a bad local optimum. Some works in the optimization of deep neural networks have shown that a better arrangement of training data can make the classifier converge faster and perform better. Inspired by this idea, we propose an easy-to-hard expert meta-training strategy to arrange the training tasks properly, where easy tasks are preferred in the first phase, then, hard tasks are emphasized in the second phase. A task hardness aware module is designed and integrated into the training procedure to estimate the hardness of a task based on the distinguishability of its categories. In addition, we explore multiple hardness measurements including the semantic relation, the pairwise Euclidean distance, the Hausdorff distance, and the Hilbert-Schmidt independence criterion. Experimental results on the miniImageNet and tieredImageNetSketch datasets show that the meta-learners can obtain better results with our expert training strategy.

preprint2020arXiv

Exploring Relations in Untrimmed Videos for Self-Supervised Learning

Existing video self-supervised learning methods mainly rely on trimmed videos for model training. However, trimmed datasets are manually annotated from untrimmed videos. In this sense, these methods are not really self-supervised. In this paper, we propose a novel self-supervised method, referred to as Exploring Relations in Untrimmed Videos (ERUV), which can be straightforwardly applied to untrimmed videos (real unlabeled) to learn spatio-temporal features. ERUV first generates single-shot videos by shot change detection. Then a designed sampling strategy is used to model relations for video clips. The strategy is saved as our self-supervision signals. Finally, the network learns representations by predicting the category of relations between the video clips. ERUV is able to compare the differences and similarities of videos, which is also an essential procedure for action and video related tasks. We validate our learned models with action recognition and video retrieval tasks with three kinds of 3D CNNs. Experimental results show that ERUV is able to learn richer representations and it outperforms state-of-the-art self-supervised methods with significant margins.

preprint2020arXiv

FC2RN: A Fully Convolutional Corner Refinement Network for Accurate Multi-Oriented Scene Text Detection

Recent scene text detection works mainly focus on curve text detection. However, in real applications, the curve texts are more scarce than the multi-oriented ones. Accurate detection of multi-oriented text with large variations of scales, orientations, and aspect ratios is of great significance. Among the multi-oriented detection methods, direct regression for the geometry of scene text shares a simple yet powerful pipeline and gets popular in academic and industrial communities, but it may produce imperfect detections, especially for long texts due to the limitation of the receptive field. In this work, we aim to improve this while keeping the pipeline simple. A fully convolutional corner refinement network (FC2RN) is proposed for accurate multi-oriented text detection, in which an initial corner prediction and a refined corner prediction are obtained at one pass. With a novel quadrilateral RoI convolution operation tailed for multi-oriented scene text, the initial quadrilateral prediction is encoded into the feature maps which can be further used to predict offset between the initial prediction and the ground-truth as well as output a refined confidence score. Experimental results on four public datasets including MSRA-TD500, ICDAR2017-RCTW, ICDAR2015, and COCO-Text demonstrate that FC2RN can outperform the state-of-the-art methods. The ablation study shows the effectiveness of corner refinement and scoring for accurate text localization.

preprint2020arXiv

Input Perturbation: A New Paradigm between Central and Local Differential Privacy

Traditionally, there are two models on differential privacy: the central model and the local model. The central model focuses on the machine learning model and the local model focuses on the training data. In this paper, we study the \textit{input perturbation} method in differentially private empirical risk minimization (DP-ERM), preserving privacy of the central model. By adding noise to the original training data and training with the `perturbed data', we achieve ($ε$,$δ$)-differential privacy on the final model, along with some kind of privacy on the original data. We observe that there is an interesting connection between the local model and the central model: the perturbation on the original data causes the perturbation on the gradient, and finally the model parameters. This observation means that our method builds a bridge between local and central model, protecting the data, the gradient and the model simultaneously, which is more superior than previous central methods. Detailed theoretical analysis and experiments show that our method achieves almost the same (or even better) performance as some of the best previous central methods with more protections on privacy, which is an attractive result. Moreover, we extend our method to a more general case: the loss function satisfies the Polyak-Lojasiewicz condition, which is more general than strong convexity, the constraint on the loss function in most previous work.

preprint2020arXiv

Keyphrase Prediction With Pre-trained Language Model

Recently, generative methods have been widely used in keyphrase prediction, thanks to their capability to produce both present keyphrases that appear in the source text and absent keyphrases that do not match any source text. However, the absent keyphrases are generated at the cost of the performance on present keyphrase prediction, since previous works mainly use generative models that rely on the copying mechanism and select words step by step. Besides, the extractive model that directly extracts a text span is more suitable for predicting the present keyphrase. Considering the different characteristics of extractive and generative methods, we propose to divide the keyphrase prediction into two subtasks, i.e., present keyphrase extraction (PKE) and absent keyphrase generation (AKG), to fully exploit their respective advantages. On this basis, a joint inference framework is proposed to make the most of BERT in two subtasks. For PKE, we tackle this task as a sequence labeling problem with the pre-trained language model BERT. For AKG, we introduce a Transformer-based architecture, which fully integrates the present keyphrase knowledge learned from PKE by the fine-tuned BERT. The experimental results show that our approach can achieve state-of-the-art results on both tasks on benchmark datasets.

preprint2020arXiv

Nearly Optimal Clustering Risk Bounds for Kernel K-Means

In this paper, we study the statistical properties of kernel $k$-means and obtain a nearly optimal excess clustering risk bound, substantially improving the state-of-art bounds in the existing clustering risk analyses. We further analyze the statistical effect of computational approximations of the Nyström kernel $k$-means, and prove that it achieves the same statistical accuracy as the exact kernel $k$-means considering only $Ω(\sqrt{nk})$ Nyström landmark points. To the best of our knowledge, such sharp excess clustering risk bounds for kernel (or approximate kernel) $k$-means have never been proposed before.

preprint2020arXiv

Neural Architecture Optimization with Graph VAE

Due to their high computational efficiency on a continuous space, gradient optimization methods have shown great potential in the neural architecture search (NAS) domain. The mapping of network representation from the discrete space to a latent space is the key to discovering novel architectures, however, existing gradient-based methods fail to fully characterize the networks. In this paper, we propose an efficient NAS approach to optimize network architectures in a continuous space, where the latent space is built upon variational autoencoder (VAE) and graph neural networks (GNN). The framework jointly learns four components: the encoder, the performance predictor, the complexity predictor and the decoder in an end-to-end manner. The encoder and the decoder belong to a graph VAE, mapping architectures between continuous representations and network architectures. The predictors are two regression models, fitting the performance and computational complexity, respectively. Those predictors ensure the discovered architectures characterize both excellent performance and high computational efficiency. Extensive experiments demonstrate our framework not only generates appropriate continuous representations but also discovers powerful neural architectures.

preprint2020arXiv

On the Optimal Minimum Distance of Fractional Repetition Codes

Fractional repetition (FR) codes are a class of repair efficient erasure codes that can recover a failed storage node with both optimal repair bandwidth and complexity. In this paper, we study the minimum distance of FR codes, which is the smallest number of nodes whose failure leads to the unrecoverable loss of the stored file. We consider upper bounds on the minimum distance and present several families of explicit FR codes attaining these bounds. The optimal constructions are derived from regular graphs and combinatorial designs, respectively.

preprint2020arXiv

Progressive Cluster Purification for Unsupervised Feature Learning

In unsupervised feature learning, sample specificity based methods ignore the inter-class information, which deteriorates the discriminative capability of representation models. Clustering based methods are error-prone to explore the complete class boundary information due to the inevitable class inconsistent samples in each cluster. In this work, we propose a novel clustering based method, which, by iteratively excluding class inconsistent samples during progressive cluster formation, alleviates the impact of noise samples in a simple-yet-effective manner. Our approach, referred to as Progressive Cluster Purification (PCP), implements progressive clustering by gradually reducing the number of clusters during training, while the sizes of clusters continuously expand consistently with the growth of model representation capability. With a well-designed cluster purification mechanism, it further purifies clusters by filtering noise samples which facilitate the subsequent feature learning by utilizing the refined clusters as pseudo-labels. Experiments on commonly used benchmarks demonstrate that the proposed PCP improves baseline method with significant margins. Our code will be available at https://github.com/zhangyifei0115/PCP.

preprint2020arXiv

SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

Scene text recognition is a hot research topic in computer vision. Recently, many recognition methods based on the encoder-decoder framework have been proposed, and they can handle scene texts of perspective distortion and curve shape. Nevertheless, they still face lots of challenges like image blur, uneven illumination, and incomplete characters. We argue that most encoder-decoder methods are based on local visual features without explicit global semantic information. In this work, we propose a semantics enhanced encoder-decoder framework to robustly recognize low-quality scene texts. The semantic information is used both in the encoder module for supervision and in the decoder module for initializing. In particular, the state-of-the art ASTER method is integrated into the proposed framework as an exemplar. Extensive experiments demonstrate that the proposed framework is more robust for low-quality text images, and achieves state-of-the-art results on several benchmark datasets.

preprint2020arXiv

Self-Training for Domain Adaptive Scene Text Detection

Though deep learning based scene text detection has achieved great progress, well-trained detectors suffer from severe performance degradation for different domains. In general, a tremendous amount of data is indispensable to train the detector in the target domain. However, data collection and annotation are expensive and time-consuming. To address this problem, we propose a self-training framework to automatically mine hard examples with pseudo-labels from unannotated videos or images. To reduce the noise of hard examples, a novel text mining module is implemented based on the fusion of detection and tracking results. Then, an image-to-video generation method is designed for the tasks that videos are unavailable and only images can be used. Experimental results on standard benchmarks, including ICDAR2015, MSRA-TD500, ICDAR2017 MLT, demonstrate the effectiveness of our self-training method. The simple Mask R-CNN adapted with self-training and fine-tuned on real data can achieve comparable or even superior results with the state-of-the-art methods.

preprint2020arXiv

Theoretical Analysis of Divide-and-Conquer ERM: Beyond Square Loss and RKHS

Theoretical analysis of the divide-and-conquer based distributed learning with least square loss in the reproducing kernel Hilbert space (RKHS) have recently been explored within the framework of learning theory. However, the studies on learning theory for general loss functions and hypothesis spaces remain limited. To fill the gap, we study the risk performance of distributed empirical risk minimization (ERM) for general loss functions and hypothesis spaces. The main contributions are two-fold. First, we derive two tight risk bounds under certain basic assumptions on the hypothesis space, as well as the smoothness, Lipschitz continuity, strong convexity of the loss function. Second, we further develop a more general risk bound for distributed ERM without the restriction of strong convexity.

preprint2020arXiv

Two Variants of Euler Sums

For positive integers $p_1,p_2,\ldots,p_k,q$ with $q>1$, we define the Euler $T$-sum $T_{p_1p_2\cdots p_k,q}$ as the sum of those terms of the usual infinite series for the classical Euler sum $S_{p_1p_2\cdots p_k,q}$ with odd denominators. Like the Euler sums, the Euler $T$-sums can be evaluated according to the Contour integral and residue theorem. Using this fact, we obtain explicit formulas for Euler $T$-sums with repeated arguments analogous to those known for Euler sums. Euler $T$-sums can be written as rational linear combinations of the Hoffman $t$-values. Using known results for Hoffman $t$-values, we obtain some examples of Euler $T$-sums in terms of (alternating) multiple zeta values. Moreover, we prove an explicit formula of triple $t$-values in terms of zeta values, double zeta values and double $t$-values. We also define alternating Euler $T$-sums and prove some results about them by the Contour integral and residue theorem. Furthermore, we define another Euler type $T$-sums and find many interesting results. In particular, we give an explicit formulas of triple Kaneko-Tsumura $T$-values of even weight in terms of single and the double $T$-values. Finally, we prove a duality formula of Kaneko-Tsumura's conjecture.

preprint2020arXiv

Two-Level Residual Distillation based Triple Network for Incremental Object Detection

Modern object detection methods based on convolutional neural network suffer from severe catastrophic forgetting in learning new classes without original data. Due to time consumption, storage burden and privacy of old data, it is inadvisable to train the model from scratch with both old and new data when new object classes emerge after the model trained. In this paper, we propose a novel incremental object detector based on Faster R-CNN to continuously learn from new object classes without using old data. It is a triple network where an old model and a residual model as assistants for helping the incremental model learning on new classes without forgetting the previous learned knowledge. To better maintain the discrimination of features between old and new classes, the residual model is jointly trained on new classes in the incremental learning procedure. In addition, a corresponding distillation scheme is designed to guide the training process, which consists of a two-level residual distillation loss and a joint classification distillation loss. Extensive experiments on VOC2007 and COCO are conducted, and the results demonstrate that the proposed method can effectively learn to incrementally detect objects of new classes, and the problem of catastrophic forgetting is mitigated in this context.

preprint2020arXiv

Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

We propose a novel self-supervised method, referred to as Video Cloze Procedure (VCP), to learn rich spatial-temporal representations. VCP first generates "blanks" by withholding video clips and then creates "options" by applying spatio-temporal operations on the withheld clips. Finally, it fills the blanks with "options" and learns representations by predicting the categories of operations applied on the clips. VCP can act as either a proxy task or a target task in self-supervised learning. As a proxy task, it converts rich self-supervised representations into video clip operations (options), which enhances the flexibility and reduces the complexity of representation learning. As a target task, it can assess learned representation models in a uniform and interpretable manner. With VCP, we train spatial-temporal representation models (3D-CNNs) and apply such models on action recognition and video retrieval tasks. Experiments on commonly used benchmarks show that the trained models outperform the state-of-the-art self-supervised models with significant margins.