Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
32works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

32 published item(s)

preprint2026arXiv

1.1 kW, 100 Hz room-temperature diode-pumped nanosecond laser by water immersion cooling

We report a room-temperature diode-pumped solid-state laser by water immersion cooling, which delivers a pulse energy of 11 J at the repetition rate of 100 Hz and the pulse duration of 7 ns, while the beam quality factor is 2.6 times the diffraction limit. To the best of our knowledge, this represents the highest performance achieved for room-temperature nanosecond lasers operating above 100 Hz, which demonstrates the great potentials of room-temperature immersion-cooled nanosecond active mirror lasers.

preprint2026arXiv

A finite-termination algorithm for testing copositivity over the positive semidefinite cone

This paper proposes an efficient algorithm for testing copositivity of homogeneous polynomials over the positive semidefinite cone. The algorithm is based on a novel matrix optimization reformulation and requires solving a hierarchy of semidefinite programs. Notably, it always terminates in finitely many iterations. If a homogeneous polynomial is copositive over the positive semidefinite cone, the algorithm provides a certificate; otherwise, it returns a vector that refutes copositivity. Building on a similar idea, we further propose an algorithm to test copositivity over the direct product of the positive semidefinite cone and the nonnegative orthant. Preliminary numerical experiments demonstrate the effectiveness of the proposed methods.

preprint2026arXiv

Agentic Recommender System with Hierarchical Belief-State Memory

Memory-augmented LLM agents have advanced personalized recommendation, yet existing approaches universally adopt flat memory representations that conflate ephemeral signals with stable preferences, and none provides a complete lifecycle governing how memory should evolve. We propose MARS (Memory-Augmented Agentic Recommender System), a framework that treats recommendation as a partially observable problem and maintains a structured belief state that progressively abstracts noisy behavioral observations into a compact estimate of user preferences. MARS organizes this belief state into three tiers: event memory buffers raw signals, preference memory maintains fine-grained mutable chunks with explicit strength and evidence tracking, and profile memory distills all preferences into a coherent natural language narrative. A complete lifecycle of six operations -- extraction, reinforcement, weakening, consolidation, forgetting, and resynthesis -- is adaptively scheduled by an LLM-based planner rather than fixed-interval heuristics. Experiments on four InstructRec benchmark domains show that MARS achieves state-of-the-art performance with average improvements of 26.4% in HR@1 and 10.3% in NDCG@10 over the strongest baselines with further gains from agentic scheduling in evolving settings.

preprint2026arXiv

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

On-policy self-distillation, where a student is pulled toward a copy of itself conditioned on privileged context (e.g., a verified solution or feedback), offers a promising direction for advancing reasoning capability without a stronger external teacher. Yet in math reasoning the gains are inconsistent, even when the same approach succeeds elsewhere. A pointwise mutual information analysis traces the failure to the privileged context itself: it inflates the teacher's confidence on tokens already implied by the solution (structural connectives, verifiable claims) and deflates it on deliberation tokens ("Wait", "Let", "Maybe") that drive multi-step search. We propose Anti-Self-Distillation (AntiSD), which ascends a divergence between student and teacher rather than descending it: this reverses the per-token sign and yields a naturally bounded advantage in one step. An entropy-triggered gate disables the term once the teacher entropy collapses, completing a drop-in replacement for default self-distillation. Across five models from 4B to 30B parameters on math reasoning benchmarks, AntiSD reaches the GRPO baseline's accuracy in 2 to 10x fewer training steps and improves final accuracy by up to 11.5 points. AntiSD opens a path to scalable self-improvement, where a language model bootstraps its own reasoning through its training signal.

preprint2026arXiv

CAST: Mitigating Object Hallucination in Large Vision-Language Models via Caption-Guided Visual Attention Steering

Although Large Vision-Language Models (LVLMs) have demonstrated remarkable performance on downstream tasks, they frequently produce contents that deviate from visual information, leading to object hallucination. To tackle this, recent works mostly depend on expensive manual annotations and training cost, or decoding strategies which significantly increase inference time. In this work, we observe that LVLMs' attention to visual information is significantly enhanced when answering caption queries compared to non-caption queries. Inspired by this phenomenon, we propose Caption-guided Visual Attention Steering (CAST), a training-free, plug-and-play hallucination mitigation method that leverages the attention activation pattern corresponding to caption queries to enhance LVLMs' visual perception capability. Specifically, we use probing techniques to identify attention heads that are highly sensitive to caption queries and estimate optimized steering directions for their outputs. This steering strengthens LVLM's fine-grained visual perception capabilities, thereby effectively mitigating object hallucination. CAST reduced object hallucination by an average of 6.03% across five widely used LVLMs and five benchmarks including both discriminative and generative tasks, demonstrating state-of-the-art performance while adding little inference cost and preserving other foundational capabilities.

preprint2026arXiv

Counting and Entropy Bounds for Structure-Avoiding Spatially-Coupled LDPC Constructions

Designing large coupling memory quasi-cyclic spatially-coupled LDPC (QC-SC-LDPC) codes with low error floors requires eliminating specific harmful substructures (e.g., short cycles) induced by edge spreading and lifting. Building on our work~\cite{r15} that introduced a Clique Lovász Local Lemma (CLLL)-based design principle and a Moser--Tardos (MT)-type constructive approach, this work quantifies the size and structure of the feasible design space. Using the quantitative CLLL, we derive explicit lower bounds on the number of feasible edge-spreading and lifting assignments satisfying a given family of structure-avoidance constraints, and further obtain bounds on the number of non-equivalent solutions under row/column permutations. Moreover, via Rényi entropy bounds for the MT distribution, we provide a computable lower bound on the number of distinct solutions that the MT algorithm can output, giving a concrete diversity guarantee for randomized constructions. Specializations for eliminating 4-cycles yield closed-form bounds as functions of system parameters, offering a principled way to select the memory and lifting degree and to estimate the remaining search space.

preprint2026arXiv

Enjoy Your Layer Normalization with the Computational Efficiency of RMSNorm

Layer normalization (LN) is a fundamental component in modern deep learning, but its per-sample centering and scaling introduce non-negligible inference overhead. RMSNorm improves efficiency by removing the centering operation, yet this may discard benefits associated with centering. This paper propose a framework to determine whether an LN in an arbitrary DNN can be replaced by RMSNorm without changing the model function. The key idea is to fold LN's centering operation into upstream general linear layers by enforcing zero-mean outputs through the column-centered constraint (CCC) and column-based weight centering (CBWC). We extend the analysis to arbitrary DNNs, define such LNs as foldable LNs, and develop a graph-based detection algorithm. Our analysis shows that many LNs in widely used architectures are foldable, enabling exact inference-time conversion and end-to-end acceleration of 2% to 12% without changing model predictions. Experiments across multiple task families further show that, when exact equivalence is partially broken in practical training settings, our method remains competitive with vanilla LN while improving efficiency.

preprint2026arXiv

Fine-Mem: Fine-Grained Feedback Alignment for Long-Horizon Memory Management

Effective memory management is essential for large language model agents to navigate long-horizon tasks. Recent research has explored using Reinforcement Learning to develop specialized memory manager agents. However, existing approaches rely on final task performance as the primary reward, which results in severe reward sparsity and ineffective credit assignment, providing insufficient guidance for individual memory operations. To this end, we propose Fine-Mem, a unified framework designed for fine-grained feedback alignment. First, we introduce a Chunk-level Step Reward to provide immediate step-level supervision via auxiliary chunk-specific question answering tasks. Second, we devise Evidence-Anchored Reward Attribution to redistribute global rewards by anchoring credit to key memory operations, based on the specific memory items utilized as evidence in reasoning. Together, these components enable stable policy optimization and align local memory operations with the long-term utility of memory. Experiments on Memalpha and MemoryAgentBench demonstrate that Fine-Mem consistently outperforms strong baselines, achieving superior success rates across various sub-tasks. Further analysis reveals its adaptability and strong generalization capabilities across diverse model configurations and backbones.

preprint2026arXiv

From Generic Correlation to Input-Specific Credit in On-Policy Self Distillation

On-policy self-distillation has emerged as a promising paradigm for post-training language models, in which the model conditions on environment feedback to serve as its own teacher, providing dense token-level rewards without external teacher models or step-level annotations. Despite its empirical success, what this reward actually measures and what kind of credit it assigns remain unclear. Under a posterior-compatibility interpretation of feedback conditioning, standard in the implicit-reward literature, we show that the self-distillation token reward is a Bayesian filtering increment whose trajectory sum is exactly the pointwise mutual information between the response and the feedback given the input. This pMI can be raised by input-specific reasoning or by input-generic shortcuts, so we further decompose the teacher log-probability along the input axis. Based on this analysis, we propose CREDIT (Contrastive REward from DIsTillation), which isolates the input-specific component with a batch-contrastive baseline. At the sequence level, CREDIT is a teacher-side surrogate for a contrastive pMI objective that also penalizes responses remaining likely under unrelated inputs. Across coding, scientific reasoning, and tool-use benchmarks on two model families, CREDIT delivers the strongest aggregate performance at negligible additional compute.

preprint2026arXiv

Joint Beamforming and Position Optimization for Fluid RIS-aided ISAC Systems

A fluid reconfigurable intelligent surface (fRIS)-aided integrated sensing and communication (ISAC) system is proposed to enhance multi-target sensing and multi-user communication. Unlike the conventional RIS, the fRIS employs movable elements with adjustable positions, offering additional spatial degrees of freedom. In this system, a joint optimization problem is formulated to minimize sensing beampattern mismatch and symbol estimation error. An algorithm based on alternating minimization is devised to handle the resultant non-convex problem, where the subproblems are solved via augmented Lagrangian method, quadratic programming, semidefinite relaxation, and majorization-minimization. A key challenge is that the element positions affect both incident and reflective channels, leading to the high-order composite objective functions. As a remedy, the high-order terms are transformed into linear and linear-difference forms by exploiting the structural characteristics of fRIS and the channels. Numerical results demonstrate the superiority of the proposed scheme over conventional RIS-aided ISAC and other benchmarks.

preprint2026arXiv

Lagrange multiplier expressions for matrix polynomial optimization and tight relaxations

This paper studies matrix constrained polynomial optimization. We investigate how to get explicit expressions for Lagrange multiplier matrices from the first order optimality conditions. The existence of these expressions can be shown under the nondegeneracy condition. Using Lagrange multiplier matrix expressions, we propose a strengthened Moment-SOS hierarchy for solving matrix polynomial optimization. Under some general assumptions, we show that this strengthened hierarchy is tight, or equivalently, it has finite convergence. We also study how to detect tightness and how to extract optimizers. Numerical experiments are provided to show the efficiency of the strengthened hierarchy.

preprint2026arXiv

Microwave vortex beam lasing via photonic time crystals

Microwave lasing carrying orbital angular momentum (OAM) holds significant potential for advanced applications in fields such as high-capacity communications, precision sensing, and radar imaging. However, conventional approaches to masers fail to produce emission with embedded OAM. The recent emergence of photonic time crystals (PTCs)-artificially structured media with periodically varying electromagnetic properties in time-offers a paradigm shift toward resonance-free lasing without the need for gain media. Yet, pioneering PTC designs have been based on three-dimensional bulk structures, which lack a surface-emitting configuration, and do not possess the capability to modulate OAM, thus hindering the realization of surface-emitted PTC masing that carries OAM. Here, we report the first experimental demonstration of non-resonant, gain medium-free, and surface-emitted microwave vortex beam lasing OAM using ring-shaped PTCs. By developing a multiplier-driven time-varying metamaterial that achieves over 100% equivalent permittivity modulation depth, we establish momentum bandgaps (k gaps) with sufficient bandwidth to overcome intrinsic losses and enable self-sustained coherent microwave amplification. Furthermore, space-time modulation induces non-reciprocity between clockwise and counterclockwise k gap modes within the circularly symmetric PTC structure, facilitating the selective generation of microwave lasing carrying OAM-a capability beyond the reach of conventional maser technologies. Our work bridges PTC physics with coherent OAM-carrying microwave emission, establishing a transformative platform for next-generation wireless communications, advanced sensing systems, and OAM-based technologies.

preprint2026arXiv

Weighted least squares estimation by multivariate-dependent weights for linear regression models

Multivariate linear regression models often face the problem of heteroscedasticity caused by multiple explanatory variables. The weighted least squares estimation with univariate-dependent weights has limitations in constructing weight functions. Therefore, this paper proposes a multivariate dependent weighted least squares estimation method. By constructing a linear combination of explanatory variables and maximizing their Spearman rank correlation coefficient with the absolute residual value, combined with maximum likelihood method to depict heteroscedasticity, it can comprehensively reflect the trend of variance changes in the random error and improve the accuracy of the model. This paper demonstrates that the optimal linear combination exponent estimator for heteroscedastic volatility obtained by our algorithm possesses consistency and asymptotic normality. In the simulation experiment, three scenarios of heteroscedasticity were designed, and the comparison showed that the proposed method was superior to the univariate-dependent weighting method in parameter estimation and model prediction. In the real data applications, the proposed method was applied to two real-world datasets about consumer spending in China and housing prices in Boston. From the perspectives of MAE, RSE, cross-validation, and fitting performance, its accuracy and stability were verified in terms of model prediction, interval estimation, and generalization ability. Additionally, the proposed method demonstrated relative advantages in fitting data with large fluctuations. This study provides an effective new approach for dealing with heteroscedasticity in multivariate linear regression.

preprint2022arXiv

Bi-level Doubly Variational Learning for Energy-based Latent Variable Models

Energy-based latent variable models (EBLVMs) are more expressive than conventional energy-based models. However, its potential on visual tasks are limited by its training process based on maximum likelihood estimate that requires sampling from two intractable distributions. In this paper, we propose Bi-level doubly variational learning (BiDVL), which is based on a new bi-level optimization framework and two tractable variational distributions to facilitate learning EBLVMs. Particularly, we lead a decoupled EBLVM consisting of a marginal energy-based distribution and a structural posterior to handle the difficulties when learning deep EBLVMs on images. By choosing a symmetric KL divergence in the lower level of our framework, a compact BiDVL for visual tasks can be obtained. Our model achieves impressive image generation performance over related works. It also demonstrates the significant capacity of testing image reconstruction and out-of-distribution detection.

preprint2022arXiv

Delving into the Estimation Shift of Batch Normalization in a Network

Batch normalization (BN) is a milestone technique in deep learning. It normalizes the activation using mini-batch statistics during training but the estimated population statistics during inference. This paper focuses on investigating the estimation of population statistics. We define the estimation shift magnitude of BN to quantitatively measure the difference between its estimated population statistics and expected ones. Our primary observation is that the estimation shift can be accumulated due to the stack of BN in a network, which has detriment effects for the test performance. We further find a batch-free normalization (BFN) can block such an accumulation of estimation shift. These observations motivate our design of XBNBlock that replace one BN with BFN in the bottleneck block of residual-style networks. Experiments on the ImageNet and COCO benchmarks show that XBNBlock consistently improves the performance of different architectures, including ResNet and ResNeXt, by a significant margin and seems to be more robust to distribution shift.

preprint2022arXiv

Generalized truncated moment problems with unbounded sets

This paper studies generalized truncated moment problems with unbounded sets. First, we study geometric properties of the truncated moment cone and its dual cone of nonnegative polynomials. By the technique of homogenization, we give a convergent hierarchy of Moment-SOS relaxations for approximating these cones. With them, we give a Moment-SOS method for solving generalized truncated moment problems with unbounded sets. Finitely atomic representing measures, or certificates for their nonexistence, can be obtained by the proposed method. Numerical experiments and applications are also given.

preprint2022arXiv

MDM: Molecular Diffusion Model for 3D Molecule Generation

Molecule generation, especially generating 3D molecular geometries from scratch (i.e., 3D \textit{de novo} generation), has become a fundamental task in drug designs. Existing diffusion-based 3D molecule generation methods could suffer from unsatisfactory performances, especially when generating large molecules. At the same time, the generated molecules lack enough diversity. This paper proposes a novel diffusion model to address those two challenges. First, interatomic relations are not in molecules' 3D point cloud representations. Thus, it is difficult for existing generative models to capture the potential interatomic forces and abundant local constraints. To tackle this challenge, we propose to augment the potential interatomic forces and further involve dual equivariant encoders to encode interatomic forces of different strengths. Second, existing diffusion-based models essentially shift elements in geometry along the gradient of data density. Such a process lacks enough exploration in the intermediate steps of the Langevin dynamics. To address this issue, we introduce a distributional controlling variable in each diffusion/reverse step to enforce thorough explorations and further improve generation diversity. Extensive experiments on multiple benchmarks demonstrate that the proposed model significantly outperforms existing methods for both unconditional and conditional generation tasks. We also conduct case studies to help understand the physicochemical properties of the generated molecules.

preprint2022arXiv

Moire quasi-bound states in the continuum

The novel physics of twisted bilayer graphene has motivated extensive studies of magic-angle flat bands hosted by moiré structures in electronic, photonic and acoustic systems. On the other hand, bound states in the continuum (BICs) have also attracted great attention in recent years because of their potential applications in the field of designing superior optical devices. Here, we combine these two independent concepts to construct a new optical state in a twisted bilayer photonic crystal slab, which is called as moiré quasi-BIC, and numerically demonstrate that such an exotic optical state possesses dual characteristics of moiré flat bands and quasi-BICs. To illustrate the mechanism for the formation of moiré flat bands, we develop an effective model at the center of the Brillouin zone and show that moiré flat bands could be fulfilled by balancing the interlayer coupling strength and the twist angle around the band edge above the light line. Moreover, by decreasing the twist angle of moiré photonic crystal slabs with flat bands, it is shown that the moiré flat-band mode at the Brillouin center gradually approaches a perfect BIC, where the total radiation loss from all diffraction channels is significantly suppressed. To clarify the advantage of moiré quasi-BICs, enhanced second-harmonic generation (SHG) is numerically proven with a wide-angle optical source. The efficiency of SHG assisted by designed moiré quasi-BICs can be greatly improved compared with that based on dispersive quasi-BICs with similar quality factors.

preprint2022arXiv

Revisiting Domain Generalized Stereo Matching Networks from a Feature Consistency Perspective

Despite recent stereo matching networks achieving impressive performance given sufficient training data, they suffer from domain shifts and generalize poorly to unseen domains. We argue that maintaining feature consistency between matching pixels is a vital factor for promoting the generalization capability of stereo matching networks, which has not been adequately considered. Here we address this issue by proposing a simple pixel-wise contrastive learning across the viewpoints. The stereo contrastive feature loss function explicitly constrains the consistency between learned features of matching pixel pairs which are observations of the same 3D points. A stereo selective whitening loss is further introduced to better preserve the stereo feature consistency across domains, which decorrelates stereo features from stereo viewpoint-specific style information. Counter-intuitively, the generalization of feature consistency between two viewpoints in the same scene translates to the generalization of stereo matching performance to unseen domains. Our method is generic in nature as it can be easily embedded into existing stereo networks and does not require access to the samples in the target domain. When trained on synthetic data and generalized to four real-world testing sets, our method achieves superior performance over several state-of-the-art networks.

preprint2022arXiv

Towards Better Understanding with Uniformity and Explicit Regularization of Embeddings in Embedding-based Neural Topic Models

Embedding-based neural topic models could explicitly represent words and topics by embedding them to a homogeneous feature space, which shows higher interpretability. However, there are no explicit constraints for the training of embeddings, leading to a larger optimization space. Also, a clear description of the changes in embeddings and the impact on model performance is still lacking. In this paper, we propose an embedding regularized neural topic model, which applies the specially designed training constraints on word embedding and topic embedding to reduce the optimization space of parameters. To reveal the changes and roles of embeddings, we introduce \textbf{uniformity} into the embedding-based neural topic model as the evaluation metric of embedding space. On this basis, we describe how embeddings tend to change during training via the changes in the uniformity of embeddings. Furthermore, we demonstrate the impact of changes in embeddings in embedding-based neural topic models through ablation studies. The results of experiments on two mainstream datasets indicate that our model significantly outperforms baseline models in terms of the harmony between topic quality and document modeling. This work is the first attempt to exploit uniformity to explore changes in embeddings of embedding-based neural topic models and their impact on model performance to the best of our knowledge.

preprint2021arXiv

EGFI: Drug-Drug Interaction Extraction and Generation with Fusion of Enriched Entity and Sentence Information

The rapid growth in literature accumulates diverse and yet comprehensive biomedical knowledge hidden to be mined such as drug interactions. However, it is difficult to extract the heterogeneous knowledge to retrieve or even discover the latest and novel knowledge in an efficient manner. To address such a problem, we propose EGFI for extracting and consolidating drug interactions from large-scale medical literature text data. Specifically, EGFI consists of two parts: classification and generation. In the classification part, EGFI encompasses the language model BioBERT which has been comprehensively pre-trained on biomedical corpus. In particular, we propose the multi-head attention mechanism and pack BiGRU to fuse multiple semantic information for rigorous context modeling. In the generation part, EGFI utilizes another pre-trained language model BioGPT-2 where the generation sentences are selected based on filtering rules. We evaluated the classification part on "DDIs 2013" dataset and "DTIs" dataset, achieving the FI score of 0.842 and 0.720 respectively. Moreover, we applied the classification part to distinguish high-quality generated sentences and verified with the exiting growth truth to confirm the filtered sentences. The generated sentences that are not recorded in DrugBank and DDIs 2013 dataset also demonstrate the potential of EGFI to identify novel drug relationships.

preprint2021arXiv

Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for Thoracic Disease Identification

Chest X-rays are an important and accessible clinical imaging tool for the detection of many thoracic diseases. Over the past decade, deep learning, with a focus on the convolutional neural network (CNN), has become the most powerful computer-aided diagnosis technology for improving disease identification performance. However, training an effective and robust deep CNN usually requires a large amount of data with high annotation quality. For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming. Thus, existing public chest X-ray datasets usually adopt language pattern based methods to automatically mine labels from reports. However, this results in label uncertainty and inconsistency. In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods from two perspectives to improve a single model's disease identification performance, rather than focusing on an ensemble of models. MODL integrates multiple models to obtain a soft label distribution for optimizing the single target model, which can reduce the effects of original label uncertainty. Moreover, KNNS aims to enhance the robustness of the target model to provide consistent predictions on images with similar medical findings. Extensive experiments on the public NIH Chest X-ray and CheXpert datasets show that our model achieves consistent improvements over the state-of-the-art methods.

preprint2020arXiv

An Efficient Agreement Mechanism in CapsNets By Pairwise Product

Capsule networks (CapsNets) are capable of modeling visual hierarchical relationships, which is achieved by the "routing-by-agreement" mechanism. This paper proposes a pairwise agreement mechanism to build capsules, inspired by the feature interactions of factorization machines (FMs). The proposed method has a much lower computation complexity. We further proposed a new CapsNet architecture that combines the strengths of residual networks in representing low-level visual features and CapsNets in modeling the relationships of parts to wholes. We conduct comprehensive experiments to compare the routing algorithms, including dynamic routing, EM routing, and our proposed FM agreement, based on both architectures of original CapsNet and our proposed one, and the results show that our method achieves both excellent performance and efficiency under a variety of situations.

preprint2020arXiv

An Investigation into the Stochasticity of Batch Whitening

Batch Normalization (BN) is extensively employed in various network architectures by performing standardization within mini-batches. A full understanding of the process has been a central target in the deep learning communities. Unlike existing works, which usually only analyze the standardization operation, this paper investigates the more general Batch Whitening (BW). Our work originates from the observation that while various whitening transformations equivalently improve the conditioning, they show significantly different behaviors in discriminative scenarios and training Generative Adversarial Networks (GANs). We attribute this phenomenon to the stochasticity that BW introduces. We quantitatively investigate the stochasticity of different whitening transformations and show that it correlates well with the optimization behaviors during training. We also investigate how stochasticity relates to the estimation of population statistics during inference. Based on our analysis, we provide a framework for designing and comparing BW algorithms in different scenarios. Our proposed BW algorithm improves the residual networks by a significant margin on ImageNet classification. Besides, we show that the stochasticity of BW can improve the GAN's performance with, however, the sacrifice of the training stability.

preprint2020arXiv

Controllable Orthogonalization in Training DNNs

Orthogonality is widely used for training deep neural networks (DNNs) due to its ability to maintain all singular values of the Jacobian close to 1 and reduce redundancy in representation. This paper proposes a computationally efficient and numerically stable orthogonalization method using Newton's iteration (ONI), to learn a layer-wise orthogonal weight matrix in DNNs. ONI works by iteratively stretching the singular values of a weight matrix towards 1. This property enables it to control the orthogonality of a weight matrix by its number of iterations. We show that our method improves the performance of image classification networks by effectively controlling the orthogonality to provide an optimal tradeoff between optimization benefits and representational capacity reduction. We also show that ONI stabilizes the training of generative adversarial networks (GANs) by maintaining the Lipschitz continuity of a network, similar to spectral normalization (SN), and further outperforms SN by providing controllable orthogonality.

preprint2020arXiv

Convolutional Neural Network Training with Distributed K-FAC

Training neural networks with many processors can reduce time-to-solution; however, it is challenging to maintain convergence and efficiency at large scales. The Kronecker-factored Approximate Curvature (K-FAC) was recently proposed as an approximation of the Fisher Information Matrix that can be used in natural gradient optimizers. We investigate here a scalable K-FAC design and its applicability in convolutional neural network (CNN) training at scale. We study optimization techniques such as layer-wise distribution strategies, inverse-free second-order gradient evaluation, and dynamic K-FAC update decoupling to reduce training time while preserving convergence. We use residual neural networks (ResNet) applied to the CIFAR-10 and ImageNet-1k datasets to evaluate the correctness and scalability of our K-FAC gradient preconditioner. With ResNet-50 on the ImageNet-1k dataset, our distributed K-FAC implementation converges to the 75.9% MLPerf baseline in 18-25% less time than does the classic stochastic gradient descent (SGD) optimizer across scales on a GPU cluster.

preprint2020arXiv

Invertible Zero-Shot Recognition Flows

Deep generative models have been successfully applied to Zero-Shot Learning (ZSL) recently. However, the underlying drawbacks of GANs and VAEs (e.g., the hardness of training with ZSL-oriented regularizers and the limited generation quality) hinder the existing generative ZSL models from fully bypassing the seen-unseen bias. To tackle the above limitations, for the first time, this work incorporates a new family of generative models (i.e., flow-based models) into ZSL. The proposed Invertible Zero-shot Flow (IZF) learns factorized data embeddings (i.e., the semantic factors and the non-semantic ones) with the forward pass of an invertible flow network, while the reverse pass generates data samples. This procedure theoretically extends conventional generative flows to a factorized conditional scheme. To explicitly solve the bias problem, our model enlarges the seen-unseen distributional discrepancy based on negative sample-based distance measurement. Notably, IZF works flexibly with either a naive Bayesian classifier or a held-out trainable one for zero-shot recognition. Experiments on widely-adopted ZSL benchmarks demonstrate the significant performance gain of IZF over existing methods, in both classic and generalized settings.

preprint2020arXiv

Layer-wise Conditioning Analysis in Exploring the Learning Dynamics of DNNs

Conditioning analysis uncovers the landscape of an optimization objective by exploring the spectrum of its curvature matrix. This has been well explored theoretically for linear models. We extend this analysis to deep neural networks (DNNs) in order to investigate their learning dynamics. To this end, we propose layer-wise conditioning analysis, which explores the optimization landscape with respect to each layer independently. Such an analysis is theoretically supported under mild assumptions that approximately hold in practice. Based on our analysis, we show that batch normalization (BN) can stabilize the training, but sometimes result in the false impression of a local minimum, which has detrimental effects on the learning. Besides, we experimentally observe that BN can improve the layer-wise conditioning of the optimization problem. Finally, we find that the last linear layer of a very deep residual network displays ill-conditioned behavior. We solve this problem by only adding one BN layer before the last linear layer, which achieves improved performance over the original and pre-activation residual networks.

preprint2020arXiv

The Medical Scribe: Corpus Development and Model Performance Analyses

There is a growing interest in creating tools to assist in clinical note generation using the audio of provider-patient encounters. Motivated by this goal and with the help of providers and medical scribes, we developed an annotation scheme to extract relevant clinical concepts. We used this annotation scheme to label a corpus of about 6k clinical encounters. This was used to train a state-of-the-art tagging model. We report ontologies, labeling results, model performances, and detailed analyses of the results. Our results show that the entities related to medications can be extracted with a relatively high accuracy of 0.90 F-score, followed by symptoms at 0.72 F-score, and conditions at 0.57 F-score. In our task, we not only identify where the symptoms are mentioned but also map them to canonical forms as they appear in the clinical notes. Of the different types of errors, in about 19-38% of the cases, we find that the model output was correct, and about 17-32% of the errors do not impact the clinical note. Taken together, the models developed in this work are more useful than the F-scores reflect, making it a promising approach for practical applications.

preprint2020arXiv

Topic Detection and Summarization of User Reviews

A massive amount of reviews are generated daily from various platforms. It is impossible for people to read through tons of reviews and to obtain useful information. Automatic summarizing customer reviews thus is important for identifying and extracting the essential information to help users to obtain the gist of the data. However, as customer reviews are typically short, informal, and multifaceted, it is extremely challenging to generate topic-wise summarization.While there are several studies aims to solve this issue, they are heuristic methods that are developed only utilizing customer reviews. Unlike existing method, we propose an effective new summarization method by analyzing both reviews and summaries.To do that, we first segment reviews and summaries into individual sentiments. As the sentiments are typically short, we combine sentiments talking about the same aspect into a single document and apply topic modeling method to identify hidden topics among customer reviews and summaries. Sentiment analysis is employed to distinguish positive and negative opinions among each detected topic. A classifier is also introduced to distinguish the writing pattern of summaries and that of customer reviews. Finally, sentiments are selected to generate the summarization based on their topic relevance, sentiment analysis score and the writing pattern. To test our method, a new dataset comprising product reviews and summaries about 1028 products are collected from Amazon and CNET. Experimental results show the effectiveness of our method compared with other methods.

preprint2020arXiv

Toward a Full MHD Jet Model of Spinning Black Holes--II: Kinematics and Application to the M87 Jet

In this paper, we investigate the magnetohydrodynamical structure of a jet powered by a spinning black hole, where electromagnetic fields and fluid motion are governed by the Grad-Shafranov equation and the Bernoulli equation, respectively. Assuming steady and axisymmetric jet structure, the global solution is uniquely determined with prescribed plasma loading into the jet and the poloidal shape of the outmost magnetic field line. We apply this model to the jet in the center of nearby radio galaxy M87, and we find it can naturally explain the slow flow acceleration and the flow velocity stratification within $10^5$ gravitational radii from the central black hole. In particular, we find the extremal black hole spin is disfavored by the flow velocity measurements, if the plasma loading to the jet is dominated by the electron/positron pair production at the jet base.

preprint2020arXiv

Unbiased Scene Graph Generation via Rich and Fair Semantic Extraction

Extracting graph representation of visual scenes in image is a challenging task in computer vision. Although there has been encouraging progress of scene graph generation in the past decade, we surprisingly find that the performance of existing approaches is largely limited by the strong biases, which mainly stem from (1) unconsciously assuming relations with certain semantic properties such as symmetric and (2) imbalanced annotations over different relations. To alleviate the negative effects of these biases, we proposed a new and simple architecture named Rich and Fair semantic extraction network (RiFa for short), to not only capture rich semantic properties of the relations, but also fairly predict relations with different scale of annotations. Using pseudo-siamese networks, RiFa embeds the subject and object respectively to distinguish their semantic differences and meanwhile preserve their underlying semantic properties. Then, it further predicts subject-object relations based on both the visual and semantic features of entities under certain contextual area, and fairly ranks the relation predictions for those with a few annotations. Experiments on the popular Visual Genome dataset show that RiFa achieves state-of-the-art performance under several challenging settings of scene graph task. Especially, it performs significantly better on capturing different semantic properties of relations, and obtains the best overall per relation performance.