Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
29works
0followers
23topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

29 published item(s)

preprint2026arXiv

AesRM: Improving Video Aesthetics with Expert-Level Feedback

Despite rapid advances in photorealistic video generation, real-world applications such as filmmaking require video aesthetics, e.g., harmonious colors and cinematic lighting, beyond visual fidelity. Prior work on visual aesthetics largely focuses on images, often reducing aesthetics to coarse definitions, e.g., visual pleasure, without a rigorous and systematic evaluation. To improve video aesthetics, we propose a hierarchical rubric that decomposes video aesthetics into three core dimensions, Visual Aesthetics (VA), Visual Fidelity (VF), and Visual Plausibility (VP), with 15 fine-grained criteria, e.g., shot composition. This framework enables a large-scale expert-annotated preference dataset and an evaluation benchmark, AesVideo-Bench, containing about 2500 video pairs with expert annotations on VA, VF, and VP. We then build a family of Video Aesthetic Reward Models (AesRM): AesRM-Base, which directly predicts pairwise preferences on these dimensions to provide efficient post-training rewards, and AesRM-CoT, which additionally generates CoT aligned with all 15 criteria to improve assessment interpretability. Specifically, we train AesRM with a three-stage progressive scheme: (1) Atomic Aesthetic Capability Learning, which strengthens AesRM's recognition of fundamental aesthetic concepts, e.g., accurately identifying centered composition; (2) Cold-Start, aligning the model with structured reasoning protocols; and (3) GRPO, further improving evaluation accuracy. To enhance AesRM-CoT, we additionally propose self-consistency-based CoT synthesis to improve CoT quality and design CoT-based process rewards during GRPO. Extensive experiments show AesRM outperforms baselines on multiple aesthetics benchmarks and is more robust, with lower position bias. Finally, we align Wan2.2 with AesRM and observe clear aesthetic gains over existing aesthetic reward models.

preprint2026arXiv

Almost Sure Convergence Rates of Stochastic Approximation and Reinforcement Learning via a Poisson-Moreau Drift

Establishing almost sure convergence rates for stochastic approximation and reinforcement learning under Markovian noise is a fundamental theoretical challenge. We make progress towards this challenge for a class of stochastic approximation algorithms whose expected updates are contractive, a setting that arises in many reinforcement learning algorithms such as $Q$-learning and linear temporal difference learning. Specifically, for a power-law learning rate $O(n^{-η})$ with $η\in (1/2, 1)$, we obtain an almost sure convergence rate arbitrarily close to $o(n^{1 - 2η})$. For a harmonic learning rate $O(n^{-1})$, we obtain an almost sure convergence rate arbitrarily close to $o(n^{-1})$, which we argue is a strong result because it is close to the optimal rate $O(n^{-1}\log\log n)$ given by the law of the iterated logarithm (for a special case of i.i.d. noise). Key to our analysis is a novel Lyapunov drift construction that applies a Poisson-equation based correction for Markovian noise to the well-established Moreau-envelope smoothing for the contractive mapping.

preprint2026arXiv

Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos

Recent video generative models have greatly improved the realism of AI-generated videos, yet their outputs still exhibit artifacts such as temporal inconsistencies, structural distortions, and semantic incoherence. While Multimodal Large Language Models (MLLMs) show strong visual understanding capabilities, their ability to perceive and reason about such artifacts remains unclear. Existing benchmarks often lack systematic evaluation of artifact-aware perception and fine-grained diagnostic reasoning, especially across diverse AI-generated video domains beyond photorealistic content. To address this gap, we introduce Artifact-Bench, a comprehensive benchmark for evaluating MLLMs on AI-generated video artifact detection and analysis. We first establish a three-level hierarchical taxonomy of realism artifacts, covering photorealistic, animated, and CG-style videos. Based on this taxonomy, Artifact-Bench defines three complementary tasks: real vs. AI-generated video classification, pairwise realism comparison, and fine-grained artifact identification. Experiments on 19 leading MLLMs reveal substantial limitations in artifact perception and reasoning, with many models approaching random or even below-random performance in challenging settings. We further observe significant misalignment between MLLM judgments and human perceptual preferences, highlighting their limited reliability as general evaluators for AI-generated video realism.

preprint2026arXiv

Beyond Linear Attention: Softmax Transformers Implement In-Context Reinforcement Learning

In-context reinforcement learning (ICRL) studies agents that, after pretraining, adapt to new tasks by conditioning on additional context without parameter updates. Existing theoretical analyses of ICRL largely rely on linear attention, which replaces the softmax function in the standard attention with an identity mapping. This paper provides the first theoretical understanding of ICRL without making the unrealistic linear attention simplification. In particular, we consider the standard softmax attention used in practice. We show that, with certain parameters, the layerwise forward pass of a Transformer with such softmax attention is equivalent to iterative updates of a weighted softmax temporal difference (TD) learning algorithm. Here, weighted softmax TD is a new RL algorithm that performs policy evaluation in kernel space and adopts both linear TD and tabular TD as special cases. We also prove that under a certain contraction condition, the policy evaluation error decays as the number of layers grows, with the identified parameters above. Finally, we prove that those parameters are a global minimizer of a pretraining loss, explaining their emergence in our numerical experiments.

preprint2026arXiv

Controlling Decision Drift in Multimodal Sentiment Analysis with Missing Modalities

Multimodal sentiment analysis relies on textual, acoustic, and visual signals, yet real-world data often suffer from modality missing and quality imbalance. Existing methods generate features for modality missing from available ones, but differences in expression mechanisms and sentiment dynamics across modalities may cause the generated features to deviate from true distributions and mislead prediction. In addition, unreliable modalities may dominate fusion, resulting in representation shift across modality combinations and unstable sentiment representations. To address these challenges, we propose a two-level reference alignment framework. The framework introduces stable references at the feature representation and sentiment decision levels to improve robustness under modality missing. First-level reference alignment leverages complete-modality samples to constrain representations and align different modality combinations into a shared sentiment space. Second-level reference alignment enforces cross-modal consistency at the decision level by suppressing unreliable modalities through prototype retrieval and voting. As a result, the framework maintains stable and reliable sentiment predictions under diverse missing-modality patterns. Experiments on CMU-MOSI and CMU-MOSEI show consistent improvements across various missing-modality settings. Under full-modality input, the proposed method achieves state-of-the-art performance, with ACC of 86.28% and 85.88%, and F1 of 86.24% and 85.86%.

preprint2026arXiv

Convergence and Emergence of In-Context Reinforcement Learning with Chain of Thought

In-context reinforcement learning (ICRL) refers to the ability of RL agents to adapt to new tasks at inference time without parameter updates by conditioning on additional context. Recent empirical studies further demonstrate that Chain-of-Thought (CoT) generation can amplify this ICRL capability. This paper is the first to provide a theoretical understanding on how CoT interacts with ICRL. We conduct our analysis in a policy evaluation setup with linear Transformer. We prove that with specific Transformer parameters, the CoT generation process is equivalent to repeatedly executing temporal difference learning updates. Additionally, we provide finite sample convergence analysis showing that the policy evaluation error decreases geometrically with CoT length and eventually saturates at a statistical floor determined by the context length. We also prove that the desired Transformer parameters are a global minimizer of the pretraining loss, providing a theoretical understanding on the empirical emergence of those parameters.

preprint2026arXiv

COPRA: Conditional Parameter Adaptation with Reinforcement Learning for Video Anomaly Detection

Vision-language models (VLMs) have shown strong performance in video anomaly detection (VAD) while providing interpretable predictions. However, existing VLM-based VAD methods suffer from a fundamental mismatch between training and inference in both data distribution and model configuration. First, most approaches rely on static post-training adaptation, limiting generalization under distribution shifts such as unseen environments or anomaly types. Second, they train VLMs on sparse frames from long videos, but perform inference on densely sampled short segments, creating inconsistencies between training and testing. To address these limitations, we propose COPRA, a conditional parameter adaptation framework for VLM-based VAD. Instead of fixed prompts or shared parameter updates, COPRA generates input-specific parameter updates to dynamically adapt a frozen VLM for each video segment during both training and inference. Experiments show strong performance on standard VAD benchmarks, consistently outperforming static baselines in both in-domain and cross-domain settings. Moreover, COPRA generalizes beyond VAD to unseen tasks such as multiple-choice Video Question Answering and Dense Captioning. These results highlight COPRA as an effective weight-space generation framework for scalable, adaptive, and context-aware video understanding. The code will be released at https://github.com/THE-MALT-LAB/COPRA

preprint2026arXiv

Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling

Recent image editing models have achieved remarkable progress in instruction following, multimodal understanding, and complex visual editing. However, existing benchmarks often fail to faithfully reflect human judgment, especially for strong frontier models, due to limited task difficulty and coarse-grained evaluation protocols. In parallel, reward models have become increasingly important for RL-based image editing optimization, yet existing reward model benchmarks still rely on unrealistic evaluation settings that deviate from practical RL scenarios. These limitations hinder reliable assessment of both image editing models and reward models. To address these challenges, we introduce Edit-Compass and EditReward-Compass, a unified evaluation suite for image editing and reward modeling. Edit-Compass contains 2,388 carefully annotated instances spanning six progressively challenging task categories, covering capabilities such as world knowledge reasoning, visual reasoning, and multi-image editing. Beyond broad task coverage, Edit-Compass adopts a fine-grained multidimensional evaluation framework based on structured reasoning and carefully designed scoring rubrics. In parallel, EditReward-Compass contains 2,251 preference pairs that simulate realistic reward modeling scenarios during RL optimization.

preprint2026arXiv

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environments, agentic capability depends not only on language reasoning, but also on the ability to perceive, interpret, and act over heterogeneous contexts such as images, videos, webpages, documents, GUIs. GLM-5V-Turbo is built around this objective: multimodal perception is integrated as a core component of reasoning, planning, tool use, and execution, rather than as an auxiliary interface to a language model. This report summarizes the main improvements behind GLM-5V-Turbo across model design, multimodal training, reinforcement learning, toolchain expansion, and integration with agent frameworks. These developments lead to strong performance in multimodal coding, visual tool use, and framework-based agentic tasks, while preserving competitive text-only coding capability. More importantly, our development process offers practical insights for building multimodal agents, highlighting the central role of multimodal perception, hierarchical optimization, and reliable end-to-end verification.

preprint2026arXiv

MathlibPR: Pull Request Merge-Readiness Benchmark for Formal Mathematical Libraries

The ecosystem of Lean and Mathlib has become the de facto standard for large language model (LLM) assisted formal reasoning with remarkable successes in recent years. Those successes, however, only consume Mathlib as an essential dependency but do not directly contribute to it. In the meantime, the growth of Mathlib has recently been bottlenecked by the review process, which requires human reviewers to judge whether proposed pull requests (PRs) follow the Mathlib's conventions and are worth integrating as part of a shared mathematical infrastructure. This leads to our central question: can LLMs help review Mathlib PRs? To this end, we introduce MathlibPR, a benchmark built from real Mathlib4 PR histories. We further propose a staged evaluation protocol and use it to evaluate both LLM models (e.g., DeepSeek, Qwen, Goedel, and Kimina) and LLM agents (e.g., Codex and Claude Code). Surprisingly, both LLM models and LLM agents struggle to distinguish merge-ready PRs from build-passing PRs that were revised or never merged. By turning Mathlib PR histories into a supervised signal, MathlibPR provides a step toward reviewer assistants and reward models that could help evaluate PRs and steer LLMs toward producing merge-ready Mathlib contributions.

preprint2026arXiv

Offline Two-Player Zero-Sum Markov Games with KL Regularization

We study the problem of learning Nash equilibria in offline two-player zero-sum Markov games. While existing approaches often rely on explicit pessimism to address distribution shift, we show that KL regularization alone suffices to stabilize learning and guarantee convergence. We first introduce Regularized Offline Sequential Equilibrium (ROSE), a theoretical framework that achieves a fast $\widetilde{\mathcal{O}}(1/n)$ convergence rate under \textit{unilateral concentrability}, improving over the standard $\widetilde{\mathcal{O}}(1/\sqrt{n})$ rates in unregularized settings. We then propose Sequential Offline Self-play Mirror Descent (SOS-MD), a practical model-free algorithm based on least-squares value estimation and iterative self-play updates. We prove that the last iterate of SOS-MD attains the same $\widetilde{\mathcal{O}}(1/n)$ statistical rate up to a vanishing optimization error of order $\widetilde{\mathcal{O}}(1/\sqrt{T})$ in the number of self-play iterations $T$.

preprint2026arXiv

OracleTSC: Oracle-Informed Reward Hurdle and Uncertainty Regularization for Traffic Signal Control

Transparent decision-making is essential for traffic signal control (TSC) systems to earn public trust. However, traditional reinforcement learning-based TSC methods function as black boxes with limited interpretability. Although large language models (LLMs) can provide natural language reasoning, reinforcement finetuning for TSC remains unstable because feedback is sparse and delayed, while most actions produce only marginal changes in congestion metrics. We introduce OracleTSC, which stabilizes LLM-based TSC through two mechanisms: (1) a reward hurdle mechanism that filters weak learning signals by subtracting a calibrated threshold from environmental rewards, and (2) uncertainty regularization that maximizes the probability of the selected response to encourage consistent decisions across sampled outputs. Experiments on the LibSignal benchmark show that OracleTSC enables a compact LLaMA3-8B model to substantially improve traffic efficiency, achieving a 75% reduction in travel time and a 67% decrease in queue length compared with the pretrained baseline while preserving interpretability through natural language explanations. OracleTSC also demonstrates strong cross-intersection generalization: a policy trained on one intersection transfers to a structurally different intersection with 17% lower travel time and 39% lower queue length without additional finetuning. These results suggest that uncertainty-aware reward shaping can improve the stability and effectiveness of reinforcement fine-tuning for TSC.

preprint2026arXiv

Teacher-Guided Policy Optimization for LLM Distillation

The convergence of reinforcement learning and imitation learning has positioned Reverse KL (RKL) as a promising paradigm for on-policy LLM distillation, aiming to unify exploration with teacher supervision. However, we identify a critical limitation: when the student and teacher distributions diverge significantly, standard RKL often fails to yield meaningful improvement due to uninformative negative feedback. To address this inefficiency, we propose Teacher-Guided Policy Optimization (TGPO), an on-policy algorithm that incorporates dense directional guidance by leveraging teacher predictions conditioned on the student's rollout. Because TGPO remains on-policy, the algorithm integrates seamlessly with existing RLVR frameworks without requiring additional data annotation. Experiments on complex reasoning benchmarks demonstrate that TGPO significantly outperforms standard baselines and is robust to different teachers.

preprint2022arXiv

Generalizing to New Domains by Mapping Natural Language to Lifted LTL

Recent work on using natural language to specify commands to robots has grounded that language to LTL. However, mapping natural language task specifications to LTL task specifications using language models require probability distributions over finite vocabulary. Existing state-of-the-art methods have extended this finite vocabulary to include unseen terms from the input sequence to improve output generalization. However, novel out-of-vocabulary atomic propositions cannot be generated using these methods. To overcome this, we introduce an intermediate contextual query representation which can be learned from single positive task specification examples, associating a contextual query with an LTL template. We demonstrate that this intermediate representation allows for generalization over unseen object references, assuming accurate groundings are available. We compare our method of mapping natural language task specifications to intermediate contextual queries against state-of-the-art CopyNet models capable of translating natural language to LTL, by evaluating whether correct LTL for manipulation and navigation task specifications can be output, and show that our method outperforms the CopyNet model on unseen object references. We demonstrate that the grounded LTL our method outputs can be used for planning in a simulated OO-MDP environment. Finally, we discuss some common failure modes encountered when translating natural language task specifications to grounded LTL.

preprint2022arXiv

Neural Architecture Searching for Facial Attributes-based Depression Recognition

Recent studies show that depression can be partially reflected from human facial attributes. Since facial attributes have various data structure and carry different information, existing approaches fail to specifically consider the optimal way to extract depression-related features from each of them, as well as investigates the best fusion strategy. In this paper, we propose to extend Neural Architecture Search (NAS) technique for designing an optimal model for multiple facial attributes-based depression recognition, which can be efficiently and robustly implemented in a small dataset. Our approach first conducts a warmer up step to the feature extractor of each facial attribute, aiming to largely reduce the search space and providing customized architecture, where each feature extractor can be either a Convolution Neural Networks (CNN) or Graph Neural Networks (GNN). Then, we conduct an end-to-end architecture search for all feature extractors and the fusion network, allowing the complementary depression cues to be optimally combined with less redundancy. The experimental results on AVEC 2016 dataset show that the model explored by our approach achieves breakthrough performance with 27\% and 30\% RMSE and MAE improvements over the existing state-of-the-art. In light of these findings, this paper provides solid evidences and a strong baseline for applying NAS to time-series data-based mental health analysis.

preprint2022arXiv

SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection

Domain Adaptive Object Detection (DAOD) leverages a labeled domain to learn an object detector generalizing to a novel domain free of annotations. Recent advances align class-conditional distributions by narrowing down cross-domain prototypes (class centers). Though great success,they ignore the significant within-class variance and the domain-mismatched semantics within the training batch, leading to a sub-optimal adaptation. To overcome these challenges, we propose a novel SemantIc-complete Graph MAtching (SIGMA) framework for DAOD, which completes mismatched semantics and reformulates the adaptation with graph matching. Specifically, we design a Graph-embedded Semantic Completion module (GSC) that completes mismatched semantics through generating hallucination graph nodes in missing categories. Then, we establish cross-image graphs to model class-conditional distributions and learn a graph-guided memory bank for better semantic completion in turn. After representing the source and target data as graphs, we reformulate the adaptation as a graph matching problem, i.e., finding well-matched node pairs across graphs to reduce the domain gap, which is solved with a novel Bipartite Graph Matching adaptor (BGM). In a nutshell, we utilize graph nodes to establish semantic-aware node affinity and leverage graph edges as quadratic constraints in a structure-aware matching loss, achieving fine-grained adaptation with a node-to-node graph matching. Extensive experiments verify that SIGMA outperforms existing works significantly. Our code is available at https://github.com/CityU-AIM-Group/SIGMA.

preprint2022arXiv

Towards Robust Adaptive Object Detection under Noisy Annotations

Domain Adaptive Object Detection (DAOD) models a joint distribution of images and labels from an annotated source domain and learns a domain-invariant transformation to estimate the target labels with the given target domain images. Existing methods assume that the source domain labels are completely clean, yet large-scale datasets often contain error-prone annotations due to instance ambiguity, which may lead to a biased source distribution and severely degrade the performance of the domain adaptive detector de facto. In this paper, we represent the first effort to formulate noisy DAOD and propose a Noise Latent Transferability Exploration (NLTE) framework to address this issue. It is featured with 1) Potential Instance Mining (PIM), which leverages eligible proposals to recapture the miss-annotated instances from the background; 2) Morphable Graph Relation Module (MGRM), which models the adaptation feasibility and transition probability of noisy samples with relation matrices; 3) Entropy-Aware Gradient Reconcilement (EAGR), which incorporates the semantic information into the discrimination process and enforces the gradients provided by noisy and clean samples to be consistent towards learning domain-invariant representations. A thorough evaluation on benchmark DAOD datasets with noisy source annotations validates the effectiveness of NLTE. In particular, NLTE improves the mAP by 8.4\% under 60\% corrupted annotations and even approaches the ideal upper bound of training on a clean source dataset.

preprint2022arXiv

WE model: A Machine Learning Model Based on Data-Driven Movie Derivatives Market Prediction

The mature development and the extension of the industry chain make the income structure of the film industry. The income of the traditional film industry depends on the box office and also includes movie merchandising, advertisement, home entertainment, book sales etc. Movie merchandising can even become more profitable than the box office. Therefore, market analysis and forecasting methods for multi-feature merchandising of multi-type films are particularly important. Traditional market research is time-consuming and labour-intensive, and its practical value is restricted. Due to the limited research method, more effective predictive analysis technology needs to be formed. With the rapid development of machine learning and big data, a large number of machine learning algorithms for predictive regression and classification recognition have been proposed and widely used in product design and industry analysis. This paper proposes a high-precision movie merchandising prediction model based on machine learning technology: WE model. This model integrates three machine learning algorithms to accurately predict the movie merchandising market. The WE model learns the relationship between the movie merchandising market and movie features by analyzing the main feature information of movies. After testing, the accuracy rate of prediction and evaluation in the merchandising market reaches 72.5%, and it has achieved a strong market control effect.

preprint2021arXiv

Epitaxial growth and magnetic characterization of EuSe thin films with various crystalline orientations

We report different growth modes and corresponding magnetic properties of thin EuSe films grown by molecular beam epitaxy on BaF2, Pb1-xEuxSe, GaAs, and Bi2Se3 substrates. We show that EuSe growth predominantly in (001) orientation on GaAs(111) and Bi2Se3, but along (111) crystallographic direction on BaF2 (111) and Pb1-xEuxSe (111). High-resolution transmission electron microscopy measurements reveal an abrupt and highly crystalline interface for both (001) and (111) EuSe films. In agreement with previous studies, ordered magnetic phases include antiferromagnetic, ferrimagnetic, and ferromagnetic phases. In contrast to previous studies, we found strong hysteresis for the antiferromagnetic-ferrimagnetic transition. An ability to grow epitaxial films of EuSe on Bi2Se3 and of Bi2Se3 on EuSe enables further investigation of interfacial exchange interactions between various phases of an insulating metamagnetic material and a topological insulator.

preprint2021arXiv

Fermi level tuning and band alignment in Mn doped InAs/GaSb

InAs/GaSb hosts a broken gap band alignment that has been shown to generate helical topological edge states. Upon the introduction of Mn into the structure, it has been predicted to host a quantized anomalous Hall effect. Here, we show that dilute Mn doping on InAs in InAs/GaSb, allows a tuning of the Fermi level, the introduction of paramagnetism, but also has a non-trivial impact on the band alignment of the system. The measurement of Shubnikov-de-Haas oscillations, cyclotron resonance, and a non-linear Hall effect in Mn-doped samples indicate the coexistence of a high mobility two-dimensional electron gas and a hole gas. Conversely, in undoped InAs/GaSb, pure-n-type transport is observed. We hypothesize that Mn acceptor levels can pin the Fermi energy near the valence band edge of InAs, far from the interface, which introduces a strong band bending to preserve the band offset at the InAs/GaSb interface. The realization of the QAHE in this structure will thus require a careful control of the band alignment to preserve topological insulating character.

preprint2021arXiv

Observation of coexisting weak localization and superconducting fluctuations in strained Sn1-xInxTe thin films

Topological superconductors have attracted tremendous excitement as they are predicted to host Majorana zero modes that can be utilized for topological quantum computing. Candidate topological superconductor Sn1-xInxTe thin films (0<x<0.3) grown by molecular beam epitaxy and strained in the (111) plane are shown to host three coexisting quantum effects: localization, antilocalization and superconducting fluctuations above the critical temperature Tc. An analysis of the normal state magnetoresistance reveals these effects. Weak localization is consistently observed in superconducting samples, indicating that superconductivity originates dominantly from trivial valence band states that may be strongly spin-orbit split. A large enhancement of the conductivity is observed above Tc, indicating that quantum coherent quasiparticle effects coexist with superconducting fluctuations. Our results motivate a re-examination of the debated pairing symmetry of this material when subjected to quantum confinement and lattice strain.

preprint2020arXiv

A Simple Phase Retrieval Algorithm from a Single Shot Interferogram

Traditional phase-shifting interferometry technique cannot be used to measure time-varying phase distributions. But single shot techniques could resolve the problem. Many efforts have been made on the phase retrieval methods from a single shot interferogram. In the paper, a simple and effective method is presented without complex computation. The interference fringe is transferred to a phase distribution with a look-up-table. And then it is divided into different regions according to the parity of every pixel. The pixels in the same region have the same parity, which determines the wrapped phase. Additionally, the light spot displacement of a local wavefront is obtained to solve the global sign ambiguity. The theoretical simulation results indicate that the PV of wavefront error is 0.00054(lambda) and the rms is 0.000125(lambda), which is much better than the results from the Fast Fourier Transformation method. We also use it in the experimentally measured interferogram. Our algorithm has the advantages of simplicity, high precision and effective for both open and closed interferometer fringes, which will be valuable for real time monitoring the optical elements shape during their processing.

preprint2020arXiv

Anomalous critical point behavior in dilute magnetic semiconductor (Ca,Na)(Zn,Mn)2Sb2

In this paper we report successful synthesis and magnetic properties of (Ca,Na)(Zn,Mn)2Sb2 as a new ferromagnetic dilute magnetic semiconductor (DMS). In this DMS material the concentration of magnetic moments can be controlled independently from the concentration of electric charge carriers that are required for mediating magnetic interactions between these moments. This feature allows us to separately investigate the effect of carriers and of spins on the ferromagnetic properties of this new DMS alloy, and particularly of the critical ferromagnetic behavior. We use modified Arrott plot technique to establish critical exponents b, g, and d of this alloy. We find that at low Mn concentrations (< 10 at.%), it is governed by short-range 3D-Ising behavior, with experimental values of b, g, and d very close to theoretical 3D-Ising values of 0.325, 1.24, and 4.815. However, as the Mn concentration increases, this DMS material exhibits a mixed-phase behavior, with g retaining its 3D-Ising characteristics, but b crossing over to longer-range mean-field behavior.

preprint2020arXiv

Deep least-squares methods: an unsupervised learning-based numerical method for solving elliptic PDEs

This paper studies an unsupervised deep learning-based numerical approach for solving partial differential equations (PDEs). The approach makes use of the deep neural network to approximate solutions of PDEs through the compositional construction and employs least-squares functionals as loss functions to determine parameters of the deep neural network. There are various least-squares functionals for a partial differential equation. This paper focuses on the so-called first-order system least-squares (FOSLS) functional studied in [3], which is based on a first-order system of scalar second-order elliptic PDEs. Numerical results for second-order elliptic PDEs in one dimension are presented.

preprint2020arXiv

Resolution enhancement and realistic speckle recovery with generative adversarial modeling of micro-optical coherence tomography

A resolution enhancement technique for optical coherence tomography (OCT), based on Generative Adversarial Networks (GANs), was developed and investigated. GANs have been previously used for resolution enhancement of photography and optical microscopy images. We have adapted and improved this technique for OCT image generation. Conditional GANs (cGANs) were trained on a novel set of ultrahigh resolution spectral domain OCT volumes, termed micro-OCT, as the high-resolution ground truth (~1$μ$m isotropic resolution). The ground truth was paired with a low-resolution image obtained by synthetically degrading resolution 4x in one of (1-D) or both axial and lateral axes (2-D). Cross-sectional image (B-scan) volumes obtained from in vivo imaging of human labial (lip) tissue and mouse skin were used in separate feasibility experiments. Accuracy of resolution enhancement compared to ground truth was quantified with human perceptual accuracy tests performed by an OCT expert. The GAN loss in the optimization objective, noise injection in both the generator and discriminator models, and multi-scale discrimination were found to be important for achieving realistic speckle appearance in the generated OCT images. The utility of high resolution speckle recovery was illustrated by an example of micro-OCT imaging of blood vessels in lip tissue. Qualitative examples applying the models to image data from outside of the training data distribution, namely human retina and mouse bladder, were also demonstrated, suggesting potential for cross-domain transferability. This preliminary study suggests that deep learning generative models trained on OCT images from high-performance prototype systems may have potential in enhancing lower resolution data from mainstream/commercial systems, thereby bringing cutting-edge technology to the masses at low cost.

preprint2020arXiv

Review of data analysis in vision inspection of power lines with an in-depth discussion of deep learning technology

The widespread popularity of unmanned aerial vehicles enables an immense amount of power lines inspection data to be collected. How to employ massive inspection data especially the visible images to maintain the reliability, safety, and sustainability of power transmission is a pressing issue. To date, substantial works have been conducted on the analysis of power lines inspection data. With the aim of providing a comprehensive overview for researchers who are interested in developing a deep-learning-based analysis system for power lines inspection data, this paper conducts a thorough review of the current literature and identifies the challenges for future research. Following the typical procedure of inspection data analysis, we categorize current works in this area into component detection and fault diagnosis. For each aspect, the techniques and methodologies adopted in the literature are summarized. Some valuable information is also included such as data description and method performance. Further, an in-depth discussion of existing deep-learning-related analysis methods in power lines inspection is proposed. Finally, we conclude the paper with several research trends for the future of this area, such as data quality problems, small object detection, embedded application, and evaluation baseline.

preprint2020arXiv

Showing Your Work Doesn&#39;t Always Work

In natural language processing, a recently popular line of work explores how to best report the experimental results of neural networks. One exemplar publication, titled &#34;Show Your Work: Improved Reporting of Experimental Results,&#34; advocates for reporting the expected validation effectiveness of the best-tuned model, with respect to the computational budget. In the present work, we critically examine this paper. As far as statistical generalizability is concerned, we find unspoken pitfalls and caveats with this approach. We analytically show that their estimator is biased and uses error-prone assumptions. We find that the estimator favors negative errors and yields poor bootstrapped confidence intervals. We derive an unbiased alternative and bolster our claims with empirical evidence from statistical simulation. Our codebase is at http://github.com/castorini/meanmax.

preprint2020arXiv

TanhExp: A Smooth Activation Function with High Convergence Speed for Lightweight Neural Networks

Lightweight or mobile neural networks used for real-time computer vision tasks contain fewer parameters than normal networks, which lead to a constrained performance. In this work, we proposed a novel activation function named Tanh Exponential Activation Function (TanhExp) which can improve the performance for these networks on image classification task significantly. The definition of TanhExp is f(x) = xtanh(e^x). We demonstrate the simplicity, efficiency, and robustness of TanhExp on various datasets and network models and TanhExp outperforms its counterparts in both convergence speed and accuracy. Its behaviour also remains stable even with noise added and dataset altered. We show that without increasing the size of the network, the capacity of lightweight neural networks can be enhanced by TanhExp with only a few training epochs and no extra parameters added.

preprint2019arXiv

Voltage-induced high-speed DW motion in a synthetic antiferromagnet

Voltage-induced motion of a magnetic domain wall (DW) has potential in developing novel devices with ultralow dissipation. However, the speed for the voltage-induced DW motion (VIDWM) in a single ferromagnetic layer is usually very low. In this work, we proposed VIDWM with high speed in a synthetic antiferromaget (SAF). The velocity for the coupled DWs in the SAF is significantly higher than its counterpart in a single ferromagnetic layer. Strong interlayer antiferromagnetic exchange coupling plays a critical role for the high DW velocity since it inhibits the tilting of DW plane with strong Dzyaloshinskii-Moriya interaction. On the other hand, the Walker breakdown of DW motion is also inhibited due to the stabilization of moment orientation under a strong interlayer antiferromagnetic coupling. In theory, the voltage-induced gradient of magnetic anisotropy is proved to be equal to an effective magnetic field that drives DW.