Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
35works
0followers
33topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

35 published item(s)

preprint2026arXiv

From Knowledge to Action: Outcomes of the 2025 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry

Large language models (LLMs) are rapidly changing how researchers in materials science and chemistry discover, organize, and act on scientific knowledge. This paper analyzes a broad set of community-developed LLM applications in an effort to identify emerging patterns in how these systems can be used across the scientific research lifecycle. We organize the projects into two complementary categories: Knowledge Infrastructure, systems that structure, retrieve, synthesize, and validate scientific information; and Action Systems, systems that execute, coordinate, or automate scientific work across computational and experimental environments. The submissions reveal a shift from single-purpose LLM tools toward integrated, multi-agent workflows that combine retrieval, reasoning, tool use, and domain-specific validation. Prominent themes include retrieval-augmented generation as grounding infrastructure, persistent structured knowledge representations, multimodal and multilingual scientific inputs, and early progress toward laboratory-integrated closed-loop systems. Together, these results suggest that LLMs are evolving from general-purpose assistants into composable infrastructure for scientific reasoning and action. This work provides a community snapshot of that transition and a practical taxonomy for understanding emerging LLM-enabled workflows in materials science and chemistry.

preprint2023arXiv

PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

The most advanced text-to-image (T2I) models require significant training costs (e.g., millions of GPU hours), seriously hindering the fundamental innovation for the AIGC community while increasing CO2 emissions. This paper introduces PIXART-$α$, a Transformer-based T2I diffusion model whose image generation quality is competitive with state-of-the-art image generators (e.g., Imagen, SDXL, and even Midjourney), reaching near-commercial application standards. Additionally, it supports high-resolution image synthesis up to 1024px resolution with low training cost, as shown in Figure 1 and 2. To achieve this goal, three core designs are proposed: (1) Training strategy decomposition: We devise three distinct training steps that separately optimize pixel dependency, text-image alignment, and image aesthetic quality; (2) Efficient T2I Transformer: We incorporate cross-attention modules into Diffusion Transformer (DiT) to inject text conditions and streamline the computation-intensive class-condition branch; (3) High-informative data: We emphasize the significance of concept density in text-image pairs and leverage a large Vision-Language model to auto-label dense pseudo-captions to assist text-image alignment learning. As a result, PIXART-$α$'s training speed markedly surpasses existing large-scale T2I models, e.g., PIXART-$α$ only takes 10.8% of Stable Diffusion v1.5's training time (675 vs. 6,250 A100 GPU days), saving nearly \$300,000 (\$26,000 vs. \$320,000) and reducing 90% CO2 emissions. Moreover, compared with a larger SOTA model, RAPHAEL, our training cost is merely 1%. Extensive experiments demonstrate that PIXART-$α$ excels in image quality, artistry, and semantic control. We hope PIXART-$α$ will provide new insights to the AIGC community and startups to accelerate building their own high-quality yet low-cost generative models from scratch.

preprint2022arXiv

A direct imaging method for the exterior and interior inverse scattering problems

This paper is concerned with the inverse acoustic scattering problems by an obstacle or a cavity with a sound-soft or a sound-hard boundary. A direct imaging method relying on the boundary conditions is proposed for reconstructing the shape of the obstacle or cavity. First, the scattered fields are approximated by the Fourier-Bessel functions with the measurements on a closed curve. Then, the indicator functions are established by the superpositions of the total fields or their derivatives to the incident point sources. We prove that the indicator functions vanish only on the boundary of the obstacle or cavity. Numerical examples are also included to demonstrate the effectiveness of the method.

preprint2022arXiv

DoF of a Cooperative X-Channel with an Application to Distributed Computing

We consider a cooperative X-channel with $\sf K$ transmitters (TXs) and $\sf K$ receivers (Rxs) where Txs and Rxs are gathered into groups of size $\sf r$ respectively. Txs belonging to the same group cooperate to jointly transmit a message to each of the $\sf K- \sf r$ Rxs in all other groups, and each Rx individually decodes all its intended messages. By introducing a new interference alignment (IA) scheme, we prove that when $\sf K/\sf r$ is an integer the sum Degrees of Freedom (SDoF) of this channel is lower bounded by $2\sf r$ if $\sf K/\sf r \in \{2,3\}$ and by $\frac{\sf K(\sf K-\sf r)-\sf r^2}{2\sf K-3\sf r}$ if $\sf K/\sf r \geq 4$. We also prove that the SDoF is upper bounded by $\frac{\sf K(\sf K-\sf r)}{2\sf K-3\sf r}$. The proposed IA scheme finds application in a wireless distributed MapReduce framework, where it improves the normalized data delivery time (NDT) compared to the state of the art.

preprint2022arXiv

Erasure conversion for fault-tolerant quantum computing in alkaline earth Rydberg atom arrays

Executing quantum algorithms on error-corrected logical qubits is a critical step for scalable quantum computing, but the requisite numbers of qubits and physical error rates are demanding for current experimental hardware. Recently, the development of error correcting codes tailored to particular physical noise models has helped relax these requirements. In this work, we propose a qubit encoding and gate protocol for ${}^{171}$Yb neutral atom qubits that converts the dominant physical errors into erasures, that is, errors in known locations. The key idea is to encode qubits in a metastable electronic level, such that gate errors predominantly result in transitions to disjoint subspaces whose populations can be continuously monitored via fluorescence. We estimate that 98% of errors can be converted into erasures. We quantify the benefit of this approach via circuit-level simulations of the surface code, finding a threshold increase from 0.937% to 4.15%. We also observe a larger code distance near the threshold, leading to a faster decrease in the logical error rate for the same number of physical qubits, which is important for near-term implementations. Erasure conversion should benefit any error correcting code, and may also be applied to design new gates and encodings in other qubit platforms.

preprint2022arXiv

Game Theoretic Consequences of Resident Matching

The resident matching algorithm, Gale-Shapley, currently used by SF Match and the National Residency Match Program (NRMP), has been in use for over 50 years without fundamental alteration. The algorithm is a 'stable-marriage' method that favors applicant outcomes. However, in these 50 years, there has been a big shift in the supply and demand of applicants and programs. These changes along with the way the Match is implemented have induced a costly race among applicants to apply and interview at as many programs as possible. Meanwhile programs also incur high costs as they maximize their probability of matching by interviewing as many candidates as possible.

preprint2022arXiv

Geometric Policy Iteration for Markov Decision Processes

Recently discovered polyhedral structures of the value function for finite state-action discounted Markov decision processes (MDP) shed light on understanding the success of reinforcement learning. We investigate the value function polytope in greater detail and characterize the polytope boundary using a hyperplane arrangement. We further show that the value space is a union of finitely many cells of the same hyperplane arrangement and relate it to the polytope of the classical linear programming formulation for MDPs. Inspired by these geometric properties, we propose a new algorithm, Geometric Policy Iteration (GPI), to solve discounted MDPs. GPI updates the policy of a single state by switching to an action that is mapped to the boundary of the value function polytope, followed by an immediate update of the value function. This new update rule aims at a faster value improvement without compromising computational efficiency. Moreover, our algorithm allows asynchronous updates of state values which is more flexible and advantageous compared to traditional policy iteration when the state set is large. We prove that the complexity of GPI achieves the best known bound $\mathcal{O}\left(\frac{|\mathcal{A}|}{1 - γ}\log \frac{1}{1-γ}\right)$ of policy iteration and empirically demonstrate the strength of GPI on MDPs of various sizes.

preprint2022arXiv

Lesion Localization in OCT by Semi-Supervised Object Detection

Over 300 million people worldwide are affected by various retinal diseases. By noninvasive Optical Coherence Tomography (OCT) scans, a number of abnormal structural changes in the retina, namely retinal lesions, can be identified. Automated lesion localization in OCT is thus important for detecting retinal diseases at their early stage. To conquer the lack of manual annotation for deep supervised learning, this paper presents a first study on utilizing semi-supervised object detection (SSOD) for lesion localization in OCT images. To that end, we develop a taxonomy to provide a unified and structured viewpoint of the current SSOD methods, and consequently identify key modules in these methods. To evaluate the influence of these modules in the new task, we build OCT-SS, a new dataset consisting of over 1k expert-labeled OCT B-scan images and over 13k unlabeled B-scans. Extensive experiments on OCT-SS identify Unbiased Teacher (UnT) as the best current SSOD method for lesion localization. Moreover, we improve over this strong baseline, with mAP increased from 49.34 to 50.86.

preprint2022arXiv

Multi-view Point Cloud Registration based on Evolutionary Multitasking with Bi-Channel Knowledge Sharing Mechanism

Multi-view point cloud registration is fundamental in 3D reconstruction. Since there are close connections between point clouds captured from different viewpoints, registration performance can be enhanced if these connections be harnessed properly. Therefore, this paper models the registration problem as multi-task optimization, and proposes a novel bi-channel knowledge sharing mechanism for effective and efficient problem solving. The modeling of multi-view point cloud registration as multi-task optimization are twofold. By simultaneously considering the local accuracy of two point clouds as well as the global consistency posed by all the point clouds involved, a fitness function with an adaptive threshold is derived. Also a framework of the co-evolutionary search process is defined for the concurrent optimization of multiple fitness functions belonging to related tasks. To enhance solution quality and convergence speed, the proposed bi-channel knowledge sharing mechanism plays its role. The intra-task knowledge sharing introduces aiding tasks that are much simpler to solve, and useful information is shared across aiding tasks and the original tasks, accelerating the search process. The inter-task knowledge sharing explores commonalities buried among the original tasks, aiming to prevent tasks from getting stuck to local optima. Comprehensive experiments conducted on model object as well as scene point clouds show the efficacy of the proposed method.

preprint2022arXiv

Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

We study reinforcement learning in an infinite-horizon average-reward setting with linear function approximation, where the transition probability function of the underlying Markov Decision Process (MDP) admits a linear form over a feature mapping of the current state, action, and next state. We propose a new algorithm UCRL2-VTR, which can be seen as an extension of the UCRL2 algorithm with linear function approximation. We show that UCRL2-VTR with Bernstein-type bonus can achieve a regret of $\tilde{O}(d\sqrt{DT})$, where $d$ is the dimension of the feature mapping, $T$ is the horizon, and $\sqrt{D}$ is the diameter of the MDP. We also prove a matching lower bound $\tildeΩ(d\sqrt{DT})$, which suggests that the proposed UCRL2-VTR is minimax optimal up to logarithmic factors. To the best of our knowledge, our algorithm is the first nearly minimax optimal RL algorithm with function approximation in the infinite-horizon average-reward setting.

preprint2022arXiv

Optimizing Video Prediction via Video Frame Interpolation

Video prediction is an extrapolation task that predicts future frames given past frames, and video frame interpolation is an interpolation task that estimates intermediate frames between two frames. We have witnessed the tremendous advancement of video frame interpolation, but the general video prediction in the wild is still an open question. Inspired by the photo-realistic results of video frame interpolation, we present a new optimization framework for video prediction via video frame interpolation, in which we solve an extrapolation problem based on an interpolation model. Our video prediction framework is based on optimization with a pretrained differentiable video frame interpolation module without the need for a training dataset, and thus there is no domain gap issue between training and test data. Also, our approach does not need any additional information such as semantic or instance maps, which makes our framework applicable to any video. Extensive experiments on the Cityscapes, KITTI, DAVIS, Middlebury, and Vimeo90K datasets show that our video prediction results are robust in general scenarios, and our approach outperforms other video prediction methods that require a large amount of training data or extra semantic information.

preprint2022arXiv

Real-time Online Multi-Object Tracking in Compressed Domain

Recent online Multi-Object Tracking (MOT) methods have achieved desirable tracking performance. However, the tracking speed of most existing methods is rather slow. Inspired from the fact that the adjacent frames are highly relevant and redundant, we divide the frames into key and non-key frames respectively and track objects in the compressed domain. For the key frames, the RGB images are restored for detection and data association. To make data association more reliable, an appearance Convolutional Neural Network (CNN) which can be jointly trained with the detector is proposed. For the non-key frames, the objects are directly propagated by a tracking CNN based on the motion information provided in the compressed domain. Compared with the state-of-the-art online MOT methods,our tracker is about 6x faster while maintaining a comparable tracking performance.

preprint2022arXiv

Riesz transform associated with the fractional Fourier transform and applications in image edge detection

The fractional Hilbert transform was introduced by Zayed [30, Zayed, 1998] and has been widely used in signal processing. In view of is connection with the fractional Fourier transform, Chen, the first, second and fourth authors of this paper in [6, Chen et al., 2021] studied the fractional Hilbert transform and other fractional multiplier operators on the real line. The present paper is concerned with a natural extension of the fractional Hilbert transform to higher dimensions: this extension is the fractional Riesz transform which is defined by multiplication which a suitable chirp function on the fractional Fourier transform side. In addition to a thorough study of the fractional Riesz transforms, in this work we also investigate the boundedness of singular integral operators with chirp functions on rotation invariant spaces, chirp Hardy spaces and their relation to chirp BMO spaces, as well as applications of the theory of fractional multipliers in partial differential equations. Through numerical simulation, we provide physical and geometric interpretations of high-dimensional fractional multipliers. Finally, we present an application of the fractional Riesz transforms in edge detection which verifies a hypothesis insinuated in [26, Xu et al., 2016]. In fact our numerical implementation confirms that amplitude, phase, and direction information can be simultaneously extracted by controlling the order of the fractional Riesz transform.

preprint2022arXiv

Stellar Atmospheric Parameters of M-type Stars from LAMOST DR8

The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) Low Resolution Spectroscopic Survey (LRS) provides massive spectroscopic data of M-type stars, and the derived stellar parameters could bring vital help to various studies. We adopt the ULySS package to perform $χ^2$ minimization with model spectra generated from the MILES interpolator, and determine the stellar atmospheric parameters for the M-type stars from LAMOST LRS Data Release (DR) 8. Comparison with the stellar parameters from APOGEE Stellar Parameter and Chemical Abundance Pipeline (ASPCAP) suggests that most of our results have good consistency. For M dwarfs, we achieve dispersions better than 74 K, 0.19 dex and 0.16 dex for $T_{\rm eff}$, $\log{g}$ and [Fe/H], while for M giants, the internal uncertainties are 58 K, 0.32 dex and 0.26 dex, respectively. Compared to ASPCAP we also find a systematic underestimation of $Δ{T_{\rm eff}} =$ $-$176 K for M dwarfs, and a systematic overestimation of $Δ{\log{g}} =$ 0.30 dex for M giants. However, such differences are less significant when we make comparison with common stars from other literature, which indicates that systematic biases exist in the difference of ASPCAP and other measurements. A catalog of 763,136 spectra corresponding to 616,314 M-type stars with derived stellar parameters is presented. We determine the stellar parameters for stars with $T_{\rm eff}$ higher than 2,900 K, with $\log{g}$ from -0.24 dex to 5.9 dex. The typical precisions are 45 K, 0.25 dex and 0.22 dex, for $T_{\rm eff}$, $\log{g}$ and [Fe/H], respectively, which are estimated from the duplicate observations of the same stars.

preprint2022arXiv

The backward Euler-Maruyama method for invariant measures of stochastic differential equations with super-linear coefficients

The backward Euler-Maruyama (BEM) method is employed to approximate the invariant measure of stochastic differential equations, where both the drift and the diffusion coefficient are allowed to grow super-linearly. The existence and uniqueness of the invariant measure of the numerical solution generated by the BEM method are proved and the convergence of the numerical invariant measure to the underlying one is shown. Simulations are provided to illustrate the theoretical results and demonstrate the application of our results in the area of system control.

preprint2022arXiv

The China Trade Shock and the ESG Performances of US firms

How does import competition from China affect engagement on ESG initiatives by US corporates? On the one hand, reduced profitability due to import competition and lagging ESG performance of Chinese exporters can disincentivize US firms to put more resources to ESG initiatives. On the other hand, the shift from labor-intensive production to capital/technology-intensive production along with offshoring may improve the US company's ESG performance. Moreover, US companies have incentives to actively pursue more ESG engagement to differentiate from Chinese imports. Exploiting a trade policy in which US congress granted China the Permanent Normal Trade Relations and the resulting change in expected tariff rates on Chinese imports, we find that greater import competition from China leads to an increase in the US company's ESG performance. The improvement primarily stems from "doing more positives" and from more involvement on environmental initiatives. Indirect and direct evidence shows that the improvement is not driven by the change in production process or offshoring, but is consistent with product differentiation. Our results suggest that the trade shock from China has significant impact on the US company's ESG performance.

preprint2022arXiv

The Galerkin analysis for the random periodic solution of semilinear stochastic evolution equations

In this paper we study the numerical method for approximating the random periodic solution of semiliear stochastic evolution equations. The main challenge lies in proving a convergence over an infinite time horizon while simulating infinite-dimensional objects. We first show the existence and uniqueness of the random periodic solution to the equation as the limit of the pull-back flows of the equation, and observe that its mild form is well-defined in the intersection of a family of decreasing Hilbert spaces. Then we propose a Galerkin-type exponential integrator scheme and establish its convergence rate of the strong error to the mild solution, where the order of convergence directly depends on the space (among the family of Hilbert spaces) for the initial point to live. We finally conclude with the best order of convergence that is arbitrarily close to 0.5.

preprint2022arXiv

Towards Understanding Mixture of Experts in Deep Learning

The Mixture-of-Experts (MoE) layer, a sparsely-activated model controlled by a router, has achieved great success in deep learning. However, the understanding of such architecture remains elusive. In this paper, we formally study how the MoE layer improves the performance of neural network learning and why the mixture model will not collapse into a single model. Our empirical results suggest that the cluster structure of the underlying problem and the non-linearity of the expert are pivotal to the success of MoE. To further understand this, we consider a challenging classification problem with intrinsic cluster structures, which is hard to learn using a single expert. Yet with the MoE layer, by choosing the experts as two-layer nonlinear convolutional neural networks (CNNs), we show that the problem can be learned successfully. Furthermore, our theory shows that the router can learn the cluster-center features, which helps divide the input complex problem into simpler linear classification sub-problems that individual experts can conquer. To our knowledge, this is the first result towards formally understanding the mechanism of the MoE layer for deep learning.

preprint2022arXiv

Turning Mathematics Problems into Games: Reinforcement Learning and Gröbner bases together solve Integer Feasibility Problems

Can agents be trained to answer difficult mathematical questions by playing a game? We consider the integer feasibility problem, a challenge of deciding whether a system of linear equations and inequalities has a solution with integer values. This is a famous NP-complete problem with applications in many areas of Mathematics and Computer Science. Our paper describes a novel algebraic reinforcement learning framework that allows an agent to play a game equivalent to the integer feasibility problem. We explain how to transform the integer feasibility problem into a game over a set of arrays with fixed margin sums. The game starts with an initial state (an array), and by applying a legal move that leaves the margins unchanged, we aim to eventually reach a winning state with zeros in specific positions. To win the game the player must find a path between the initial state and a final terminal winning state if one exists. Finding such a winning state is equivalent to solving the integer feasibility problem. The key algebraic ingredient is a Gröbner basis of the toric ideal for the underlying axial transportation polyhedron. The Gröbner basis can be seen as a set of connecting moves (actions) of the game. We then propose a novel RL approach that trains an agent to predict moves in continuous space to cope with the large size of action space. The continuous move is then projected onto the set of legal moves so that the path always leads to valid states. As a proof of concept we demonstrate in experiments that our agent can play well the simplest version of our game for 2-way tables. Our work highlights the potential to train agents to solve non-trivial mathematical queries through contemporary machine learning methods used to train agents to play games.

preprint2021arXiv

Modelling Paralinguistic Properties in Conversational Speech to Detect Bipolar Disorder and Borderline Personality Disorder

Bipolar disorder (BD) and borderline personality disorder (BPD) are two chronic mental health conditions that clinicians find challenging to distinguish based on clinical interviews, due to their overlapping symptoms. In this work, we investigate the automatic detection of these two conditions by modelling both verbal and non-verbal cues in a set of interviews. We propose a new approach of modelling short-term features with visibility-signature transform, and compare it with widely used high-level statistical functions. We demonstrate the superior performance of our proposed signature-based model. Furthermore, we show the role of different sets of features in characterising BD and BPD.

preprint2021arXiv

Towards fast weak adversarial training to solve high dimensional parabolic partial differential equations using XNODE-WAN

Due to the curse of dimensionality, solving high dimensional parabolic partial differential equations (PDEs) has been a challenging problem for decades. Recently, a weak adversarial network (WAN) proposed in (Y.Zang et al., 2020) offered a flexible and computationally efficient approach to tackle this problem defined on arbitrary domains by leveraging the weak solution. WAN reformulates the PDE problem as a generative adversarial network, where the weak solution (primal network) and the test function (adversarial network) are parameterized by the multi-layer deep neural networks (DNNs). However, it is not yet clear whether DNNs are the most effective model for the parabolic PDE solutions as they do not take into account the fundamentally different roles played by time and spatial variables in the solution. To reinforce the difference, we design a novel so-called XNODE model for the primal network, which is built on the neural ODE (NODE) model with additional spatial dependency to incorporate the a priori information of the PDEs and serve as a universal and effective approximation to the solution. The proposed hybrid method (XNODE-WAN), by integrating the XNODE model within the WAN framework, leads to significant improvement in the performance and efficiency of training. Numerical results show that our method can reduce the training time to a fraction of that of the WAN model.

preprint2020arXiv

A Multilevel Monte Carlo Estimator for Matrix Multiplication

Inspired by the latest developments in multilevel Monte Carlo (MLMC) methods and randomised sketching for linear algebra problems we propose a MLMC estimator for real-time processing of matrix structured random data. Our algorithm is particularly effective in handling high-dimensional inner products and matrix multiplication, in applications of image analysis and large-scale supervised learning.

preprint2020arXiv

Cross-modality Person re-identification with Shared-Specific Feature Transfer

Cross-modality person re-identification (cm-ReID) is a challenging but key technology for intelligent video analysis. Existing works mainly focus on learning common representation by embedding different modalities into a same feature space. However, only learning the common characteristics means great information loss, lowering the upper bound of feature distinctiveness. In this paper, we tackle the above limitation by proposing a novel cross-modality shared-specific feature transfer algorithm (termed cm-SSFT) to explore the potential of both the modality-shared information and the modality-specific characteristics to boost the re-identification performance. We model the affinities of different modality samples according to the shared features and then transfer both shared and specific features among and across modalities. We also propose a complementary feature learning strategy including modality adaption, project adversarial learning and reconstruction enhancement to learn discriminative and complementary shared and specific features of each modality, respectively. The entire cm-SSFT algorithm can be trained in an end-to-end manner. We conducted comprehensive experiments to validate the superiority of the overall algorithm and the effectiveness of each component. The proposed algorithm significantly outperforms state-of-the-arts by 22.5% and 19.3% mAP on the two mainstream benchmark datasets SYSU-MM01 and RegDB, respectively.

preprint2020arXiv

Deriving information from missing data: implications for mood prediction

The availability of mobile technologies has enabled the efficient collection prospective longitudinal, ecologically valid self-reported mood data from psychiatric patients. These data streams have potential for improving the efficiency and accuracy of psychiatric diagnosis as well predicting future mood states enabling earlier intervention. However, missing responses are common in such datasets and there is little consensus as to how this should be dealt with in practice. A signature-based method was used to capture different elements of self-reported mood alongside missing data to both classify diagnostic group and predict future mood in patients with bipolar disorder, borderline personality disorder and healthy controls. The missing-response-incorporated signature-based method achieves roughly 66\% correct diagnosis, with f1 scores for three different clinic groups 59\% (bipolar disorder), 75\% (healthy control) and 61\% (borderline personality disorder) respectively. This was significantly more efficient than the naive model which excluded missing data. Accuracies of predicting subsequent mood states and scores were also improved by inclusion of missing responses. The signature method provided an effective approach to the analysis of prospectively collected mood data where missing data was common and should be considered as an approach in other similar datasets.

preprint2020arXiv

Fractional Fourier transforms on $L^p$ and applications

This paper is devoted to the $L^p(\mathbb R)$ theory of the fractional Fourier transform (FRFT) for $1\le p < 2$. In view of the special structure of the FRFT, we study FRFT properties of $L^1$ functions, via the introduction of a suitable chirp operator. However, in the $L^1(\mathbb{R})$ setting, problems of convergence arise even when basic manipulations of functions are performed. We overcome such issues and study the FRFT inversion problem via approximation by suitable means, such as the fractional Gauss and Abel means. We also obtain the regularity of fractional convolution and results on pointwise convergence of FRFT means. Finally we discuss $L^p$ multiplier results and a Littlewood-Paley theorem associated with FRFT.

preprint2020arXiv

Future Video Synthesis with Object Motion Prediction

We present an approach to predict future video frames given a sequence of continuous video frames in the past. Instead of synthesizing images directly, our approach is designed to understand the complex scene dynamics by decoupling the background scene and moving objects. The appearance of the scene components in the future is predicted by non-rigid deformation of the background and affine transformation of moving objects. The anticipated appearances are combined to create a reasonable video in the future. With this procedure, our method exhibits much less tearing or distortion artifact compared to other approaches. Experimental results on the Cityscapes and KITTI datasets show that our model outperforms the state-of-the-art in terms of visual quality and accuracy.

preprint2020arXiv

H-VGRAE: A Hierarchical Stochastic Spatial-Temporal Embedding Method for Robust Anomaly Detection in Dynamic Networks

Detecting anomalous edges and nodes in dynamic networks is critical in various areas, such as social media, computer networks, and so on. Recent approaches leverage network embedding technique to learn how to generate node representations for normal training samples and detect anomalies deviated from normal patterns. However, most existing network embedding approaches learn deterministic node representations, which are sensitive to fluctuations of the topology and attributes due to the high flexibility and stochasticity of dynamic networks. In this paper, a stochastic neural network, named by Hierarchical Variational Graph Recurrent Autoencoder (H-VGRAE), is proposed to detect anomalies in dynamic networks by the learned robust node representations in the form of random variables. H-VGRAE is a semi-supervised model to capture normal patterns in training set by maximizing the likelihood of the adjacency matrix and node attributes via variational inference. Comparing with existing methods, H-VGRAE has three main advantages: 1) H-VGRAE learns robust node representations through stochasticity modeling and the extraction of multi-scale spatial-temporal features; 2) H-VGRAE can be extended to deep structure with the increase of the dynamic network scale; 3) the anomalous edge and node can be located and interpreted from the probabilistic perspective. Extensive experiments on four real-world datasets demonstrate the outperformance of H-VGRAE on anomaly detection in dynamic networks compared with state-of-the-art competitors.

preprint2020arXiv

Rethinking Classification and Localization for Object Detection

Two head structures (i.e. fully connected head and convolution head) have been widely used in R-CNN based detectors for classification and localization tasks. However, there is a lack of understanding of how does these two head structures work for these two tasks. To address this issue, we perform a thorough analysis and find an interesting fact that the two head structures have opposite preferences towards the two tasks. Specifically, the fully connected head (fc-head) is more suitable for the classification task, while the convolution head (conv-head) is more suitable for the localization task. Furthermore, we examine the output feature maps of both heads and find that fc-head has more spatial sensitivity than conv-head. Thus, fc-head has more capability to distinguish a complete object from part of an object, but is not robust to regress the whole object. Based upon these findings, we propose a Double-Head method, which has a fully connected head focusing on classification and a convolution head for bounding box regression. Without bells and whistles, our method gains +3.5 and +2.8 AP on MS COCO dataset from Feature Pyramid Network (FPN) baselines with ResNet-50 and ResNet-101 backbones, respectively.

preprint2020arXiv

Semi-implicit Taylor schemes for stiff rough differential equations

We study a class of semi-implicit Taylor-type numerical methods that are easy to implement and designed to solve multidimensional stochastic differential equations driven by a general rough noise, e.g. a fractional Brownian motion. In the multiplicative noise case, the equation is understood as a rough differential equation in the sense of T.~Lyons. We focus on equations for which the drift coefficient may be unbounded and satisfies a one-sided Lipschitz condition only. We prove well-posedness of the methods, provide a full analysis, and deduce their convergence rate. Numerical experiments show that our schemes are particularly useful in the case of stiff rough stochastic differential equations driven by a fractional Brownian motion.

preprint2020arXiv

Stellar Spectral Interpolation using Machine Learning

Theoretical stellar spectra rely on model stellar atmospheres computed based on our understanding of the physical laws at play in the stellar interiors. These models, coupled with atomic and molecular line databases, are used to generate theoretical stellar spectral libraries (SSLs) comprising of stellar spectra over a regular grid of atmospheric parameters (temperature, surface gravity, abundances) at any desired resolution. Another class of SSLs is referred to as empirical spectral libraries; these contain observed spectra at limited resolution. SSLs play an essential role in deriving the properties of stars and stellar populations. Both theoretical and empirical libraries suffer from limited coverage over the parameter space. This limitation is overcome to some extent by generating spectra for specific sets of atmospheric parameters by interpolating within the grid of available parameter space. In this work, we present a method for spectral interpolation in the optical region using machine learning algorithms that are generic, easily adaptable for any SSL without much change in the model parameters, and computationally inexpensive. We use two machine learning techniques, Random Forest (RF) and Artificial Neural Networks (ANN), and train the models on the MILES library. We apply the trained models to spectra from the CFLIB for testing and show that the performance of the two models is comparable. We show that both the models achieve better accuracy than the existing methods of polynomial based interpolation and the Gaussian radial basis function (RBF) interpolation.

preprint2020arXiv

Transferring Inter-Class Correlation

The Teacher-Student (T-S) framework is widely utilized in the classification tasks, through which the performance of one neural network (the student) can be improved by transferring knowledge from another trained neural network (the teacher). Since the transferring knowledge is related to the network capacities and structures between the teacher and the student, how to define efficient knowledge remains an open question. To address this issue, we design a novel transferring knowledge, the Self-Attention based Inter-Class Correlation (ICC) map in the output layer, and propose our T-S framework, Inter-Class Correlation Transfer (ICCT).

preprint2019arXiv

A Sketched Finite Element Method for Elliptic Models

We consider a sketched implementation of the finite element method for elliptic partial differential equations on high-dimensional models. Motivated by applications in real-time simulation and prediction we propose an algorithm that involves projecting the finite element solution onto a low-dimensional subspace and sketching the reduced equations using randomised sampling. We show that a sampling distribution based on the leverage scores of a tall matrix associated with the discrete Laplacian operator, can achieve nearly optimal performance and a significant speedup. We derive an expression of the complexity of the algorithm in terms of the number of samples that are necessary to meet an error tolerance specification with high probability, and an upper bound for the distance between the sketched and the high-dimensional solutions. Our analysis shows that the projection not only reduces the dimension of the problem but also regularises the reduced system against sketching error. Our numerical simulations suggest speed improvements of two orders of magnitude in exchange for a small loss in the accuracy of the prediction.

preprint2019arXiv

Distributed Learning of Decentralized Control Policies for Articulated Mobile Robots

State-of-the-art distributed algorithms for reinforcement learning rely on multiple independent agents, which simultaneously learn in parallel environments while asynchronously updating a common, shared policy. Moreover, decentralized control architectures (e.g., CPGs) can coordinate spatially distributed portions of an articulated robot to achieve system-level objectives. In this work, we investigate the relationship between distributed learning and decentralized control by learning decentralized control policies for the locomotion of articulated robots in challenging environments. To this end, we present an approach that leverages the structure of the asynchronous advantage actor-critic (A3C) algorithm to provide a natural means of learning decentralized control policies on a single articulated robot. Our primary contribution shows individual agents in the A3C algorithm can be defined by independently controlled portions of the robot&#39;s body, thus enabling distributed learning on a single robot for efficient hardware implementation. We present results of closed-loop locomotion in unstructured terrains on a snake and a hexapod robot, using decentralized controllers learned offline and online respectively. Preprint of the paper submitted to the IEEE Transactions in Robotics (T-RO) journal in October 2018, and accepted for publication as a regular paper in May 2019.

preprint2019arXiv

InversionNet: A Real-Time and Accurate Full Waveform Inversion with CNNs and continuous CRFs

Full-waveform inversion problems are usually formulated as optimization problems, where the forward-wave propagation operator $f$ maps the subsurface velocity structures to seismic signals. The existing computational methods for solving full-waveform inversion are not only computationally expensive, but also yields low-resolution results because of the ill-posedness and cycle skipping issues of full-waveform inversion. To resolve those issues, we employ machine-learning techniques to solve the full-waveform inversion. Specifically, we focus on applying the convolutional neural network~(CNN) to directly derive the inversion operator $f^{-1}$ so that the velocity structure can be obtained without knowing the forward operator $f$. We build a convolutional neural network with an encoder-decoder structure to model the correspondence from seismic data to subsurface velocity structures. Furthermore, we employ the conditional random field~(CRF) on top of the CNN to generate structural predictions by modeling the interactions between different locations on the velocity model. Our numerical examples using synthetic seismic reflection data show that the propose CNN-CRF model significantly improve the accuracy of the velocity inversion while the computational time is reduced.