Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
41works
0followers
20topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

41 published item(s)

preprint2026arXiv

A Diff-Attention Aware State Space Fusion Model for Remote Sensing Classification

Multispectral (MS) and panchromatic (PAN) images describe the same land surface, so these images not only have their own advantages, but also have a lot of similar information. In order to separate these similar information and their respective advantages, reduce the feature redundancy in the fusion stage. This paper introduces a diff-attention aware state space fusion model (DAS2F-Model) for multimodal remote sensing image classification. Based on the selective state space model, a cross-modal diff-attention module (CMDA-Module) is designed to extract and separate the common features and their respective dominant features of MS and PAN images. Among this, space preserving visual mamba (SPVM) retains image spatial features and captures local features by optimizing visual mamba's input reasonably. Considering that features in the fusion stage will have large semantic differences after feature separation and simple fusion operations struggle to effectively integrate these significantly different features, an attention-aware linear fusion module (AALF-Module) is proposed. It performs pixel-wise linear fusion by calculating influence coefficients. This mechanism can fuse features with large semantic differences while keeping the feature size unchanged. Empirical evaluations indicate that the presented method achieves better results than alternative approaches. The relevant code can be found at:https://github.com/AVKSKVL/DAS-F-Model

preprint2026arXiv

AIMS: An Adaptive Integration of Multi-Sensor Measurements for Quadrupedal Robot Localization

This paper addresses the problem of accurate localization for quadrupedal robots operating in narrow tunnel-like environments. Due to the long and homogeneous characteristics of such scenarios, LiDAR measurements often provide weak geometric constraints, making traditional sensor fusion methods susceptible to accumulated motion estimation errors. To address these challenges, we propose AIMS, an adaptive LiDAR-IMU-leg odometry fusion method for robust quadrupedal robot localization in degenerate environments. The proposed method is formulated within an error-state Kalman filtering framework, where LiDAR and leg odometry measurements are integrated with IMU-based state prediction, and measurement noise covariance matrices are adaptively adjusted based on online degeneracy-aware reliability assessment. Experimental results obtained in narrow corridor environments demonstrate that the proposed method improves localization accuracy and robustness compared with state-of-the-art approaches.

preprint2026arXiv

TEA: Temporal Adaptive Satellite Image Semantic Segmentation

Crop mapping based on satellite images time-series (SITS) holds substantial economic value in agricultural production settings, in which parcel segmentation is an essential step. Existing approaches have achieved notable advancements in SITS segmentation with predetermined sequence lengths. However, we found that these approaches overlooked the generalization capability of models across scenarios with varying temporal length, leading to markedly poor segmentation results in such cases. To address this issue, we propose TEA, a TEmporal Adaptive SITS semantic segmentation method to enhance the model's resilience under varying sequence lengths. We introduce a teacher model that encapsulates the global sequence knowledge to guide a student model with adaptive temporal input lengths. Specifically, teacher shapes the student's feature space via intermediate embedding, prototypes and soft label perspectives to realize knowledge transfer, while dynamically aggregating student model to mitigate knowledge forgetting. Finally, we introduce full-sequence reconstruction as an auxiliary task to further enhance the quality of representations across inputs of varying temporal lengths. Through extensive experiments, we demonstrate that our method brings remarkable improvements across inputs of different temporal lengths on common benchmarks. Our code will be publicly available.

preprint2026arXiv

UIKA: Fast Universal Head Avatar from Pose-Free Images

We present UIKA, a feed-forward animatable Gaussian head model from an arbitrary number of unposed inputs, including a single image, multi-view captures, and smartphone-captured videos. Unlike the traditional avatar method, which requires a studio-level multi-view capture system and reconstructs a human-specific model through a long-time optimization process, we rethink the task through the lenses of model representation, network design, and data preparation. First, we introduce a UV-guided avatar modeling strategy, in which each input image is associated with a pixel-wise facial correspondence estimation. Such correspondence estimation allows us to reproject each valid pixel color from screen space to UV space, which is independent of camera pose and character expression. Furthermore, we design learnable UV tokens on which the attention mechanism can be applied at both the screen and UV levels. The learned UV tokens can be decoded into canonical Gaussian attributes using aggregated UV information from all input views. To train our large avatar model, we additionally prepare a large-scale, identity-rich synthetic training dataset. Our method significantly outperforms existing approaches in both monocular and multi-view settings. See more details in our project page: https://zijian-wu.github.io/uika-page/

preprint2026arXiv

VIP: Visual-guided Prompt Evolution for Efficient Dense Vision-Language Inference

Pursuing training-free open-vocabulary semantic segmentation in an efficient and generalizable manner remains challenging due to the deep-seated spatial bias in CLIP. To overcome the limitations of existing solutions, this work moves beyond the CLIP-based paradigm and harnesses the recent spatially-aware dino$.$txt framework to facilitate more efficient and high-quality dense prediction. While dino$.$txt exhibits robust spatial awareness, we find that the semantic ambiguity of text queries gives rise to severe mismatch within its dense cross-modal interactions. To address this, we introduce Visual-guided Prompt evolution (VIP) to rectify the semantic expressiveness of text queries in dino$.$txt, unleashing its potential for fine-grained object perception. Towards this end, VIP integrates alias expansion with a visual-guided distillation mechanism to mine valuable semantic cues, which are robustly aggregated in a saliency-aware manner to yield a high-fidelity prediction. Extensive evaluations demonstrate that VIP: 1. surpasses the top-leading methods by 1.4%-8.4% average mIoU, 2. generalizes well to diverse challenging domains, and 3. requires marginal inference time and memory overhead.

preprint2024arXiv

Interference of Two-Dimensional Bose-Einstein Condensates in Micro-Gravity

We investigate the interference of two-dimensional Bose-Einstein condensates in micro-gravity, which influenced by the interaction strength, initial momentum, gravitational potential and phase difference. We demonstrate that the gravitational potential from the Earth can change the density distribution and phase distribution of the condensate's wave function. As time evolves, a portion of the gravitational potential energy of the microscopic particles can be converted into kinetic energy, which changes the motion of the microscopic particles, and leads to the varying of the density and phase distribution of the wave function. Nevertheless, the influences of the Earth's gravity on the wave function can be eliminated by the micro-gravity environment, which confirmed by many micro-gravity cold atom experiments. Our results present the influences of gravity and other parameters on interference of Bose-Einstein condensates, which help us to reveal the intrinsic natures of the related theoretical predictions and experimental phenomena. Furthermore, our work builds a bridge between the related physical phenomena and our physical intuition about the Bose-Einstein condensates in micro-gravity environment.

preprint2022arXiv

A Dynamic Response Recovery Framework Using Ambient Synchrophasor Data

Wide-area dynamic studies are of paramount importance to ensure the stability and reliability of power grids. The rising deployment synchrophasor and other sensing technologies has made data-driven modeling and analysis possible using the synchronized fast-rate dynamic measurements. This paper presents a general model-free framework of inferring the grid dynamic responses using the ubiquitous ambient data collected during normal grid operations. Building upon the second-order dynamic model, we have established the connection from the cross-correlation of various types of angle, frequency, and line flow data at any two locations, to their corresponding dynamic responses. The theoretical results enabled a fully data-driven framework for estimating the latter using real-time ambient data. Numerical results using the WSCC 9-bus system and a synthetic 2000-bus Texas system have demonstrated the effectiveness of proposed approaches for dynamic modeling of realistic power systems.

preprint2022arXiv

CelebV-HQ: A Large-Scale Video Facial Attributes Dataset

Large-scale datasets have played indispensable roles in the recent success of face generation/editing and significantly facilitated the advances of emerging research fields. However, the academic community still lacks a video dataset with diverse facial attribute annotations, which is crucial for the research on face-related videos. In this work, we propose a large-scale, high-quality, and diverse video dataset with rich facial attribute annotations, named the High-Quality Celebrity Video Dataset (CelebV-HQ). CelebV-HQ contains 35,666 video clips with the resolution of 512x512 at least, involving 15,653 identities. All clips are labeled manually with 83 facial attributes, covering appearance, action, and emotion. We conduct a comprehensive analysis in terms of age, ethnicity, brightness stability, motion smoothness, head pose diversity, and data quality to demonstrate the diversity and temporal coherence of CelebV-HQ. Besides, its versatility and potential are validated on two representative tasks, i.e., unconditional video generation and video facial attribute editing. Furthermore, we envision the future potential of CelebV-HQ, as well as the new opportunities and challenges it would bring to related research directions. Data, code, and models are publicly available. Project page: https://celebv-hq.github.io.

preprint2022arXiv

Contrastive Laplacian Eigenmaps

Graph contrastive learning attracts/disperses node representations for similar/dissimilar node pairs under some notion of similarity. It may be combined with a low-dimensional embedding of nodes to preserve intrinsic and structural properties of a graph. In this paper, we extend the celebrated Laplacian Eigenmaps with contrastive learning, and call them COntrastive Laplacian EigenmapS (COLES). Starting from a GAN-inspired contrastive formulation, we show that the Jensen-Shannon divergence underlying many contrastive graph embedding models fails under disjoint positive and negative distributions, which may naturally emerge during sampling in the contrastive setting. In contrast, we demonstrate analytically that COLES essentially minimizes a surrogate of Wasserstein distance, which is known to cope well under disjoint distributions. Moreover, we show that the loss of COLES belongs to the family of so-called block-contrastive losses, previously shown to be superior compared to pair-wise losses typically used by contrastive methods. We show on popular benchmarks/backbones that COLES offers favourable accuracy/scalability compared to DeepWalk, GCN, Graph2Gauss, DGI and GRACE baselines.

preprint2022arXiv

Detailed Facial Geometry Recovery from Multi-View Images by Learning an Implicit Function

Recovering detailed facial geometry from a set of calibrated multi-view images is valuable for its wide range of applications. Traditional multi-view stereo (MVS) methods adopt an optimization-based scheme to regularize the matching cost. Recently, learning-based methods integrate all these into an end-to-end neural network and show superiority of efficiency. In this paper, we propose a novel architecture to recover extremely detailed 3D faces within dozens of seconds. Unlike previous learning-based methods that regularize the cost volume via 3D CNN, we propose to learn an implicit function for regressing the matching cost. By fitting a 3D morphable model from multi-view images, the features of multiple images are extracted and aggregated in the mesh-attached UV space, which makes the implicit function more effective in recovering detailed facial shape. Our method outperforms SOTA learning-based MVS in accuracy by a large margin on the FaceScape dataset. The code and data are released in https://github.com/zhuhao-nju/mvfr.

preprint2022arXiv

Distributionally Robust Chance-Constrained Optimal Transmission Switching for Renewable Integration

Increasing integration of renewable generation poses significant challenges to ensure robustness guarantees in real-time energy system decision-making. This work aims to develop a robust optimal transmission switching (OTS) framework that can effectively relieve grid congestion and mitigate renewable curtailment. We formulate a two-stage distributionally robust chance-constrained (DRCC) problem that assures limited constraint violations for any uncertainty distribution within an ambiguity set. Here, the second-stage recourse variables are represented as linear functions of uncertainty, yielding an equivalent reformulation involving linear constraints only. We utilize moment-based (mean-mean absolute deviation) and distance-based (infinity-Wasserstein distance) ambiguity sets that lead to scalable mixed-integer linear program (MILP) formulations. Numerical experiments on the IEEE 14-bus and 118-bus systems have demonstrated the performance improvements of the proposed DRCC-OTS approaches in terms of guaranteed constraint violations and reduced renewable curtailment. In particular, the computational efficiency of the moment-based MILP approach, which is scenario-free with fixed problem dimensions, has been confirmed, making it suitable for real-time grid operations.

preprint2022arXiv

Dynamics near Couette flow for the $β$-plane equation

In this paper, we study stationary structures near the planar Couette flow in Sobolev spaces on a channel $\mathbb{T}\times[-1,1]$, and asymptotic behavior of Couette flow in Gevrey spaces on $\mathbb{T}\times\mathbb{R}$ for the $β$-plane equation. Let $T>0$ be the horizontal period of the channel and $α={2π\over T}$ be the wave number. We obtain a sharp region $O$ in the whole $(α,β)$ half-plane such that non-parallel steadily traveling waves do not exist for $(α,β)\in O$ and such traveling waves exist for $(α,β)$ in the remaining regions, near Couette flow for $H^{\geq5}$ velocity perturbation. The borderlines between the region $O$ and its remaining are determined by two curves of the principal eigenvalues of singular Rayleigh-Kuo operators. Our results reveal that there exists $β_*>0$ such that if $|β|\leq β_*$, then non-parallel traveling waves do not exist for any $T>0$, while if $|β|>β_*$, then there exists a critical period $T_β>0$ so that such traveling waves exist for $T\in \left[T_β,\infty\right)$ and do not exist for $T\in \left(0,T_β\right)$, near Couette flow for $H^{\geq5}$ velocity perturbation. This contrasting dynamics plays an important role in studying the long time dynamics near Couette flow with Coriolis effects. Moreover, for any $β\neq0$ and $T>0$, there exist no non-parallel traveling waves with speeds converging in $(-1,1)$ near Couette flow for $H^{\geq5}$ velocity perturbation, in contrast to this, we construct non-shear stationary solutions near Couette flow for $H^{<{5\over2}}$ velocity perturbation, which is a generalization of Theorem 1 in [22] but the construction is more difficult due to the $β$&#39;s term. Finally, we prove nonlinear inviscid damping for Couette flow in some Gevrey spaces by extending the method of [4] to the $β$-plane equation on $\mathbb{T}\times\mathbb{R}$.

preprint2022arXiv

Efficient Representation for Electric Vehicle Charging Station Operations using Reinforcement Learning

Effectively operating electrical vehicle charging station (EVCS) is crucial for enabling the rapid transition of electrified transportation. To solve this problem using reinforcement learning (RL), the dimension of state/action spaces scales with the number of EVs and is thus very large and time-varying. This dimensionality issue affects the efficiency and convergence properties of generic RL algorithms. We develop aggregation schemes that are based on the emergency of EV charging, namely the laxity value. A least-laxity first (LLF) rule is adopted to consider only the total charging power of the EVCS which ensures the feasibility of individual EV schedules. In addition, we propose an equivalent state aggregation that can guarantee to attain the same optimal policy. Based on the proposed representation, policy gradient method is used to find the best parameters for the linear Gaussian policy . Numerical results have validated the performance improvement of the proposed representation approaches in attaining higher rewards and more effective policies as compared to existing approximation based approach.

preprint2022arXiv

Ekar: An Explainable Method for Knowledge Aware Recommendation

This paper studies recommender systems with knowledge graphs, which can effectively address the problems of data sparsity and cold start. Recently, a variety of methods have been developed for this problem, which generally try to learn effective representations of users and items and then match items to users according to their representations. Though these methods have been shown quite effective, they lack good explanations, which are critical to recommender systems. In this paper, we take a different route and propose generating recommendations by finding meaningful paths from users to items. Specifically, we formulate the problem as a sequential decision process, where the target user is defined as the initial state, and the edges on the graphs are defined as actions. We shape the rewards according to existing state-of-the-art methods and then train a policy function with policy gradient methods. Experimental results on three real-world datasets show that our proposed method not only provides effective recommendations but also offers good explanations.

preprint2022arXiv

Ensemble Spectral Prediction (ESP) Model for Metabolite Annotation

A key challenge in metabolomics is annotating measured spectra from a biological sample with chemical identities. Currently, only a small fraction of measurements can be assigned identities. Two complementary computational approaches have emerged to address the annotation problem: mapping candidate molecules to spectra, and mapping query spectra to molecular candidates. In essence, the candidate molecule with the spectrum that best explains the query spectrum is recommended as the target molecule. Despite candidate ranking being fundamental in both approaches, no prior works utilized rank learning tasks in determining the target molecule. We propose a novel machine learning model, Ensemble Spectral Prediction (ESP), for metabolite annotation. ESP takes advantage of prior neural network-based annotation models that utilize multilayer perceptron (MLP) networks and Graph Neural Networks (GNNs). Based on the ranking results of the MLP and GNN-based models, ESP learns a weighting for the outputs of MLP and GNN spectral predictors to generate a spectral prediction for a query molecule. Importantly, training data is stratified by molecular formula to provide candidate sets during model training. Further, baseline MLP and GNN models are enhanced by considering peak dependencies through multi-head attention mechanism and multi-tasking on spectral topic distributions. ESP improves average rank by 41% and 30% over the MLP and GNN baselines, respectively, demonstrating remarkable performance gain over state-of-the-art neural network approaches. We show that annotation performance, for ESP and other models, is a strong function of the number of molecules in the candidate set and their similarity to the target molecule.

preprint2022arXiv

Epipolar Focus Spectrum: A Novel Light Field Representation and Application in Dense-view Reconstruction

Existing light field representations, such as epipolar plane image (EPI) and sub-aperture images, do not consider the structural characteristics across the views, so they usually require additional disparity and spatial structure cues for follow-up tasks. Besides, they have difficulties dealing with occlusions or larger disparity scenes. To this end, this paper proposes a novel Epipolar Focus Spectrum (EFS) representation by rearranging the EPI spectrum. Different from the classical EPI representation where an EPI line corresponds to a specific depth, there is a one-to-one mapping from the EFS line to the view. Accordingly, compared to a sparsely-sampled light field, a densely-sampled one with the same field of view (FoV) leads to a more compact distribution of such linear structures in the double-cone-shaped region with the identical opening angle in its corresponding EFS. Hence the EFS representation is invariant to the scene depth. To demonstrate its effectiveness, we develop a trainable EFS-based pipeline for light field reconstruction, where a dense light field can be reconstructed by compensating the &#34;missing EFS lines&#34; given a sparse light field, yielding promising results with cross-view consistency, especially in the presence of severe occlusion and large disparity. Experimental results on both synthetic and real-world datasets demonstrate the validity and superiority of the proposed method over SOTA methods.

preprint2022arXiv

Graph-adaptive Rectified Linear Unit for Graph Neural Networks

Graph Neural Networks (GNNs) have achieved remarkable success by extending traditional convolution to learning on non-Euclidean data. The key to the GNNs is adopting the neural message-passing paradigm with two stages: aggregation and update. The current design of GNNs considers the topology information in the aggregation stage. However, in the updating stage, all nodes share the same updating function. The identical updating function treats each node embedding as i.i.d. random variables and thus ignores the implicit relationships between neighborhoods, which limits the capacity of the GNNs. The updating function is usually implemented with a linear transformation followed by a non-linear activation function. To make the updating function topology-aware, we inject the topological information into the non-linear activation function and propose Graph-adaptive Rectified Linear Unit (GReLU), which is a new parametric activation function incorporating the neighborhood information in a novel and efficient way. The parameters of GReLU are obtained from a hyperfunction based on both node features and the corresponding adjacent matrix. To reduce the risk of overfitting and the computational cost, we decompose the hyperfunction as two independent components for nodes and features respectively. We conduct comprehensive experiments to show that our plug-and-play GReLU method is efficient and effective given different GNN backbones and various downstream tasks.

preprint2022arXiv

Inferring power system dynamics from synchrophasor data using Gaussian processes

Synchrophasor data provide unprecedented opportunities for inferring power system dynamics, such as estimating voltage angles, frequencies, and accelerations along with power injection at all buses. Aligned to this goal, this work puts forth a novel framework for learning dynamics after small-signal disturbances by leveraging Gaussian processes (GPs). We extend results on learning of a linear time-invariant system using GPs to the multi-input multi-output setup. This is accomplished by decomposing power system swing dynamics into a set of single-input single-output linear systems with narrow frequency pass bands. The proposed learning technique captures time derivatives in continuous time, accommodates data streams sampled at different rates, and can cope with missing data and heterogeneous levels of accuracy. While Kalman filter-based approaches require knowing all system inputs, the proposed framework handles readings of system inputs, outputs, their derivatives, and combinations thereof collected from an arbitrary subset of buses. Relying on minimal system information, it further provides uncertainty quantification in addition to point estimates of system dynamics. Numerical tests verify that this technique can infer dynamics at non-metered buses, impute and predict synchrophasors, and locate faults under linear and non-linear system models under ambient and fault disturbances.

preprint2022arXiv

Migrating Face Swap to Mobile Devices: A lightweight Framework and A Supervised Training Solution

Existing face swap methods rely heavily on large-scale networks for adequate capacity to generate visually plausible results, which inhibits its applications on resource-constraint platforms. In this work, we propose MobileFSGAN, a novel lightweight GAN for face swap that can run on mobile devices with much fewer parameters while achieving competitive performance. A lightweight encoder-decoder structure is designed especially for image synthesis tasks, which is only 10.2MB and can run on mobile devices at a real-time speed. To tackle the unstability of training such a small network, we construct the FSTriplets dataset utilizing facial attribute editing techniques. FSTriplets provides source-target-result training triplets, yielding pixel-level labels thus for the first time making the training process supervised. We also designed multi-scale gradient losses for efficient back-propagation, resulting in faster and better convergence. Experimental results show that our model reaches comparable performance towards state-of-the-art methods, while significantly reducing the number of network parameters. Codes and the dataset have been released.

preprint2022arXiv

Model-free Learning for Risk-constrained Linear Quadratic Regulator with Structured Feedback in Networked Systems

We develop a model-free learning algorithm for the infinite-horizon linear quadratic regulator (LQR) problem. Specifically, (risk) constraints and structured feedback are considered, in order to reduce the state deviation while allowing for a sparse communication graph in practice. By reformulating the dual problem as a nonconvex-concave minimax problem, we adopt the gradient descent max-oracle (GDmax), and for modelfree setting, the stochastic (S)GDmax using zero-order policy gradient. By bounding the Lipschitz and smoothness constants of the LQR cost using specifically defined sublevel sets, we can design the stepsize and related parameters to establish convergence to a stationary point (at a high probability). Numerical tests in a networked microgrid control problem have validated the convergence of our proposed SGDmax algorithm while demonstrating the effectiveness of risk constraints. The SGDmax algorithm has attained a satisfactory optimality gap compared to the classical LQR control, especially for the full feedback case.

preprint2022arXiv

MoFaNeRF: Morphable Facial Neural Radiance Field

We propose a parametric model that maps free-view images into a vector space of coded facial shape, expression and appearance with a neural radiance field, namely Morphable Facial NeRF. Specifically, MoFaNeRF takes the coded facial shape, expression and appearance along with space coordinate and view direction as input to an MLP, and outputs the radiance of the space point for photo-realistic image synthesis. Compared with conventional 3D morphable models (3DMM), MoFaNeRF shows superiority in directly synthesizing photo-realistic facial details even for eyes, mouths, and beards. Also, continuous face morphing can be easily achieved by interpolating the input shape, expression and appearance codes. By introducing identity-specific modulation and texture encoder, our model synthesizes accurate photometric details and shows strong representation ability. Our model shows strong ability on multiple applications including image-based fitting, random generation, face rigging, face editing, and novel view synthesis. Experiments show that our method achieves higher representation ability than previous parametric models, and achieves competitive performance in several applications. To the best of our knowledge, our work is the first facial parametric model built upon a neural radiance field that can be used in fitting, generation and manipulation. The code and data is available at https://github.com/zhuhao-nju/mofanerf.

preprint2022arXiv

On the Sample Complexity of Decentralized Linear Quadratic Regulator with Partially Nested Information Structure

We study the problem of control policy design for decentralized state-feedback linear quadratic control with a partially nested information structure, when the system model is unknown. We propose a model-based learning solution, which consists of two steps. First, we estimate the unknown system model from a single system trajectory of finite length, using least squares estimation. Next, based on the estimated system model, we design a control policy that satisfies the desired information structure. We show that the suboptimality gap between our control policy and the optimal decentralized control policy (designed using accurate knowledge of the system model) scales linearly with the estimation error of the system model. Using this result, we provide an end-to-end sample complexity result for learning decentralized controllers for a linear quadratic control problem with a partially nested information structure.

preprint2022arXiv

Reinforcement Learning Based Optimal Battery Control Under Cycle-based Degradation Cost

Battery energy storage systems are providing increasing level of benefits to power grid operations by decreasing the resource uncertainty and supporting frequency regulation. Thus, it is crucial to obtain the optimal policy for battery to efficiently provide these grid-services while accounting for its degradation cost. To solve the optimal battery control (OBC) problem using the powerful reinforcement learning (RL) algorithms, this paper aims to develop a new representation of the cycle-based battery degradation model according to the rainflow algorithm. As the latter depends on the full trajectory, existing work has to rely on linearized approximation for converting it into instantaneous terms for the Markov Decision Process (MDP) based formulation. We propose a new MDP form by introducing additional state variables that can easily keep track of past switching points for determining the cycle depth. The proposed degradation model allows to adopt the powerful deep Q-Network (DQN) based RL algorithm to efficiently search for the OBC policy. Numerical tests using real market data have demonstrated the performance improvements of the proposed cycle-based degradation model in enhancing the battery operations while mitigating its degradation, as compared to earlier work using the linearized approximation.

preprint2022arXiv

Risk-Aware Learning for Scalable Voltage Optimization in Distribution Grids

Real-time coordination of distributed energy resources (DERs) is crucial for regulating the voltage profile in distribution grids. By capitalizing on a scalable neural network (NN) architecture, one can attain decentralized DER decisions to address the lack of real-time communications. This paper develops an advanced learning-enabled DER coordination scheme by accounting for the potential risks associated with reactive power prediction and voltage deviation. Such risks are quantified by the conditional value-at-risk (CVaR) using the worst-case samples only, and we propose a mini-batch selection algorithm to address the training speed issue in minimizing the CVaR-regularized loss. Numerical tests using real-world data on the IEEE 123-bus test case have demonstrated the computation and safety improvements of the proposed risk-aware learning algorithm for decentralized DER decision making, especially in terms of reducing feeder voltage violations.

preprint2022arXiv

Scalable Learning for Optimal Load Shedding Under Power Grid Emergency Operations

Effective and timely responses to unexpected contingencies are crucial for enhancing the resilience of power grids. Given the fast, complex process of cascading propagation, corrective actions such as optimal load shedding (OLS) are difficult to attain in large-scale networks due to the computation complexity and communication latency issues. This work puts forth an innovative learning-for-OLS approach by constructing the optimal decision rules of load shedding under a variety of potential contingency scenarios through offline neural network (NN) training. Notably, the proposed NN-based OLS decisions are fully decentralized, enabling individual load centers to quickly react to the specific contingency using readily available local measurements. Numerical studies on the IEEE 14-bus system have demonstrated the effectiveness of our scalable OLS design for real-time responses to severe grid emergency events.

preprint2022arXiv

SSGCNet: A Sparse Spectra Graph Convolutional Network for Epileptic EEG Signal Classification

In this article, we propose a sparse spectra graph convolutional network (SSGCNet) for solving Epileptic EEG signal classification problems. The aim is to achieve a lightweight deep learning model without losing model classification accuracy. We propose a weighted neighborhood field graph (WNFG) to represent EEG signals, which reduces the redundant edges between graph nodes. WNFG has lower time complexity and memory usage than the conventional solutions. Using the graph representation, the sequential graph convolutional network is based on a combination of sparse weight pruning technique and the alternating direction method of multipliers (ADMM). Our approach can reduce computation complexity without effect on classification accuracy. We also present convergence results for the proposed approach. The performance of the approach is illustrated in public and clinical-real datasets. Compared with the existing literature, our WNFG of EEG signals achieves up to 10 times of redundant edge reduction, and our approach achieves up to 97 times of model pruning without loss of classification accuracy.

preprint2022arXiv

The number of traveling wave families in a running water with Coriolis force

In this paper, we study the number of traveling wave families near a shear flow under the influence of Coriolis force, where the traveling speeds lie outside the range of the flow $u$. Under the $β$-plane approximation, if the flow $u$ has a critical point at which $u$ attains its minimal (resp. maximal) value, then a unique transitional $β$ value exists in the positive (resp. negative) half-line such that the number of traveling wave families near the shear flow changes suddenly from finite to infinite when $β$ passes through it. On the other hand, if $u$ has no such critical points, then the number is always finite for positive (resp. negative) $β$ values. This is true for general shear flows under mildly technical assumptions, and for a large class of shear flows including a cosine jet $u(y) = {1+\cos(πy)\over 2}$ (i.e. the sinus profile) and analytic monotone flows unconditionally. The sudden change of the number of traveling wave families indicates that long time dynamics around the shear flow is much richer than the non-rotating case, where no such traveling wave families exist.

preprint2021arXiv

Deep Anti-aliasing of Whole Focal Stack Using Slice Spectrum

The paper aims at removing the aliasing effects of the whole focal stack generated from a sparse-sampled {4D} light field, while keeping the consistency across all the focal layers. We first explore the structural characteristics embedded in the focal stack slice and its corresponding frequency-domain representation, i.e., the Focal Stack Spectrum (FSS). We observe that the energy distribution of the FSS always resides within the same triangular area under different angular sampling rates, additionally the continuity of the Point Spread Function (PSF) is intrinsically maintained in the FSS. Based on these two observations, we propose a learning-based FSS reconstruction approach for one-time aliasing removing over the whole focal stack. Moreover, a novel conjugate-symmetric loss function is proposed for the optimization. Compared to previous works, our method avoids an explicit depth estimation, and can handle challenging large-disparity scenarios. Experimental results on both synthetic and real light field datasets show the superiority of the proposed approach for different scenes and various angular sampling rates.

preprint2021arXiv

Efficient Identification of Bus Split Events Using Synchrophasor Data

Accurate grid topology information is of paramount importance for routine power system operations, while the growing availability of synchrophasor data offers the opportunity to identify topology changes in real time. Identification of bus split events, where the substation becomes electrically disconnected, is becoming increasingly important for maintaining the security of power systems. This paper aims to provide an efficient modeling and monitoring framework for bus split events by using a concise bus-branch representation. The linear sensitivity analysis is first performed to quickly evaluate the grid-wide impact of such events. Furthermore, the synchrophasor data enabled identification problem is formulated by matching the changes in bus phase angles (and possibly line flows). To address the resultant bilinear multiplication involving the binary connectivity variables, the McCormick relaxation technique is leveraged to attain an equivalent mixed-integer linear program reformulation that is efficiently solvable. Numerical studies on the IEEE 14-bus and 300-bus systems demonstrate the validity and efficiency of the proposed identification algorithm towards real-time implementation.

preprint2020arXiv

Additively Homomorphical Encryption based Deep Neural Network for Asymmetrically Collaborative Machine Learning

The financial sector presents many opportunities to apply various machine learning techniques. Centralized machine learning creates a constraint which limits further applications in finance sectors. Data privacy is a fundamental challenge for a variety of finance and insurance applications that account on learning a model across different sections. In this paper, we define a new practical scheme of collaborative machine learning that one party owns data, but another party owns labels only, and term this \textbf{Asymmetrically Collaborative Machine Learning}. For this scheme, we propose a novel privacy-preserving architecture where two parties can collaboratively train a deep learning model efficiently while preserving the privacy of each party&#39;s data. More specifically, we decompose the forward propagation and backpropagation of the neural network into four different steps and propose a novel protocol to handle information leakage in these steps. Our extensive experiments on different datasets demonstrate not only stable training without accuracy loss, but also more than 100 times speedup compared with the state-of-the-art system.

preprint2020arXiv

Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning

Talking face generation aims to synthesize a face video with precise lip synchronization as well as a smooth transition of facial motion over the entire video via the given speech clip and facial image. Most existing methods mainly focus on either disentangling the information in a single image or learning temporal information between frames. However, cross-modality coherence between audio and video information has not been well addressed during synthesis. In this paper, we propose a novel arbitrary talking face generation framework by discovering the audio-visual coherence via the proposed Asymmetric Mutual Information Estimator (AMIE). In addition, we propose a Dynamic Attention (DA) block by selectively focusing the lip area of the input image during the training stage, to further enhance lip synchronization. Experimental results on benchmark LRW dataset and GRID dataset transcend the state-of-the-art methods on prevalent metrics with robust high-resolution synthesizing on gender and pose variations.

preprint2020arXiv

Deep Audio-Visual Learning: A Survey

Audio-visual learning, aimed at exploiting the relationship between audio and visual modalities, has drawn considerable attention since deep learning started to be used successfully. Researchers tend to leverage these two modalities either to improve the performance of previously considered single-modality tasks or to address new challenging problems. In this paper, we provide a comprehensive survey of recent audio-visual learning development. We divide the current audio-visual learning tasks into four different subfields: audio-visual separation and localization, audio-visual correspondence learning, audio-visual generation, and audio-visual representation learning. State-of-the-art methods as well as the remaining challenges of each subfield are further discussed. Finally, we summarize the commonly used datasets and performance metrics.

preprint2020arXiv

Dynamics of continuous maps induced on the space of probability measures

For a continuous self-map $f$ on a compact interval $I$ and the induced map $\hat f$ on the space $\mathcal{M}(I)$ of probability measures, we obtain a sharp condition to guarantee that $(I,f)$ is transitive if and only if $(\mathcal{M}(I),\hat f)$ is transitive. We also show that the sensitivity of $(I,f)$ is equivalent to that of $(\mathcal{M}(I),\hat f)$. We prove that $(\mathcal{M}(I),\hat f)$ must have infinite topological entropy for any transitive system $(I,f)$, while there exists a transitive non-autonomous system $(I,f_{0,\infty})$ such that $(\mathcal{M}(I),\hat f_{0,\infty})$ has zero topological entropy, where $f_{0,\infty}=\{f_n\}_{n=0}^\infty$ is a sequence of continuous self-maps on $I$. For a continuous self-map $f$ on a general compact metric space $X$, we show that chain transitivity of $(X, f)$ implies chain mixing of $(\mathcal{M}(X),\hat f)$, and we provide two counterexamples to demonstrate that the converse is not true. We confirm that shadowing of $(X,f)$ is not inherited by $(\mathcal{M}(X),\hat f)$ in general. For a non-autonomous system $(X,f_{0,\infty})$, we prove that if $(\mathcal{M}(X),\hat{f}_{0,\infty})$ is weak mixing of order $n$, then so is $(X,f_{0,\infty})$ for any $n\geq2$; while there exists $(X,f_{0,\infty})$ such that it is weak mixing of order $2$ but $(\mathcal{M} (X),\hat{f}_{0,\infty})$ is not. We then prove that Li-Yorke chaos (resp., distributional chaos) of $(X,f_{0,\infty})$ carries over to $(\mathcal{M}(X),\hat f_{0,\infty})$, and give an example to show that $(X,f)$ and $(\mathcal{M}(X),\hat f)$ may have no Li-Yorke pair simultaneously. We also prove that if $f_n$ is surjective for all $n\geq 0$, then chain mixing of $(\mathcal{M}(X),\hat f_{0,\infty})$ always holds true, and shadowing of $(\mathcal{M}(X),\hat f_{0,\infty})$ implies mixing of $(X, f_{0,\infty})$.

preprint2020arXiv

FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction

In this paper, we present a large-scale detailed 3D face dataset, FaceScape, and propose a novel algorithm that is able to predict elaborate riggable 3D face models from a single image input. FaceScape dataset provides 18,760 textured 3D faces, captured from 938 subjects and each with 20 specific expressions. The 3D models contain the pore-level facial geometry that is also processed to be topologically uniformed. These fine 3D facial models can be represented as a 3D morphable model for rough shapes and displacement maps for detailed geometry. Taking advantage of the large-scale and high-accuracy dataset, a novel algorithm is further proposed to learn the expression-specific dynamic details using a deep neural network. The learned relationship serves as the foundation of our 3D face prediction system from a single image input. Different than the previous methods, our predicted 3D models are riggable with highly detailed geometry under different expressions. The unprecedented dataset and code will be released to public for research purpose.

preprint2020arXiv

Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies

Policy gradient (PG) methods are a widely used reinforcement learning methodology in many applications such as video games, autonomous driving, and robotics. In spite of its empirical success, a rigorous understanding of the global convergence of PG methods is lacking in the literature. In this work, we close the gap by viewing PG methods from a nonconvex optimization perspective. In particular, we propose a new variant of PG methods for infinite-horizon problems that uses a random rollout horizon for the Monte-Carlo estimation of the policy gradient. This method then yields an unbiased estimate of the policy gradient with bounded variance, which enables the tools from nonconvex optimization to be applied to establish global convergence. Employing this perspective, we first recover the convergence results with rates to the stationary-point policies in the literature. More interestingly, motivated by advances in nonconvex optimization, we modify the proposed PG method by introducing periodically enlarged stepsizes. The modified algorithm is shown to escape saddle points under mild assumptions on the reward and the policy parameterization. Under a further strict saddle points assumption, this result establishes convergence to essentially locally-optimal policies of the underlying problem, and thus bridges the gap in existing literature on the convergence of PG methods. Results from experiments on the inverted pendulum are then provided to corroborate our theory, namely, by slightly reshaping the reward function to satisfy our assumption, unfavorable saddle points can be avoided and better limit points can be attained. Intriguingly, this empirical finding justifies the benefit of reward-reshaping from a nonconvex optimization perspective.

preprint2020arXiv

SAPIEN: A SimulAted Part-based Interactive ENvironment

Building home assistant robots has long been a pursuit for vision and robotics researchers. To achieve this task, a simulated environment with physically realistic simulation, sufficient articulated objects, and transferability to the real robot is indispensable. Existing environments achieve these requirements for robotics simulation with different levels of simplification and focus. We take one step further in constructing an environment that supports household tasks for training robot learning algorithm. Our work, SAPIEN, is a realistic and physics-rich simulated environment that hosts a large-scale set for articulated objects. Our SAPIEN enables various robotic vision and interaction tasks that require detailed part-level understanding.We evaluate state-of-the-art vision algorithms for part detection and motion attribute recognition as well as demonstrate robotic interaction tasks using heuristic approaches and reinforcement learning algorithms. We hope that our SAPIEN can open a lot of research directions yet to be explored, including learning cognition through interaction, part motion discovery, and construction of robotics-ready simulated game environment.

preprint2020arXiv

Self-Supervised Human Depth Estimation from Monocular Videos

Previous methods on estimating detailed human depth often require supervised training with `ground truth&#39; depth data. This paper presents a self-supervised method that can be trained on YouTube videos without known depth, which makes training data collection simple and improves the generalization of the learned network. The self-supervised learning is achieved by minimizing a photo-consistency loss, which is evaluated between a video frame and its neighboring frames warped according to the estimated depth and the 3D non-rigid motion of the human body. To solve this non-rigid motion, we first estimate a rough SMPL model at each video frame and compute the non-rigid body motion accordingly, which enables self-supervised learning on estimating the shape details. Experiments demonstrate that our method enjoys better generalization and performs much better on data in the wild.

preprint2020arXiv

The Return of Lexical Dependencies: Neural Lexicalized PCFGs

In this paper we demonstrate that $\textit{context free grammar (CFG) based methods for grammar induction benefit from modeling lexical dependencies}$. This contrasts to the most popular current methods for grammar induction, which focus on discovering $\textit{either}$ constituents $\textit{or}$ dependencies. Previous approaches to marry these two disparate syntactic formalisms (e.g. lexicalized PCFGs) have been plagued by sparsity, making them unsuitable for unsupervised grammar induction. However, in this work, we present novel neural models of lexicalized PCFGs which allow us to overcome sparsity problems and effectively induce both constituents and dependencies within a single model. Experiments demonstrate that this unified framework results in stronger results on both representations than achieved when modeling either formalism alone. Code is available at https://github.com/neulab/neural-lpcfg.

preprint2020arXiv

Transmission Switching Under Wind Uncertainty Using Linear Decision Rules

Increasing penetration of wind and renewable generation poses significant challenges to the power system operations and reliability. This paper considers the real-time optimal transmission switching (OTS) problem for determining the generation dispatch and network topology that can account for uncertain energy resources. To efficiently solve the resultant two-stage stochastic program, we propose a tractable linear decision rule (LDR) based approximation solution that can eliminate the uncertainty variables and lead to fixed number of constraints. The proposed LDR approach can guarantee feasibility, and significantly reduces the computational complexity of existing approaches that grows with the number of randomly generated samples of uncertainty. Numerical studies on IEEE test cases demonstrate the high approximation accuracy of the proposed LDR solution and its computational efficiency for real-time OTS implementations.

preprint2019arXiv

Barotropic instability of shear flows

We consider barotropic instability of shear flows for incompressible fluids with Coriolis effects. For a class of shear flows, we develop a new method to find the sharp stability conditions. We study the flow with Sinus profile in details and obtain the sharp stability boundary in the whole parameter space, which corrects previous results in the fluid literature. Our new results are confirmed by more accurate numerical computation. The addition of the Coriolis force is found to bring fundamental changes to the stability of shear flows. Moreover, we study dynamical behaviors near the shear flows, including the bifurcation of nontrivial traveling wave solutions and the linear inviscid damping. The first ingredient of our proof is a careful classification of the neutral modes. The second one is to write the linearized fluid equation in a Hamiltonian form and then use an instability index theory for general Hamiltonian PDEs. The last one is to study the singular and non-resonant neutral modes using Sturm-Liouville theory and hypergeometric functions.

preprint2019arXiv

Dynamic Distribution State Estimation Using Synchrophasor Data

The increasing deployment of distribution-level phasor measurement units (PMUs) calls for dynamic distribution state estimation (DDSE) approaches that tap into high-rate measurements to maintain a comprehensive view of the distribution-system state in real time. Accordingly, this paper explores the development of a fast algorithmic framework by casting the DDSE task within the time-varying optimization realm. The time-varying formulation involves a time-varying robustified least-squares approach, and it naturally models optimal trajectories for the estimated states under streaming of measurements. The formulation is based on a linear surrogate of the AC power-flow equations, and it includes an element of robustness with respect to measurement outliers. The paper then leverages a first-order prediction-correction method to achieve simple online updates that can provably track the state variables from heterogeneous measurements. This online algorithm is computationally efficient as it relies on the Hessian of the cost function without computing matrix-inverse. Convergence and bounds on the estimation errors of proposed algorithm can be analytically established.