Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
32works
0followers
23topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

32 published item(s)

preprint2026arXiv

Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

On-policy distillation (OPD) has emerged as an efficient post-training paradigm for large language models. However, existing studies largely attribute this advantage to denser and more stable supervision, while the parameter-level mechanisms underlying OPD's efficiency remain poorly understood. In this work, we argue that OPD's efficiency stems from a form of ``foresight'': it establishes a stable update trajectory toward the final model early in training. This foresight manifests in two aspects. First, at the \textbf{Module-Allocation Level}, OPD identifies regions with low marginal utility and concentrates updates on modules that are more critical to reasoning. Second, at the \textbf{Update-Direction Level}, OPD exhibits stronger low-rank concentration, with its dominant subspaces aligning closely with the final update subspace early in training. Building on these findings, we propose \textbf{EffOPD}, a plug-and-play acceleration method that speeds up OPD by adaptively selecting an extrapolation step size and moving along the current update direction. EffOPD requires no additional trainable modules or complex hyperparameter tuning, and achieves an average training acceleration of $3\times$ while maintaining comparable final performance. Overall, our findings provide a parameter-dynamics perspective for understanding the efficiency of OPD and offer practical insights for designing more efficient post-training methods for large language models.

preprint2024arXiv

Degenerate bifurcations of two-fold doubly-connected uniformly rotating vortex patches

In this paper, we obtain families of two-fold doubly-connected uniformly rotating vortex patches of the 2-D incompressible Euler equations emanating from some specific annuli. The main difficulty comes from strong degeneracy of the problem, neither the kernel of linearization is one-dimensional nor the transeversallity condition holds. To this end, we make a detailed analysis on the nonlinear functional and the bifurcation curves are obtained by perturbing real algebraic varieties defined by truncated polynomials. In addition, our result partially answers an problem proposed by Hmidi and Mateu in \cite{Hmidi2016a} (\emph{Adv.Math.302 (2016), 799-850}).

preprint2022arXiv

Laboratory development of a heterodyne interferometric system for translation and tilt measurement of the proof mass in the space gravitational wave detection

Laser heterodyne interferometry plays a key role in the proof mass's monitor and control by measuring its multiple degrees of freedom motions in the Space Gravitational Wave Detection. Laboratory development of polarization-multiplexing heterodyne interferometer (PMHI) using quadrant photodetectors (QPD) is presented in this paper, intended for measuring the translation and tilt of a proof mass. The system is of symmetric design, which can expand to five degrees of freedom measurements based on polarization-multiplexing and differential wavefront sensing (DWS). The ground-simulated experimental results demonstrate that a measurement noise of 3 pm/Hz$^{1/2}$ and 2 nrad/Hz$^{1/2}$ at 1 Hz have been achieved respectively. The tilt-to-length error is dominated by geometric misalignment for the current system, the coupling of which is at micrometer level within a tilt range of 1000 μrad.

preprint2022arXiv

M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge

Recent development of speech processing, such as speech recognition, speaker diarization, etc., has inspired numerous applications of speech technologies. The meeting scenario is one of the most valuable and, at the same time, most challenging scenarios for the deployment of speech technologies. Specifically, two typical tasks, speaker diarization and multi-speaker automatic speech recognition have attracted much attention recently. However, the lack of large public meeting data has been a major obstacle for the advancement of the field. Therefore, we make available the AliMeeting corpus, which consists of 120 hours of recorded Mandarin meeting data, including far-field data collected by 8-channel microphone array as well as near-field data collected by headset microphone. Each meeting session is composed of 2-4 speakers with different speaker overlap ratio, recorded in rooms with different size. Along with the dataset, we launch the ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) with two tracks, namely speaker diarization and multi-speaker ASR, aiming to provide a common testbed for meeting rich transcription and promote reproducible research in this field. In this paper we provide a detailed introduction of the AliMeeting dateset, challenge rules, evaluation methods and baseline systems.

preprint2022arXiv

Noncontact measurement method of linear and angular displacement based on dual-beam feedback interferometric system

This study describes a unique optical approach for the noncontact measurement of linear and angular displacement. Compared to previous methods, the sensor system here based on the dual-beam phase-modulated feedback interferometry provides higher sensitivity for non-cooperative targets and a wider range concerning the angle measurement. The amount of linear and angular displacement is calculated by tracing the phase changes of the differential beams. Performance of the proposed method is evaluated via testing a prototype system. The prototype has a 35 nm and 0.15" stability over 1 hour, with a resolution of 1 nm and 0.02" correspondingly, according to the experimental data. The linearity is 5.58*10^{-6} in the range of 100 mm and 1.34*10^{-4} in the range of 360°, indicating that the proposed method may possess considerable potential for high-precision metrological applications.

preprint2022arXiv

Polarization effects on fluorescence emission of zebrafish neurons using light-sheet microscopy

Light-sheet fluorescence microscopy (LSFM) makes use of a thin plane of light to optically section and image transparent tissues or organisms {\it{in vivo}}, which has the advantages of fast imaging speed and low phototoxicity. In this paper, we have employed light-sheet microscopy to investigate the polarization effects on fluorescence emission of zebrafish neurons via modifying the electric oscillation orientation of the excitation light. The intensity of the fluorescence emission from the excited zebrafish larvae follows a cosine square function with respect to the polarization state of the excitation light and reveals a 40$\%$ higher fluorescence emission when the polarization orientation is orthogonal to the illumination and detection axes. Through registration and subtraction of fluorescence images under different polarization states, we have demonstrated that most of the enhanced fluorescence signals are from the nerve cells rather than the extracellular substance. This provides us a way to distinguish the cell boundaries and observe the organism structures with improved contrast and resolution.

preprint2022arXiv

Robust Learning-based Predictive Control for Discrete-time Nonlinear Systems with Unknown Dynamics and State Constraints

Robust model predictive control (MPC) is a well-known control technique for model-based control with constraints and uncertainties. In classic robust tube-based MPC approaches, an open-loop control sequence is computed via periodically solving an online nominal MPC problem, which requires prior model information and frequent access to onboard computational resources. In this paper, we propose an efficient robust MPC solution based on receding horizon reinforcement learning, called r-LPC, for unknown nonlinear systems with state constraints and disturbances. The proposed r-LPC utilizes a Koopman operator-based prediction model obtained off-line from pre-collected input-output datasets. Unlike classic tube-based MPC, in each prediction time interval of r-LPC, we use an actor-critic structure to learn a near-optimal feedback control policy rather than a control sequence. The resulting closed-loop control policy can be learned off-line and deployed online or learned online in an asynchronous way. In the latter case, online learning can be activated whenever necessary; for instance, the safety constraint is violated with the deployed policy. The closed-loop recursive feasibility, robustness, and asymptotic stability are proven under function approximation errors of the actor-critic networks. Simulation and experimental results on two nonlinear systems with unknown dynamics and disturbances have demonstrated that our approach has better or comparable performance when compared with tube-based MPC and LQR, and outperforms a recently developed actor-critic learning approach.

preprint2022arXiv

Robust Tube-based Model Predictive Control with Koopman Operators--Extended Version

Koopman operators are of infinite dimension and capture the characteristics of nonlinear dynamics in a lifted global linear manner. The finite data-driven approximation of Koopman operators results in a class of linear predictors, useful for formulating linear model predictive control (MPC) of nonlinear dynamical systems with reduced computational complexity. However, the robustness of the closed-loop Koopman MPC under modeling approximation errors and possible exogenous disturbances is still a crucial issue to be resolved. Aiming at the above problem, this paper presents a robust tube-based MPC solution with Koopman operators, i.e., r-KMPC, for nonlinear discrete-time dynamical systems with additive disturbances. The proposed controller is composed of a nominal MPC using a lifted Koopman model and an off-line nonlinear feedback policy. The proposed approach does not assume the convergence of the approximated Koopman operator, which allows using a Koopman model with a limited order for controller design. Fundamental properties, e.g., stabilizability, observability, of the Koopman model are derived under standard assumptions with which, the closed-loop robustness and nominal point-wise convergence are proven. Simulated examples are illustrated to verify the effectiveness of the proposed approach.

preprint2022arXiv

Singular Limits for the Navier-Stokes-Poisson Equations of Viscous Plasma with Strong Density Boundary Layer

The quasi-neutral limit of the Navier-Stokes-Poisson system modeling a viscous plasma with vanishing viscosity coefficients in the half-space $\mathbb{R}^{3}_{+}$ is rigorously proved under a Navier-slip boundary condition for velocity and the Dirichlet boundary condition for electric potential. This is achieved by establishing the nonlinear stability of the approximation solutions involving the strong boundary layer in density and electric potential, which comes from the break-down of the quasi-neutrality near the boundary, and dealing with the difficulty of the interaction of this strong boundary layer with the weak boundary layer of the velocity field.

preprint2022arXiv

Spurious currents suppression by accurate difference schemes in multiphase lattice Boltzmann method

Spurious currents, which are often observed near a curved interface in the multiphase simulations by diffuse interface methods, are unphysical phenomena and usually damage the computational accuracy and stability. In this paper, the origination and suppression of spurious currents are investigated by using the multiphase lattice Boltzmann method driven by chemical potential. Both the difference error and insufficient isotropy of discrete gradient operator give rise to the directional deviations of nonideal force and then originate the spurious currents. Nevertheless, the high-order finite difference produces far more accurate results than the high-order isotropic difference. We compare several finite difference schemes which have different formal accuracy and resolution. When a large proportional coefficient is used, the transition region is narrow and steep, and the resolution of finite difference indicates the computational accuracy more exactly than the formal accuracy. On the contrary, for a small proportional coefficient, the transition region is wide and gentle, and the formal accuracy of finite difference indicates the computational accuracy better than the resolution. Furthermore, numerical simulations show that the spurious currents calculated in the 3D situation are highly consistent with those in 2D simulations; especially, the two-phase coexistence densities calculated by the high-order accuracy finite difference are in excellent agreement with the theoretical predictions of the Maxwell equal-area construction till the reduced temperature 0.2.

preprint2022arXiv

Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge

The ICASSP 2022 Multi-channel Multi-party Meeting Transcription Grand Challenge (M2MeT) focuses on one of the most valuable and the most challenging scenarios of speech technologies. The M2MeT challenge has particularly set up two tracks, speaker diarization (track 1) and multi-speaker automatic speech recognition (ASR) (track 2). Along with the challenge, we released 120 hours of real-recorded Mandarin meeting speech data with manual annotation, including far-field data collected by 8-channel microphone array as well as near-field data collected by each participants' headset microphone. We briefly describe the released dataset, track setups, baselines and summarize the challenge results and major techniques used in the submissions.

preprint2022arXiv

Towards Generalizable Person Re-identification with a Bi-stream Generative Model

Generalizable person re-identification (re-ID) has attracted growing attention due to its powerful adaptation capability in the unseen data domain. However, existing solutions often neglect either crossing cameras (e.g., illumination and resolution differences) or pedestrian misalignments (e.g., viewpoint and pose discrepancies), which easily leads to poor generalization capability when adapted to the new domain. In this paper, we formulate these difficulties as: 1) Camera-Camera (CC) problem, which denotes the various human appearance changes caused by different cameras; 2) Camera-Person (CP) problem, which indicates the pedestrian misalignments caused by the same identity person under different camera viewpoints or changing pose. To solve the above issues, we propose a Bi-stream Generative Model (BGM) to learn the fine-grained representations fused with camera-invariant global feature and pedestrian-aligned local feature, which contains an encoding network and two stream decoding sub-networks. Guided by original pedestrian images, one stream is employed to learn a camera-invariant global feature for the CC problem via filtering cross-camera interference factors. For the CP problem, another stream learns a pedestrian-aligned local feature for pedestrian alignment using information-complete densely semantically aligned part maps. Moreover, a part-weighted loss function is presented to reduce the influence of missing parts on pedestrian alignment. Extensive experiments demonstrate that our method outperforms the state-of-the-art methods on the large-scale generalizable re-ID benchmarks, involving domain generalization setting and cross-domain setting.

preprint2022arXiv

Weakly-supervised 3D Human Pose Estimation with Cross-view U-shaped Graph Convolutional Network

Although monocular 3D human pose estimation methods have made significant progress, it is far from being solved due to the inherent depth ambiguity. Instead, exploiting multi-view information is a practical way to achieve absolute 3D human pose estimation. In this paper, we propose a simple yet effective pipeline for weakly-supervised cross-view 3D human pose estimation. By only using two camera views, our method can achieve state-of-the-art performance in a weakly-supervised manner, requiring no 3D ground truth but only 2D annotations. Specifically, our method contains two steps: triangulation and refinement. First, given the 2D keypoints that can be obtained through any classic 2D detection methods, triangulation is performed across two views to lift the 2D keypoints into coarse 3D poses. Then, a novel cross-view U-shaped graph convolutional network (CV-UGCN), which can explore the spatial configurations and cross-view correlations, is designed to refine the coarse 3D poses. In particular, the refinement progress is achieved through weakly-supervised learning, in which geometric and structure-aware consistency checks are performed. We evaluate our method on the standard benchmark dataset, Human3.6M. The Mean Per Joint Position Error on the benchmark dataset is 27.4 mm, which outperforms existing state-of-the-art methods remarkably (27.4 mm vs 30.2 mm).

preprint2022arXiv

WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

In this paper, we present WenetSpeech, a multi-domain Mandarin corpus consisting of 10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about 10000 hours unlabeled speech, with 22400+ hours in total. We collect the data from YouTube and Podcast, which covers a variety of speaking styles, scenarios, domains, topics, and noisy conditions. An optical character recognition (OCR) based method is introduced to generate the audio/text segmentation candidates for the YouTube data on its corresponding video captions, while a high-quality ASR transcription system is used to generate audio/text pair candidates for the Podcast data. Then we propose a novel end-to-end label error detection approach to further validate and filter the candidates. We also provide three manually labelled high-quality test sets along with WenetSpeech for evaluation -- Dev for cross-validation purpose in training, Test_Net, collected from Internet for matched test, and Test\_Meeting, recorded from real meetings for more challenging mismatched test. Baseline systems trained with WenetSpeech are provided for three popular speech recognition toolkits, namely Kaldi, ESPnet, and WeNet, and recognition results on the three test sets are also provided as benchmarks. To the best of our knowledge, WenetSpeech is the current largest open-sourced Mandarin speech corpus with transcriptions, which benefits research on production-level speech recognition.

preprint2020arXiv

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

Constructing confidence intervals for the coefficients of high-dimensional sparse linear models remains a challenge, mainly because of the complicated limiting distributions of the widely used estimators, such as the lasso. Several methods have been developed for constructing such intervals. Bootstrap lasso+ols is notable for its technical simplicity, good interpretability, and performance that is comparable with that of other more complicated methods. However, bootstrap lasso+ols depends on the beta-min assumption, a theoretic criterion that is often violated in practice. Thus, we introduce a new method, called bootstrap lasso+partial ridge, to relax this assumption. Lasso+partial ridge is a two-stage estimator. First, the lasso is used to select features. Then, the partial ridge is used to refit the coefficients. Simulation results show that bootstrap lasso+partial ridge outperforms bootstrap lasso+ols when there exist small, but nonzero coefficients, a common situation that violates the beta-min assumption. For such coefficients, the confidence intervals constructed using bootstrap lasso+partial ridge have, on average, $50\%$ larger coverage probabilities than those of bootstrap lasso+ols. Bootstrap lasso+partial ridge also has, on average, $35\%$ shorter confidence interval lengths than those of the de-sparsified lasso methods, regardless of whether the linear models are misspecified. Additionally, we provide theoretical guarantees for bootstrap lasso+partial ridge under appropriate conditions, and implement it in the R package "HDCI."

preprint2020arXiv

Can speed up the convergence rate of stochastic gradient methods to $\mathcal{O}(1/k^2)$ by a gradient averaging strategy?

In this paper we consider the question of whether it is possible to apply a gradient averaging strategy to improve on the sublinear convergence rates without any increase in storage. Our analysis reveals that a positive answer requires an appropriate averaging strategy and iterations that satisfy the variance dominant condition. As an interesting fact, we show that if the iterative variance we defined is always dominant even a little bit in the stochastic gradient iterations, the proposed gradient averaging strategy can increase the convergence rate $\mathcal{O}(1/k)$ to $\mathcal{O}(1/k^2)$ in probability for the strongly convex objectives with Lipschitz gradients. This conclusion suggests how we should control the stochastic gradient iterations to improve the rate of convergence.

preprint2020arXiv

Coordinated Path Following Control of Fixed-wing Unmanned Aerial Vehicles

In this paper, we investigate the problem of coordinated path following for fixed-wing UAVs with speed constraints in 2D plane. The objective is to steer a fleet of UAVs along the path(s) while achieving the desired sequenced inter-UAV arc distance. In contrast to the previous coordinated path following studies, we are able through our proposed hybrid control law to deal with the forward speed and the angular speed constraints of fixed-wing UAVs. More specifically, the hybrid control law makes all the UAVs work at two different levels: those UAVs whose path following errors are within an invariant set (i.e., the designed coordination set) work at the coordination level; and the other UAVs work at the single-agent level. At the coordination level, we prove that even with speed constraints, the proposed control law can make sure the path following errors reduce to zero, while the desired arc distances converge to the desired value. At the single-agent level, the convergence analysis for the path following error entering the coordination set is provided. We develop a hardware-in-the-loop simulation testbed of the multi-UAV system by using actual autopilots and the X-Plane simulator. The effectiveness of the proposed approach is corroborated with both MATLAB and the testbed.

preprint2020arXiv

Derivative-free global minimization for a class of multiple minima problems

We prove that the finite-difference based derivative-free descent (FD-DFD) methods have a capability to find the global minima for a class of multiple minima problems. Our main result shows that, for a class of multiple minima objectives that is extended from strongly convex functions with Lipschitz-continuous gradients, the iterates of FD-DFD converge to the global minimizer $x_*$ with the linear convergence $\|x_{k+1}-x_*\|_2^2\leqslantρ^k \|x_1-x_*\|_2^2$ for a fixed $0<ρ<1$ and any initial iteration $x_1\in\mathbb{R}^d$ when the parameters are properly selected. Since the per-iteration cost, i.e., the number of function evaluations, is fixed and almost independent of the dimension $d$, the FD-DFD algorithm has a complexity bound $\mathcal{O}(\log\frac{1}ε)$ for finding a point $x$ such that the optimality gap $\|x-x_*\|_2^2$ is less than $ε>0$. Numerical experiments in various dimensions from $5$ to $500$ demonstrate the benefits of the FD-DFD method.

preprint2020arXiv

Drosophila-Inspired 3D Moving Object Detection Based on Point Clouds

3D moving object detection is one of the most critical tasks in dynamic scene analysis. In this paper, we propose a novel Drosophila-inspired 3D moving object detection method using Lidar sensors. According to the theory of elementary motion detector, we have developed a motion detector based on the shallow visual neural pathway of Drosophila. This detector is sensitive to the movement of objects and can well suppress background noise. Designing neural circuits with different connection modes, the approach searches for motion areas in a coarse-to-fine fashion and extracts point clouds of each motion area to form moving object proposals. An improved 3D object detection network is then used to estimate the point clouds of each proposal and efficiently generates the 3D bounding boxes and the object categories. We evaluate the proposed approach on the widely-used KITTI benchmark, and state-of-the-art performance was obtained by using the proposed approach on the task of motion detection.

preprint2020arXiv

Exploring Image Enhancement for Salient Object Detection in Low Light Images

Low light images captured in a non-uniform illumination environment usually are degraded with the scene depth and the corresponding environment lights. This degradation results in severe object information loss in the degraded image modality, which makes the salient object detection more challenging due to low contrast property and artificial light influence. However, existing salient object detection models are developed based on the assumption that the images are captured under a sufficient brightness environment, which is impractical in real-world scenarios. In this work, we propose an image enhancement approach to facilitate the salient object detection in low light images. The proposed model directly embeds the physical lighting model into the deep neural network to describe the degradation of low light images, in which the environment light is treated as a point-wise variate and changes with local content. Moreover, a Non-Local-Block Layer is utilized to capture the difference of local content of an object against its local neighborhood favoring regions. To quantitative evaluation, we construct a low light Images dataset with pixel-level human-labeled ground-truth annotations and report promising results on four public datasets and our benchmark dataset.

preprint2020arXiv

Generalized Equal Area Criterion for Stability Analysis of Nonlinear Oscillators

In power system analysis, the Equal Area Criterion is of great importance on revealing the condition for a grid-connected synchronous generator to maintain its transient rotor angle stability with the grid under a large disturbance. At present, the increasing renewable distributed energy resources (DERs) are introducing a variety of new nonlinear characteristics that are different from those of conventional generation although these DERs can still be operated and modeled as grid-connected oscillators with synthetic inertias. To study stability and synchronism of a grid-connected DER operated as a more general nonlinear oscillator, this paper proposes a Generalized Equal Area Criterion for estimation of transient stability margin subject to a large disturbance.

preprint2020arXiv

Multi-View Spectral Clustering with High-Order Optimal Neighborhood Laplacian Matrix

Multi-view spectral clustering can effectively reveal the intrinsic cluster structure among data by performing clustering on the learned optimal embedding across views. Though demonstrating promising performance in various applications, most of existing methods usually linearly combine a group of pre-specified first-order Laplacian matrices to construct the optimal Laplacian matrix, which may result in limited representation capability and insufficient information exploitation. Also, storing and implementing complex operations on the $n\times n$ Laplacian matrices incurs intensive storage and computation complexity. To address these issues, this paper first proposes a multi-view spectral clustering algorithm that learns a high-order optimal neighborhood Laplacian matrix, and then extends it to the late fusion version for accurate and efficient multi-view clustering. Specifically, our proposed algorithm generates the optimal Laplacian matrix by searching the neighborhood of the linear combination of both the first-order and high-order base Laplacian matrices simultaneously. By this way, the representative capacity of the learned optimal Laplacian matrix is enhanced, which is helpful to better utilize the hidden high-order connection information among data, leading to improved clustering performance. We design an efficient algorithm with proved convergence to solve the resultant optimization problem. Extensive experimental results on nine datasets demonstrate the superiority of our algorithm against state-of-the-art methods, which verifies the effectiveness and advantages of the proposed algorithm.

preprint2020arXiv

Person Re-Identification via Active Hard Sample Mining

Annotating a large-scale image dataset is very tedious, yet necessary for training person re-identification models. To alleviate such a problem, we present an active hard sample mining framework via training an effective re-ID model with the least labeling efforts. Considering that hard samples can provide informative patterns, we first formulate an uncertainty estimation to actively select hard samples to iteratively train a re-ID model from scratch. Then, intra-diversity estimation is designed to reduce the redundant hard samples by maximizing their diversity. Moreover, we propose a computer-assisted identity recommendation module embedded in the active hard sample mining framework to help human annotators to rapidly and accurately label the selected samples. Extensive experiments were carried out to demonstrate the effectiveness of our method on several public datasets. Experimental results indicate that our method can reduce 57%, 63%, and 49% annotation efforts on the Market1501, MSMT17, and CUHK03, respectively, while maximizing the performance of the re-ID model.

preprint2020arXiv

Planning and Operations of Mixed Fleets in Mobility-on-Demand Systems

Automated vehicles (AVs) are expected to be beneficial for Mobility-on-Demand (MoD), thanks to their ability of being globally coordinated. To facilitate the steady transition towards full autonomy, we consider the transition period of AV deployment, whereby an MoD system operates a mixed fleet of automated vehicles (AVs) and human-driven vehicles (HVs). In such systems, AVs are centrally coordinated by the operator, and the HVs might strategically respond to the coordination of AVs. We devise computationally tractable strategies to coordinate mixed fleets in MoD systems. Specifically, we model an MoD system with a mixed fleet using a Stackelberg framework where the MoD operator serves as the leader and human-driven vehicles serve as the followers. We develop two models: 1) a steady-state model to analyze the properties of the problem and determine the planning variables (e.g., compensations, prices, and the fleet size of AVs), and 2) a time-varying model to design a real-time coordination algorithm for AVs. The proposed models are validated using a case study inspired by real operational data of a MoD service in Singapore. Results show that the proposed algorithms can significantly improve system performance.

preprint2020arXiv

PointNet on FPGA for Real-Time LiDAR Point Cloud Processing

LiDAR sensors have been widely used in many autonomous vehicle modalities, such as perception, mapping, and localization. This paper presents an FPGA-based deep learning platform for real-time point cloud processing targeted on autonomous vehicles. The software driver for the Velodyne LiDAR sensor is modified and moved into the on-chip processor system, while the programmable logic is designed as a customized hardware accelerator. As the state-of-art deep learning algorithm for point cloud processing, PointNet is successfully implemented on the proposed FPGA platform. Targeted on a Xilinx Zynq UltraScale+ MPSoC ZCU104 development board, the FPGA implementations of PointNet achieve the computing performance of 182.1 GOPS and 280.0 GOPS for classification and segmentation respectively. The proposed design can support an input up to 4096 points per frame. The processing time is 19.8 ms for classification and 34.6 ms for segmentation, which meets the real-time requirement for most of the existing LiDAR sensors.

preprint2020arXiv

RTFN: A Robust Temporal Feature Network for Time Series Classification

Time series data usually contains local and global patterns. Most of the existing feature networks pay more attention to local features rather than the relationships among them. The latter is, however, also important yet more difficult to explore. To obtain sufficient representations by a feature network is still challenging. To this end, we propose a novel robust temporal feature network (RTFN) for feature extraction in time series classification, containing a temporal feature network (TFN) and an LSTM-based attention network (LSTMaN). TFN is a residual structure with multiple convolutional layers. It functions as a local-feature extraction network to mine sufficient local features from data. LSTMaN is composed of two identical layers, where attention and long short-term memory (LSTM) networks are hybridized. This network acts as a relation extraction network to discover the intrinsic relationships among the extracted features at different positions in sequential data. In experiments, we embed RTFN into a supervised structure as a feature extractor and into an unsupervised structure as an encoder, respectively. The results show that the RTFN-based structures achieve excellent supervised and unsupervised performance on a large number of UCR2018 and UEA2018 datasets.

preprint2020arXiv

RTFN: Robust Temporal Feature Network

Time series analysis plays a vital role in various applications, for instance, healthcare, weather prediction, disaster forecast, etc. However, to obtain sufficient shapelets by a feature network is still challenging. To this end, we propose a novel robust temporal feature network (RTFN) that contains temporal feature networks and attentional LSTM networks. The temporal feature networks are built to extract basic features from input data while the attentional LSTM networks are devised to capture complicated shapelets and relationships to enrich features. In experiments, we embed RTFN into supervised structure as a feature extraction network and into unsupervised clustering as an encoder, respectively. The results show that the RTFN-based supervised structure is a winner of 40 out of 85 datasets and the RTFN-based unsupervised clustering performs the best on 4 out of 11 datasets in the UCR2018 archive.

preprint2020arXiv

Stochastic gradient-free descents

In this paper we propose stochastic gradient-free methods and accelerated methods with momentum for solving stochastic optimization problems. All these methods rely on stochastic directions rather than stochastic gradients. We analyze the convergence behavior of these methods under the mean-variance framework, and also provide a theoretical analysis about the inclusion of momentum in stochastic settings which reveals that the momentum term we used adds a deviation of order $\mathcal{O}(1/k)$ but controls the variance at the order $\mathcal{O}(1/k)$ for the $k$th iteration. So it is shown that, when employing a decaying stepsize $α_k=\mathcal{O}(1/k)$, the stochastic gradient-free methods can still maintain the sublinear convergence rate $\mathcal{O}(1/k)$ and the accelerated methods with momentum can achieve a convergence rate $\mathcal{O}(1/k^2)$ in probability for the strongly convex objectives with Lipschitz gradients; and all these methods converge to a solution with a zero expected gradient norm when the objective function is nonconvex, twice differentiable and bounded below.

preprint2020arXiv

SymmetryNet: Learning to Predict Reflectional and Rotational Symmetries of 3D Shapes from Single-View RGB-D Images

We study the problem of symmetry detection of 3D shapes from single-view RGB-D images, where severely missing data renders geometric detection approach infeasible. We propose an end-to-end deep neural network which is able to predict both reflectional and rotational symmetries of 3D objects present in the input RGB-D image. Directly training a deep model for symmetry prediction, however, can quickly run into the issue of overfitting. We adopt a multi-task learning approach. Aside from symmetry axis prediction, our network is also trained to predict symmetry correspondences. In particular, given the 3D points present in the RGB-D image, our network outputs for each 3D point its symmetric counterpart corresponding to a specific predicted symmetry. In addition, our network is able to detect for a given shape multiple symmetries of different types. We also contribute a benchmark of 3D symmetry detection based on single-view RGB-D images. Extensive evaluation on the benchmark demonstrates the strong generalization ability of our method, in terms of high accuracy of both symmetry axis prediction and counterpart estimation. In particular, our method is robust in handling unseen object instances with large variation in shape, multi-symmetry composition, as well as novel object categories.

preprint2020arXiv

Towards Critical Clearing Time Sensitivity for DAE Systems with Singularity

Standard power system models are parameter dependent differential-algebraic equation (DAE) type. Following a transient event, voltage collapse can occur as a bifurcation of the transient load flow solutions which is marked by the system trajectory reaching a singular surface in state space where the voltage causality is lost. If a fault is expected to cause voltage collapse, preventive control decisions such as changes in AVR settings need to be taken in order to get enhance the system stability. In this regard, the knowledge of sensitivity of critical clearing time (CCT) to controllable system parameters can be of great help. The quasi-stability boundary of DAE systems is more complicated than ODE systems where in addition to unstable equilibrium points (UEP) and periodic orbits, singularity plays an important role making the problem challenging. The stability boundary is then made up of a number of dynamically distinct components. In the present work, we derive the expression for CCT sensitivity for the phenomenon where the critical fault-on trajectory intersects the singular surface itself which is one such component forming the stability boundary. The results are illustrated for a small test system in order to gain visual insights.

preprint2019arXiv

BiRA-Net: Bilinear Attention Net for Diabetic Retinopathy Grading

Diabetic retinopathy (DR) is a common retinal disease that leads to blindness. For diagnosis purposes, DR image grading aims to provide automatic DR grade classification, which is not addressed in conventional research methods of binary DR image classification. Small objects in the eye images, like lesions and microaneurysms, are essential to DR grading in medical imaging, but they could easily be influenced by other objects. To address these challenges, we propose a new deep learning architecture, called BiRA-Net, which combines the attention model for feature extraction and bilinear model for fine-grained classification. Furthermore, in considering the distance between different grades of different DR categories, we propose a new loss function, called grading loss, which leads to improved training convergence of the proposed approach. Experimental results are provided to demonstrate the superior performance of the proposed approach.