Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
29works
0followers
25topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

29 published item(s)

preprint2024arXiv

3S-TSE: Efficient Three-Stage Target Speaker Extraction for Real-Time and Low-Resource Applications

Target speaker extraction (TSE) aims to isolate a specific voice from multiple mixed speakers relying on a registerd sample. Since voiceprint features usually vary greatly, current end-to-end neural networks require large model parameters which are computational intensive and impractical for real-time applications, espetially on resource-constrained platforms. In this paper, we address the TSE task using microphone array and introduce a novel three-stage solution that systematically decouples the process: First, a neural network is trained to estimate the direction of the target speaker. Second, with the direction determined, the Generalized Sidelobe Canceller (GSC) is used to extract the target speech. Third, an Inplace Convolutional Recurrent Neural Network (ICRN) acts as a denoising post-processor, refining the GSC output to yield the final separated speech. Our approach delivers superior performance while drastically reducing computational load, setting a new standard for efficient real-time target speaker extraction.

preprint2024arXiv

RHDLPP: A multigroup radiation hydrodynamics code for laser-produced plasmas

We introduce the RHDLPP, a flux-limited multigroup radiation hydrodynamics numerical code designed for simulating laser-produced plasmas in diverse environments. The code bifurcates into two packages: RHDLPP-LTP for low-temperature plasmas generated by moderate-intensity nanosecond lasers, and RHDLPP-HTP for high-temperature, high-density plasmas formed by high-intensity laser pulses. The core radiation hydrodynamic equations are resolved in the Eulerian frame, employing an operator-split method. This method decomposes the solution into two substeps: first, the explicit resolution of the hyperbolic subsystems integrating radiation and fluid dynamics, and second, the implicit treatment of the parabolic part comprising stiff radiation diffusion, heat conduction, and energy exchange. Laser propagation and energy deposition are modeled through a hybrid approach, combining geometrical optics ray-tracing in sub-critical plasma regions with a one-dimensional solution of the Helmholtz wave equation in super-critical areas. The thermodynamic states are ascertained using an equation of state, based on either the real gas approximation or the quotidian equation of state (QEOS). Additionally, RHDLPP includes RHDLPP-SpeIma3D, a three-dimensional spectral simulation post-processing module, for generating both temporally-spatially resolved and time-integrated spectra and imaging, facilitating direct comparisons with experimental data. The paper showcases a series of verification tests to establish the code's accuracy and efficiency, followed by application cases, including simulations of laser-produced aluminum (Al) plasmas, pre-pulse-induced target deformation of tin (Sn) microdroplets relevant to extreme ultraviolet lithography light sources, and varied imaging and spectroscopic simulations.

preprint2022arXiv

ConferencingSpeech 2022 Challenge: Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications

With the advances in speech communication systems such as online conferencing applications, we can seamlessly work with people regardless of where they are. However, during online meetings, speech quality can be significantly affected by background noise, reverberation, packet loss, network jitter, etc. Because of its nature, speech quality is traditionally assessed in subjective tests in laboratories and lately also in crowdsourcing following the international standards from ITU-T Rec. P.800 series. However, those approaches are costly and cannot be applied to customer data. Therefore, an effective objective assessment approach is needed to evaluate or monitor the speech quality of the ongoing conversation. The ConferencingSpeech 2022 challenge targets the non-intrusive deep neural network models for the speech quality assessment task. We open-sourced a training corpus with more than 86K speech clips in different languages, with a wide range of synthesized and live degradations and their corresponding subjective quality scores through crowdsourcing. 18 teams submitted their models for evaluation in this challenge. The blind test sets included about 4300 clips from wide ranges of degradations. This paper describes the challenge, the datasets, and the evaluation methods and reports the final results.

preprint2022arXiv

Convolutional dual graph Laplacian sparse coding

In recent years, graph signal processing (GSP) technology has become popular in various fields, and graph Laplacian regularizers have also been introduced into convolutional sparse representation. This paper proposes a convolutional sparse representation model based on the dual graph Laplacian regularizer to ensure effective application of a dual graph signal smoothing prior on the rows and columns of input images.The graph Laplacian matrix contains the gradient information of the image and the similarity information between pixels, and can also describe the degree of change of the graph, so the image can be smoothed. Compared with the single graph smoothing prior, the dual graph has a simple structure, relaxes the conditions, and is more conducive to image restoration using the image signal prior. In this paper, this paper formulated the corresponding minimization problem using the proposed model, and subsequently used the alternating direction method of multiplication (ADMM) algorithm to solve it in the Fourier domain.Finally, using random Gaussian white noise for the denoising experiments. Compared with the single graph smoothing prior,the denoising results of the model with dual graph smoothing prior proposed in this paper has fewer noise points and clearer texture.

preprint2022arXiv

Fast Computation of Generalized Eigenvectors for Manifold Graph Embedding

Our goal is to efficiently compute low-dimensional latent coordinates for nodes in an input graph -- known as graph embedding -- for subsequent data processing such as clustering. Focusing on finite graphs that are interpreted as uniform samples on continuous manifolds (called manifold graphs), we leverage existing fast extreme eigenvector computation algorithms for speedy execution. We first pose a generalized eigenvalue problem for sparse matrix pair $(\A,\B)$, where $\A = Ł- μ\Q + ε\I$ is a sum of graph Laplacian $Ł$ and disconnected two-hop difference matrix $\Q$. Eigenvector $\v$ minimizing Rayleigh quotient $\frac{\v^{\top} \A \v}{\v^{\top} \v}$ thus minimizes $1$-hop neighbor distances while maximizing distances between disconnected $2$-hop neighbors, preserving graph structure. Matrix $\B = \text{diag}(\{\b_i\})$ that defines eigenvector orthogonality is then chosen so that boundary / interior nodes in the sampling domain have the same generalized degrees. $K$-dimensional latent vectors for the $N$ graph nodes are the first $K$ generalized eigenvectors for $(\A,\B)$, computed in $\cO(N)$ using LOBPCG, where $K \ll N$. Experiments show that our embedding is among the fastest in the literature, while producing the best clustering performance for manifold graphs.

preprint2022arXiv

Formulating Intuitive Stack-of-Tasks using Visuo-Tactile Perception for Collaborative Human-Robot Fine Manipulation

Enabling robots to work in close proximity to humans necessitates a control framework that does not only incorporate multi-sensory information for autonomous and coordinated interactions but also has perceptive task planning to ensure an adaptable and flexible collaborative behaviour. In this research, an intuitive stack-of-tasks (iSoT) formulation is proposed, that defines the robot's actions by considering the human-arm postures and the task progression. The framework is augmented with visuo-tactile information to effectively perceive the collaborative environment and intuitively switch between the planned sub-tasks. The visual feedback from depth cameras monitors and estimates the objects' poses and human-arm postures, while the tactile data provides the exploration skills to detect and maintain the desired contacts to avoid object slippage. To evaluate the performance, effectiveness and usability of the proposed framework, assembly and disassembly tasks, performed by the human-human and human-robot partners, are considered and analyzed using distinct evaluation metrics i.e, approach adaptation, grasp correction, task coordination latency, cumulative posture deviation, and task repeatability.

preprint2022arXiv

Global unique solution for the 3-D full compressible MHD equations in space of lower regularity

In this paper, we establish new $L^p$ gradient estimates of the solutions in order to discuss Cauchy problem for the full compressible magnetohydrodynamic(MHD) systems in $\mathrm{R}^3$. We use the "$\rm{div}-\rm{curl}$" decomposition technique (see \cite{{HJR},{MR}}) and new modified effective viscous flux and vorticity to calculate "$\Vert\nabla \mathbf{u}\Vert_{L^3}$" and "$\Vert\nabla \mathbb{H}\Vert_{L^3}$".As a result, we obtain global well-posedness for the solution with the initial data being in a class of space with lower regularity, while the energy of which should be suitably small.

preprint2022arXiv

Hot-lines topology and the fate of the spin resonance mode in three-dimensional unconventional superconductors

In the quasi-two-dimensional (quasi-2D) copper- and iron-based superconductors, the onset of superconductivity is accompanied by a prominent peak in the magnetic spectrum at momenta close to the wave-vector of the nearby antiferromagnetic state. Such a peak is well described in terms of a spin resonance mode, i.e., a spin-1 exciton theoretically predicted for quasi-2D superconductors with a sign-changing gap. The same theories, however, indicate that such a resonance mode should be absent in a three-dimensional (3D) system with a spherical Fermi surface. This raises the question of the fate of the spin resonance mode in layered unconventional superconductors that are not strongly anisotropic, such as certain heavy-fermion compounds and potentially the newly discovered nickelate superconductor NdNiO$_2$. Here, we use the random-phase-approximation to calculate the dynamical spin susceptibility of 3D superconductors with a $d_{x^2-y^2}$-wave gap symmetry and corrugated cylindrical-like Fermi surfaces. By varying the out-of-plane hopping anisotropy $t_z/t$, we demonstrate that the appearance of a spin resonance mode is determined by the topology of the hot lines -- i.e. lines on the Fermi surface that are connected by the magnetic wave-vector. For an in-plane antiferromagnetic wave-vector, the hot lines undergo a topological transition from open lines to closed loops at a critical $t_z/t$ value. The closed hot lines cross the nodal superconducting lines, making the spin resonance mode overdamped and incoherent. In contrast, for an out-of-plane antiferromagnetic wave-vector, the hot lines remain open and the spin resonance mode remains sharp. We discuss the experimental implications of our results for the out-of-plane dispersion of the spin resonance mode and, more generally, for inelastic neutron scattering experiments on unconventional superconductors.

preprint2022arXiv

Less is More: Adaptive Curriculum Learning for Thyroid Nodule Diagnosis

Thyroid nodule classification aims at determining whether the nodule is benign or malignant based on a given ultrasound image. However, the label obtained by the cytological biopsy which is the golden standard in clinical medicine is not always consistent with the ultrasound imaging TI-RADS criteria. The information difference between the two causes the existing deep learning-based classification methods to be indecisive. To solve the Inconsistent Label problem, we propose an Adaptive Curriculum Learning (ACL) framework, which adaptively discovers and discards the samples with inconsistent labels. Specifically, ACL takes both hard sample and model certainty into account, and could accurately determine the threshold to distinguish the samples with Inconsistent Label. Moreover, we contribute TNCD: a Thyroid Nodule Classification Dataset to facilitate future related research on the thyroid nodules. Extensive experimental results on TNCD based on three different backbone networks not only demonstrate the superiority of our method but also prove that the less-is-more principle which strategically discards the samples with Inconsistent Label could yield performance gains. Source code and data are available at https://github.com/chenghui-666/ACL/.

preprint2022arXiv

MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids

Improving the user's hearing ability to understand speech in noisy environments is critical to the development of hearing aid (HA) devices. For this, it is important to derive a metric that can fairly predict speech intelligibility for HA users. A straightforward approach is to conduct a subjective listening test and use the test results as an evaluation metric. However, conducting large-scale listening tests is time-consuming and expensive. Therefore, several evaluation metrics were derived as surrogates for subjective listening test results. In this study, we propose a multi-branched speech intelligibility prediction model (MBI-Net), for predicting the subjective intelligibility scores of HA users. MBI-Net consists of two branches of models, with each branch consisting of a hearing loss model, a cross-domain feature extraction module, and a speech intelligibility prediction model, to process speech signals from one channel. The outputs of the two branches are fused through a linear layer to obtain predicted speech intelligibility scores. Experimental results confirm the effectiveness of MBI-Net, which produces higher prediction scores than the baseline system in Track 1 and Track 2 on the Clarity Prediction Challenge 2022 dataset.

preprint2022arXiv

MTI-Net: A Multi-Target Speech Intelligibility Prediction Model

Recently, deep learning (DL)-based non-intrusive speech assessment models have attracted great attention. Many studies report that these DL-based models yield satisfactory assessment performance and good flexibility, but their performance in unseen environments remains a challenge. Furthermore, compared to quality scores, fewer studies elaborate deep learning models to estimate intelligibility scores. This study proposes a multi-task speech intelligibility prediction model, called MTI-Net, for simultaneously predicting human and machine intelligibility measures. Specifically, given a speech utterance, MTI-Net is designed to predict human subjective listening test results and word error rate (WER) scores. We also investigate several methods that can improve the prediction performance of MTI-Net. First, we compare different features (including low-level features and embeddings from self-supervised learning (SSL) models) and prediction targets of MTI-Net. Second, we explore the effect of transfer learning and multi-tasking learning on training MTI-Net. Finally, we examine the potential advantages of fine-tuning SSL embeddings. Experimental results demonstrate the effectiveness of using cross-domain features, multi-task learning, and fine-tuning SSL embeddings. Furthermore, it is confirmed that the intelligibility and WER scores predicted by MTI-Net are highly correlated with the ground-truth scores.

preprint2022arXiv

Optimal time decay estimation for large-solution about 3D compressible MHD equations

This paper mainly focus on optimal time decay estimation for large-solution about compressible magnetohydrodynamic equations in 3D whole space, provided that $(σ_{0}-1,u_{0},M_{0})\in L^1\cap H^2$. In [2](Chen et al.,2019), they proved time decay estimation of $\|(σ-1,u,M)\|_{H^1}$ being $(1+t)^{-\frac{3}{4}}$. Based on it, we obtained that of $\|\nabla(σ-1,u,M)\|_{H^1}$ being $(1+t)^{-\frac{5}{4}}$ in [24]. Therefore, we are committed to improving that of $\|\nabla^2 (σ-1,u,M)\|_{L^2}$ in this paper. Thanks to the method adopted in [25] (Wang and Wen, 2021), we get the optimal time decay estimation to the highest-order derivative for space of solution, which means that time decay estimation of $\|\nabla^2 (σ-1,u,M)\|_{L^2}$ is $(1+t)^{-\frac{7}{4}}$.

preprint2022arXiv

Robot Cooking with Stir-fry: Bimanual Non-prehensile Manipulation of Semi-fluid Objects

This letter describes an approach to achieve well-known Chinese cooking art stir-fry on a bimanual robot system. Stir-fry requires a sequence of highly dynamic coordinated movements, which is usually difficult to learn for a chef, let alone transfer to robots. In this letter, we define a canonical stir-fry movement, and then propose a decoupled framework for learning this deformable object manipulation from human demonstration. First, the dual arms of the robot are decoupled into different roles (a leader and follower) and learned with classical and neural network-based methods separately, then the bimanual task is transformed into a coordination problem. To obtain general bimanual coordination, we secondly propose a Graph and Transformer based model -- Structured-Transformer, to capture the spatio-temporal relationship between dual-arm movements. Finally, by adding visual feedback of content deformation, our framework can adjust the movements automatically to achieve the desired stir-fry effect. We verify the framework by a simulator and deploy it on a real bimanual Panda robot system. The experimental results validate our framework can realize the bimanual robot stir-fry motion and have the potential to extend to other deformable objects with bimanual coordination.

preprint2022arXiv

Two stages for visual object tracking

Siamese-based trackers have achived promising performance on visual object tracking tasks. Most existing Siamese-based trackers contain two separate branches for tracking, including classification branch and bounding box regression branch. In addition, image segmentation provides an alternative way to obetain the more accurate target region. In this paper, we propose a novel tracker with two-stages: detection and segmentation. The detection stage is capable of locating the target by Siamese networks. Then more accurate tracking results are obtained by segmentation module given the coarse state estimation in the first stage. We conduct experiments on four benchmarks. Our approach achieves state-of-the-art results, with the EAO of 52.6$\%$ on VOT2016, 51.3$\%$ on VOT2018, and 39.0$\%$ on VOT2019 datasets, respectively.

preprint2022arXiv

XBound-Former: Toward Cross-scale Boundary Modeling in Transformers

Skin lesion segmentation from dermoscopy images is of great significance in the quantitative analysis of skin cancers, which is yet challenging even for dermatologists due to the inherent issues, i.e., considerable size, shape and color variation, and ambiguous boundaries. Recent vision transformers have shown promising performance in handling the variation through global context modeling. Still, they have not thoroughly solved the problem of ambiguous boundaries as they ignore the complementary usage of the boundary knowledge and global contexts. In this paper, we propose a novel cross-scale boundary-aware transformer, \textbf{XBound-Former}, to simultaneously address the variation and boundary problems of skin lesion segmentation. XBound-Former is a purely attention-based network and catches boundary knowledge via three specially designed learners. We evaluate the model on two skin lesion datasets, ISIC-2016\&PH$^2$ and ISIC-2018, where our model consistently outperforms other convolution- and transformer-based models, especially on the boundary-wise metrics. We extensively verify the generalization ability of polyp lesion segmentation that has similar characteristics, and our model can also yield significant improvement compared to the latest models.

preprint2021arXiv

Eye-gaze Estimation with HEOG and Neck EMG using Deep Neural Networks

Hearing-impaired listeners usually have troubles attending target talker in multi-talker scenes, even with hearing aids (HAs). The problem can be solved with eye-gaze steering HAs, which requires listeners eye-gazing on the target. In a situation where head rotates, eye-gaze is subject to both behaviors of saccade and head rotation. However, existing methods of eye-gaze estimation did not work reliably, since the listener's strategy of eye-gaze varies and measurements of the two behaviors were not properly combined. Besides, existing methods were based on hand-craft features, which could overlook some important information. In this paper, a head-fixed and a head-free experiments were conducted. We used horizontal electrooculography (HEOG) and neck electromyography (NEMG), which separately measured saccade and head rotation to commonly estimate eye-gaze. Besides traditional classifier and hand-craft features, deep neural networks (DNN) were introduced to automatically extract features from intact waveforms. Evaluation results showed that when the input was HEOG with inertial measurement unit, the best performance of our proposed DNN classifiers achieved 93.3%; and when HEOG was with NEMG together, the accuracy reached 72.6%, higher than that with HEOG (about 71.0%) or NEMG (about 35.7%) alone. These results indicated the feasibility to estimate eye-gaze with HEOG and NEMG.

preprint2021arXiv

Femtosecond dynamics of a polariton bosonic cascade at room temperature

Whispering gallery modes in a microwire are characterized by a nearly equidistant energy spectrum. In the strong exciton-photon coupling regime, this system represents a bosonic cascade: a ladder of discrete energy levels that sustains stimulated transitions between neighboring steps. In this work, by using femtosecond angle-resolved spectroscopic imaging technique, the ultrafast dynamics of polaritons in a bosonic cascade based on a one-dimensional ZnO whispering gallery microcavity is explicitly visualized. Clear ladder-form build-up process from higher to lower energy branches of the polariton condensates are observed, which are well reproduced by modeling using rate equations. Moreover, the polariton parametric scattering dynamics are distinguished on a timescale of hundreds of femtoseconds. Our understanding of the femtosecond condensation and scattering dynamics paves the way towards ultrafast coherent control of polaritons at room temperature, which will make it promising for high-speed all-optical integrated applications.

preprint2021arXiv

The effect of speech and noise levels on the quality perceived by cochlear implant and normal hearing listeners

Electrical hearing by cochlear implants (CIs) may be fundamentally different from acoustic hearing by normal-hearing (NH) listeners, presumably showing unequal speech quality perception in various noise environments. Noise reduction (NR) algorithms used in CI reduce the noise in favor of signal-to-noise ratio (SNR), regardless of plausible accompanying distortions that may degrade the speech quality perception. To gain better understanding of CI speech quality perception, the present work aimed investigating speech quality perception in a diverse noise conditions, including factors of speech/noise levels, type of noise, and distortions caused by NR models. Fifteen NH and seven CI subjects participated in this study. Speech sentences were set to two different levels (65 and 75 dB SPL). Two types of noise (Cafeteria and Babble) at three levels (55, 65, and 75 dB SPL) were used. Sentences were processed using two NR algorithms to investigate the perceptual sensitivity of CI and NH listeners to the distortion. All sentences processed with the combinations of these sets were presented to CI and NH listeners, and they were asked to rate the sound quality of speech as they perceived. The effect of each factor on the perceived speech quality was investigated based on the group averaged quality rated by CI and NH listeners. Consistent with previous studies, CI listeners were not as sensitive as NH to the distortion made by NR algorithms. Statistical analysis showed that the speech level has significant effect on quality perception. At the same SNR, the quality of 65 dB speech was rated higher than that of 75 dB for CI users, but vice versa for NH listeners. Therefore, the present study showed that the perceived speech quality patterns were different between CI and NH listeners in terms of their sensitivity to distortion and speech level in complex listening environment.

preprint2020arXiv

Combined effects of nonmetallic impurities and planned metallic dopants on grain boundary energy and strength

Most research on nanocrystalline alloys has been focused on planned doping of metals with other metallic elements, but nonmetallic impurities are also prevalent in the real world. In this work, we report on the combined effects of metallic dopants and nonmetallic impurities on grain boundary energy and strength using first-principles calculations, with a $Σ$5 (310) grain boundary in Cu chosen as a model system. We find a clear correlation between the grain boundary energy and the change in excess free volume of doped grain boundaries. A combination of a larger substitutional dopant and an interstitial impurity can fill the excess free volume more efficiently and further reduce the grain boundary energy. We also find that the strengthening effects of dopants and impurities are dominated by the electronic interactions between the host Cu atoms and the two types of dopant elements. For example, the significant competing effects of metal dopants such as Zr, Nb, and Mo with impurities on the grain boundary strength are uncovered from the density of states of the d electrons. As a whole, this work deepens the field's understanding of the interaction between metallic dopants and nonmetallic impurities on grain boundary properties, providing a guide for improving the thermal stability of materials while avoiding embrittling effects.

preprint2020arXiv

Deep Job Understanding at LinkedIn

As the world's largest professional network, LinkedIn wants to create economic opportunity for everyone in the global workforce. One of its most critical missions is matching jobs with processionals. Improving job targeting accuracy and hire efficiency align with LinkedIn's Member First Motto. To achieve those goals, we need to understand unstructured job postings with noisy information. We applied deep transfer learning to create domain-specific job understanding models. After this, jobs are represented by professional entities, including titles, skills, companies, and assessment questions. To continuously improve LinkedIn's job understanding ability, we designed an expert feedback loop where we integrated job understanding models into LinkedIn's products to collect job posters' feedback. In this demonstration, we present LinkedIn's job posting flow and demonstrate how the integrated deep job understanding work improves job posters' satisfaction and provides significant metric lifts in LinkedIn's job recommendation system.

preprint2020arXiv

Delay and Packet-Drop Tolerant Multi-Stage Distributed Average Tracking in Mean Square

This paper studies the distributed average tracking problem pertaining to a discrete-time linear time-invariant multi-agent network, which is subject to, concurrently, input delays, random packet-drops, and reference noise. The problem amounts to an integrated design of delay and packet-drop tolerant algorithm and determining the ultimate upper bound of the tracking error between agents' states and the average of the reference signals. The investigation is driven by the goal of devising a practically more attainable average tracking algorithm, thereby extending the existing work in the literature which largely ignored the aforementioned uncertainties. For this purpose, a blend of techniques from Kalman filtering, multi-stage consensus filtering, and predictive control is employed, which gives rise to a simple yet comepelling distributed average tracking algorithm that is robust to initialization error and allows the trade-off between communication/computation cost and stationary-state tracking error. Due to the inherent coupling among different control components, convergence analysis is significantly challenging. Nevertheless, it is revealed that the allowable values of the algorithm parameters rely upon the maximal degree of an expected network, while the convergence speed depends upon the second smallest eigenvalue of the same network's topology. The effectiveness of the theoretical results is verified by a numerical example.

preprint2020arXiv

Dynamic Spatio-temporal Graph-based CNNs for Traffic Prediction

Forecasting future traffic flows from previous ones is a challenging problem because of their complex and dynamic nature of spatio-temporal structures. Most existing graph-based CNNs attempt to capture the static relations while largely neglecting the dynamics underlying sequential data. In this paper, we present dynamic spatio-temporal graph-based CNNs (DST-GCNNs) by learning expressive features to represent spatio-temporal structures and predict future traffic flows from surveillance video data. In particular, DST-GCNN is a two stream network. In the flow prediction stream, we present a novel graph-based spatio-temporal convolutional layer to extract features from a graph representation of traffic flows. Then several such layers are stacked together to predict future flows over time. Meanwhile, the relations between traffic flows in the graph are often time variant as the traffic condition changes over time. To capture the graph dynamics, we use the graph prediction stream to predict the dynamic graph structures, and the predicted structures are fed into the flow prediction stream. Experiments on real datasets demonstrate that the proposed model achieves competitive performances compared with the other state-of-the-art methods.

preprint2020arXiv

Leveraging Kernelized Synergies on Shared Subspace for Precision Grasp and Dexterous Manipulation

Manipulation in contrast to grasping is a trajectorial task that needs to use dexterous hands. Improving the dexterity of robot hands, increases the controller complexity and thus requires to use the concept of postural synergies. Inspired from postural synergies, this research proposes a new framework called kernelized synergies that focuses on the re-usability of the same subspace for precision grasping and dexterous manipulation. In this work, the computed subspace of postural synergies; parameterized by probabilistic movement primitives, is treated with kernel to preserve its grasping and manipulation characteristics and allows its reuse for new objects. The grasp stability of the proposed framework is assessed with a force closure quality index. For performance evaluation, the proposed framework is tested on two different simulated robot hand models using the Syngrasp toolbox and experimentally, four complex grasping and manipulation tasks are performed and reported. The results confirm the hand agnostic approach of the proposed framework and its generalization to distinct objects irrespective of their shape and size.

preprint2020arXiv

MetaSelector: Meta-Learning for Recommendation with User-Level Adaptive Model Selection

Recommender systems often face heterogeneous datasets containing highly personalized historical data of users, where no single model could give the best recommendation for every user. We observe this ubiquitous phenomenon on both public and private datasets and address the model selection problem in pursuit of optimizing the quality of recommendation for each user. We propose a meta-learning framework to facilitate user-level adaptive model selection in recommender systems. In this framework, a collection of recommenders is trained with data from all users, on top of which a model selector is trained via meta-learning to select the best single model for each user with the user-specific historical data. We conduct extensive experiments on two public datasets and a real-world production dataset, demonstrating that our proposed framework achieves improvements over single model baselines and sample-level model selector in terms of AUC and LogLoss. In particular, the improvements may lead to huge profit gain when deployed in online recommender systems.

preprint2020arXiv

Simultaneous Generation of Arbitrary Assembly of Polarization States with Geometrical-Scaling-Induced Phase Modulation

Manipulating the polarization of light on the microscale or nanoscale is essential for integrated photonics and quantum optical devices. Nowadays, the metasurface allows one to build on-chip devices that efficiently manipulate the polarization states. However, it remains challenging to generate different types of polarization states simultaneously, which is required to encode information for quantum computing and quantum cryptography applications. By introducing geometrical-scaling-induced (GSI) phase modulations, we demonstrate that an assembly of circularly polarized (CP) and linearly polarized (LP) states can be simultaneously generated by a single metasurface made of L-shaped resonators with different geometrical features. Upon illumination, each resonator diffracts the CP state with a certain GSI phase. The interaction of these diffractions leads to the desired output beams, where the polarization state and the propagation direction can be accurately tuned by selecting the geometrical shape, size, and spatial sequence of each resonator in the unit cell. As an example of potential applications, we show that an image can be encoded with different polarization profiles at different diffraction orders and decoded with a polarization analyzer. This approach resolves a challenging problem in integrated optics and is inspiring for on-chip quantum information processing.

preprint2020arXiv

The Importance and the Limitations of Sim2Real for Robotic Manipulation in Precision Agriculture

In recent years Sim2Real approaches have brought great results to robotics. Techniques such as model-based learning or domain randomization can help overcome the gap between simulation and reality, but in some situations simulation accuracy is still needed. An example is agricultural robotics, which needs detailed simulations, both in terms of dynamics and visuals. However, simulation software is still not capable of such quality and accuracy. Current Sim2Real techniques are helpful in mitigating the problem, but for these specific tasks they are not enough.

preprint2020arXiv

Uncovering the influence of common nonmetal impurities on the stability and strength of a $Σ$5 (310) grain boundary in Cu

Impurities are often driven to segregate to grain boundaries, which can significantly alter a material's thermal stability and mechanical behavior. To provide a comprehensive picture of this issue, the influence of a wide variety of common nonmetal impurities (H, B, C, N, O, Si, P and S) incorporated during service or materials processing are studied using first-principles simulations, with a focus on identifying changes to the energetics and mechanical strength of a Cu $Σ$5 (310) grain boundary. Changes to the grain boundary energy are found to be closely correlated with the covalent radii of the impurities and the volumetric deformations of polyhedra at the interface. The strengthening energies of each impurity are evaluated as a function of covalent radius and electronegativity, followed by first-principles-based tensile tests on selected impurities. The strengthening of a B-doped grain boundary comes from an enhancement of the charge density among the adjacent Cu atoms, which improves the connection between the two grains. Alternatively, the detrimental effect of O results from the reduction of charge density between the Cu atoms. This work deepens the understanding of the possible beneficial and harmful effects of impurities on grain boundaries, providing a guide for materials processing studies.

preprint2020arXiv

When Distributed Formation Control Is Feasible under Hard Constraints on Energy and Time?

This paper studies distributed optimal formation control with hard constraints on energy levels and termination time, in which the formation error is to be minimized jointly with the energy cost. The main contributions include a globally optimal distributed formation control law and a comprehensive analysis of the resulting closed-loop system under those hard constraints. It is revealed that the energy levels, the task termination time, the steady-state error tolerance, as well as the network topology impose inherent limitations in achieving the formation control mission. Most notably, the lower bounds on the achievable termination time and the required minimum energy levels are derived, which are given in terms of the initial formation error, the steady-state error tolerance, and the largest eigenvalue of the Laplacian matrix. These lower bounds can be employed to assert whether an energy and time constrained formation task is achievable and how to accomplish such a task. Furthermore, the monotonicity of those lower bounds in relation to the control parameters is revealed. A simulation example is finally given to illustrate the obtained results.

preprint2018arXiv

Controllability of Directed Heterogeneous Networked MIMO Systems

This paper studies the controllability of networked multi-input-multi-output (MIMO) systems, in which the network topology is weighted and directed, and the nodes are heterogeneous higher-dimensional linear time-invariant (LTI) dynamical systems. The primary objective is to search for controllability criteria beyond those already known for homogeneous networks. The focus is on the effects of the network topology, node dynamics, external control inputs, as well as the inner interactions on the network controllability. It is found that a network of heterogeneous systems can be controllable even if the corresponding homogeneous network topology is uncontrollable. The finding thus unravels another fundamental property that affects the network controllability---the heterogeneity of the node dynamics. A necessary and sufficient condition is derived for the controllability of heterogeneous networked MIMO LTI systems. For some typical cases, necessary and/or sufficient controllability conditions are specified and presented on the node dynamics, inner interactions, as well as the network topology.