Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
62works
0followers
35topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

62 published item(s)

preprint2026arXiv

GeoDecider: A Coarse-to-Fine Agentic Workflow for Explainable Lithology Classification

Lithology classification aims to infer subsurface rock types from well-logging signals, supporting downstream applications like reservoir characterization. Despite substantial progress, most existing methods still treat lithology classification as a single-pass classification task. In contrast, practical experts incorporate geological principles, external knowledge, and tool-use capabilities to perform accurate classification. In this work, we propose GeoDecider, a coarse-to-fine agentic workflow that enables accurate and explainable lithology classification through training-free use of large language models (LLMs). GeoDecider reformulates lithology classification as an expert-like structured process and organizes it into a multi-stage workflow involving coarse-to-fine reasoning. Specifically, GeoDecider includes the following stages: (1) base classifier-guided coarse classification, which uses a pre-trained classifier to provide a rough reference for downstream tasks, thus reducing the overall cost of downstream reasoning, (2) tool-augmented reasoning, which utilizes several tools such as contextual analysis and neighbor retrieval to achieve finer and more precise classifications, (3) geological refinement, which post-processes the final results to enforce geological consistency. Experiments on four benchmarks show that GeoDecider outperforms representative baselines. Further analysis demonstrates that the proposed framework produces geologically interpretable predictions while achieving a better trade-off between classification performance and inference efficiency.

preprint2025arXiv

GeoBench: Rethinking Multimodal Geometric Problem-Solving via Hierarchical Evaluation

Geometric problem solving constitutes a critical branch of mathematical reasoning, requiring precise analysis of shapes and spatial relationships. Current evaluations of geometric reasoning in vision-language models (VLMs) face limitations, including the risk of test data contamination from textbook-based benchmarks, overemphasis on final answers over reasoning processes, and insufficient diagnostic granularity. To address these issues, we present GeoBench, a hierarchical benchmark featuring four reasoning levels in geometric problem-solving: Visual Perception, Goal-Oriented Planning, Rigorous Theorem Application, and Self-Reflective Backtracking. Through six formally verified tasks generated via TrustGeoGen, we systematically assess capabilities ranging from attribute extraction to logical error correction. Experiments reveal that while reasoning models like OpenAI-o3 outperform general MLLMs, performance declines significantly with increasing task complexity. Key findings demonstrate that sub-goal decomposition and irrelevant premise filtering critically influence final problem-solving accuracy, whereas Chain-of-Thought prompting unexpectedly degrades performance in some tasks. These findings establish GeoBench as a comprehensive benchmark while offering actionable guidelines for developing geometric problem-solving systems.

preprint2024arXiv

LMaaS: Exploring Pricing Strategy of Large Model as a Service for Communication

The next generation of communication is envisioned to be intelligent communication, that can replace traditional symbolic communication, where highly condensed semantic information considering both source and channel will be extracted and transmitted with high efficiency. The recent popular large models such as GPT4 and the boosting learning techniques lay a solid foundation for the intelligent communication, and prompt the practical deployment of it in the near future. Given the characteristics of "training once and widely use" of those multimodal large language models, we argue that a pay-as-you-go service mode will be suitable in this context, referred to as Large Model as a Service (LMaaS). However, the trading and pricing problem is quite complex with heterogeneous and dynamic customer environments, making the pricing optimization problem challenging in seeking on-hand solutions. In this paper, we aim to fill this gap and formulate the LMaaS market trading as a Stackelberg game with two steps. In the first step, we optimize the seller's pricing decision and propose an Iterative Model Pricing (IMP) algorithm that optimizes the prices of large models iteratively by reasoning customers' future rental decisions, which is able to achieve a near-optimal pricing solution. In the second step, we optimize customers' selection decisions by designing a robust selecting and renting (RSR) algorithm, which is guaranteed to be optimal with rigorous theoretical proof. Extensive experiments confirm the effectiveness and robustness of our algorithms.

preprint2024arXiv

Synthetic Data Generation in Low-Resource Settings via Fine-Tuning of Large Language Models

The in-context learning ability of large language models (LLMs) enables them to generalize to novel downstream tasks with relatively few labeled examples. However, they require enormous computational resources to be deployed. Alternatively, smaller models can solve specific tasks if fine-tuned with enough labeled examples. These examples, however, are expensive to obtain. In pursuit of the best of both worlds, we study synthetic data generation of fine-tuning training data via fine-tuned teacher LLMs to improve the downstream performance of much smaller models. In four text classification and two text generation tasks, we find that both data generation and annotation dramatically improve the respective downstream model's performance, occasionally necessitating only a minor fraction of the original training dataset.

preprint2024arXiv

Unlocking the Potential of Large Language Models for Explainable Recommendations

Generating user-friendly explanations regarding why an item is recommended has become increasingly common, largely due to advances in language generation technology, which can enhance user trust and facilitate more informed decision-making when using online services. However, existing explainable recommendation systems focus on using small-size language models. It remains uncertain what impact replacing the explanation generator with the recently emerging large language models (LLMs) would have. Can we expect unprecedented results? In this study, we propose LLMXRec, a simple yet effective two-stage explainable recommendation framework aimed at further boosting the explanation quality by employing LLMs. Unlike most existing LLM-based recommendation works, a key characteristic of LLMXRec is its emphasis on the close collaboration between previous recommender models and LLM-based explanation generators. Specifically, by adopting several key fine-tuning techniques, including parameter-efficient instructing tuning and personalized prompt techniques, controllable and fluent explanations can be well generated to achieve the goal of explanation recommendation. Most notably, we provide three different perspectives to evaluate the effectiveness of the explanations. Finally, we conduct extensive experiments over several benchmark recommender models and publicly available datasets. The experimental results not only yield positive results in terms of effectiveness and efficiency but also uncover some previously unknown outcomes. To facilitate further explorations in this area, the full code and detailed original results are open-sourced at https://github.com/GodFire66666/LLM_rec_explanation/.

preprint2022arXiv

A Comparative Study of Deep Reinforcement Learning-based Transferable Energy Management Strategies for Hybrid Electric Vehicles

The deep reinforcement learning-based energy management strategies (EMS) have become a promising solution for hybrid electric vehicles (HEVs). When driving cycles are changed, the neural network will be retrained, which is a time-consuming and laborious task. A more efficient way of choosing EMS is to combine deep reinforcement learning (DRL) with transfer learning, which can transfer knowledge of one domain to the other new domain, making the network of the new domain reach convergence values quickly. Different exploration methods of DRL, including adding action space noise and parameter space noise, are compared against each other in the transfer learning process in this work. Results indicate that the network added parameter space noise is more stable and faster convergent than the others. In conclusion, the best exploration method for transferable EMS is to add noise in the parameter space, while the combination of action space noise and parameter space noise generally performs poorly. Our code is available at https://github.com/BIT-XJY/RL-based-Transferable-EMS.git.

preprint2022arXiv

A novel time delay estimation algorithm of acoustic pyrometry for furnace

Acoustic pyrometry is a non-contact measurement technology for monitoring furnace combustion reaction, diagnosing energy loss due to incomplete combustion and ensuring safe production. The accuracy of time of flight (TOF) estimation of an acoustic pyrometry directly affects the authenticity of furnace temperature measurement. In this paper presented is a novel TOF (i.e. time delay) estimation algorithm based on digital lock-in filtering (DLF) algorithm. In this research, the time-frequency relationship between the first harmonic of the acoustic signal and the moment of characteristic frequency applied is established through the digital lock-in and low-pass filtering techniques. The accurate estimation of TOF is obtained by extracting and comparing the temporal relationship of the characteristic frequency occurrence between received and source acoustic signals. The computational error analysis indicates that the accuracy of the proposed algorithm is better than that of the classical generalized cross-correlation (GCC) algorithm, and the computational effort is significantly reduced to half of that the GCC can offer. It can be confirmed that with this method, the temperature measurement in furnaces can be improved in terms of computational effort and accuracy, which are vital parameters in furnace combustion control. It provides a new idea of time delay estimation with the utilization of acoustic pyrometry for furnace.

preprint2022arXiv

An Ensemble Learning Framework for Vehicle Trajectory Prediction in Interactive Scenarios

Precisely modeling interactions and accurately predicting trajectories of surrounding vehicles are essential to the decision-making and path-planning of intelligent vehicles. This paper proposes a novel framework based on ensemble learning to improve the performance of trajectory predictions in interactive scenarios. The framework is termed Interactive Ensemble Trajectory Predictor (IETP). IETP assembles interaction-aware trajectory predictors as base learners to build an ensemble learner. Firstly, each base learner in IETP observes historical trajectories of vehicles in the scene. Then each base learner handles interactions between vehicles to predict trajectories. Finally, an ensemble learner is built to predict trajectories by applying two ensemble strategies on the predictions from all base learners. Predictions generated by the ensemble learner are final outputs of IETP. In this study, three experiments using different data are conducted based on the NGSIM dataset. Experimental results show that IETP improves the predicting accuracy and decreases the variance of errors compared to base learners. In addition, IETP exceeds baseline models with 50% of the training data, indicating that IETP is data-efficient. Moreover, the implementation of IETP is publicly available at https://github.com/BIT-Jack/IETP.

preprint2022arXiv

Asynchronous Decentralized Federated Learning for Collaborative Fault Diagnosis of PV Stations

Due to the different losses caused by various photovoltaic (PV) array faults, accurate diagnosis of fault types is becoming increasingly important. Compared with a single one, multiple PV stations collect sufficient fault samples, but their data is not allowed to be shared directly due to potential conflicts of interest. Therefore, federated learning can be exploited to train a collaborative fault diagnosis model. However, the modeling efficiency is seriously affected by the model update mechanism since each PV station has a different computing capability and amount of data. Moreover, for the safe and stable operation of the PV system, the robustness of collaborative modeling must be guaranteed rather than simply being processed on a central server. To address these challenges, a novel asynchronous decentralized federated learning (ADFL) framework is proposed. Each PV station not only trains its local model but also participates in collaborative fault diagnosis by exchanging model parameters to improve the generalization without losing accuracy. The global model is aggregated distributedly to avoid central node failure. By designing the asynchronous update scheme, the communication overhead and training time are greatly reduced. Both the experiments and numerical simulations are carried out to verify the effectiveness of the proposed method.

preprint2022arXiv

Context-Aware Video Reconstruction for Rolling Shutter Cameras

With the ubiquity of rolling shutter (RS) cameras, it is becoming increasingly attractive to recover the latent global shutter (GS) video from two consecutive RS frames, which also places a higher demand on realism. Existing solutions, using deep neural networks or optimization, achieve promising performance. However, these methods generate intermediate GS frames through image warping based on the RS model, which inevitably result in black holes and noticeable motion artifacts. In this paper, we alleviate these issues by proposing a context-aware GS video reconstruction architecture. It facilitates the advantages such as occlusion reasoning, motion compensation, and temporal abstraction. Specifically, we first estimate the bilateral motion field so that the pixels of the two RS frames are warped to a common GS frame accordingly. Then, a refinement scheme is proposed to guide the GS frame synthesis along with bilateral occlusion masks to produce high-fidelity GS video frames at arbitrary times. Furthermore, we derive an approximated bilateral motion field model, which can serve as an alternative to provide a simple but effective GS frame initialization for related tasks. Experiments on synthetic and real data show that our approach achieves superior performance over state-of-the-art methods in terms of objective metrics and subjective visual quality. Code is available at \url{https://github.com/GitCVfb/CVR}.

preprint2022arXiv

Contrastive Cross-domain Recommendation in Matching

Cross-domain recommendation (CDR) aims to provide better recommendation results in the target domain with the help of the source domain, which is widely used and explored in real-world systems. However, CDR in the matching (i.e., candidate generation) module struggles with the data sparsity and popularity bias issues in both representation learning and knowledge transfer. In this work, we propose a novel Contrastive Cross-Domain Recommendation (CCDR) framework for CDR in matching. Specifically, we build a huge diversified preference network to capture multiple information reflecting user diverse interests, and design an intra-domain contrastive learning (intra-CL) and three inter-domain contrastive learning (inter-CL) tasks for better representation learning and knowledge transfer. The intra-CL enables more effective and balanced training inside the target domain via a graph augmentation, while the inter-CL builds different types of cross-domain interactions from user, taxonomy, and neighbor aspects. In experiments, CCDR achieves significant improvements on both offline and online evaluations in a real-world system. Currently, we have deployed our CCDR on WeChat Top Stories, affecting plenty of users. The source code is in https://github.com/lqfarmer/CCDR.

preprint2022arXiv

Cross-scale Attention Guided Multi-instance Learning for Crohn's Disease Diagnosis with Pathological Images

Multi-instance learning (MIL) is widely used in the computer-aided interpretation of pathological Whole Slide Images (WSIs) to solve the lack of pixel-wise or patch-wise annotations. Often, this approach directly applies "natural image driven" MIL algorithms which overlook the multi-scale (i.e. pyramidal) nature of WSIs. Off-the-shelf MIL algorithms are typically deployed on a single-scale of WSIs (e.g., 20x magnification), while human pathologists usually aggregate the global and local patterns in a multi-scale manner (e.g., by zooming in and out between different magnifications). In this study, we propose a novel cross-scale attention mechanism to explicitly aggregate inter-scale interactions into a single MIL network for Crohn's Disease (CD), which is a form of inflammatory bowel disease. The contribution of this paper is two-fold: (1) a cross-scale attention mechanism is proposed to aggregate features from different resolutions with multi-scale interaction; and (2) differential multi-scale attention visualizations are generated to localize explainable lesion patterns. By training ~250,000 H&E-stained Ascending Colon (AC) patches from 20 CD patient and 30 healthy control samples at different scales, our approach achieved a superior Area under the Curve (AUC) score of 0.8924 compared with baseline models. The official implementation is publicly available at https://github.com/hrlblab/CS-MIL.

preprint2022arXiv

Deep Unified Representation for Heterogeneous Recommendation

Recommendation system has been a widely studied task both in academia and industry. Previous works mainly focus on homogeneous recommendation and little progress has been made for heterogeneous recommender systems. However, heterogeneous recommendations, e.g., recommending different types of items including products, videos, celebrity shopping notes, among many others, are dominant nowadays. State-of-the-art methods are incapable of leveraging attributes from different types of items and thus suffer from data sparsity problems. And it is indeed quite challenging to represent items with different feature spaces jointly. To tackle this problem, we propose a kernel-based neural network, namely deep unified representation (or DURation) for heterogeneous recommendation, to jointly model unified representations of heterogeneous items while preserving their original feature space topology structures. Theoretically, we prove the representation ability of the proposed model. Besides, we conduct extensive experiments on real-world datasets. Experimental results demonstrate that with the unified representation, our model achieves remarkable improvement (e.g., 4.1% ~ 34.9% lift by AUC score and 3.7% lift by online CTR) over existing state-of-the-art models.

preprint2022arXiv

Early systems change necessary for catalyzing long-term sustainability in a post-2030 agenda

Progress to-date towards the Sustainable Development Goals (SDGs) has fallen short of expectations and is unlikely to fully meet 2030 targets. Despite the little chance of imminent success, past assessments have mostly focused on short- and medium-term evaluations, limiting the ability to explore the longer-term effects of systemic interactions with time lags and delay. Here we undertake global systems modelling with a longer-term view than previous assessments to explore the drivers of sustainability progress and how they could emerge by 2030, 2050, and 2100 under different development pathways and towards quantitative targets. We find that early planning for systems change to shift from business-as-usual to more sustainable pathways is important for accelerating progress towards increasingly ambitious targets by 2030, 2050, and 2100. These findings indicate the importance of adopting longer-term timeframes and pathways to ensure that the necessary pre-conditions are in place for sustainability beyond the current 2030 Agenda.

preprint2022arXiv

Gain-gain and gain-lossless PT-symmetry broken from PT-phase diagram

Parity-time (PT) symmetry and broken in micro/nano photonic structures have been investigated extensively as they bring new opportunities to control the flow of light based on non-Hermitian optics. Previous studies have focused on the situations of PT-symmetry broken in loss-loss or gain-loss coupling systems. Here, we theoretically predict the gain-gain and gain-lossless PT-broken from phase diagram, where the boundaries between PT-symmetry and PT-broken can be clearly defined in the full-parameter space including gain, lossless and loss. For specific micro/nano photonic structures, such as coupled waveguides, we give the transmission matrices of each phase space, which can be used for beam splitting. Taking coupled waveguides as an example, we obtain periodic energy exchange in PT-symmetry phase and exponential gain or loss in PT-broken phase, which are consistent with the phase diagram. The scenario giving a full view of PT-symmetry or broken, will not only deepen the understanding of fundamental physics, but also will promote the breakthrough of photonic applications like optical routers and beam splitters.

preprint2022arXiv

Graph Adaptive Semantic Transfer for Cross-domain Sentiment Classification

Cross-domain sentiment classification (CDSC) aims to use the transferable semantics learned from the source domain to predict the sentiment of reviews in the unlabeled target domain. Existing studies in this task attach more attention to the sequence modeling of sentences while largely ignoring the rich domain-invariant semantics embedded in graph structures (i.e., the part-of-speech tags and dependency relations). As an important aspect of exploring characteristics of language comprehension, adaptive graph representations have played an essential role in recent years. To this end, in the paper, we aim to explore the possibility of learning invariant semantic features from graph-like structures in CDSC. Specifically, we present Graph Adaptive Semantic Transfer (GAST) model, an adaptive syntactic graph embedding method that is able to learn domain-invariant semantics from both word sequences and syntactic graphs. More specifically, we first raise a POS-Transformer module to extract sequential semantic features from the word sequences as well as the part-of-speech tags. Then, we design a Hybrid Graph Attention (HGAT) module to generate syntax-based semantic features by considering the transferable dependency relations. Finally, we devise an Integrated aDaptive Strategy (IDS) to guide the joint learning process of both modules. Extensive experiments on four public datasets indicate that GAST achieves comparable effectiveness to a range of state-of-the-art models.

preprint2022arXiv

Graph Convolution-Based Deep Reinforcement Learning for Multi-Agent Decision-Making in Mixed Traffic Environments

An efficient and reliable multi-agent decision-making system is highly demanded for the safe and efficient operation of connected autonomous vehicles in intelligent transportation systems. Current researches mainly focus on the Deep Reinforcement Learning (DRL) methods. However, utilizing DRL methods in interactive traffic scenarios is hard to represent the mutual effects between different vehicles and model the dynamic traffic environments due to the lack of interactive information in the representation of the environments, which results in low accuracy of cooperative decisions generation. To tackle these difficulties, this research proposes a framework to enable different Graph Reinforcement Learning (GRL) methods for decision-making, and compares their performance in interactive driving scenarios. GRL methods combinate the Graph Neural Network (GNN) and DRL to achieve the better decisions generation in interactive scenarios of autonomous vehicles, where the features of interactive scenarios are extracted by the GNN, and cooperative behaviors are generated by DRL framework. Several GRL approaches are summarized and implemented in the proposed framework. To evaluate the performance of the proposed GRL methods, an interactive driving scenarios on highway with two ramps is constructed, and simulated experiment in the SUMO platform is carried out to evaluate the performance of different GRL approaches. Finally, results are analyzed in multiple perspectives and dimensions to compare the characteristic of different GRL approaches in intelligent transportation scenarios. Results show that the implementation of GNN can well represents the interaction between vehicles, and the combination of GNN and DRL is able to improve the performance of the generation of lane-change behaviors. The source code of our work can be found at https://github.com/Jacklinkk/TorchGRL.

preprint2022arXiv

HGKT: Introducing Hierarchical Exercise Graph for Knowledge Tracing

Knowledge tracing (KT) which aims at predicting learner's knowledge mastery plays an important role in the computer-aided educational system. In recent years, many deep learning models have been applied to tackle the KT task, which have shown promising results. However, limitations still exist. Most existing methods simplify the exercising records as knowledge sequences, which fail to explore rich information that existed in exercises. Besides, the existing diagnosis results of knowledge tracing are not convincing enough since they neglect prior relations between exercises. To solve the above problems, we propose a hierarchical graph knowledge tracing model called HGKT to explore the latent hierarchical relations between exercises. Specifically, we introduce the concept of problem schema to construct a hierarchical exercise graph that could model the exercise learning dependencies. Moreover, we employ two attention mechanisms to highlight the important historical states of learners. In the testing stage, we present a K&S diagnosis matrix that could trace the transition of mastery of knowledge and problem schema, which can be more easily applied to different applications. Extensive experiments show the effectiveness and interpretability of our proposed models.

preprint2022arXiv

New geometric constants of isosceles orthogonal type

Based on the parallelogram law and isosceles orthogonality, we define a new orthogonal geometric constant. We first discuss some basic properties of this new constant. Next, we consider the relation between the constant and the uniformly non-square property. Moreover, a generalized constant is also introduced and some basic properties are presented. It is shown that, for a normed space, the constant value is equal to 1 if and only if the norm can be induced by the inner product. Finally, we verify that this constant is closely related to the well-known geometric constants through some inequalities.

preprint2022arXiv

Nonlinear interferometry beyond classical limit facilitated by cyclic dynamics

Time-reversed evolution has substantial implications in physics, including prominent applications in refocusing of classical waves or spins and fundamental researches such as quantum information scrambling. In quantum metrology, nonlinear interferometry based on time reversal protocols supports entanglement-enhanced measurements without requiring low-noise detection. Despite the broad interest in time reversal, it remains challenging to reverse the quantum dynamics of an interacting many-body system as is typically realized by an (effective) sign-flip of the system's Hamiltonian. Here, we present an approach that is broadly applicable to cyclic systems for implementing nonlinear interferometry without invoking time reversal. Inspired by the observation that the time-reversed dynamics drives a system back to its starting point, we propose to accomplish the same by slaving the system to travel along a 'closed-loop' instead of explicitly tracing back its antecedent path. Utilizing the quasi-periodic spin mixing dynamics in a three-mode $^{87}$Rb atom spinor condensate, we implement such a 'closed-loop' nonlinear interferometer and achieve a metrological gain of $3.87_{-0.95}^{+0.91}$ decibels over the classical limit for a total of 26500 atoms. Our approach unlocks the high potential of nonlinear interferometry by allowing the dynamics to penetrate into deep nonlinear regime, which gives rise to highly entangled non-Gaussian state. The idea of bypassing time reversal may open up new opportunities in the experimental investigation of researches that are typically studied by using time reversal protocols.

preprint2022arXiv

On the stability of orthogonal additivity in $β$-homogeneous $F$-spaces and quasi-Banach spaces

In this paper, we study the stability of the orthogonal equation,which is closely related to the results by Wlodzimierz Fechner and Justyna Sikorska in 2010. There are some differences that we consider the target space with the \b{eta}-homogeneous norm and quasi-norm. Overcoming the \b{eta}-homogeneous norm and quasi-norm bottlenecks, we get some new results.

preprint2022arXiv

Planar Hall effect induced spin rectification effect and its strong impact on spin pumping measurements

Spin pumping is a technique widely used to generate the pure spin current and characterize the spin-charge conversion in various systems. The reversing sign of the symmetric Lorentzian charge current with respect to opposite magnetic field is generally accepted as the key criterion to identify its pure spin current origin. However, we herein find that the rectified voltage due to the planar Hall effect can exhibit similar spurious signal, complicating and even misleading the analysis. The distribution of microwave magnetic field and induction current has strong influence on the magnetic field symmetry and lineshape of the obtained signal. We further demonstrate a geometry where the spin-charge conversion and the rectified voltage can be readily distinguished with a straightforward symmetry analysis.

preprint2022arXiv

Relational Memory Augmented Language Models

We present a memory-augmented approach to condition an autoregressive language model on a knowledge graph. We represent the graph as a collection of relation triples and retrieve relevant relations for a given context to improve text generation. Experiments on WikiText-103, WMT19, and enwik8 English datasets demonstrate that our approach produces a better language model in terms of perplexity and bits per character. We also show that relational memory improves coherence, is complementary to token-based memory, and enables causal interventions. Our model provides a simple yet effective way to combine an autoregressive language model with a knowledge graph for a more coherent and logical generation.

preprint2022arXiv

Scenario Generation for Cooling, Heating, and Power Loads Using Generative Moment Matching Networks

Scenario generations of cooling, heating, and power loads are of great significance for the economic operation and stability analysis of integrated energy systems. In this paper, a novel deep generative network is proposed to model cooling, heating, and power load curves based on a generative moment matching networks (GMMN) where an auto-encoder transforms high-dimensional load curves into low-dimensional latent variables and the maximum mean discrepancy represents the similarity metrics between the generated samples and the real samples. After training the model, the new scenarios are generated by feeding Gaussian noises to the scenario generator of the GMMN. Unlike the explicit density models, the proposed GMMN does not need to artificially assume the probability distribution of the load curves, which leads to stronger universality. The simulation results show that the GMMN not only fits the probability distribution of multi-class load curves well, but also accurately captures the shape (e.g., large peaks, fast ramps, and fluctuation), frequency-domain characteristics, and temporal-spatial correlations of cooling, heating, and power loads. Furthermore, the energy consumption of generated samples closely resembles that of real samples.

preprint2022arXiv

The CORSMAL benchmark for the prediction of the properties of containers

The contactless estimation of the weight of a container and the amount of its content manipulated by a person are key pre-requisites for safe human-to-robot handovers. However, opaqueness and transparencies of the container and the content, and variability of materials, shapes, and sizes, make this estimation difficult. In this paper, we present a range of methods and an open framework to benchmark acoustic and visual perception for the estimation of the capacity of a container, and the type, mass, and amount of its content. The framework includes a dataset, specific tasks and performance measures. We conduct an in-depth comparative analysis of methods that used this framework and audio-only or vision-only baselines designed from related works. Based on this analysis, we can conclude that audio-only and audio-visual classifiers are suitable for the estimation of the type and amount of the content using different types of convolutional neural networks, combined with either recurrent neural networks or a majority voting strategy, whereas computer vision methods are suitable to determine the capacity of the container using regression and geometric approaches. Classifying the content type and level using only audio achieves a weighted average F1-score up to 81% and 97%, respectively. Estimating the container capacity with vision-only approaches and estimating the filling mass with audio-visual multi-stage approaches reach up to 65% weighted average capacity and mass scores. These results show that there is still room for improvement on the design of new methods. These new methods can be ranked and compared on the individual leaderboards provided by our open framework.

preprint2022arXiv

Ulam stability of an additive-quadratic functional equation in F-space and quasi-Banach spaces

By adopting the direct method and fixed point method, we prove that the Hyers-Ulam stability of the following additive-quadratic functional equation \begin{equation} f(x+y, z+w)+f(x-y, z-w)-2 f(x, z)-2 f(x, w)=0 \end{equation} in $β$-homogeneous $F$-spaces and quasi-Banach spaces. There are some differences that we consider the target space with the $β$-homogeneous norm and quasi-norm. Overcoming the $β$-homogeneous norm and quasi-norm bottlenecks, we get some new results.

preprint2021arXiv

A congested schedule-based dynamic transit passenger flow estimator using stop count data

A dynamic transit flow estimation model based on congested schedule-based transit equilibrium assignment is proposed using observations from stop count data. A solution algorithm is proposed for the mathematical program with schedule-based transit equilibrium constraints (MPEC) with polynomial computational complexity. The equilibrium constraints corresponding to the schedule-based hyperpath flow are modified from the literature to fit into an estimation problem. Computational experiments are conducted first to verify the methodology with two synthetic data sets (one of which is Sioux Falls), followed by a validation of the method using bus data from Qingpu District in Shanghai, China, with 4 bus lines, 120 segments, 55 bus stops, and 120 one-minute intervals. The estimation model converged to 0.005 tolerance of relative change in 10 iterations. The estimated average of segment flows are only 2.5% off from the average of the observed segment flows; relative errors among segments are 42.5%.

preprint2021arXiv

A deep learning based known plaintext attack method for chaotic cryptosystem

In this paper, we propose a known-plaintext attack (KPA) method based on deep learning for traditional chaotic encryption scheme. We employ the convolutional neural network to learn the operation mechanism of chaotic cryptosystem, and accept the trained network as the final decryption system. To evaluate the attack performance of different networks on different chaotic cryptosystem, we adopt two neural networks to perform known-plaintext attacks on two distinct chaotic encryption schemes. The experimental results demonstrate the potential of deep learning-based method for known-plaintext attack against chaotic cryptosystem. Different from the previous known-plaintext attack methods, which were usually limited to a specific chaotic cryptosystem, a neural network can be applied to the cryptanalysis of various chaotic cryptosystems with deep learning-based approach, while several different networks can be designed for the cryptanalysis of chaotic cryptosystems. This paper provides a new idea for the cryptanalysis of chaotic image encryption algorithm.

preprint2021arXiv

A Hybrid Bandit Model with Visual Priors for Creative Ranking in Display Advertising

Creative plays a great important role in e-commerce for exhibiting products. Sellers usually create multiple creatives for comprehensive demonstrations, thus it is crucial to display the most appealing design to maximize the Click-Through Rate~(CTR). For this purpose, modern recommender systems dynamically rank creatives when a product is proposed for a user. However, this task suffers more cold-start problem than conventional products recommendation In this paper, we propose a hybrid bandit model with visual priors which first makes predictions with a visual evaluation, and then naturally evolves to focus on the specialities through the hybrid bandit model. Our contributions are three-fold: 1) We present a visual-aware ranking model (called VAM) that incorporates a list-wise ranking loss for ordering the creatives according to the visual appearance. 2) Regarding visual evaluations as a prior, the hybrid bandit model (called HBM) is proposed to evolve consistently to make better posteriori estimations by taking more observations into consideration for online scenarios. 3) A first large-scale creative dataset, CreativeRanking, is constructed, which contains over 1.7M creatives of 500k products as well as their real impression and click data. Extensive experiments have also been conducted on both our dataset and public Mushroom dataset, demonstrating the effectiveness of the proposed method.

preprint2021arXiv

An Odor Labeling Convolutional Encoder-Decoder for Odor Sensing in Machine Olfaction

Deep learning methods have been widely applied to visual and acoustic technology. In this paper, we proposed an odor labeling convolutional encoder-decoder (OLCE) for odor identification in machine olfaction. OLCE composes a convolutional neural network encoder and decoder where the encoder output is constrained to odor labels. An electronic nose was used for the data collection of gas responses followed by a normative experimental procedure. Several evaluation indexes were calculated to evaluate the algorithm effectiveness: accuracy 92.57%, precision 92.29%, recall rate 92.06%, F1-Score 91.96%, and Kappa coefficient 90.76%. We also compared the model with some algorithms used in machine olfaction. The comparison result demonstrated that OLCE had the best performance among these algorithms. In the paper, some perspectives of machine olfactions have been also discussed.

preprint2021arXiv

Echo state graph neural networks with analogue random resistor arrays

Recent years have witnessed an unprecedented surge of interest, from social networks to drug discovery, in learning representations of graph-structured data. However, graph neural networks, the machine learning models for handling graph-structured data, face significant challenges when running on conventional digital hardware, including von Neumann bottleneck incurred by physically separated memory and processing units, slowdown of Moore's law due to transistor scaling limit, and expensive training cost. Here we present a novel hardware-software co-design, the random resistor array-based echo state graph neural network, which addresses these challenges. The random resistor arrays not only harness low-cost, nanoscale and stackable resistors for highly efficient in-memory computing using simple physical laws, but also leverage the intrinsic stochasticity of dielectric breakdown to implement random projections in hardware for an echo state network that effectively minimizes the training cost thanks to its fixed and random weights. The system demonstrates state-of-the-art performance on both graph classification using the MUTAG and COLLAB datasets and node classification using the CORA dataset, achieving 34.2x, 93.2x, and 570.4x improvement of energy efficiency and 98.27%, 99.46%, and 95.12% reduction of training cost compared to conventional graph learning on digital hardware, respectively, which may pave the way for the next generation AI system for graph learning.

preprint2021arXiv

Faster State Preparation across Quantum Phase Transition Assisted by Reinforcement Learning

An energy gap develops near quantum critical point of quantum phase transition in a finite many-body (MB) system, facilitating the ground state transformation by adiabatic parameter change. In real application scenarios, however, the efficacy for such a protocol is compromised by the need to balance finite system life time with adiabaticity, as exemplified in a recent experiment that prepares three-mode balanced Dicke state near deterministically [PNAS {\bf 115}, 6381 (2018)]. Instead of tracking the instantaneous ground state as unanimously required for most adiabatic crossing, this work reports a faster sweeping policy taking advantage of excited level dynamics. It is obtained based on deep reinforcement learning (DRL) from a multi-step training scheme we develop. In the absence of loss, a fidelity $\ge 99\%$ between prepared and the target Dicke state is achieved over a small fraction of the adiabatically required time. When loss is included, training is carried out according to an operational benchmark, the interferometric sensitivity of the prepared state instead of fidelity, leading to better sensitivity in about half of the previously reported time. Implemented in a Bose-Einstein condensate of $\sim 10^4$ $^{87}$Rb atoms, the balanced three-mode Dicke state exhibiting an improved number squeezing of $13.02\pm0.20$ dB is observed within 766 ms, highlighting the potential of DRL for quantum dynamics control and quantum state preparation in interacting MB systems.

preprint2021arXiv

Investigating Pose Representations and Motion Contexts Modeling for 3D Motion Prediction

Predicting human motion from historical pose sequence is crucial for a machine to succeed in intelligent interactions with humans. One aspect that has been obviated so far, is the fact that how we represent the skeletal pose has a critical impact on the prediction results. Yet there is no effort that investigates across different pose representation schemes. We conduct an indepth study on various pose representations with a focus on their effects on the motion prediction task. Moreover, recent approaches build upon off-the-shelf RNN units for motion prediction. These approaches process input pose sequence sequentially and inherently have difficulties in capturing long-term dependencies. In this paper, we propose a novel RNN architecture termed AHMR (Attentive Hierarchical Motion Recurrent network) for motion prediction which simultaneously models local motion contexts and a global context. We further explore a geodesic loss and a forward kinematics loss for the motion prediction task, which have more geometric significance than the widely employed L2 loss. Interestingly, we applied our method to a range of articulate objects including human, fish, and mouse. Empirical results show that our approach outperforms the state-of-the-art methods in short-term prediction and achieves much enhanced long-term prediction proficiency, such as retaining natural human-like motions over 50 seconds predictions. Our codes are released.

preprint2021arXiv

LadRa-Net: Locally-Aware Dynamic Re-read Attention Net for Sentence Semantic Matching

Sentence semantic matching requires an agent to determine the semantic relation between two sentences, which is widely used in various natural language tasks, such as Natural Language Inference (NLI), Paraphrase Identification (PI), and so on. Much recent progress has been made in this area, especially attention-based methods and pre-trained language model based methods. However, most of these methods focus on all the important parts in sentences in a static way and only emphasize how important the words are to the query, inhibiting the ability of attention mechanism. In order to overcome this problem and boost the performance of attention mechanism, we propose a novel dynamic re-read attention, which can pay close attention to one small region of sentences at each step and re-read the important parts for better sentence representations. Based on this attention variation, we develop a novel Dynamic Re-read Network (DRr-Net) for sentence semantic matching. Moreover, selecting one small region in dynamic re-read attention seems insufficient for sentence semantics, and employing pre-trained language models as input encoders will introduce incomplete and fragile representation problems. To this end, we extend DRrNet to Locally-Aware Dynamic Re-read Attention Net (LadRa-Net), in which local structure of sentences is employed to alleviate the shortcoming of Byte-Pair Encoding (BPE) in pre-trained language models and boost the performance of dynamic reread attention. Extensive experiments on two popular sentence semantic matching tasks demonstrate that DRr-Net can significantly improve the performance of sentence semantic matching. Meanwhile, LadRa-Net is able to achieve better performance by considering the local structures of sentences. In addition, it is exceedingly interesting that some discoveries in our experiments are consistent with some findings of psychological research.

preprint2021arXiv

Learning the Implicit Semantic Representation on Graph-Structured Data

Existing representation learning methods in graph convolutional networks are mainly designed by describing the neighborhood of each node as a perceptual whole, while the implicit semantic associations behind highly complex interactions of graphs are largely unexploited. In this paper, we propose a Semantic Graph Convolutional Networks (SGCN) that explores the implicit semantics by learning latent semantic-paths in graphs. In previous work, there are explorations of graph semantics via meta-paths. However, these methods mainly rely on explicit heterogeneous information that is hard to be obtained in a large amount of graph-structured data. SGCN first breaks through this restriction via leveraging the semantic-paths dynamically and automatically during the node aggregating process. To evaluate our idea, we conduct sufficient experiments on several standard datasets, and the empirical results show the superior performance of our model.

preprint2021arXiv

Quality meets Diversity: A Model-Agnostic Framework for Computerized Adaptive Testing

Computerized Adaptive Testing (CAT) is emerging as a promising testing application in many scenarios, such as education, game and recruitment, which targets at diagnosing the knowledge mastery levels of examinees on required concepts. It shows the advantage of tailoring a personalized testing procedure for each examinee, which selects questions step by step, depending on her performance. While there are many efforts on developing CAT systems, existing solutions generally follow an inflexible model-specific fashion. That is, they need to observe a specific cognitive model which can estimate examinee's knowledge levels and design the selection strategy according to the model estimation. In this paper, we study a novel model-agnostic CAT problem, where we aim to propose a flexible framework that can adapt to different cognitive models. Meanwhile, this work also figures out CAT solution with addressing the problem of how to generate both high-quality and diverse questions simultaneously, which can give a comprehensive knowledge diagnosis for each examinee. Inspired by Active Learning, we propose a novel framework, namely Model-Agnostic Adaptive Testing (MAAT) for CAT solution, where we design three sophisticated modules including Quality Module, Diversity Module and Importance Module. Extensive experimental results on two real-world datasets clearly demonstrate that our MAAT can support CAT with guaranteeing both quality and diversity perspectives.

preprint2021arXiv

Universal relations between atomic dipolar relaxation and van der Waals interaction

Dipolar relaxation happens when one or both colliding atoms flip their spins exothermically inside a magnetic ($B$) field. This work reports precise measurements of dipolar relaxation in a Bose-Einstein condensate of ground state $^{87}$Rb atoms together with in-depth theoretical investigations. Previous perturbative treatments fail to explain our observations except at very small $B$-fields. By employing quantum defect theory based on analytic solutions of asymptotic van der Waals interaction $-C_6/R^6$ ($R$ being interatomic spacing), we significantly expand the applicable range of perturbative treatment. We find the $B$-dependent dipolar relaxation lineshapes are largely universal, determined by the coefficient $C_6$ and the associated $s$-wave scattering lengths $a_{\rm sc}$ of the states before and after spin flips. This universality, which applies generally to other atomic species as well, implicates potential controls of dipolar relaxation and related cold chemical reactions by tuning $a_{\rm sc}$.

preprint2020arXiv

A Study on Effects of Implicit and Explicit Language Model Information for DBLSTM-CTC Based Handwriting Recognition

Deep Bidirectional Long Short-Term Memory (D-BLSTM) with a Connectionist Temporal Classification (CTC) output layer has been established as one of the state-of-the-art solutions for handwriting recognition. It is well known that the DBLSTM trained by using a CTC objective function will learn both local character image dependency for character modeling and long-range contextual dependency for implicit language modeling. In this paper, we study the effects of implicit and explicit language model information for DBLSTM-CTC based handwriting recognition by comparing the performance of using or without using an explicit language model in decoding. It is observed that even using one million lines of training sentences to train the DBLSTM, using an explicit language model is still helpful. To deal with such a large-scale training problem, a GPU-based training tool has been developed for CTC training of DBLSTM by using a mini-batch based epochwise Back Propagation Through Time (BPTT) algorithm.

preprint2020arXiv

A Survey on Contextual Embeddings

Contextual embeddings, such as ELMo and BERT, move beyond global word representations like Word2Vec and achieve ground-breaking performance on a wide range of natural language processing tasks. Contextual embeddings assign each word a representation based on its context, thereby capturing uses of words across varied contexts and encoding knowledge that transfers across languages. In this survey, we review existing contextual embedding models, cross-lingual polyglot pre-training, the application of contextual embeddings in downstream tasks, model compression, and model analyses.

preprint2020arXiv

A Survey on Sensor Technologies for Unmanned Ground Vehicles

Unmanned ground vehicles have a huge development potential in both civilian and military fields, and have become the focus of research in various countries. In addition, high-precision, high-reliability sensors are significant for UGVs' efficient operation. This paper proposes a brief review on sensor technologies for UGVs. Firstly, characteristics of various sensors are introduced. Then the strengths and weaknesses of different sensors as well as their application scenarios are compared. Furthermore, sensor applications in some existing UGVs are summarized. Finally, the hotspots of sensor technologies are forecasted to point the development direction.

preprint2020arXiv

Accurate DOA Estimation Based on Real-Valued Singular Value Decomposition

In this paper, an accurate direction-of-arrival (DOA) estimator is developed based on the real-valued singular value decomposition (SVD) of covariance matrix. Unitary transform on the complex-valued covariance matrix is first applied, and then SVD performs on the resulting real-valued data matrix. The singular vector is then utilized with a weighted least squares (WLS) method to achieve DOA estimation. The performance of the proposed algorithm is compared with several state-of-the-art methods as well as the CRB. The results indicate the accuracy and effectiveness of the proposed method.

preprint2020arXiv

An end-to-end CNN framework for polarimetric vision tasks based on polarization-parameter-constructing network

Pixel-wise operations between polarimetric images are important for processing polarization information. For the lack of such operations, the polarization information cannot be fully utilized in convolutional neural network(CNN). In this paper, a novel end-to-end CNN framework for polarization vision tasks is proposed, which enables the networks to take full advantage of polarimetric images. The framework consists of two sub-networks: a polarization-parameter-constructing network (PPCN) and a task network. PPCN implements pixel-wise operations between images in the CNN form with 1x1 convolution kernels. It takes raw polarimetric images as input, and outputs polarization-parametric images to task network so as to complete a vison task. By training together, the PPCN can learn to provide the most suitable polarization-parametric images for the task network and the dataset. Taking faster R-CNN as task network, the experimental results show that compared with existing methods, the proposed framework achieves much higher mean-average-precision (mAP) in object detection task

preprint2020arXiv

An Investigation on Deep Learning with Beta Stabilizer

Artificial neural networks (ANN) have been used in many applications such like handwriting recognition and speech recognition. It is well-known that learning rate is a crucial value in the training procedure for artificial neural networks. It is shown that the initial value of learning rate can confoundedly affect the final result and this value is always set manually in practice. A new parameter called beta stabilizer has been introduced to reduce the sensitivity of the initial learning rate. But this method has only been proposed for deep neural network (DNN) with sigmoid activation function. In this paper we extended beta stabilizer to long short-term memory (LSTM) and investigated the effects of beta stabilizer parameters on different models, including LSTM and DNN with relu activation function. It is concluded that beta stabilizer parameters can reduce the sensitivity of learning rate with almost the same performance on DNN with relu activation function and LSTM. However, it is shown that the effects of beta stabilizer on DNN with relu activation function and LSTM are fewer than the effects on DNN with sigmoid activation function.

preprint2020arXiv

Application of light diffraction theory to qualify the downstream light field modulation property of mitigated KDP crystals

Micro-milling can effectively remove laser damage sites on a KDP (potassium dihydrogen phosphate) surface and then improve the laser damage resistance of the components. However, the repaired KDP surface could cause light propagating turbulence and downstream light intensification with the potential risk to damage downstream optics. In order to analyze the downstream light field modulation caused by Gaussian mitigation pits on KDP crystals, a computational model of the downstream light diffraction based on the angular spectrum theory and the Gaussian repair contour is established. The results show that the phase offset caused by the repaired surface produces a large light field modulation near the rear KDP surface. The modulation generated in the whole downstream light field is greater than that caused by the amplitude change. Therefore, the phase characteristics of the outgoing light could be suggested as a vital research topic for future research on the downstream light field modulation caused by mitigation contours. Significantly, the experimental results on the downstream light intensity distribution have good agreement with the simulation ones, which proves the validity of the established downstream light diffraction model. The phase characterization of the outgoing light is proposed as an evaluation tool in the repair of KDP crystals. The developed analytical method and numerical discrete algorithm could be also applicable in qualifying the repair quality of other optical components applied in high-power laser systems.

preprint2020arXiv

ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction

Molecular property prediction (e.g., energy) is an essential problem in chemistry and biology. Unfortunately, many supervised learning methods usually suffer from the problem of scarce labeled molecules in the chemical space, where such property labels are generally obtained by Density Functional Theory (DFT) calculation which is extremely computational costly. An effective solution is to incorporate the unlabeled molecules in a semi-supervised fashion. However, learning semi-supervised representation for large amounts of molecules is challenging, including the joint representation issue of both molecular essence and structure, the conflict between representation and property leaning. Here we propose a novel framework called Active Semi-supervised Graph Neural Network (ASGN) by incorporating both labeled and unlabeled molecules. Specifically, ASGN adopts a teacher-student framework. In the teacher model, we propose a novel semi-supervised learning method to learn general representation that jointly exploits information from molecular structure and molecular distribution. Then in the student model, we target at property prediction task to deal with the learning loss conflict. At last, we proposed a novel active learning strategy in terms of molecular diversities to select informative data during the whole framework learning. We conduct extensive experiments on several public datasets. Experimental results show the remarkable performance of our ASGN framework.

preprint2020arXiv

Deep Technology Tracing for High-tech Companies

Technological change and innovation are vitally important, especially for high-tech companies. However, factors influencing their future research and development (R&D) trends are both complicated and various, leading it a quite difficult task to make technology tracing for high-tech companies. To this end, in this paper, we develop a novel data-driven solution, i.e., Deep Technology Forecasting (DTF) framework, to automatically find the most possible technology directions customized to each high-tech company. Specially, DTF consists of three components: Potential Competitor Recognition (PCR), Collaborative Technology Recognition (CTR), and Deep Technology Tracing (DTT) neural network. For one thing, PCR and CTR aim to capture competitive relations among enterprises and collaborative relations among technologies, respectively. For another, DTT is designed for modeling dynamic interactions between companies and technologies with the above relations involved. Finally, we evaluate our DTF framework on real-world patent data, and the experimental results clearly prove that DTF can precisely help to prospect future technology emphasis of companies by exploiting hybrid factors.

preprint2020arXiv

Domain Adaption for Knowledge Tracing

With the rapid development of online education system, knowledge tracing which aims at predicting students' knowledge state is becoming a critical and fundamental task in personalized education. Traditionally, existing methods are domain-specified. However, there are a larger number of domains (e.g., subjects, schools) in the real world and the lacking of data in some domains, how to utilize the knowledge and information in other domains to help train a knowledge tracing model for target domains is increasingly important. We refer to this problem as domain adaptation for knowledge tracing (DAKT) which contains two aspects: (1) how to achieve great knowledge tracing performance in each domain. (2) how to transfer good performed knowledge tracing model between domains. To this end, in this paper, we propose a novel adaptable framework, namely adaptable knowledge tracing (AKT) to address the DAKT problem. Specifically, for the first aspect, we incorporate the educational characteristics (e.g., slip, guess, question texts) based on the deep knowledge tracing (DKT) to obtain a good performed knowledge tracing model. For the second aspect, we propose and adopt three domain adaptation processes. First, we pre-train an auto-encoder to select useful source instances for target model training. Second, we minimize the domain-specific knowledge state distribution discrepancy under maximum mean discrepancy (MMD) measurement to achieve domain adaptation. Third, we adopt fine-tuning to deal with the problem that the output dimension of source and target domain are different to make the model suitable for target domains. Extensive experimental results on two private datasets and seven public datasets clearly prove the effectiveness of AKT for great knowledge tracing performance and its superior transferable ability.

preprint2020arXiv

Energy Trading in Microgrids for Synergies among Electricity, Hydrogen and Heat Networks

The emerging paradigm of interconnected microgrids advocates energy trading or sharing among multiple microgrids. It helps make full use of the temporal availability of energy and diversity in operational costs when meeting various energy loads. However, energy trading might not completely absorb excess renewable energy. A multi-energy management framework including fuel cell vehicles, energy storage, combined heat and power system, and renewable energy is proposed, and the characteristics and scheduling arrangements of fuel cell vehicles are considered to further improve the local absorption of the renewable energy and enhance the economic benefits of microgrids. While intensive research has been conducted on energy scheduling and trading problem, a fundamental question still remains unanswered on microgrid economics. Namely, due to multi-energy coupling, stochastic renewable energy generation and demands, when and how a microgrid should schedule and trade energy with others, which maximizes its long-term benefit. This paper designs a joint energy scheduling and trading algorithm based on Lyapunov optimization and a double-auction mechanism. Its purpose is to determine the valuations of energy in the auction, optimally schedule energy distribution, and strategically purchase and sell energy with the current electricity prices. Simulations based on real data show that each individual microgrid, under the management of the proposed algorithm, can achieve a time-averaged profit that is arbitrarily close to an optimum value, while avoiding compromising its own comfort.

preprint2020arXiv

Future Vector Enhanced LSTM Language Model for LVCSR

Language models (LM) play an important role in large vocabulary continuous speech recognition (LVCSR). However, traditional language models only predict next single word with given history, while the consecutive predictions on a sequence of words are usually demanded and useful in LVCSR. The mismatch between the single word prediction modeling in trained and the long term sequence prediction in read demands may lead to the performance degradation. In this paper, a novel enhanced long short-term memory (LSTM) LM using the future vector is proposed. In addition to the given history, the rest of the sequence will be also embedded by future vectors. This future vector can be incorporated with the LSTM LM, so it has the ability to model much longer term sequence level information. Experiments show that, the proposed new LSTM LM gets a better result on BLEU scores for long term sequence prediction. For the speech recognition rescoring, although the proposed LSTM LM obtains very slight gains, the new model seems obtain the great complementary with the conventional LSTM LM. Rescoring using both the new and conventional LSTM LMs can achieve a very large improvement on the word error rate.

preprint2020arXiv

IROS 2019 Lifelong Robotic Vision Challenge -- Lifelong Object Recognition Report

This report summarizes IROS 2019-Lifelong Robotic Vision Competition (Lifelong Object Recognition Challenge) with methods and results from the top $8$ finalists (out of over~$150$ teams). The competition dataset (L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition (OpenLORIS-object) is designed for driving lifelong/continual learning research and application in robotic vision domain, with everyday objects in home, office, campus, and mall scenarios. The dataset explicitly quantifies the variants of illumination, object occlusion, object size, camera-object distance/angles, and clutter information. Rules are designed to quantify the learning capability of the robotic vision system when faced with the objects appearing in the dynamic environments in the contest. Individual reports, dataset information, rules, and released source code can be found at the project homepage: "https://lifelong-robotic-vision.github.io/competition/".

preprint2020arXiv

Learning the Compositional Visual Coherence for Complementary Recommendations

Complementary recommendations, which aim at providing users product suggestions that are supplementary and compatible with their obtained items, have become a hot topic in both academia and industry in recent years. %However, it is challenging due to its complexity and subjectivity. Existing work mainly focused on modeling the co-purchased relations between two items, but the compositional associations of item collections are largely unexplored. Actually, when a user chooses the complementary items for the purchased products, it is intuitive that she will consider the visual semantic coherence (such as color collocations, texture compatibilities) in addition to global impressions. Towards this end, in this paper, we propose a novel Content Attentive Neural Network (CANN) to model the comprehensive compositional coherence on both global contents and semantic contents. Specifically, we first propose a \textit{Global Coherence Learning} (GCL) module based on multi-heads attention to model the global compositional coherence. Then, we generate the semantic-focal representations from different semantic regions and design a \textit{Focal Coherence Learning} (FCL) module to learn the focal compositional coherence from different semantic-focal representations. Finally, we optimize the CANN in a novel compositional optimization strategy. Extensive experiments on the large-scale real-world data clearly demonstrate the effectiveness of CANN compared with several state-of-the-art methods.

preprint2020arXiv

Model-based Joint Bit Allocation between Geometry and Color for Video-based 3D Point Cloud Compression

Rate distortion optimization plays a very important role in image/video coding. But for 3D point cloud, this problem has not been investigated. In this paper, the rate and distortion characteristics of 3D point cloud are investigated in detail, and a typical and challenging rate distortion optimization problem is solved for 3D point cloud. Specifically, since the quality of the reconstructed 3D point cloud depends on both the geometry and color distortions, we first propose analytical rate and distortion models for the geometry and color information in video-based 3D point cloud compression platform, and then solve the joint bit allocation problem for geometry and color based on the derived models. To maximize the reconstructed quality of 3D point cloud, the bit allocation problem is formulated as a constrained optimization problem and solved by an interior point method. Experimental results show that the rate-distortion performance of the proposed solution is close to that obtained with exhaustive search but at only 0.68% of its time complexity. Moreover, the proposed rate and distortion models can also be used for the other rate-distortion optimization problems (such as prediction mode decision) and rate control technologies for 3D point cloud coding in the future.

preprint2020arXiv

Modular End-to-end Automatic Speech Recognition Framework for Acoustic-to-word Model

End-to-end (E2E) systems have played a more and more important role in automatic speech recognition (ASR) and achieved great performance. However, E2E systems recognize output word sequences directly with the input acoustic feature, which can only be trained on limited acoustic data. The extra text data is widely used to improve the results of traditional artificial neural network-hidden Markov model (ANN-HMM) hybrid systems. The involving of extra text data to standard E2E ASR systems may break the E2E property during decoding. In this paper, a novel modular E2E ASR system is proposed. The modular E2E ASR system consists of two parts: an acoustic-to-phoneme (A2P) model and a phoneme-to-word (P2W) model. The A2P model is trained on acoustic data, while extra data including large scale text data can be used to train the P2W model. This additional data enables the modular E2E ASR system to model not only the acoustic part but also the language part. During the decoding phase, the two models will be integrated and act as a standard acoustic-to-word (A2W) model. In other words, the proposed modular E2E ASR system can be easily trained with extra text data and decoded in the same way as a standard E2E ASR system. Experimental results on the Switchboard corpus show that the modular E2E model achieves better word error rate (WER) than standard A2W models.

preprint2020arXiv

Multi-parameter estimation with multi-mode Ramsey interferometry

Estimating multiple parameters simultaneously is of great importance to measurement science and application. For a single parameter, atomic Ramsey interferometry (or equivalently optical Mach-Zehnder interferometry) is capable of providing the precision at the standard quantum limit (SQL) using unentangled probe states as input. In such an interferometer, the first beam splitter represented by unitary transformation $U$ generates a quantum phase sensing superposition state, while the second beam splitter $U^{-1}$ recombines the phase encoded paths to realize interferometric sensing in terms of population measurements. We prove that such an interferometric scheme can be directly generalized to estimation of multiple parameters (associated with commuting generators) to the SQL precision using multi-mode unentangled states, if (but not iff) $U$ is orthogonal, i.e. a unitary transformation with only real matrix elements. We show that such a $U$ can always be constructed experimentally in a simple and scalable manner. The effects of particle number fluctuation and detection noise on such multi-mode interferometry are considered. Our findings offer a simple solution for estimating multiple parameters corresponding to mutually commuting generators.

preprint2020arXiv

Multi-Task Self-Supervised Learning for Disfluency Detection

Most existing approaches to disfluency detection heavily rely on human-annotated data, which is expensive to obtain in practice. To tackle the training data bottleneck, we investigate methods for combining multiple self-supervised tasks-i.e., supervised tasks where data can be collected without manual labeling. First, we construct large-scale pseudo training data by randomly adding or deleting words from unlabeled news data, and propose two self-supervised pre-training tasks: (i) tagging task to detect the added noisy words. (ii) sentence classification to distinguish original sentences from grammatically-incorrect sentences. We then combine these two tasks to jointly train a network. The pre-trained network is then fine-tuned using human-annotated disfluency detection training data. Experimental results on the commonly used English Switchboard test set show that our approach can achieve competitive performance compared to the previous systems (trained using the full dataset) by using less than 1% (1000 sentences) of the training data. Our method trained on the full dataset significantly outperforms previous methods, reducing the error by 21% on English Switchboard.

preprint2020arXiv

Neural Cognitive Diagnosis for Intelligent Education Systems

Cognitive diagnosis is a fundamental issue in intelligent education, which aims to discover the proficiency level of students on specific knowledge concepts. Existing approaches usually mine linear interactions of student exercising process by manual-designed function (e.g., logistic function), which is not sufficient for capturing complex relations between students and exercises. In this paper, we propose a general Neural Cognitive Diagnosis (NeuralCD) framework, which incorporates neural networks to learn the complex exercising interactions, for getting both accurate and interpretable diagnosis results. Specifically, we project students and exercises to factor vectors and leverage multi neural layers for modeling their interactions, where the monotonicity assumption is applied to ensure the interpretability of both factors. Furthermore, we propose two implementations of NeuralCD by specializing the required concepts of each exercise, i.e., the NeuralCDM with traditional Q-matrix and the improved NeuralCDM+ exploring the rich text content. Extensive experimental results on real-world datasets show the effectiveness of NeuralCD framework with both accuracy and interpretability.

preprint2020arXiv

SketchyCOCO: Image Generation from Freehand Scene Sketches

We introduce the first method for automatic image generation from scene-level freehand sketches. Our model allows for controllable image generation by specifying the synthesis goal via freehand sketches. The key contribution is an attribute vector bridged Generative Adversarial Network called EdgeGAN, which supports high visual-quality object-level image content generation without using freehand sketches as training data. We have built a large-scale composite dataset called SketchyCOCO to support and evaluate the solution. We validate our approach on the tasks of both object-level and scene-level image generation on SketchyCOCO. Through quantitative, qualitative results, human evaluation and ablation studies, we demonstrate the method's capacity to generate realistic complex scene-level images from various freehand sketches.

preprint2019arXiv

A Dynamic and Cooperative Tracking System for Crowdfunding

Crowdfunding is an emerging finance platform for creators to fund their efforts by soliciting relatively small contributions from a large number of individuals using the Internet. Due to the unique rules, a campaign succeeds in trading only when it collects adequate funds in a given time. To prevent creators and backers from wasting time and efforts on failing campaigns, dynamically estimating the success probability of a campaign is very important. However, existing crowdfunding systems neither have the mechanism of dynamic predictive tracking, nor provide the real-time campaign status for creators and backers on the platform. To address these issues, we develop a novel system, which contains a dynamic data-driven approach to tracking the success probability and status. We demonstrate the following scenarios using our system. First, users can utilize our system to analyze the emotion of incremental reviews so as to understand backers' perspectives of the campaign in time. Meanwhile, our system visualizes the statistic number of positive and negative reviews. On this basis, our system can dynamically track the success probability of each campaign.

preprint2019arXiv

Machine learning and evolutionary algorithm studies of graphene metamaterials for optimized plasmon-induced transparency

Machine learning and optimization algorithms have been widely applied in the design and optimization for photonic devices. In this article, we briefly review recent progress of this field of research and show some data-driven applications (e.g. spectrum prediction, inverse design and performance optimization) for novel graphene metamaterials (GMs). The structure of the GMs is well-designed to achieve the wideband plasmon induced transparency effect, which is regarded as optimization object and can be theoretically demonstrated by using transfer matrix method. Some classical machine learning algorithms, including k nearest neighbour, decision tree, random forest and artificial neural networks, are utilized to equivalently substitute the numerical simulation in the forward spectrum prediction and complete the inverse design for the GMs. The calculated results demonstrate that all the algorithms are effective and the random forest has advantages in terms of accuracy and training speed. Moreover, the single-objective and multi-objective optimization algorithms are used to achieve steep transmission characteristics by synthetically taking many performance metrics into consideration. The maximum difference between the transmission peaks and dips in the optimized transmission spectrum can reach 0.97. In comparison to previous works, we provide a guidance for intelligent design of photonic devices and advanced materials based on machine learning and evolutionary algorithms.

preprint2019arXiv

Solving Fokker-Planck equation using deep learning

The probability density function of stochastic differential equations is governed by the Fokker-Planck (FP) equation. A novel machine learning method is developed to solve the general FP equations based on deep neural networks. The proposed algorithm does not require any interpolation and coordinate transformation, which is different from the traditional numercial methods. The main novelty of this paper is that penalty factors are introduced to overcome the local optimization for the deep learning approach, and the corresponding setting rules are given. Meanwhile, we consider a normalization condition as a supervision condition to effectively avoid that the trial solution is zero. Several numerical examples are presented to illustrate performances of the proposed algorithm, including one- and two-dimensional systems. All the results suggest that the deep learning is quite feasible and effective to calculate the FP equation. Further, influences of the number of hidden layers, the penalty factors, and the optimization algorithm are discussed in detail. These results indicate that the performances of the machine learning technique can be improved through constructing the neural networks appropriately.

preprint2019arXiv

Target Localization with Jammer Removal Using Frequency Diverse Array

A foremost task in frequency diverse array multiple-input multiple-output (FDA-MIMO) radar is to efficiently obtain the target signal in the presence of interferences. In this paper, we employ a novel "low-rank + low-rank + sparse" decomposition model to extract the low-rank desired signal and suppress the jamming signals from both barrage and burst jammers. In the literature, the barrage jamming signals, which are intentionally interfered by enemy jammer radar, are usually assumed Gaussian distributed. However, such assumption is oversimplified to hold in practice as the interferences often exhibit non-Gaussian properties. Those non-Gaussian jamming signals, known as impulsive noise or burst jamming, are involuntarily deviated from friendly radar or other working radio equipment including amplifier saturation and sensor failures, thunderstorms and man-made noise. The estimation performance of the existing estimators, relied crucially on the Gaussian noise assumption, may degrade substantially since the probability density function (PDF) of burst jamming has heavier tails that exceed a few standard deviations than the Gaussian distribution. To capture a more general signal model with burst jamming in practice, both barrage jamming and burst jamming are included and a two-step "Go Decomposition" (GoDec) method via alternating minimization is devised for such mixed jamming signal model, where the $a$ $priori$ rank information is exploited to suppress two kinds of jammers and estimate the desired target. Simulation results verify the robust performance of the devised scheme.