Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
68works
0followers
32topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

68 published item(s)

preprint2026arXiv

Learning Generalizable Multimodal Representations for Software Vulnerability Detection

Source code and its accompanying comments are complementary yet naturally aligned modalities-code encodes structural logic while comments capture developer intent. However, existing vulnerability detection methods mostly rely on single-modality code representations, overlooking the complementary semantic information embedded in comments and thus limiting their generalization across complex code structures and logical relationships. To address this, we propose MultiVul, a multimodal contrastive framework that aligns code and comment representations through dual similarity learning and consistency regularization, augmented with diverse code-text pairs to improve robustness. Experiments on widely adopted DiverseVul and Devign datasets across four large language models (LLMs) (i.e., DeepSeek-Coder-6.7B, Qwen2.5-Coder-7B, StarCoder2-7B, and CodeLlama-7B) show that MultiVul achieves up to 27.07% F1 improvement over prompting-based methods and 13.37% over code-only Fine-Tuning, while maintaining comparable inference efficiency.

preprint2026arXiv

LLM-Oriented Information Retrieval: A Denoising-First Perspective

Modern information retrieval (IR) is no longer consumed primarily by humans but increasingly by large language models (LLMs) via retrieval-augmented generation (RAG) and agentic search. Unlike human users, LLMs are constrained by limited attention budgets and are uniquely vulnerable to noise; misleading or irrelevant information is no longer just a nuisance, but a direct cause of hallucinations and reasoning failures. In this perspective paper, we argue that denoising-maximizing usable evidence density and verifiability within a context window-is becoming the primary bottleneck across the full information access pipeline. We conceptualize this paradigm shift through a four-stage framework of IR challenges: from inaccessible to undiscoverable, to misaligned, and finally to unverifiable. Furthermore, we provide a pipeline-organized taxonomy of signal-to-noise optimization techniques, spanning indexing, retrieval, context engineering, verification, and agentic workflow. We also present research works on information denoising in domains that rely heavily on retrieval such as lifelong assistant, coding agent, deep research, and multimodal understanding.

preprint2023arXiv

Dual-space Hierarchical Learning for Goal-guided Conversational Recommendation

Proactively and naturally guiding the dialog from the non-recommendation context (e.g., Chit-chat) to the recommendation scenario (e.g., Music) is crucial for the Conversational Recommender System (CRS). Prior studies mainly focus on planning the next dialog goal~(e.g., chat on a movie star) conditioned on the previous dialog. However, we find the dialog goals can be simultaneously observed at different levels, which can be utilized to improve CRS. In this paper, we propose Dual-space Hierarchical Learning (DHL) to leverage multi-level goal sequences and their hierarchical relationships for conversational recommendation. Specifically, we exploit multi-level goal sequences from both the representation space and the optimization space. In the representation space, we propose the hierarchical representation learning where a cross attention module derives mutually enhanced multi-level goal representations. In the optimization space, we devise the hierarchical weight learning to reweight lower-level goal sequences, and introduce bi-level optimization for stable update. Additionally, we propose a soft labeling strategy to guide optimization gradually. Experiments on two real-world datasets verify the effectiveness of our approach. Code and data are available here.

preprint2022arXiv

A Bayesian Precision Response-adaptive Phase II Clinical Trial Design for Radiotherapies with Competing Risk Survival Outcomes

Many phase II clinical trials have used survival outcomes as the primary endpoints in recent decades. Suppose the radiotherapy is evaluated in a phase II trial using survival outcomes. In that case, the competing risk issue often arises because the time to disease progression can be censored by the time to normal tissue complications, and vice versa. Besides, much literature has examined that patients receiving the same radiotherapy dose may yield distinct responses due to their heterogeneous radiation susceptibility statuses. Therefore, the "one-dose-fit-all" strategy often fails, and it is more relevant to evaluate the subgroup-specific treatment effect with the subgroup defined by the radiation susceptibility status. In this paper, we propose a Bayesian precision phase II trial design evaluating the subgroup-specific treatment effects of radiotherapy. We use the cause-specific hazard approach to model the competing risk survival outcomes. We propose restricting the candidate radiation doses based on each patient's radiation susceptibility status. Only the clinically feasible personalized dose will be considered, which enhances the benefit for the patients in the trial. In addition, we propose a stratified Bayesian adaptive randomization scheme such that more patients will be randomized to the dose reporting more favorable survival outcomes. Numerical studies have shown that the proposed design performed well and outperformed the conventional design ignoring the competing risk issue.

preprint2022arXiv

An Alternating Direction Explicit Method for Time Evolution Equations with Applications to Fractional Differential Equations

We derive and analyze the alternating direction explicit (ADE) method for time evolution equations with the time-dependent Dirichlet boundary condition and with the zero Neumann boundary condition. The original ADE method is an additive operator splitting (AOS) method, which has been developed for treating a wide range of linear and nonlinear time evolution equations with the zero Dirichlet boundary condition. For linear equations, it has been shown to achieve the second order accuracy in time yet is unconditionally stable for an arbitrary time step size. For the boundary conditions considered in this work, we carefully construct the updating formula at grid points near the boundary of the computational domain and show that these formulas maintain the desired accuracy and the property of unconditional stability. We also construct numerical methods based on the ADE scheme for two classes of fractional differential equations. We will give numerical examples to demonstrate the simplicity and the computational efficiency of the method.

preprint2022arXiv

An Operator-Splitting Method for the Gaussian Curvature Regularization Model with Applications to Surface Smoothing and Imaging

Gaussian curvature is an important geometric property of surfaces, which has been used broadly in mathematical modeling. Due to the full nonlinearity of the Gaussian curvature, efficient numerical methods for models based on it are uncommon in literature. In this article, we propose an operator-splitting method for a general Gaussian curvature model. In our method, we decouple the full nonlinearity of Gaussian curvature from differential operators by introducing two matrix- and vector-valued functions. The optimization problem is then converted into the search for the steady state solution of a time dependent PDE system. The above PDE system is well-suited to time discretization by operator splitting, the sub-problems encountered at each fractional step having either a closed form solution or being solvable by efficient algorithms. The proposed method is not sensitive to the choice of parameters, its efficiency and performances being demonstrated via systematic experiments on surface smoothing and image denoising.

preprint2022arXiv

Arrhythmia Classifier using Binarized Convolutional Neural Network for Resource-Constrained Devices

Monitoring electrocardiogram signals is of great significance for the diagnosis of arrhythmias. In recent years, deep learning and convolutional neural networks have been widely used in the classification of cardiac arrhythmias. However, the existing neural network applied to ECG signal detection usually requires a lot of computing resources, which is not friendlyF to resource-constrained equipment, and it is difficult to realize real-time monitoring. In this paper, a binarized convolutional neural network suitable for ECG monitoring is proposed, which is hardware-friendly and more suitable for use in resource-constrained wearable devices. Targeting the MIT-BIH arrhythmia database, the classifier based on this network reached an accuracy of 95.67% in the five-class test. Compared with the proposed baseline full-precision network with an accuracy of 96.45%, it is only 0.78% lower. Importantly, it achieves 12.65 times the computing speedup, 24.8 times the storage compression ratio, and only requires a quarter of the memory overhead.

preprint2022arXiv

Arrhythmia Classifier Using Convolutional Neural Network with Adaptive Loss-aware Multi-bit Networks Quantization

Cardiovascular disease (CVDs) is one of the universal deadly diseases, and the detection of it in the early stage is a challenging task to tackle. Recently, deep learning and convolutional neural networks have been employed widely for the classification of objects. Moreover, it is promising that lots of networks can be deployed on wearable devices. An increasing number of methods can be used to realize ECG signal classification for the sake of arrhythmia detection. However, the existing neural networks proposed for arrhythmia detection are not hardware-friendly enough due to a remarkable quantity of parameters resulting in memory and power consumption. In this paper, we present a 1-D adaptive loss-aware quantization, achieving a high compression rate that reduces memory consumption by 23.36 times. In order to adapt to our compression method, we need a smaller and simpler network. We propose a 17 layer end-to-end neural network classifier to classify 17 different rhythm classes trained on the MIT-BIH dataset, realizing a classification accuracy of 93.5%, which is higher than most existing methods. Due to the adaptive bitwidth method making important layers get more attention and offered a chance to prune useless parameters, the proposed quantization method avoids accuracy degradation. It even improves the accuracy rate, which is 95.84%, 2.34% higher than before. Our study achieves a 1-D convolutional neural network with high performance and low resources consumption, which is hardware-friendly and illustrates the possibility of deployment on wearable devices to realize a real-time arrhythmia diagnosis.

preprint2022arXiv

Asymptotic behavior of conservative solutions to the Hunter-Saxton equation

In this paper we study the large time asymptotic behavior of (energy) conservative solutions to the Hunter-Saxton equation in a generalized framework that consists of the evolutions of solution and its energy measure. We describe the large time asymptotic expansions of the conservative solutions, and rigorously verify the validity of the leading order term in $L^{\infty}(\mathbb{R})$ and ${\dot{H}}^1(\mathbb{R})$ spaces respectively. The leading order term is given by a kink-wave that is determined by the total energy of the system only. As a corollary, we also show that the singular part of the energy measure converges to zero, as the time goes to either positive or negative infinity. Under some natural decay rate assumptions on the tails of the initial energy measure, we rigorously provide the optimal error estimates in $L^{\infty}(\mathbb{R})$ and ${\dot{H}}^1(\mathbb{R})$. As the time goes to infinity, the pointwise convergence and pointwise growth rate for the solution are also obtained under the same assumptions on the initial data. The proofs of our results rely heavily on the elaborate analysis of the generalized characteristics designed for the measure-valued initial data, and explicit formulae for conservative solutions.

preprint2022arXiv

Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint

Overparameterized neural networks enjoy great representation power on complex data, and more importantly yield sufficiently smooth output, which is crucial to their generalization and robustness. Most existing function approximation theories suggest that with sufficiently many parameters, neural networks can well approximate certain classes of functions in terms of the function value. The neural network themselves, however, can be highly nonsmooth. To bridge this gap, we take convolutional residual networks (ConvResNets) as an example, and prove that large ConvResNets can not only approximate a target function in terms of function value, but also exhibit sufficient first-order smoothness. Moreover, we extend our theory to approximating functions supported on a low-dimensional manifold. Our theory partially justifies the benefits of using deep and wide networks in practice. Numerical experiments on adversarial robust image classification are provided to support our theory.

preprint2022arXiv

CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery

We introduce Contrastive Intrinsic Control (CIC), an algorithm for unsupervised skill discovery that maximizes the mutual information between state-transitions and latent skill vectors. CIC utilizes contrastive learning between state-transitions and skills to learn behavior embeddings and maximizes the entropy of these embeddings as an intrinsic reward to encourage behavioral diversity. We evaluate our algorithm on the Unsupervised Reinforcement Learning Benchmark, which consists of a long reward-free pre-training phase followed by a short adaptation phase to downstream tasks with extrinsic rewards. CIC substantially improves over prior methods in terms of adaptation efficiency, outperforming prior unsupervised skill discovery methods by 1.79x and the next leading overall exploration algorithm by 1.18x.

preprint2022arXiv

Coevolution of broad emission lines and X-ray spectrum in changing-look AGNs

Changing-look active galactic nuclei (CLAGNs) show the disappearance and reappearance of broad emission lines in a few years, which challenges the orientation-based AGN unification model. We reduce the X-ray data for five well-studied CLAGNs that show a strong change in broad emission lines in the past several decades. We find that the X-ray photon index, $Γ$, and the Eddington-scaled X-ray luminosity, $L_{\rm 2-10 keV}/L_{\rm Edd}$, normally follow negative and positive correlations when the Eddington ratio is lower and higher than a critical value of $\sim 10^{-3}$. We find that the CLAGNs observed with broad H$β$ emission lines stay in the positive part of the $Γ-L_{\rm 2-10 keV}/L_{\rm Edd}$ correlation, while the broad H$β$ lines become weak or disappear in the anticorrelation part of the $Γ-L_{\rm 2-10 keV}/L_{\rm Edd}$ correlation, which suggests that the evolution of the broad lines should be correlated with the evolution of the underlying accretion process. We further find that the CLAGNs are consistent with the other different types of AGNs in the $L_{\rm bol}-L_{\rm bol}/L_{\rm Edd}$ correlation. These results support that the CLAGNs are belong to a special stage of AGNs with a bolometric Eddington ratio $\sim$1\%, where the broad emission lines are easily affected by the strong variation in ionization luminosity that is caused by the transition of accretion modes.

preprint2022arXiv

Continual Learning with Recursive Gradient Optimization

Learning multiple tasks sequentially without forgetting previous knowledge, called Continual Learning(CL), remains a long-standing challenge for neural networks. Most existing methods rely on additional network capacity or data replay. In contrast, we introduce a novel approach which we refer to as Recursive Gradient Optimization(RGO). RGO is composed of an iteratively updated optimizer that modifies the gradient to minimize forgetting without data replay and a virtual Feature Encoding Layer(FEL) that represents different long-term structures with only task descriptors. Experiments demonstrate that RGO has significantly better performance on popular continual classification benchmarks when compared to the baselines and achieves new state-of-the-art performance on 20-split-CIFAR100(82.22%) and 20-split-miniImageNet(72.63%). With higher average accuracy than Single-Task Learning(STL), this method is flexible and reliable to provide continual learning capabilities for learning models that rely on gradient descent.

preprint2022arXiv

Cycle Self-Training for Semi-Supervised Object Detection with Distribution Consistency Reweighting

Recently, many semi-supervised object detection (SSOD) methods adopt teacher-student framework and have achieved state-of-the-art results. However, the teacher network is tightly coupled with the student network since the teacher is an exponential moving average (EMA) of the student, which causes a performance bottleneck. To address the coupling problem, we propose a Cycle Self-Training (CST) framework for SSOD, which consists of two teachers T1 and T2, two students S1 and S2. Based on these networks, a cycle self-training mechanism is built, i.e., S1${\rightarrow}$T1${\rightarrow}$S2${\rightarrow}$T2${\rightarrow}$S1. For S${\rightarrow}$T, we also utilize the EMA weights of the students to update the teachers. For T${\rightarrow}$S, instead of providing supervision for its own student S1(S2) directly, the teacher T1(T2) generates pseudo-labels for the student S2(S1), which looses the coupling effect. Moreover, owing to the property of EMA, the teacher is most likely to accumulate the biases from the student and make the mistakes irreversible. To mitigate the problem, we also propose a distribution consistency reweighting strategy, where pseudo-labels are reweighted based on distribution consistency across the teachers T1 and T2. With the strategy, the two students S2 and S1 can be trained robustly with noisy pseudo labels to avoid confirmation biases. Extensive experiments prove the superiority of CST by consistently improving the AP over the baseline and outperforming state-of-the-art methods by 2.1% absolute AP improvements with scarce labeled data.

preprint2022arXiv

Deep Nonparametric Estimation of Operators between Infinite Dimensional Spaces

Learning operators between infinitely dimensional spaces is an important learning task arising in wide applications in machine learning, imaging science, mathematical modeling and simulations, etc. This paper studies the nonparametric estimation of Lipschitz operators using deep neural networks. Non-asymptotic upper bounds are derived for the generalization error of the empirical risk minimizer over a properly chosen network class. Under the assumption that the target operator exhibits a low dimensional structure, our error bounds decay as the training sample size increases, with an attractive fast rate depending on the intrinsic dimension in our estimation. Our assumptions cover most scenarios in real applications and our results give rise to fast rates by exploiting low dimensional structures of data in operator estimation. We also investigate the influence of network structures (e.g., network width, depth, and sparsity) on the generalization error of the neural network estimator and propose a general suggestion on the choice of network structures to maximize the learning efficiency quantitatively.

preprint2022arXiv

Demonstration of room-temperature continuous-wave operation of InGaAs/AlGaAs quantum well lasers directly grown on on-axis silicon (001)

Room-temperature continuous-wave operation of InGaAs/AlGaAs quantum well lasers directly grown on on-axis silicon (001) has been demonstrated. A 420 nm thick GaAs epilayer completely free of antiphase domains was initially grown on the silicon substrate in a metal-organic chemical vapor deposition system and the other epilayers including four sets of five-period strained-layer superlattices and the laser-structural layers were successively grown in a molecular beam epitaxy system. The lasers were prepared as broad-stripe Fabry-Perot ones with a stripe width of 21.5 um and a cavity length of 1 mm. Typically, the threshold current and the corresponding threshold current density are 186.4 mA and 867 A/cm2, respectively. The lasing wavelength is around 980 nm and the slope efficiency is 0.097 W/A with a single-facet output power of 22.5 mW at an injection current of 400 mA. This advancement makes the silicon-based monolithic optoelectronic integration relevant to quantum well lasers more promising with an enhanced feasibility.

preprint2022arXiv

Development and Commissioning of a Compact Cosmic Ray Muon Imaging Prototype

Due to the muon tomography's capability of imaging high Z materials, some potential applications have been reported on inspecting smuggled nuclear materials in customs. A compact Cosmic Ray Muons (CRM) imaging prototype, Lanzhou University Muon Imaging System (LUMIS), is comprehensively introduced in this paper including the structure design, assembly, data acquisition and analysis, detector performance test, and material imaging commissioning etc. Casted triangular prism plastic scintillators (PS) were coupled with Si-PMs for sensitive detector components in system. LUMIS's experimental results show that the detection efficiency of an individual detector layer is about 98%, the position resolution for vertical incident muons is 2.5 mm and the angle resolution is 8.73 mrad given a separation distance of 40.5 cm. Moreover, the image reconstruction software was developed based on the Point of Closest Approach (PoCA) to detect lead bricks as our target. The reconstructed images indicate that the profile of the lead bricks in the image is highly consistent with the target. Subsequently, the capability of LUMIS to distinguish different materials, such as Pb, Cu, Fe, and Al, was investigated as well. The lower limit of response time for rapidly alarming high-Z materials is also given and discussed. The successful development and commissioning of the LUMIS prototype have provided a new solution option in technology and craftsmanship for developing compact CRM imaging systems that can be used in many applications.

preprint2022arXiv

Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning

Recent progress in deep learning has relied on access to large and diverse datasets. Such data-driven progress has been less evident in offline reinforcement learning (RL), because offline RL data is usually collected to optimize specific target tasks limiting the data's diversity. In this work, we propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL. ExORL first generates data with unsupervised reward-free exploration, then relabels this data with a downstream reward before training a policy with offline RL. We find that exploratory data allows vanilla off-policy RL algorithms, without any offline-specific modifications, to outperform or match state-of-the-art offline RL algorithms on downstream tasks. Our findings suggest that data generation is as important as algorithmic advances for offline RL and hence requires careful consideration from the community. Code and data can be found at https://github.com/denisyarats/exorl .

preprint2022arXiv

Emoji-based Co-attention Network for Microblog Sentiment Analysis

Emojis are widely used in online social networks to express emotions, attitudes, and opinions. As emotional-oriented characters, emojis can be modeled as important features of emotions towards the recipient or subject for sentiment analysis. However, existing methods mainly take emojis as heuristic information that fails to resolve the problem of ambiguity noise. Recent researches have utilized emojis as an independent input to classify text sentiment but they ignore the emotional impact of the interaction between text and emojis. It results that the emotional semantics of emojis cannot be fully explored. In this paper, we propose an emoji-based co-attention network that learns the mutual emotional semantics between text and emojis on microblogs. Our model adopts the co-attention mechanism based on bidirectional long short-term memory incorporating the text and emojis, and integrates a squeeze-and-excitation block in a convolutional neural network classifier to increase its sensitivity to emotional semantic features. Experimental results show that the proposed method can significantly outperform several baselines for sentiment analysis on short texts of social media.

preprint2022arXiv

Feature and Instance Joint Selection: A Reinforcement Learning Perspective

Feature selection and instance selection are two important techniques of data processing. However, such selections have mostly been studied separately, while existing work towards the joint selection conducts feature/instance selection coarsely; thus neglecting the latent fine-grained interaction between feature space and instance space. To address this challenge, we propose a reinforcement learning solution to accomplish the joint selection task and simultaneously capture the interaction between the selection of each feature and each instance. In particular, a sequential-scanning mechanism is designed as action strategy of agents, and a collaborative-changing environment is used to enhance agent collaboration. In addition, an interactive paradigm introduces prior selection knowledge to help agents for more efficient exploration. Finally, extensive experiments on real-world datasets have demonstrated improved performances.

preprint2022arXiv

Forecasts on CMB lensing observations with AliCPT-1

AliCPT-1 is the first Chinese CMB experiment aiming for high precision measurement of Cosmic Microwave Background B-mode polarization. The telescope, currently under deployment in Tibet, will observe in two frequency bands centered at 90 and 150 GHz. We forecast the CMB lensing reconstruction, lensing-galaxy as well as lensing-CIB (Cosmic Infrared Background) cross correlation signal-to-noise ratio (SNR) for AliCPT-1. We consider two stages with different integrated observation time, namely &#34;4 module*yr&#34; (first stage) and &#34;48 module*yr&#34; (final stage). For lensing reconstruction, we use three different quadratic estimators, namely temperature-only, polarization-only and minimum-variance estimators, using curved sky geometry. We take into account the impact of inhomogeneous hit counts as well as of the mean-field bias due to incomplete sky coverage. In the first stage, our results show that the 150 GHz channel is able to measure the lensing signal at $15σ$ significance with the minimum-variance estimator. In the final stage, the measurement significance will increase to $31σ$. We also combine the two frequency data in the harmonic domain to optimize the SNR. Our result show that the coadding procedure can significantly reduce the reconstruction bias in the multiple range l>800. Thanks to the high quality of the polarization data in the final stage of AliCPT-1, the EB estimator will dominate the lensing reconstruction in this stage. We also estimate the SNR of cross-correlations between AliCPT-1 CMB lensing and other tracers of the large scale structure of the universe. For its cross-correlation with DESI galaxies/quasars, we report the cross-correlation SNR = 10-20 for the 4 redshift bins at 0.05<z<2.1. In the first stage, the total SNR is about $32$. In the final stage, the lensing-galaxy cross-correlation can reach SNR=52.

preprint2022arXiv

GMN: Generative Multi-modal Network for Practical Document Information Extraction

Document Information Extraction (DIE) has attracted increasing attention due to its various advanced applications in the real world. Although recent literature has already achieved competitive results, these approaches usually fail when dealing with complex documents with noisy OCR results or mutative layouts. This paper proposes Generative Multi-modal Network (GMN) for real-world scenarios to address these problems, which is a robust multi-modal generation method without predefined label categories. With the carefully designed spatial encoder and modal-aware mask module, GMN can deal with complex documents that are hard to serialized into sequential order. Moreover, GMN tolerates errors in OCR results and requires no character-level annotation, which is vital because fine-grained annotation of numerous documents is laborious and even requires annotators with specialized domain knowledge. Extensive experiments show that GMN achieves new state-of-the-art performance on several public DIE datasets and surpasses other methods by a large margin, especially in realistic scenes.

preprint2022arXiv

Knowledge Mining with Scene Text for Fine-Grained Recognition

Recently, the semantics of scene text has been proven to be essential in fine-grained image classification. However, the existing methods mainly exploit the literal meaning of scene text for fine-grained recognition, which might be irrelevant when it is not significantly related to objects/scenes. We propose an end-to-end trainable network that mines implicit contextual knowledge behind scene text image and enhance the semantics and correlation to fine-tune the image representation. Unlike the existing methods, our model integrates three modalities: visual feature extraction, text semantics extraction, and correlating background knowledge to fine-grained image classification. Specifically, we employ KnowBert to retrieve relevant knowledge for semantic representation and combine it with image features for fine-grained classification. Experiments on two benchmark datasets, Con-Text, and Drink Bottle, show that our method outperforms the state-of-the-art by 3.72\% mAP and 5.39\% mAP, respectively. To further validate the effectiveness of the proposed method, we create a new dataset on crowd activity recognition for the evaluation. The source code and new dataset of this work are available at https://github.com/lanfeng4659/KnowledgeMiningWithSceneText.

preprint2022arXiv

Low-complexity Beam Selection algorithms based on SVD for MmWave Massive MIMO Systems

To realize mmWave massive MIMO systems in practice, Beamspace MIMO with beam selection provides an attractive solution at a considerably reduced number of radio frequency (RF) chains. We propose low-complexity beam selection algorithms based on singular value decomposition (SVD). We first diagonalize the channel matrix by SVD, and the appropriate beams are selected one-by-one in a decremental or incremental order based on the criterion of sum-rate maximization. To reduce the complexity of the proposed algorithms significantly, we make use of SVD in the last iteration to aviod SVD from scratch again. Meanwhile, our proposed algorithms naturally obtain the precoding matrix, which can eliminate the multiusers interference. Simulation results demonstrate that our proposed algorithms can outperform the competing algorithms, including the fully digital zero-precoding.

preprint2022arXiv

Network resilience in the aging brain

Degeneration and adaptation are two competing sides of the same coin called resilience in the progressive processes of brain aging or diseases. Degeneration accumulates during brain aging and other cerebral activities, causing structural atrophy and dysfunction. At the same time, adaptation allows brain network reorganize to compensate for structural loss to maintain cognition function. Although hidden resilience mechanism is critical and fundamental to uncover the brain aging law, due to the lack of datasets and appropriate methodology, it remains essentially unknown how these two processes interact dynamically across brain networks. To quantitatively investigate this complex process, we analyze aging brains based on 6-year follow-up multimodal neuroimaging database from 63 persons. We reveal the critical mechanism of network resilience that various perturbation may cause fast brain structural atrophy, and then brain can reorganize its functional layout to lower its operational efficiency, which helps to slow down the structural atrophy and finally recover its functional efficiency equilibrium. This empirical finding could be explained by our theoretical model, suggesting one universal resilience dynamical function. This resilience is achieved in the brain functional network with evolving percolation and rich-club features. Our findings can help to understand the brain aging process and design possible mitigation methods to adjust interaction between degeneration and adaptation from resilience viewpoint.

preprint2022arXiv

Neural Collaborative Graph Machines for Table Structure Recognition

Recently, table structure recognition has achieved impressive progress with the help of deep graph models. Most of them exploit single visual cues of tabular elements or simply combine visual cues with other modalities via early fusion to reason their graph relationships. However, neither early fusion nor individually reasoning in terms of multiple modalities can be appropriate for all varieties of table structures with great diversity. Instead, different modalities are expected to collaborate with each other in different patterns for different table cases. In the community, the importance of intra-inter modality interactions for table structure reasoning is still unexplored. In this paper, we define it as heterogeneous table structure recognition (Hetero-TSR) problem. With the aim of filling this gap, we present a novel Neural Collaborative Graph Machines (NCGM) equipped with stacked collaborative blocks, which alternatively extracts intra-modality context and models inter-modality interactions in a hierarchical way. It can represent the intra-inter modality relationships of tabular elements more robustly, which significantly improves the recognition performance. We also show that the proposed NCGM can modulate collaborative pattern of different modalities conditioned on the context of intra-modality cues, which is vital for diversified table cases. Experimental results on benchmarks demonstrate our proposed NCGM achieves state-of-the-art performance and beats other contemporary methods by a large margin especially under challenging scenarios.

preprint2022arXiv

NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition

Recently, Vision Transformers (ViT), with the self-attention (SA) as the de facto ingredients, have demonstrated great potential in the computer vision community. For the sake of trade-off between efficiency and performance, a group of works merely perform SA operation within local patches, whereas the global contextual information is abandoned, which would be indispensable for visual recognition tasks. To solve the issue, the subsequent global-local ViTs take a stab at marrying local SA with global one in parallel or alternative way in the model. Nevertheless, the exhaustively combined local and global context may exist redundancy for various visual data, and the receptive field within each layer is fixed. Alternatively, a more graceful way is that global and local context can adaptively contribute per se to accommodate different visual data. To achieve this goal, we in this paper propose a novel ViT architecture, termed NomMer, which can dynamically Nominate the synergistic global-local context in vision transforMer. By investigating the working pattern of our proposed NomMer, we further explore what context information is focused. Beneficial from this &#34;dynamic nomination&#34; mechanism, without bells and whistles, the NomMer can not only achieve 84.5% Top-1 classification accuracy on ImageNet with only 73M parameters, but also show promising performance on dense prediction tasks, i.e., object detection and semantic segmentation. The code and models will be made publicly available at https://github.com/TencentYoutuResearch/VisualRecognition-NomMer

preprint2022arXiv

Predicting Solar Energetic Particles Using SDO/HMI Vector Magnetic Data Products and a Bidirectional LSTM Network

Solar energetic particles (SEPs) are an essential source of space radiation, which are hazards for humans in space, spacecraft, and technology in general. In this paper we propose a deep learning method, specifically a bidirectional long short-term memory (biLSTM) network, to predict if an active region (AR) would produce an SEP event given that (i) the AR will produce an M- or X-class flare and a coronal mass ejection (CME) associated with the flare, or (ii) the AR will produce an M- or X-class flare regardless of whether or not the flare is associated with a CME. The data samples used in this study are collected from the Geostationary Operational Environmental Satellite&#39;s X-ray flare catalogs provided by the National Centers for Environmental Information. We select M- and X-class flares with identified ARs in the catalogs for the period between 2010 and 2021, and find the associations of flares, CMEs and SEPs in the Space Weather Database of Notifications, Knowledge, Information during the same period. Each data sample contains physical parameters collected from the Helioseismic and Magnetic Imager on board the Solar Dynamics Observatory. Experimental results based on different performance metrics demonstrate that the proposed biLSTM network is better than related machine learning algorithms for the two SEP prediction tasks studied here. We also discuss extensions of our approach for probabilistic forecasting and calibration with empirical evaluation.

preprint2022arXiv

RepBNN: towards a precise Binary Neural Network with Enhanced Feature Map via Repeating

Binary neural network (BNN) is an extreme quantization version of convolutional neural networks (CNNs) with all features and weights mapped to just 1-bit. Although BNN saves a lot of memory and computation demand to make CNN applicable on edge or mobile devices, BNN suffers the drop of network performance due to the reduced representation capability after binarization. In this paper, we propose a new replaceable and easy-to-use convolution module RepConv, which enhances feature maps through replicating input or output along channel dimension by $β$ times without extra cost on the number of parameters and convolutional computation. We also define a set of RepTran rules to use RepConv throughout BNN modules like binary convolution, fully connected layer and batch normalization. Experiments demonstrate that after the RepTran transformation, a set of highly cited BNNs have achieved universally better performance than the original BNN versions. For example, the Top-1 accuracy of Rep-ReCU-ResNet-20, i.e., a RepBconv enhanced ReCU-ResNet-20, reaches 88.97% on CIFAR-10, which is 1.47% higher than that of the original network. And Rep-AdamBNN-ReActNet-A achieves 71.342% Top-1 accuracy on ImageNet, a fresh state-of-the-art result of BNNs. Code and models are available at:https://github.com/imfinethanks/Rep_AdamBNN.

preprint2022arXiv

Self-Supervised 3D Monocular Object Detection by Recycling Bounding Boxes

Modern object detection architectures are moving towards employing self-supervised learning (SSL) to improve performance detection with related pretext tasks. Pretext tasks for monocular 3D object detection have not yet been explored yet in literature. The paper studies the application of established self-supervised bounding box recycling by labeling random windows as the pretext task. The classifier head of the 3D detector is trained to classify random windows containing different proportions of the ground truth objects, thus handling the foreground-background imbalance. We evaluate the pretext task using the RTM3D detection model as baseline, with and without the application of data augmentation. We demonstrate improvements of between 2-3 % in mAP 3D and 0.9-1.5 % BEV scores using SSL over the baseline scores. We propose the inverse class frequency re-weighted (ICFW) mAP score that highlights improvements in detection for low frequency classes in a class imbalanced dataset with long tails. We demonstrate improvements in ICFW both mAP 3D and BEV scores to take into account the class imbalance in the KITTI validation dataset. We see 4-5 % increase in ICFW metric with the pretext task.

preprint2022arXiv

Stimulated generation of deterministic platicon frequency microcombs

Dissipative Kerr soliton generation in chip-scale nonlinear resonators has recently observed remarkable advances, spanning from massively-parallel communications, self-referenced oscillators, to dual-comb spectroscopy. Often working in the anomalous dispersion regime, unique driving protocols and dispersion in these nonlinear resonators have been examined to achieve the soliton and soliton-like temporal pulse shapes and coherent frequency comb generation. The normal dispersion regime provides a complementary approach to bridge the nonlinear dynamical studies, including the possibility of square pulse formation with flat-top plateaus, or platicons. Here we report observations of square pulse formation in chip-scale frequency combs, through stimulated pumping at one free-spectral-range and in silicon nitride rings with +55 fs2/mm normal group velocity dispersion. Tuning of the platicon frequency comb via a varied sideband modulation frequency is examined in both spectral and temporal measurements. Determined by second-harmonic auto-correlation and cross-correlation, we observe bright square platicon pulse of 17 ps pulsewidth on a 19 GHz flat frequency comb. With auxiliary-laser-assisted thermal stabilization, we surpass the thermal bistable dragging and extend the mode-locking access to narrower 2 ps platicon pulse states, supported by nonlinear dynamical modeling and boundary limit discussions.

preprint2022arXiv

TaCo: Textual Attribute Recognition via Contrastive Learning

As textual attributes like font are core design elements of document format and page style, automatic attributes recognition favor comprehensive practical applications. Existing approaches already yield satisfactory performance in differentiating disparate attributes, but they still suffer in distinguishing similar attributes with only subtle difference. Moreover, their performance drop severely in real-world scenarios where unexpected and obvious imaging distortions appear. In this paper, we aim to tackle these problems by proposing TaCo, a contrastive framework for textual attribute recognition tailored toward the most common document scenes. Specifically, TaCo leverages contrastive learning to dispel the ambiguity trap arising from vague and open-ended attributes. To realize this goal, we design the learning paradigm from three perspectives: 1) generating attribute views, 2) extracting subtle but crucial details, and 3) exploiting valued view pairs for learning, to fully unlock the pre-training potential. Extensive experiments show that TaCo surpasses the supervised counterparts and advances the state-of-the-art remarkably on multiple attribute recognition tasks. Online services of TaCo will be made available.

preprint2022arXiv

The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training

The self-supervised Masked Image Modeling (MIM) schema, following &#34;mask-and-reconstruct&#34; pipeline of recovering contents from masked image, has recently captured the increasing interest in the multimedia community, owing to the excellent ability of learning visual representation from unlabeled data. Aiming at learning representations with high semantics abstracted, a group of works attempts to reconstruct non-semantic pixels with large-ratio masking strategy, which may suffer from &#34;over-smoothing&#34; problem, while others directly infuse semantics into targets in off-line way requiring extra data. Different from them, we shift the perspective to the Fourier domain which naturally has global perspective and present a new Masked Image Modeling (MIM), termed Geminated Gestalt Autoencoder (Ge$^2$-AE) for visual pre-training. Specifically, we equip our model with geminated decoders in charge of reconstructing image contents from both pixel and frequency space, where each other serves as not only the complementation but also the reciprocal constraints. Through this way, more robust representations can be learned in the pre-trained encoders, of which the effectiveness is confirmed by the juxtaposing experimental results on downstream recognition tasks. We also conduct several quantitative and qualitative experiments to investigate the learning behavior of our method. To our best knowledge, this is the first MIM work to solve the visual pre-training through the lens of frequency domain.

preprint2022arXiv

The polarization quaternion and its applications: a joint representation of the Q/U Stokes parameters and E/B mode polarizations

In this work, we point out that the Q/U Stokes parameters and E/B mode polarizations are the four components of a unique quaternion, which describes at the same time the directions and the parity states of spherical linear polarizations. We then point out that, with this polarization quaternion, the mathematical form of all Q/U and E/B transforms are greatly simplified, to an extent that requires only one quaternion multiplication for each transform. A preliminary application of the polarization quaternion is shown as an example to detect peculiar pixel domain patterns within the E- and B-families, which are the former and latter halves of the polarization quaternion.

preprint2022arXiv

Ultrafast Optical Spectroscopy Evidence of Pseudogap and Electron-Phonon Coupling in an Iron-Based Superconductor KCa$_2$Fe$_4$As$_4$F$_2$

We use ultrafast optical spectroscopy to study the nonequilibrium quasiparticle relaxation dynamics of the iron-based superconductor KCa$_2$Fe$_4$As$_4$F$_2$ with $T_c=33.5$ K. Our results reveal a possible pseudogap ($Δ_{PG}$ = 2.4 $\pm$ 0.1 meV) below $T^*\approx 50$ K but prior to the opening of a superconducting gap ($Δ_{SC}$(0) $\approx$ 4.3 $\pm$ 0.1 meV). Measurements under high pump fluence real two distinct, coherent phonon oscillations with 1.95 and 5.51 THz frequencies, respectively. The high-frequency $A_{1g}$(2) mode corresponds to the $c-$axis polarized vibrations of FeAs planes with a nominal electron-phonon coupling constant $λ_{A_{1g}(2)}$ = 0.194 $\pm$ 0.02. Our findings suggest that the pseudogap is likely a precursor of superconductivity, and the electron-phonon coupling may play an essential role in the superconducting pairing in KCa$_2$Fe$_4$As$_4$F$_2$.

preprint2022arXiv

Unbiased Implicit Feedback via Bi-level Optimization

Implicit feedback is widely leveraged in recommender systems since it is easy to collect and provides weak supervision signals. Recent works reveal a huge gap between the implicit feedback and user-item relevance due to the fact that implicit feedback is also closely related to the item exposure. To bridge this gap, existing approaches explicitly model the exposure and propose unbiased estimators to improve the relevance. Unfortunately, these unbiased estimators suffer from the high gradient variance, especially for long-tail items, leading to inaccurate gradient updates and degraded model performance. To tackle this challenge, we propose a low-variance unbiased estimator from a probabilistic perspective, which effectively bounds the variance of the gradient. Unlike previous works which either estimate the exposure via heuristic-based strategies or use a large biased training set, we propose to estimate the exposure via an unbiased small-scale validation set. Specifically, we first parameterize the user-item exposure by incorporating both user and item information, and then construct an unbiased validation set from the biased training set. By leveraging the unbiased validation set, we adopt bi-level optimization to automatically update exposure-related parameters along with recommendation model parameters during the learning. Experiments on two real-world datasets and two semi-synthetic datasets verify the effectiveness of our method.

preprint2022arXiv

UNIMO-2: End-to-End Unified Vision-Language Grounded Learning

Vision-Language Pre-training (VLP) has achieved impressive performance on various cross-modal downstream tasks. However, most existing methods can only learn from aligned image-caption data and rely heavily on expensive regional features, which greatly limits their scalability and performance. In this paper, we propose an end-to-end unified-modal pre-training framework, namely UNIMO-2, for joint learning on both aligned image-caption data and unaligned image-only and text-only corpus. We build a unified Transformer model to jointly learn visual representations, textual representations and semantic alignment between images and texts. In particular, we propose to conduct grounded learning on both images and texts via a sharing grounded space, which helps bridge unaligned images and texts, and align the visual and textual semantic spaces on different types of corpora. The experiments show that our grounded learning method can improve textual and visual semantic alignment for improving performance on various cross-modal tasks. Moreover, benefiting from effective joint modeling of different types of corpora, our model also achieves impressive performance on single-modal visual and textual tasks. Our code and models are public at the UNIMO project page https://unimo-ptm.github.io/.

preprint2022arXiv

UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning

Existed pre-training methods either focus on single-modal tasks or multi-modal tasks, and cannot effectively adapt to each other. They can only utilize single-modal data (i.e. text or image) or limited multi-modal data (i.e. image-text pairs). In this work, we propose a unified-modal pre-training architecture, namely UNIMO, which can effectively adapt to both single-modal and multi-modal understanding and generation tasks. Large scale of free text corpus and image collections can be utilized to improve the capability of visual and textual understanding, and cross-modal contrastive learning (CMCL) is leveraged to align the textual and visual information into a unified semantic space over a corpus of image-text pairs. As the non-paired single-modal data is very rich, our model can utilize much larger scale of data to learn more generalizable representations. Moreover, the textual knowledge and visual knowledge can enhance each other in the unified semantic space. The experimental results show that UNIMO significantly improves the performance of several single-modal and multi-modal downstream tasks. Our code and pre-trained models are public at the UNIMO project page https://unimo-ptm.github.io/

preprint2021arXiv

A Color Elastica Model for Vector-Valued Image Regularization

Models related to the Euler&#39;s elastica energy have proven to be useful for many applications including image processing. Extending elastica models to color images and multi-channel data is a challenging task, as stable and consistent numerical solvers for these geometric models often involve high order derivatives. Like the single channel Euler&#39;s elastica model and the total variation (TV) models, geometric measures that involve high order derivatives could help when considering image formation models that minimize elastic properties. In the past, the Polyakov action from high energy physics has been successfully applied to color image processing. Here, we introduce an addition to the Polyakov action for color images that minimizes the color manifold curvature. The color image curvature is computed by applying of the Laplace-Beltrami operator to the color image channels. When reduced to gray-scale images, while selecting appropriate scaling between space and color, the proposed model minimizes the Euler&#39;s elastica operating on the image level sets. Finding a minimizer for the proposed nonlinear geometric model is a challenge we address in this paper. Specifically, we present an operator-splitting method to minimize the proposed functional. The non-linearity is decoupled by introducing three vector-valued and matrix-valued variables. The problem is then converted into solving for the steady state of an associated initial-value problem. The initial-value problem is time-split into three fractional steps, such that each sub-problem has a closed form solution, or can be solved by fast algorithms. The efficiency and robustness of the proposed method are demonstrated by systematic numerical experiments.

preprint2021arXiv

A Robust Traffic Control Model Considering Uncertainties in Turning Ratios

The effects of model parameter uncertainty on traffic flow control problems have recently drawn research attention. While the uncertainty in fundamental diagram related parameters has been investigated in the past, few articles have focused on network parameters uncertainty, including turning ratio uncertainty. To fill this gap, this article proposes a robust control model to deal with the uncertainties in the turning ratio by using distributionally robust chance constraints. The model allows one to compute the optimal control action that maximizes some objective, under all possible distributions of network parameters. We then apply this robust control framework to both a freeway network and an urban network, and evaluate the impact of uncertainty on optimal control inputs, over the test networks. The case studies show that compared to non-robust control, the proposed robust model can reduce congestion brought by the uncertainties and improve the overall throughput.

preprint2021arXiv

CoordiQ : Coordinated Q-learning for Electric Vehicle Charging Recommendation

Electric vehicles have been rapidly increasing in usage, but stations to charge them have not always kept up with demand, so efficient routing of vehicles to stations is critical to operating at maximum efficiency. Deciding which stations to recommend drivers to is a complex problem with a multitude of possible recommendations, volatile usage patterns and temporally extended consequences of recommendations. Reinforcement learning offers a powerful paradigm for solving sequential decision-making problems, but traditional methods may struggle with sample efficiency due to the high number of possible actions. By developing a model that allows complex representations of actions, we improve outcomes for users of our system by over 30% when compared to existing baselines in a simulation. If implemented widely, these better recommendations can globally save over 4 million person-hours of waiting and driving each year.

preprint2021arXiv

Disentangling Observed Causal Effects from Latent Confounders using Method of Moments

Discovering the complete set of causal relations among a group of variables is a challenging unsupervised learning problem. Often, this challenge is compounded by the fact that there are latent or hidden confounders. When only observational data is available, the problem is ill-posed, i.e. the causal relationships are non-identifiable unless strong modeling assumptions are made. When interventions are available, we provide guarantees on identifiability and learnability under mild assumptions. We assume a linear structural equation model (SEM) with independent latent factors and directed acyclic graph (DAG) relationships among the observables. Since the latent variable inference is based on independent component analysis (ICA), we call this model SEM-ICA. We use the method of moments principle to establish model identifiability. We develop efficient algorithms based on coupled tensor decomposition with linear constraints to obtain scalable and guaranteed solutions. Thus, we provide a principled approach to tackling the joint problem of causal discovery and latent variable inference.

preprint2021arXiv

Efficient Multi-Beam Training For Terahertz Wireless communications

Although Terahertz communication systems can provide high data rates, it needs high directional beamforming at transmitters and receivers to achieve such rates over a long distance. Therefore, an efficient beam training method is vital to accelerate the link establishment. In this study, we propose a low-complexity beam training scheme of terahertz communication system which uses a low-cost small-scale hybrid architecture to assist a large-scale array for data transmission. The proposed scheme includes two key stages: (1) coarse AoAs/AoDs estimation for beam subset optimization in auxiliary array stage, and (2) accurate AoAs/AoDs estimation by exploiting channel sparsity in data transmission array stage. The analysis shows that the complexity of the scheme is linear with the number of main paths, and thus greatly reduces the complexity of beam training. Simulation results have verified the better performance in spectral efficiency of the proposed scheme than that of the related work.

preprint2021arXiv

Intelligent Electric Vehicle Charging Recommendation Based on Multi-Agent Reinforcement Learning

Electric Vehicle (EV) has become a preferable choice in the modern transportation system due to its environmental and energy sustainability. However, in many large cities, EV drivers often fail to find the proper spots for charging, because of the limited charging infrastructures and the spatiotemporally unbalanced charging demands. Indeed, the recent emergence of deep reinforcement learning provides great potential to improve the charging experience from various aspects over a long-term horizon. In this paper, we propose a framework, named Multi-Agent Spatio-Temporal Reinforcement Learning (Master), for intelligently recommending public accessible charging stations by jointly considering various long-term spatiotemporal factors. Specifically, by regarding each charging station as an individual agent, we formulate this problem as a multi-objective multi-agent reinforcement learning task. We first develop a multi-agent actor-critic framework with the centralized attentive critic to coordinate the recommendation between geo-distributed agents. Moreover, to quantify the influence of future potential charging competition, we introduce a delayed access strategy to exploit the knowledge of future charging competition during training. After that, to effectively optimize multiple learning objectives, we extend the centralized attentive critic to multi-critics and develop a dynamic gradient re-weighting strategy to adaptively guide the optimization direction. Finally, extensive experiments on two real-world datasets demonstrate that Master achieves the best comprehensive performance compared with nine baseline approaches.

preprint2021arXiv

Joint Air Quality and Weather Prediction Based on Multi-Adversarial Spatiotemporal Networks

Accurate and timely air quality and weather predictions are of great importance to urban governance and human livelihood. Though many efforts have been made for air quality or weather prediction, most of them simply employ one another as feature input, which ignores the inner-connection between two predictive tasks. On the one hand, the accurate prediction of one task can help improve another task&#39;s performance. On the other hand, geospatially distributed air quality and weather monitoring stations provide additional hints for city-wide spatiotemporal dependency modeling. Inspired by the above two insights, in this paper, we propose the Multi-adversarial spatiotemporal recurrent Graph Neural Networks (MasterGNN) for joint air quality and weather predictions. Specifically, we first propose a heterogeneous recurrent graph neural network to model the spatiotemporal autocorrelation among air quality and weather monitoring stations. Then, we develop a multi-adversarial graph learning framework to against observation noise propagation introduced by spatiotemporal modeling. Moreover, we present an adaptive training strategy by formulating multi-adversarial learning as a multi-task learning problem. Finally, extensive experiments on two real-world datasets show that MasterGNN achieves the best performance compared with seven baselines on both air quality and weather prediction tasks.

preprint2021arXiv

Learning the tuned liquid damper dynamics by means of a robust EKF

The tuned liquid dampers (TLD) technology is a feasible and cost-effective seismic design. In order to improve its efficiency it is fundamental to find accurate models describing their dynamic. A TLD system can be modeled through the Housner model and its parameters can be estimated by solving a nonlinear state estimation problem. We propose a robust extended Kalman filter which alleviates the model discretization and the fact that the noise process is not known. We test the effectiveness of the proposed approach by using some experimental data corresponding to two classical seismic waves, namely the El Centro wave and the Hachinohe wave.

preprint2021arXiv

Out-of-distribution Prediction with Invariant Risk Minimization: The Limitation and An Effective Fix

This work considers the out-of-distribution (OOD) prediction problem where (1)~the training data are from multiple domains and (2)~the test domain is unseen in the training. DNNs fail in OOD prediction because they are prone to pick up spurious correlations. Recently, Invariant Risk Minimization (IRM) is proposed to address this issue. Its effectiveness has been demonstrated in the colored MNIST experiment. Nevertheless, we find that the performance of IRM can be dramatically degraded under \emph{strong $Λ$ spuriousness} -- when the spurious correlation between the spurious features and the class label is strong due to the strong causal influence of their common cause, the domain label, on both of them (see Fig. 1). In this work, we try to answer the questions: why does IRM fail in the aforementioned setting? Why does IRM work for the original colored MNIST dataset? How can we fix this problem of IRM? Then, we propose a simple and effective approach to fix the problem of IRM. We combine IRM with conditional distribution matching to avoid a specific type of spurious correlation under strong $Λ$ spuriousness. Empirically, we design a series of semi synthetic datasets -- the colored MNIST plus, which exposes the problems of IRM and demonstrates the efficacy of the proposed method.

preprint2021arXiv

Out-of-Town Recommendation with Travel Intention Modeling

Out-of-town recommendation is designed for those users who leave their home-town areas and visit the areas they have never been to before. It is challenging to recommend Point-of-Interests (POIs) for out-of-town users since the out-of-town check-in behavior is determined by not only the user&#39;s home-town preference but also the user&#39;s travel intention. Besides, the user&#39;s travel intentions are complex and dynamic, which leads to big difficulties in understanding such intentions precisely. In this paper, we propose a TRAvel-INtention-aware Out-of-town Recommendation framework, named TRAINOR. The proposed TRAINOR framework distinguishes itself from existing out-of-town recommenders in three aspects. First, graph neural networks are explored to represent users&#39; home-town check-in preference and geographical constraints in out-of-town check-in behaviors. Second, a user-specific travel intention is formulated as an aggregation combining home-town preference and generic travel intention together, where the generic travel intention is regarded as a mixture of inherent intentions that can be learned by Neural Topic Model (NTM). Third, a non-linear mapping function, as well as a matrix factorization method, are employed to transfer users&#39; home-town preference and estimate out-of-town POI&#39;s representation, respectively. Extensive experiments on real-world data sets validate the effectiveness of the TRAINOR framework. Moreover, the learned travel intention can deliver meaningful explanations for understanding a user&#39;s travel purposes.

preprint2021arXiv

PML: Progressive Margin Loss for Long-tailed Age Classification

In this paper, we propose a progressive margin loss (PML) approach for unconstrained facial age classification. Conventional methods make strong assumption on that each class owns adequate instances to outline its data distribution, likely leading to bias prediction where the training samples are sparse across age classes. Instead, our PML aims to adaptively refine the age label pattern by enforcing a couple of margins, which fully takes in the in-between discrepancy of the intra-class variance, inter-class variance and class center. Our PML typically incorporates with the ordinal margin and the variational margin, simultaneously plugging in the globally-tuned deep neural network paradigm. More specifically, the ordinal margin learns to exploit the correlated relationship of the real-world age labels. Accordingly, the variational margin is leveraged to minimize the influence of head classes that misleads the prediction of tailed samples. Moreover, our optimization carefully seeks a series of indicator curricula to achieve robust and efficient model training. Extensive experimental results on three face aging datasets demonstrate that our PML achieves compelling performance compared to state of the arts. Code will be made publicly.

preprint2021arXiv

Spatial Object Recommendation with Hints: When Spatial Granularity Matters

Existing spatial object recommendation algorithms generally treat objects identically when ranking them. However, spatial objects often cover different levels of spatial granularity and thereby are heterogeneous. For example, one user may prefer to be recommended a region (say Manhattan), while another user might prefer a venue (say a restaurant). Even for the same user, preferences can change at different stages of data exploration. In this paper, we study how to support top-k spatial object recommendations at varying levels of spatial granularity, enabling spatial objects at varying granularity, such as a city, suburb, or building, as a Point of Interest (POI). To solve this problem, we propose the use of a POI tree, which captures spatial containment relationships between POIs. We design a novel multi-task learning model called MPR (short for Multi-level POI Recommendation), where each task aims to return the top-k POIs at a certain spatial granularity level. Each task consists of two subtasks: (i) attribute-based representation learning; (ii) interaction-based representation learning. The first subtask learns the feature representations for both users and POIs, capturing attributes directly from their profiles. The second subtask incorporates user-POI interactions into the model. Additionally, MPR can provide insights into why certain recommendations are being made to a user based on three types of hints: user-aspect, POI-aspect, and interaction-aspect. We empirically validate our approach using two real-life datasets, and show promising performance improvements over several state-of-the-art methods.

preprint2021arXiv

Temperature evolution of quasiparticle dispersion dynamics in semimetallic 1T-TiTe2 via high-resolution angle-resolved photoemission spectroscopy and ultrafast optical pump-probe spectroscopy

High-resolution angle-resolved photoemission spectroscopy and ultrafast optical pump-probe spectroscopy were used to study semimetallic 1T - TiTe2 quasiparticle dispersion and dynamics. A kink and a flat band, having the same energy scale and temperature-dependent behaviors along the G-M direction, were detected. Both manifested at low temperatures but blurred as temperature increased. The kink was formed by an electron-phonon coupling. And the localized flat band might be closely related to an electron-phonon coupling. Ultrafast optical spectroscopy identified multiple distinct time scales in the 10-300 K range. Quantitative analysis of the fastest decay process evidenced a significant lifetime temperature dependence at high temperatures, while this starts to change slowly below ~ 100 K where an anomalous Hall coefficient occurred. At low temperature, a coherent A1g phonon mode with a frequency of ~ 4.36 THz was extracted. Frequency temperature dependence suggests that phonon hardening occurs as temperature falls and anharmonic effects can explain it. Frequency fluence dependence indicates that the phonons soften as fluence increases.

preprint2021arXiv

The design of the Ali CMB Polarization Telescope receiver

Ali CMB Polarization Telescope (AliCPT-1) is the first CMB degree-scale polarimeter to be deployed on the Tibetan plateau at 5,250m above sea level. AliCPT-1 is a 90/150 GHz 72 cm aperture, two-lens refracting telescope cooled down to 4 K. Alumina lenses, 800mm in diameter, image the CMB in a 33.4° field of view on a 636mm wide focal plane. The modularized focal plane consists of dichroic polarization-sensitive Transition-Edge Sensors (TESes). Each module includes 1,704 optically active TESes fabricated on a 150mm diameter silicon wafer. Each TES array is read out with a microwave multiplexing readout system capable of a multiplexing factor up to 2,048. Such a large multiplexing factor has allowed the practical deployment of tens of thousands of detectors, enabling the design of a receiver that can operate up to 19 TES arrays for a total of 32,376 TESes. AliCPT-1 leverages the technological advancements in the detector design from multiple generations of previously successful feedhorn-coupled polarimeters, and in the instrument design from BICEP-3, but applied on a larger scale. The cryostat receiver is currently under integration and testing. During the first deployment year, the focal plane will be populated with up to 4 TES arrays. Further TES arrays will be deployed in the following years, fully populating the focal plane with 19 arrays on the fourth deployment year. Here we present the AliCPT-1 receiver design, and how the design has been optimized to meet the experimental requirements.

preprint2020arXiv

AutoFS: Automated Feature Selection via Diversity-aware Interactive Reinforcement Learning

In this paper, we study the problem of balancing effectiveness and efficiency in automated feature selection. Feature selection is a fundamental intelligence for machine learning and predictive analysis. After exploring many feature selection methods, we observe a computational dilemma: 1) traditional feature selection methods (e.g., mRMR) are mostly efficient, but difficult to identify the best subset; 2) the emerging reinforced feature selection methods automatically navigate feature space to explore the best subset, but are usually inefficient. Are automation and efficiency always apart from each other? Can we bridge the gap between effectiveness and efficiency under automation? Motivated by such a computational dilemma, this study is to develop a novel feature space navigation method. To that end, we propose an Interactive Reinforced Feature Selection (IRFS) framework that guides agents by not just self-exploration experience, but also diverse external skilled trainers to accelerate learning for feature exploration. Specifically, we formulate the feature selection problem into an interactive reinforcement learning framework. In this framework, we first model two trainers skilled at different searching strategies: (1) KBest based trainer; (2) Decision Tree based trainer. We then develop two strategies: (1) to identify assertive and hesitant agents to diversify agent training, and (2) to enable the two trainers to take the teaching role in different stages to fuse the experiences of the trainers and diversify teaching process. Such a hybrid teaching strategy can help agents to learn broader knowledge, and, thereafter, be more effective. Finally, we present extensive experiments on real-world datasets to demonstrate the improved performances of our method: more efficient than existing reinforced selection and more effective than classic selection.

preprint2020arXiv

Comparison of the Planck 2018 CMB polarization maps in the BICEP2/Keck region

We examine the statistical properties of polarization maps from Planck 2018 within the patch of sky observed by the BICEP2/Keck experiment using the one point distribution function (1PDF), skewness, and kurtosis statistics. Our analysis is performed for the Q and U Stokes parameters and for the corresponding E- and B-modes of the CMB signals. We extend our analysis by studying the correlations between CMB polarization maps and residual maps (the difference between the full signal and the CMB map) for the frequency range of 100--217 GHz with both the Q/U and E/B approaches. Although all the CMB maps reveal almost Gaussian statistical properties for Q/U and E/B domains, we have detected very significant anomalies for cross-correlations with residuals at 100 GHz at the level of $3.7σ$ for the Commander map and $5.2σ$ for NILC, for both the Q and U parameters. Using the NILC--Commander difference, which does not contain a cosmological signal, we find a sub-dominant non-Gaussian component in Q skewness and kurtosis at the level of $4.3σ$ and $10σ$, respectively. For the B-mode we have found a very high level of cross-correlation (0.63--0.69) between the NILC/Commander maps and the 143 GHz total signal, which cannot be associated with the cosmological component. These strong deviations suggest that remnants of foregrounds, systematic effects, and component separation exist in the 2018 Planck CMB polarization maps in the BICEP2 sky area, which is far away from the Galactic plane. Our analysis also demonstrates the preferability of the Q/U domain over E/B for determination of the statistical properties of the derived CMB signals, due to non-locality of the transition Q/U $\rightarrow$ E/B.

preprint2020arXiv

Curvature Regularized Surface Reconstruction from Point Cloud

We propose a variational functional and fast algorithms to reconstruct implicit surface from point cloud data with a curvature constraint. The minimizing functional balances the distance function from the point cloud and the mean curvature term. Only the point location is used, without any local normal or curvature estimation at each point. With the added curvature constraint, the computation becomes particularly challenging. To enhance the computational efficiency, we solve the problem by a novel operator splitting scheme. It replaces the original high-order PDEs by a decoupled PDE system, which is solved by a semi-implicit method. We also discuss approach using an augmented Lagrangian method. The proposed method shows robustness against noise, and recovers concave features and sharp corners better compared to models without curvature constraint. Numerical experiments in two and three dimensional data sets, noisy and sparse data are presented to validate the model.

preprint2020arXiv

DARWIN: A Highly Flexible Platform for Imaging Research in Radiology

To conduct a radiomics or deep learning research experiment, the radiologists or physicians need to grasp the needed programming skills, which, however, could be frustrating and costly when they have limited coding experience. In this paper, we present DARWIN, a flexible research platform with a graphical user interface for medical imaging research. Our platform is consists of a radiomics module and a deep learning module. The radiomics module can extract more than 1000 dimension features(first-, second-, and higher-order) and provided many draggable supervised and unsupervised machine learning models. Our deep learning module integrates state of the art architectures of classification, detection, and segmentation tasks. It allows users to manually select hyperparameters, or choose an algorithm to automatically search for the best ones. DARWIN also offers the possibility for users to define a custom pipeline for their experiment. These flexibilities enable radiologists to carry out various experiments easily.

preprint2020arXiv

Deep Learning for 3D Point Clouds: A Survey

Point cloud learning has lately attracted increasing attention due to its wide applications in many areas, such as computer vision, autonomous driving, and robotics. As a dominating technique in AI, deep learning has been successfully used to solve various 2D vision problems. However, deep learning on point clouds is still in its infancy due to the unique challenges faced by the processing of point clouds with deep neural networks. Recently, deep learning on point clouds has become even thriving, with numerous methods being proposed to address different problems in this area. To stimulate future research, this paper presents a comprehensive review of recent progress in deep learning methods for point clouds. It covers three major tasks, including 3D shape classification, 3D object detection and tracking, and 3D point cloud segmentation. It also presents comparative results on several publicly available datasets, together with insightful observations and inspiring future research directions.

preprint2020arXiv

Hybrid Discriminative-Generative Training via Contrastive Learning

Contrastive learning and supervised learning have both seen significant progress and success. However, thus far they have largely been treated as two separate objectives, brought together only by having a shared neural network. In this paper we show that through the perspective of hybrid discriminative-generative training of energy-based models we can make a direct connection between contrastive learning and supervised learning. Beyond presenting this unified view, we show our specific choice of approximation of the energy-based loss outperforms the existing practice in terms of classification accuracy of WideResNet on CIFAR-10 and CIFAR-100. It also leads to improved performance on robustness, out-of-distribution detection, and calibration.

preprint2020arXiv

Identifying and Tracking Solar Magnetic Flux Elements with Deep Learning

Deep learning has drawn a lot of interest in recent years due to its effectiveness in processing big and complex observational data gathered from diverse instruments. Here we propose a new deep learning method, called SolarUnet, to identify and track solar magnetic flux elements or features in observed vector magnetograms based on the Southwest Automatic Magnetic Identification Suite (SWAMIS). Our method consists of a data pre-processing component that prepares training data from the SWAMIS tool, a deep learning model implemented as a U-shaped convolutional neural network for fast and accurate image segmentation, and a post-processing component that prepares tracking results. SolarUnet is applied to data from the 1.6 meter Goode Solar Telescope at the Big Bear Solar Observatory. When compared to the widely used SWAMIS tool, SolarUnet is faster while agreeing mostly with SWAMIS on feature size and flux distributions, and complementing SWAMIS in tracking long-lifetime features. Thus, the proposed physics-guided deep learning-based tool can be considered as an alternative method for solar magnetic tracking.

preprint2020arXiv

Inferring Vector Magnetic Fields from Stokes Profiles of GST/NIRIS Using a Convolutional Neural Network

We propose a new machine learning approach to Stokes inversion based on a convolutional neural network (CNN) and the Milne-Eddington (ME) method. The Stokes measurements used in this study were taken by the Near InfraRed Imaging Spectropolarimeter (NIRIS) on the 1.6 m Goode Solar Telescope (GST) at the Big Bear Solar Observatory. By learning the latent patterns in the training data prepared by the physics-based ME tool, the proposed CNN method is able to infer vector magnetic fields from the Stokes profiles of GST/NIRIS. Experimental results show that our CNN method produces smoother and cleaner magnetic maps than the widely used ME method. Furthermore, the CNN method is 4~6 times faster than the ME method, and is able to produce vector magnetic fields in near real-time, which is essential to space weather forecasting. Specifically, it takes ~50 seconds for the CNN method to process an image of 720 x 720 pixels comprising Stokes profiles of GST/NIRIS. Finally, the CNN-inferred results are highly correlated to the ME-calculated results and are closer to the ME&#39;s results with the Pearson product-moment correlation coefficient (PPMCC) being closer to 1 on average than those from other machine learning algorithms such as multiple support vector regression and multilayer perceptrons (MLP). In particular, the CNN method outperforms the current best machine learning method (MLP) by 2.6% on average in PPMCC according to our experimental study. Thus, the proposed physics-assisted deep learning-based CNN tool can be considered as an alternative, efficient method for Stokes inversion for high resolution polarimetric observations obtained by GST/NIRIS.

preprint2020arXiv

On the Numerical Solution of Nonlinear Eigenvalue Problems for the Monge-Ampère Operator

In this article, we report the results we obtained when investigating the numerical solution of some nonlinear eigenvalue problems for the Monge-Ampère operator $v\rightarrow \det \mathbf{D}^2 v$. The methodology we employ relies on the following ingredients: (i) A divergence formulation of the eigenvalue problems under consideration. (ii) The time discretization by operator-splitting of an initial value problem (a kind of gradient flow) associated with each eigenvalue problem. (iii) A finite element approximation relying on spaces of continuous piecewise affine functions. To validate the above methodology, we applied it to the solution of problems with known exact solutions: The results we obtained suggest convergence to the exact solution when the space discretization step $h\rightarrow 0$. We considered also test problems with no known exact solutions.

preprint2020arXiv

Polestar: An Intelligent, Efficient and National-Wide Public Transportation Routing Engine

Public transportation plays a critical role in people&#39;s daily life. It has been proven that public transportation is more environmentally sustainable, efficient, and economical than any other forms of travel. However, due to the increasing expansion of transportation networks and more complex travel situations, people are having difficulties in efficiently finding the most preferred route from one place to another through public transportation systems. To this end, in this paper, we present Polestar, a data-driven engine for intelligent and efficient public transportation routing. Specifically, we first propose a novel Public Transportation Graph (PTG) to model public transportation system in terms of various travel costs, such as time or distance. Then, we introduce a general route search algorithm coupled with an efficient station binding method for efficient route candidate generation. After that, we propose a two-pass route candidate ranking module to capture user preferences under dynamic travel situations. Finally, experiments on two real-world data sets demonstrate the advantages of Polestar in terms of both efficiency and effectiveness. Indeed, in early 2019, Polestar has been deployed on Baidu Maps, one of the world&#39;s largest map services. To date, Polestar is servicing over 330 cities, answers over a hundred millions of queries each day, and achieves substantial improvement of user click ratio.

preprint2020arXiv

Predicting Coronal Mass Ejections Using SDO/HMI Vector Magnetic Data Products and Recurrent Neural Networks

We present two recurrent neural networks (RNNs), one based on gated recurrent units and the other based on long short-term memory, for predicting whether an active region (AR) that produces an M- or X-class flare will also produce a coronal mass ejection (CME). We model data samples in an AR as time series and use the RNNs to capture temporal information of the data samples. Each data sample has 18 physical parameters, or features, derived from photospheric vector magnetic field data taken by the Helioseismic and Magnetic Imager (HMI) on board the Solar Dynamics Observatory (SDO). We survey M- and X-class flares that occurred from 2010 May to 2019 May using the Geostationary Operational Environmental Satellite&#39;s X-ray flare catalogs provided by the National Centers for Environmental Information (NCEI), and select those flares with identified ARs in the NCEI catalogs. In addition, we extract the associations of flares and CMEs from the Space Weather Database Of Notifications, Knowledge, Information (DONKI). We use the information gathered above to build the labels (positive versus negative) of the data samples at hand. Experimental results demonstrate the superiority of our RNNs over closely related machine learning methods in predicting the labels of the data samples. We also discuss an extension of our approach to predict a probabilistic estimate of how likely an M- or X-class flare will initiate a CME, with good performance results. To our knowledge this is the first time that RNNs have been used for CME prediction.

preprint2020arXiv

PuzzleNet: Scene Text Detection by Segment Context Graph Learning

Recently, a series of decomposition-based scene text detection methods has achieved impressive progress by decomposing challenging text regions into pieces and linking them in a bottom-up manner. However, most of them merely focus on linking independent text pieces while the context information is underestimated. In the puzzle game, the solver often put pieces together in a logical way according to the contextual information of each piece, in order to arrive at the correct solution. Inspired by it, we propose a novel decomposition-based method, termed Puzzle Networks (PuzzleNet), to address the challenging scene text detection task in this work. PuzzleNet consists of the Segment Proposal Network (SPN) that predicts the candidate text segments fitting arbitrary shape of text region, and the two-branch Multiple-Similarity Graph Convolutional Network (MSGCN) that models both appearance and geometry correlations between each segment to its contextual ones. By building segments as context graphs, MSGCN effectively employs segment context to predict combinations of segments. Final detections of polygon shape are produced by merging segments according to the predicted combinations. Evaluations on three benchmark datasets, ICDAR15, MSRA-TD500 and SCUT-CTW1500, have demonstrated that our method can achieve better or comparable performance than current state-of-the-arts, which is beneficial from the exploitation of segment context graph.

preprint2020arXiv

SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis

Recently, sentiment analysis has seen remarkable advance with the help of pre-training approaches. However, sentiment knowledge, such as sentiment words and aspect-sentiment pairs, is ignored in the process of pre-training, despite the fact that they are widely used in traditional sentiment analysis approaches. In this paper, we introduce Sentiment Knowledge Enhanced Pre-training (SKEP) in order to learn a unified sentiment representation for multiple sentiment analysis tasks. With the help of automatically-mined knowledge, SKEP conducts sentiment masking and constructs three sentiment knowledge prediction objectives, so as to embed sentiment information at the word, polarity and aspect level into pre-trained sentiment representation. In particular, the prediction of aspect-sentiment pairs is converted into multi-label classification, aiming to capture the dependency between words in a pair. Experiments on three kinds of sentiment tasks show that SKEP significantly outperforms strong pre-training baseline, and achieves new state-of-the-art results on most of the test datasets. We release our code at https://github.com/baidu/Senta.

preprint2020arXiv

Statistics of CMB polarization angles

We study the distribution functions of the CMB polarization angle $ψ$, focusing on the Planck 2018 CMB maps. We extend the model of Preece & Battye (2014) of Gaussian correlated $Q$ and $U$ Stokes parameters to allow nonzero means. When the variances of $Q$ and $U$ are equal and their covariance and means are zero, the polarization angle is uniformly distributed. Otherwise the uniform distribution will be modulated by harmonics with $2ψ$ and $4ψ$ phases. These modulations are visible in the Planck 2018 CMB maps. Furthermore, the mean value of $U$ is peculiar compared to the power spectrum.

preprint2020arXiv

Transmitting Extra Bits by Rotating Signal Constellations

In this letter, we propose a novel LDPC coding scheme to transmit extra bits aided by rotated signal constellations without any additional cost in transmission power or bandwidth. In the proposed scheme, the LDPC coded data are modulated by a rotated two-dimensional signal constellation, in which the rotation angle is specified by the given extra bits. At the receiver, the rotation angle is estimated with the aid of the statistical learning of the syndrome of the LDPC code. After recovering the rotation angle, the coded payload data can be decoded by the LDPC decoder. The simulation results show that, for an LDPC code of length 2304, up to four extra bits can be transmitted with negligible influence on the reliability of the LDPC coded data.

preprint2019arXiv

Angle-resolved photoemission spectroscopy study of crystal electric field in heavy fermion compound CePt2In7

The three-dimensional electronic structure and Ce 4f electrons of the heavy fermion superconductor CePt2In7 is investigated. Angle-resolved photoemission spectroscopy using variable photon energy establishes the existence of quasi-two and three dimensional Fermi surface topologies. Temperature-dependent 4d-4f on-resonance photoemission spectroscopies reveal that heavy quasiparticle bands begin to form at a temperature well above the characteristic (coherence) temperature T*. T* emergence may be closely related to crystal electric field splitting, particularly the low-lying heavy band formed by crystal electric field splitting.