Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
88works
0followers
39topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

88 published item(s)

preprint2026arXiv

Seeing Realism from Simulation: Efficient Video Transfer for Vision-Language-Action Data Augmentation

Vision-language-action (VLA) models typically rely on large-scale real-world videos, whereas simulated data, despite being inexpensive and highly parallelizable to collect, often suffers from a substantial visual domain gap and limited environmental diversity, resulting in weak real-world generalization. We present an efficient video augmentation framework that converts simulated VLA videos into realistic training videos while preserving task semantics and action trajectories. Our pipeline extracts structured conditions from simulation via video semantic segmentation and video captioning, rewrites captions to diversify environments, and uses a conditional video transfer model to synthesize realistic videos. To make augmentation practical at scale, we introduce a diffusion feature-reuse mechanism that reuses video tokens across adjacent timesteps to accelerate generation, and a coreset sampling strategy that identifies a compact, non-redundant subset for augmentation under limited computation. Extensive experiments on Robotwin 2.0, LIBERO, LIBERO-Plus, and a real robotic platform demonstrate consistent improvements. For example, our method improves RDT-1B by 8% on Robotwin 2.0, and boosts $π_0$ by 5.1% on the more challenging LIBERO-Plus benchmark. Code is available at: https://github.com/nanfangxiansheng/Seeing-Realism-from-Simulation.

preprint2022arXiv

A joint explanation of W-mass and muon g-2 in 2HDM

Since both $W$-mass and muon $g-2$ can be affected by the mass splittings among extra Higgs bosons $(H,~A,~H^\pm)$ in a 2HDM, we take a model with $μ$-$τ$ LFV interactions to examine the two anomalies reported respectively by CDF II and FNAL. We obtain the following observations: (i) Combined with theoretical constraints, the CDF $W$-mass measurement disfavors $H$ or $A$ to degenerate in mass with $H^\pm$, but allows $H$ and $A$ to degenerate. The mass splitting between $H^\pm$ and $H/A$ is required to be larger than 10 GeV. The $m_{H^\pm}$ and $m_{A}$ are favored to be smaller than 650 GeV for $m_H<120$ GeV, and allowed to have more large values with increasing of $m_H$. (ii) After imposing other relevant experimental constraints, there are parameter spaces that simultaneously satisfy (at $2σ$ level) the CDF $W$-mass, the FNAL muon $g-2$ and the data of lepton universality in $τ$ decays, but the mass splittings among extra Higgs bosons are strictly constrained.

preprint2022arXiv

A Keypoint-based Global Association Network for Lane Detection

Lane detection is a challenging task that requires predicting complex topology shapes of lane lines and distinguishing different types of lanes simultaneously. Earlier works follow a top-down roadmap to regress predefined anchors into various shapes of lane lines, which lacks enough flexibility to fit complex shapes of lanes due to the fixed anchor shapes. Lately, some works propose to formulate lane detection as a keypoint estimation problem to describe the shapes of lane lines more flexibly and gradually group adjacent keypoints belonging to the same lane line in a point-by-point manner, which is inefficient and time-consuming during postprocessing. In this paper, we propose a Global Association Network (GANet) to formulate the lane detection problem from a new perspective, where each keypoint is directly regressed to the starting point of the lane line instead of point-by-point extension. Concretely, the association of keypoints to their belonged lane line is conducted by predicting their offsets to the corresponding starting points of lanes globally without dependence on each other, which could be done in parallel to greatly improve efficiency. In addition, we further propose a Lane-aware Feature Aggregator (LFA), which adaptively captures the local correlations between adjacent keypoints to supplement local information to the global association. Extensive experiments on two popular lane detection benchmarks show that our method outperforms previous methods with F1 score of 79.63% on CULane and 97.71% on Tusimple dataset with high FPS. The code will be released at https://github.com/Wolfwjs/GANet.

preprint2022arXiv

A Novel Semi-supervised Meta Learning Method for Subject-transfer Brain-computer Interface

Brain-computer interface (BCI) provides a direct communication pathway between human brain and external devices. Before a new subject could use BCI, a calibration procedure is usually required. Because the inter- and intra-subject variances are so large that the models trained by the existing subjects perform poorly on new subjects. Therefore, effective subject-transfer and calibration method is essential. In this paper, we propose a semi-supervised meta learning (SSML) method for subject-transfer learning in BCIs. The proposed SSML learns a meta model with the existing subjects first, then fine-tunes the model in a semi-supervised learning manner, i.e. using few labeled and many unlabeled samples of target subject for calibration. It is significant for BCI applications where the labeled data are scarce or expensive while unlabeled data are readily available. To verify the SSML method, three different BCI paradigms are tested: 1) event-related potential detection; 2) emotion recognition; and 3) sleep staging. The SSML achieved significant improvements of over 15% on the first two paradigms and 4.9% on the third. The experimental results demonstrated the effectiveness and potential of the SSML method in BCI applications.

preprint2022arXiv

A Privacy-Preserving Unsupervised Domain Adaptation Framework for Clinical Text Analysis

Unsupervised domain adaptation (UDA) generally aligns the unlabeled target domain data to the distribution of the source domain to mitigate the distribution shift problem. The standard UDA requires sharing the source data with the target, having potential data privacy leaking risks. To protect the source data&#39;s privacy, we first propose to share the source feature distribution instead of the source data. However, sharing only the source feature distribution may still suffer from the membership inference attack who can infer an individual&#39;s membership by the black-box access to the source model. To resolve this privacy issue, we further study the under-explored problem of privacy-preserving domain adaptation and propose a method with a novel differential privacy training strategy to protect the source data privacy. We model the source feature distribution by Gaussian Mixture Models (GMMs) under the differential privacy setting and send it to the target client for adaptation. The target client resamples differentially private source features from GMMs and adapts on target data with several state-of-art UDA backbones. With our proposed method, the source data provider could avoid leaking source data privacy during domain adaptation as well as reserve the utility. To evaluate our proposed method&#39;s utility and privacy loss, we apply our model on a medical report disease label classification task using two noisy challenging clinical text datasets. The results show that our proposed method can preserve source data&#39;s privacy with a minor performance influence on the text classification task.

preprint2022arXiv

A review of knowledge graph application scenarios in cyber security

Facing the dynamic complex cyber environments, internal and external cyber threat intelligence, and the increasing risk of cyber-attack, knowledge graphs show great application potential in the cyber security area because of their capabilities in knowledge aggregation, representation, management, and reasoning. However, while most research has focused on how to develop a complete knowledge graph, it remains unclear how to apply the knowledge graph to solve industrial real challenges in cyber-attack and defense scenarios. In this review, we provide a brief overview of the basic concepts, schema, and construction approaches for the cyber security knowledge graph. To facilitate future research on cyber security knowledge graphs, we also present a curated collection of datasets and open-source libraries on the knowledge construction and information extraction task. In the major part of this article, we conduct a comparative review of the different works that elaborate on the recent progress in the application scenarios of the cyber security knowledge graph. Furthermore, a novel comprehensive classification framework is created to describe the connected works from nine primary categories and eighteen subcategories. Finally, we have a thorough outlook on several promising research directions based on the discussion of existing research flaws.

preprint2022arXiv

BiSyn-GAT+: Bi-Syntax Aware Graph Attention Network for Aspect-based Sentiment Analysis

Aspect-based sentiment analysis (ABSA) is a fine-grained sentiment analysis task that aims to align aspects and corresponding sentiments for aspect-specific sentiment polarity inference. It is challenging because a sentence may contain multiple aspects or complicated (e.g., conditional, coordinating, or adversative) relations. Recently, exploiting dependency syntax information with graph neural networks has been the most popular trend. Despite its success, methods that heavily rely on the dependency tree pose challenges in accurately modeling the alignment of the aspects and their words indicative of sentiment, since the dependency tree may provide noisy signals of unrelated associations (e.g., the &#34;conj&#34; relation between &#34;great&#34; and &#34;dreadful&#34; in Figure 2). In this paper, to alleviate this problem, we propose a Bi-Syntax aware Graph Attention Network (BiSyn-GAT+). Specifically, BiSyn-GAT+ fully exploits the syntax information (e.g., phrase segmentation and hierarchical structure) of the constituent tree of a sentence to model the sentiment-aware context of every single aspect (called intra-context) and the sentiment relations across aspects (called inter-context) for learning. Experiments on four benchmark datasets demonstrate that BiSyn-GAT+ outperforms the state-of-the-art methods consistently.

preprint2022arXiv

Bridging the Gap between Deep Learning and Frustrated Quantum Spin System for Extreme-scale Simulations on New Generation of Sunway Supercomputer

Efficient numerical methods are promising tools for delivering unique insights into the fascinating properties of physics, such as the highly frustrated quantum many-body systems. However, the computational complexity of obtaining the wave functions for accurately describing the quantum states increases exponentially with respect to particle number. Here we present a novel convolutional neural network (CNN) for simulating the two-dimensional highly frustrated spin-$1/2$ $J_1-J_2$ Heisenberg model, meanwhile the simulation is performed at an extreme scale system with low cost and high scalability. By ingenious employment of transfer learning and CNN&#39;s translational invariance, we successfully investigate the quantum system with the lattice size up to $24\times24$, within 30 million cores of the new generation of sunway supercomputer. The final achievement demonstrates the effectiveness of CNN-based representation of quantum-state and brings the state-of-the-art record up to a brand-new level from both aspects of remarkable accuracy and unprecedented scales.

preprint2022arXiv

Collaboration Equilibrium in Federated Learning

Federated learning (FL) refers to the paradigm of learning models over a collaborative research network involving multiple clients without sacrificing privacy. Recently, there have been rising concerns on the distributional discrepancies across different clients, which could even cause counterproductive consequences when collaborating with others. While it is not necessarily that collaborating with all clients will achieve the best performance, in this paper, we study a rational collaboration called ``collaboration equilibrium&#39;&#39; (CE), where smaller collaboration coalitions are formed. Each client collaborates with certain members who maximally improve the model learning and isolates the others who make little contribution. We propose the concept of benefit graph which describes how each client can benefit from collaborating with other clients and advance a Pareto optimization approach to identify the optimal collaborators. Then we theoretically prove that we can reach a CE from the benefit graph through an iterative graph operation. Our framework provides a new way of setting up collaborations in a research network. Experiments on both synthetic and real world data sets are provided to demonstrate the effectiveness of our method.

preprint2022arXiv

Deep Learning Based Single Sample Per Person Face Recognition: A Survey

Face recognition has long been an active research area in the field of artificial intelligence, particularly since the rise of deep learning in recent years. In some practical situations, each identity has only a single sample available for training. Face recognition under this situation is referred to as single sample face recognition and poses significant challenges to the effective training of deep models. Therefore, in recent years, researchers have attempted to unleash more potential of deep learning and improve the model recognition performance in the single sample situation. While several comprehensive surveys have been conducted on traditional single sample face recognition approaches, emerging deep learning based methods are rarely involved in these reviews. Accordingly, we focus on the deep learning-based methods in this paper, classifying them into virtual sample methods and generic learning methods. In the former category, virtual images or virtual features are generated to benefit the training of the deep model. In the latter one, additional multi-sample generic sets are used. There are three types of generic learning methods: combining traditional methods and deep features, improving the loss function, and improving network structure, all of which are covered in our analysis. Moreover, we review face datasets that have been commonly used for evaluating single sample face recognition models and go on to compare the results of different types of models. Additionally, we discuss problems with existing single sample face recognition methods, including identity information preservation in virtual sample methods, domain adaption in generic learning methods. Furthermore, we regard developing unsupervised methods is a promising future direction, and point out that the semantic gap as an important issue that needs to be further considered.

preprint2022arXiv

Deep Petrov-Galerkin Method for Solving Partial Differential Equations

Deep neural networks are powerful tools for approximating functions, and they are applied to successfully solve various problems in many fields. In this paper, we propose a neural network-based numerical method to solve partial differential equations. In this new framework, the method is designed on weak formulations, and the unknown functions are approximated by deep neural networks and test functions can be chosen by different approaches, for instance, basis functions of finite element methods, neural networks, and so on. Because the spaces of trial function and test function are different, we name this new approach by Deep Petrov-Galerkin Method (DPGM). The resulted linear system is not necessarily to be symmetric and square, so the discretized problem is solved by a least-square method. Take the Poisson problem as an example, mixed DPGMs based on several mixed formulations are proposed and studied as well. In addition, we apply the DPGM to solve two classical time-dependent problems based on the space-time approach, that is, the unknown function is approximated by a neural network, in which temporal variable and spatial variables are treated equally, and the initial conditions are regarded as boundary conditions for the space-time domain. Finally, several numerical examples are presented to show the performance of the DPGMs, and we observe that this new method outperforms traditional numerical methods in several aspects.

preprint2022arXiv

Diagnosis of ultrafast ultraintense laser pulse characteristics by machine-learning-assisted electron spin

Rapid development of ultrafast ultraintense laser technologies continues to create opportunities for studying strong-field physics under extreme conditions. However, accurate determination of the spatial and temporal characteristics of a laser pulse is still a great challenge, especially when laser powers higher than hundreds of terawatts are involved. In this paper, by utilizing the radiative spin-flip effect, we find that the spin depolarization of an electron beam can be employed to diagnose characteristics of ultrafast ultraintense lasers with peak intensities around $10^{20}$-$10^{22}$~W/cm$^2$. With three shots, our machine-learning-assisted model can predict, simultaneously, the pulse duration, peak intensity, and focal radius of a focused Gaussian ultrafast ultraintense laser (in principle, the profile can be arbitrary) with relative errors of $0.1\%$-$10\%$. The underlying physics and an alternative diagnosis method (without the assistance of machine learning) are revealed by the asymptotic approximation of the final spin degree of polarization. Our proposed scheme exhibits robustness and detection accuracy with respect to fluctuations in the electron beam parameters. Accurate measurements of the ultrafast ultraintense laser parameters will lead to much higher precision in, for example, laser nuclear physics investigations and laboratory astrophysics studies. Robust machine learning techniques may also find applications in more general strong-field physics scenarios.

preprint2022arXiv

DyRep: Bootstrapping Training with Dynamic Re-parameterization

Structural re-parameterization (Rep) methods achieve noticeable improvements on simple VGG-style networks. Despite the prevalence, current Rep methods simply re-parameterize all operations into an augmented network, including those that rarely contribute to the model&#39;s performance. As such, the price to pay is an expensive computational overhead to manipulate these unnecessary behaviors. To eliminate the above caveats, we aim to bootstrap the training with minimal cost by devising a dynamic re-parameterization (DyRep) method, which encodes Rep technique into the training process that dynamically evolves the network structures. Concretely, our proposal adaptively finds the operations which contribute most to the loss in the network, and applies Rep to enhance their representational capacity. Besides, to suppress the noisy and redundant operations introduced by Rep, we devise a de-parameterization technique for a more compact re-parameterization. With this regard, DyRep is more efficient than Rep since it smoothly evolves the given network instead of constructing an over-parameterized network. Experimental results demonstrate our effectiveness, e.g., DyRep improves the accuracy of ResNet-18 by $2.04\%$ on ImageNet and reduces $22\%$ runtime over the baseline. Code is available at: https://github.com/hunto/DyRep.

preprint2022arXiv

Explanation of electron and muon $g-2$ anomalies in AMSB

We propose to jointly explain the electron/muon $g-2$ anomalies in the framework of anomaly mediated SUSY breaking (AMSB) scenario. Two Yukawa deflected AMSB models are proposed and discussed in depth: one with lepton-specific interactions and the other one with messenger-matter interactions. Both models are found to be able to jointly explain the anomalies at $2 σ$ level by naturally realizing the preferred parameter space with $μM_1,μM_2<0$ and very heavy left-handed smuon.

preprint2022arXiv

Flavor Structures of Quarks and Leptons from Flipped SU(5) GUT with $A_4$ Modular Flavor Symmetry

We propose to generate the flavor structures of the Standard Model plus neutrinos from flipped SU(5) GUT with $A_4$ modular flavor symmetry. Possible way to assign different moduli values for quarks and leptons in modular GUT scheme is discussed. We propose to reduce the multiple modular symmetries to a single modular symmetry in the low energy effective theory with proper boundary conditions. We classify all possible scenarios in this scheme according to the assignments of the modular $A_4$ representations for matter superfields and give the expressions of the quark and lepton mass matrices predicted by our scheme at the GUT scale. After properly selecting the modular weights for various superfields that can lead to better fitting, we can obtain the best-fit points with the corresponding $χ^2$ values for the sample subscenarios. We find that the flavor structures of the Standard Model plus neutrinos can be fitted perfectly in such a $A_4$ modular flavor GUT scheme with single or two modulus fields. Especially, the $χ^2_{total}$ of our fitting can be as low as $1.558$ for sample ${\bf IX^\prime}$ of scenario ${\bf III}$ even if only a single common modulus field for both quark and lepton sectors is adopted. The most predictive scenario ${\bf III}$, in which all superfields transform as triplets of $A_4$, can be fitted much better with two independent moduli fields $τ_q,τ_l$ for quark sector and lepton sector ($χ^2_{total}\approx 95$) than that with the single modulus case ($χ^2_{total}\approx 282.4$).

preprint2022arXiv

Forecast-based Multi-aspect Framework for Multivariate Time-series Anomaly Detection

Today&#39;s cyber-world is vastly multivariate. Metrics collected at extreme varieties demand multivariate algorithms to properly detect anomalies. However, forecast-based algorithms, as widely proven approaches, often perform sub-optimally or inconsistently across datasets. A key common issue is they strive to be one-size-fits-all but anomalies are distinctive in nature. We propose a method that tailors to such distinction. Presenting FMUAD - a Forecast-based, Multi-aspect, Unsupervised Anomaly Detection framework. FMUAD explicitly and separately captures the signature traits of anomaly types - spatial change, temporal change and correlation change - with independent modules. The modules then jointly learn an optimal feature representation, which is highly flexible and intuitive, unlike most other models in the category. Extensive experiments show our FMUAD framework consistently outperforms other state-of-the-art forecast-based anomaly detectors.

preprint2022arXiv

Geometric Matrix Completion via Sylvester Multi-Graph Neural Network

Despite the success of the Sylvester equation empowered methods on various graph mining applications, such as semi-supervised label learning and network alignment, there also exists several limitations. The Sylvester equation&#39;s inability of modeling non-linear relations and the inflexibility of tuning towards different tasks restrict its performance. In this paper, we propose an end-to-end neural framework, SYMGNN, which consists of a multi-network neural aggregation module and a prior multi-network association incorporation learning module. The proposed framework inherits the key ideas of the Sylvester equation, and meanwhile generalizes it to overcome aforementioned limitations. Empirical evaluations on real-world datasets show that the instantiations of SYMGNN overall outperform the baselines in geometric matrix completion task, and its low-rank instantiation could further reduce the memory consumption by 16.98\% on average.

preprint2022arXiv

GreedyNASv2: Greedier Search with a Greedy Path Filter

Training a good supernet in one-shot NAS methods is difficult since the search space is usually considerably huge (e.g., $13^{21}$). In order to enhance the supernet&#39;s evaluation ability, one greedy strategy is to sample good paths, and let the supernet lean towards the good ones and ease its evaluation burden as a result. However, in practice the search can be still quite inefficient since the identification of good paths is not accurate enough and sampled paths still scatter around the whole search space. In this paper, we leverage an explicit path filter to capture the characteristics of paths and directly filter those weak ones, so that the search can be thus implemented on the shrunk space more greedily and efficiently. Concretely, based on the fact that good paths are much less than the weak ones in the space, we argue that the label of &#34;weak paths&#34; will be more confident and reliable than that of &#34;good paths&#34; in multi-path sampling. In this way, we thus cast the training of path filter in the positive and unlabeled (PU) learning paradigm, and also encourage a \textit{path embedding} as better path/operation representation to enhance the identification capacity of the learned filter. By dint of this embedding, we can further shrink the search space by aggregating similar operations with similar embeddings, and the search can be more efficient and accurate. Extensive experiments validate the effectiveness of the proposed method GreedyNASv2. For example, our obtained GreedyNASv2-L achieves $81.1\%$ Top-1 accuracy on ImageNet dataset, significantly outperforming the ResNet-50 strong baselines.

preprint2022arXiv

HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors

Conventional knowledge distillation (KD) methods for object detection mainly concentrate on homogeneous teacher-student detectors. However, the design of a lightweight detector for deployment is often significantly different from a high-capacity detector. Thus, we investigate KD among heterogeneous teacher-student pairs for a wide application. We observe that the core difficulty for heterogeneous KD (hetero-KD) is the significant semantic gap between the backbone features of heterogeneous detectors due to the different optimization manners. Conventional homogeneous KD (homo-KD) methods suffer from such a gap and are hard to directly obtain satisfactory performance for hetero-KD. In this paper, we propose the HEtero-Assists Distillation (HEAD) framework, leveraging heterogeneous detection heads as assistants to guide the optimization of the student detector to reduce this gap. In HEAD, the assistant is an additional detection head with the architecture homogeneous to the teacher head attached to the student backbone. Thus, a hetero-KD is transformed into a homo-KD, allowing efficient knowledge transfer from the teacher to the student. Moreover, we extend HEAD into a Teacher-Free HEAD (TF-HEAD) framework when a well-trained teacher detector is unavailable. Our method has achieved significant improvement compared to current detection KD methods. For example, on the MS-COCO dataset, TF-HEAD helps R18 RetinaNet achieve 33.9 mAP (+2.2), while HEAD further pushes the limit to 36.2 mAP (+4.5).

preprint2022arXiv

Learning Where to Learn in Cross-View Self-Supervised Learning

Self-supervised learning (SSL) has made enormous progress and largely narrowed the gap with the supervised ones, where the representation learning is mainly guided by a projection into an embedding space. During the projection, current methods simply adopt uniform aggregation of pixels for embedding; however, this risks involving object-irrelevant nuisances and spatial misalignment for different augmentations. In this paper, we present a new approach, Learning Where to Learn (LEWEL), to adaptively aggregate spatial information of features, so that the projected embeddings could be exactly aligned and thus guide the feature learning better. Concretely, we reinterpret the projection head in SSL as a per-pixel projection and predict a set of spatial alignment maps from the original features by this weight-sharing projection head. A spectrum of aligned embeddings is thus obtained by aggregating the features with spatial weighting according to these alignment maps. As a result of this adaptive alignment, we observe substantial improvements on both image-level prediction and dense prediction at the same time: LEWEL improves MoCov2 by 1.6%/1.3%/0.5%/0.4% points, improves BYOL by 1.3%/1.3%/0.7%/0.6% points, on ImageNet linear/semi-supervised classification, Pascal VOC semantic segmentation, and object detection, respectively.

preprint2022arXiv

LightViT: Towards Light-Weight Convolution-Free Vision Transformers

Vision transformers (ViTs) are usually considered to be less light-weight than convolutional neural networks (CNNs) due to the lack of inductive bias. Recent works thus resort to convolutions as a plug-and-play module and embed them in various ViT counterparts. In this paper, we argue that the convolutional kernels perform information aggregation to connect all tokens; however, they would be actually unnecessary for light-weight ViTs if this explicit aggregation could function in a more homogeneous way. Inspired by this, we present LightViT as a new family of light-weight ViTs to achieve better accuracy-efficiency balance upon the pure transformer blocks without convolution. Concretely, we introduce a global yet efficient aggregation scheme into both self-attention and feed-forward network (FFN) of ViTs, where additional learnable tokens are introduced to capture global dependencies; and bi-dimensional channel and spatial attentions are imposed over token embeddings. Experiments show that our model achieves significant improvements on image classification, object detection, and semantic segmentation tasks. For example, our LightViT-T achieves 78.7% accuracy on ImageNet with only 0.7G FLOPs, outperforming PVTv2-B0 by 8.2% while 11% faster on GPU. Code is available at https://github.com/hunto/LightViT.

preprint2022arXiv

Low energy supersymmetry confronted with current experiments: an overview

This study provides a brief overview of low-energy supersymmetry (SUSY) in light of current experimental constraints, such as collider searches, dark matter searches, and muon $g-2$ measurements. In addition, we survey a variety of low energy supersymmetric models: the phenomenological minimal supersymmetric model (MSSM); the supersymmetric models with cut-off-scale boundary conditions, i.e., the minimal supergravity (mSUGRA) or the constrained MSSM (CMSSM), the gauge mediation of SUSY breaking (GMSB), and the anomaly mediation of SUSY breaking (AMSB), as well as their extensions. The conclusion is that the low energy SUSY can survive all current experimental constraints and remains compelling, albeit suffering from a little fine-tuning problem. The fancy models like mSUGRA, GMSB, and AMSB need to be extended if the muon $g-2$ anomaly comes from new physics.

preprint2022arXiv

Mask Wearing Status Estimation with Smartwatches

We present MaskReminder, an automatic mask-wearing status estimation system based on smartwatches, to remind users who may be exposed to the COVID-19 virus transmission scenarios, to wear a mask. MaskReminder with the powerful MLP-Mixer deep learning model can effectively learn long-short range information from the inertial measurement unit readings, and can recognize the mask-related hand movements such as wearing a mask, lowering the metal strap of the mask, removing the strap from behind one side of the ears, etc. Extensive experiments on 20 volunteers and 8000+ data samples show that the average recognition accuracy is 89%. Moreover, MaskReminder is capable to remind a user to wear with a success rate of 90% even in the user-independent setting.

preprint2022arXiv

MogFace: Towards a Deeper Appreciation on Face Detection

Benefiting from the pioneering design of generic object detectors, significant achievements have been made in the field of face detection. Typically, the architectures of the backbone, feature pyramid layer, and detection head module within the face detector all assimilate the excellent experience from general object detectors. However, several effective methods, including label assignment and scale-level data augmentation strategy, fail to maintain consistent superiority when applying on the face detector directly. Concretely, the former strategy involves a vast body of hyper-parameters and the latter one suffers from the challenge of scale distribution bias between different detection tasks, which both limit their generalization abilities. Furthermore, in order to provide accurate face bounding boxes for facial down-stream tasks, the face detector imperatively requires the elimination of false alarms. As a result, practical solutions on label assignment, scale-level data augmentation, and reducing false alarms are necessary for advancing face detectors. In this paper, we focus on resolving three aforementioned challenges that exiting methods are difficult to finish off and present a novel face detector, termed MogFace. In our Mogface, three key components, Adaptive Online Incremental Anchor Mining Strategy, Selective Scale Enhancement Strategy and Hierarchical Context-Aware Module, are separately proposed to boost the performance of face detectors. Finally, to the best of our knowledge, our MogFace is the best face detector on the Wider Face leader-board, achieving all champions across different testing scenarios. The code is available at \url{https://github.com/damo-cv/MogFace}.

preprint2022arXiv

Multimodal Machine Learning in Precision Health

As machine learning and artificial intelligence are more frequently being leveraged to tackle problems in the health sector, there has been increased interest in utilizing them in clinical decision-support. This has historically been the case in single modal data such as electronic health record data. Attempts to improve prediction and resemble the multimodal nature of clinical expert decision-making this has been met in the computational field of machine learning by a fusion of disparate data. This review was conducted to summarize this field and identify topics ripe for future research. We conducted this review in accordance with the PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) extension for Scoping Reviews to characterize multi-modal data fusion in health. We used a combination of content analysis and literature searches to establish search strings and databases of PubMed, Google Scholar, and IEEEXplore from 2011 to 2021. A final set of 125 articles were included in the analysis. The most common health areas utilizing multi-modal methods were neurology and oncology. However, there exist a wide breadth of current applications. The most common form of information fusion was early fusion. Notably, there was an improvement in predictive performance performing heterogeneous data fusion. Lacking from the papers were clear clinical deployment strategies and pursuit of FDA-approved tools. These findings provide a map of the current literature on multimodal data fusion as applied to health diagnosis/prognosis problems. Multi-modal machine learning, while more robust in its estimations over unimodal methods, has drawbacks in its scalability and the time-consuming nature of information concatenation.

preprint2022arXiv

Neural Network Gaussian Processes by Increasing Depth

Recent years have witnessed an increasing interest in the correspondence between infinitely wide networks and Gaussian processes. Despite the effectiveness and elegance of the current neural network Gaussian process theory, to the best of our knowledge, all the neural network Gaussian processes are essentially induced by increasing width. However, in the era of deep learning, what concerns us more regarding a neural network is its depth as well as how depth impacts the behaviors of a network. Inspired by a width-depth symmetry consideration, we use a shortcut network to show that increasing the depth of a neural network can also give rise to a Gaussian process, which is a valuable addition to the existing theory and contributes to revealing the true picture of deep learning. Beyond the proposed Gaussian process by depth, we theoretically characterize its uniform tightness property and the smallest eigenvalue of the Gaussian process kernel. These characterizations can not only enhance our understanding of the proposed depth-induced Gaussian process but also pave the way for future applications. Lastly, we examine the performance of the proposed Gaussian process by regression experiments on two benchmark data sets.

preprint2022arXiv

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29.00dB on DIV2K validation set. IMDN is set as the baseline for efficiency measurement. The challenge had 3 tracks including the main track (runtime), sub-track one (model complexity), and sub-track two (overall performance). In the main track, the practical runtime performance of the submissions was evaluated. The rank of the teams were determined directly by the absolute value of the average runtime on the validation set and test set. In sub-track one, the number of parameters and FLOPs were considered. And the individual rankings of the two metrics were summed up to determine a final ranking in this track. In sub-track two, all of the five metrics mentioned in the description of the challenge including runtime, parameter count, FLOPs, activations, and memory consumption were considered. Similar to sub-track one, the rankings of five metrics were summed up to determine a final ranking. The challenge had 303 registered participants, and 43 teams made valid submissions. They gauge the state-of-the-art in efficient single image super-resolution.

preprint2022arXiv

Pre-training Enhanced Spatial-temporal Graph Neural Network for Multivariate Time Series Forecasting

Multivariate Time Series (MTS) forecasting plays a vital role in a wide range of applications. Recently, Spatial-Temporal Graph Neural Networks (STGNNs) have become increasingly popular MTS forecasting methods. STGNNs jointly model the spatial and temporal patterns of MTS through graph neural networks and sequential models, significantly improving the prediction accuracy. But limited by model complexity, most STGNNs only consider short-term historical MTS data, such as data over the past one hour. However, the patterns of time series and the dependencies between them (i.e., the temporal and spatial patterns) need to be analyzed based on long-term historical MTS data. To address this issue, we propose a novel framework, in which STGNN is Enhanced by a scalable time series Pre-training model (STEP). Specifically, we design a pre-training model to efficiently learn temporal patterns from very long-term history time series (e.g., the past two weeks) and generate segment-level representations. These representations provide contextual information for short-term time series input to STGNNs and facilitate modeling dependencies between time series. Experiments on three public real-world datasets demonstrate that our framework is capable of significantly enhancing downstream STGNNs, and our pre-training model aptly captures temporal patterns.

preprint2022arXiv

Q-balls Formation and the Production of Gravitational Waves With Non-minimal Gravitational Coupling

We propose to introduce non-minimal couplings of Affleck-Dine (AD) field to gravity by adding the coupling of AD field to the Ricci scalar curvature. As the Jordan frame supergravity always predict $|Φ|^2 {\cal R}/6$ type coupling for scalars with canonical kinetic terms, we propose a way to realize the required $c_0|Φ|^2 {\cal R}$-type couplings with generic $c_0$ for canonical complex scalar fields after SUSY breaking. The impacts of such non-minimal gravitational couplings for AD field is shown, especially on the Q-balls formation and the associated gravitational wave (GW) productions. New form of scalar potential for AD field in the Einstein frame is obtained. By numerical simulations, we find that, with non-minimal gravitational coupling to AD field, Q-balls can successfully form even with the choice of non-negative $K$ parameter for $ξ>0$. The associated GW productions as well as their dependences on the $ξ$ parameter are also discussed.

preprint2022arXiv

Relational Surrogate Loss Learning

Evaluation metrics in machine learning are often hardly taken as loss functions, as they could be non-differentiable and non-decomposable, e.g., average precision and F1 score. This paper aims to address this problem by revisiting the surrogate loss learning, where a deep neural network is employed to approximate the evaluation metrics. Instead of pursuing an exact recovery of the evaluation metric through a deep neural network, we are reminded of the purpose of the existence of these evaluation metrics, which is to distinguish whether one model is better or worse than another. In this paper, we show that directly maintaining the relation of models between surrogate losses and metrics suffices, and propose a rank correlation-based optimization method to maximize this relation and learn surrogate losses. Compared to previous works, our method is much easier to optimize and enjoys significant efficiency and performance gains. Extensive experiments show that our method achieves improvements on various tasks including image classification and neural machine translation, and even outperforms state-of-the-art methods on human pose estimation and machine reading comprehension tasks. Code is available at: https://github.com/hunto/ReLoss.

preprint2022arXiv

Relativistic origin of Hertz-form and extended Hertz-form equations for Maxwell theory of electromagnetism

We show explicitly that the Hertz-form Maxwell&#39;s equations and their extensions can be obtained from the non-relativistic expansion of Lorentz transformation of Maxwell&#39;s equations. The explicit expression for the parameter $α$ in the extended Hertz-form equations can be derived from such a non-relativistic expansion. The extended Hertz-form equations, which do not preserve Galilean invariance, origin from Lorentz transformation of Maxwell&#39;s equations and differ from the Galilean-transformed Maxwell equations (the original Hertz equations) by the relative sign differences between the two $α$ terms etc. Especially, the $α$ parameter is of relativistic origin. The superluminal behavior illustrated by the D&#39;Alembert equation from the extended Hertz-form equations should be removed by including all subleading contributions in the $v/c$ expansion, although such a superluminal behavior will not occur in the vacuum because $α=0$. We should note that in the Hertz form and extended Hertz form equations, the electromagnetic fields should take the forms $ \vec{\mathcal{E}}(x)=\vec{E}(Λ^{-1}x)$ and $ \vec{\mathcal{B}}(x)=\vec{B}(Λ^{-1}x)$. Such a choice of description for the fields is different from the ordinary one with $\vec{E}(x)$ and $\vec{B}(x)$, which are well known to satisfy the ordinary Maxwell&#39;s equations. The descriptions of electromagnetic phenomena using the function set $\{\vec{\mathcal{E}}(x),\vec{\mathcal{B}}(x)\}$ and the function set $(\vec{E}(x),\vec{B}(x))$ are equivalent, with the $\{\vec{\mathcal{E}}(x),\vec{\mathcal{B}}(x)\}$ description satisfying the extended Hertz-form Maxwell&#39;s equations in the low speed approximation. The solution of (extended) Hertz-form Maxwell&#39;s equations describe the traveling wave form electromagnetic field.

preprint2022arXiv

Response to: Significance and stability of deep learning-based identification of subtypes within major psychiatric disorders. Molecular Psychiatry (2022)

Recently, Winter and Hahn [1] commented on our work on identifying subtypes of major psychiatry disorders (MPDs) based on neurobiological features using machine learning [2]. They questioned the generalizability of our methods and the statistical significance, stability, and overfitting of the results, and proposed a pipeline for disease subtyping. We appreciate their earnest consideration of our work, however, we need to point out their misconceptions of basic machine-learning concepts and delineate some key issues involved.

preprint2022arXiv

Robust (Controlled) Table-to-Text Generation with Structure-Aware Equivariance Learning

Controlled table-to-text generation seeks to generate natural language descriptions for highlighted subparts of a table. Previous SOTA systems still employ a sequence-to-sequence generation method, which merely captures the table as a linear structure and is brittle when table layouts change. We seek to go beyond this paradigm by (1) effectively expressing the relations of content pieces in the table, and (2) making our model robust to content-invariant structural transformations. Accordingly, we propose an equivariance learning framework, which encodes tables with a structure-aware self-attention mechanism. This prunes the full self-attention structure into an order-invariant graph attention that captures the connected graph structure of cells belonging to the same row or column, and it differentiates between relevant cells and irrelevant cells from the structural perspective. Our framework also modifies the positional encoding mechanism to preserve the relative position of tokens in the same cell but enforce position invariance among different cells. Our technology is free to be plugged into existing table-to-text generation models, and has improved T5-based models to offer better performance on ToTTo and HiTab. Moreover, on a harder version of ToTTo, we preserve promising performance, while previous SOTA systems, even with transformation-based data augmentation, have seen significant performance drops. Our code is available at https://github.com/luka-group/Lattice.

preprint2022arXiv

ScaleNet: Searching for the Model to Scale

Recently, community has paid increasing attention on model scaling and contributed to developing a model family with a wide spectrum of scales. Current methods either simply resort to a one-shot NAS manner to construct a non-structural and non-scalable model family or rely on a manual yet fixed scaling strategy to scale an unnecessarily best base model. In this paper, we bridge both two components and propose ScaleNet to jointly search base model and scaling strategy so that the scaled large model can have more promising performance. Concretely, we design a super-supernet to embody models with different spectrum of sizes (e.g., FLOPs). Then, the scaling strategy can be learned interactively with the base model via a Markov chain-based evolution algorithm and generalized to develop even larger models. To obtain a decent super-supernet, we design a hierarchical sampling strategy to enhance its training sufficiency and alleviate the disturbance. Experimental results show our scaled networks enjoy significant performance superiority on various FLOPs, but with at least 2.53x reduction on search cost. Codes are available at https://github.com/luminolx/ScaleNet.

preprint2022arXiv

Searching for Network Width with Bilaterally Coupled Network

Searching for a more compact network width recently serves as an effective way of channel pruning for the deployment of convolutional neural networks (CNNs) under hardware constraints. To fulfill the searching, a one-shot supernet is usually leveraged to efficiently evaluate the performance \wrt~different network widths. However, current methods mainly follow a \textit{unilaterally augmented} (UA) principle for the evaluation of each width, which induces the training unfairness of channels in supernet. In this paper, we introduce a new supernet called Bilaterally Coupled Network (BCNet) to address this issue. In BCNet, each channel is fairly trained and responsible for the same amount of network widths, thus each network width can be evaluated more accurately. Besides, we propose to reduce the redundant search space and present the BCNetV2 as the enhanced supernet to ensure rigorous training fairness over channels. Furthermore, we leverage a stochastic complementary strategy for training the BCNet, and propose a prior initial population sampling method to boost the performance of the evolutionary search. We also propose the first open-source width benchmark on macro structures named Channel-Bench-Macro for the better comparison of width search algorithms. Extensive experiments on benchmark CIFAR-10 and ImageNet datasets indicate that our method can achieve state-of-the-art or competing performance over other baseline methods. Moreover, our method turns out to further boost the performance of NAS models by refining their network widths. For example, with the same FLOPs budget, our obtained EfficientNet-B0 achieves 77.53\% Top-1 accuracy on ImageNet dataset, surpassing the performance of original setting by 0.65\%.

preprint2022arXiv

SimMatch: Semi-supervised Learning with Similarity Matching

Learning with few labeled data has been a longstanding problem in the computer vision and machine learning research community. In this paper, we introduced a new semi-supervised learning framework, SimMatch, which simultaneously considers semantic similarity and instance similarity. In SimMatch, the consistency regularization will be applied on both semantic-level and instance-level. The different augmented views of the same instance are encouraged to have the same class prediction and similar similarity relationship respected to other instances. Next, we instantiated a labeled memory buffer to fully leverage the ground truth labels on instance-level and bridge the gaps between the semantic and instance similarities. Finally, we proposed the \textit{unfolding} and \textit{aggregation} operation which allows these two similarities be isomorphically transformed with each other. In this way, the semantic and instance pseudo-labels can be mutually propagated to generate more high-quality and reliable matching targets. Extensive experimental results demonstrate that SimMatch improves the performance of semi-supervised learning tasks across different benchmark datasets and different settings. Notably, with 400 epochs of training, SimMatch achieves 67.2\%, and 74.4\% Top-1 Accuracy with 1\% and 10\% labeled examples on ImageNet, which significantly outperforms the baseline methods and is better than previous semi-supervised learning frameworks. Code and pre-trained models are available at https://github.com/KyleZheng1997/simmatch.

preprint2022arXiv

Social Distancing Alert with Smartwatches

Social distancing is an efficient public health practice during the COVID-19 pandemic. However, people would violate the social distancing practice unconsciously when they conduct some social activities such as handshaking, hugging, kissing on the face or forehead, etc. In this paper, we present SoDA, a social distancing practice violation alert system based on smartwatches, for preventing COVID-19 virus transmission. SoDA utilizes recordings of accelerometers and gyroscopes to recognize activities that may violate social distancing practice with simple yet effective Vision Transformer models. Extensive experiments over 10 volunteers and 1800+ samples demonstrate that SoDA achieves social activity recognition with the accuracy of 94.7%, 1.8% negative alert, and 2.2% missing alert.

preprint2022arXiv

Spatial-Temporal Identity: A Simple yet Effective Baseline for Multivariate Time Series Forecasting

Multivariate Time Series (MTS) forecasting plays a vital role in a wide range of applications. Recently, Spatial-Temporal Graph Neural Networks (STGNNs) have become increasingly popular MTS forecasting methods due to their state-of-the-art performance. However, recent works are becoming more sophisticated with limited performance improvements. This phenomenon motivates us to explore the critical factors of MTS forecasting and design a model that is as powerful as STGNNs, but more concise and efficient. In this paper, we identify the indistinguishability of samples in both spatial and temporal dimensions as a key bottleneck, and propose a simple yet effective baseline for MTS forecasting by attaching Spatial and Temporal IDentity information (STID), which achieves the best performance and efficiency simultaneously based on simple Multi-Layer Perceptrons (MLPs). These results suggest that we can design efficient and effective models as long as they solve the indistinguishability of samples, without being limited to STGNNs.

preprint2022arXiv

Stretchable Cells Help DARTS Search Better

Differentiable neural architecture search (DARTS) has gained much success in discovering flexible and diverse cell types. To reduce the evaluation gap, the supernet is expected to have identical layers with the target network. However, even for this consistent search, the searched cells often suffer from poor performance, especially for the supernet with fewer layers, as current DARTS methods are prone to wide and shallow cells, and this topology collapse induces sub-optimal searched cells. In this paper, we alleviate this issue by endowing the cells with explicit stretchability, so the search can be directly implemented on our stretchable cells for both operation type and topology simultaneously. Concretely, we introduce a set of topological variables and a combinatorial probabilistic distribution to explicitly model the target topology. With more diverse and complex topologies, our method adapts well for various layer numbers. Extensive experiments on CIFAR-10 and ImageNet show that our stretchable cells obtain better performance with fewer layers and parameters. For example, our method can improve DARTS by 0.28\% accuracy on CIFAR-10 dataset with 45\% parameters reduced or 2.9\% with similar FLOPs on ImageNet dataset.

preprint2022arXiv

The growth mechanism of boundary layers for the 2D Navier-Stokes equations

We give a detailed description of formation of the boundary layers in the inviscid limit problem. To be more specific, we prove that the magnitude of the vorticity near the boundary is growing to the size of $1/\sqrtν$ and the width of the layer is spreading out to be proportional the $\sqrtν$ in a finite time period. In fact, the growth time scaling is almost $ν$

preprint2022arXiv

Thermal conductivity reduction in (Zr$_{0.25}$Ta$_{0.25}$Nb$_{0.25}$Ti$_{0.25}$)C high entropy carbide from extrinsic lattice defects

High entropy carbides ceramics with randomly-distributed multiple principal cations have shown high temperature stability, low thermal conductivity, and possible radiation tolerance. While chemical disorder has been shown to suppress thermal conductivity in these materials, little investigation has been made on the effects of additional, extrinsically-generated structural defects on thermal transport. Here, (Zr$_{0.25}$Ta$_{0.25}$Nb$_{0.25}$Ti$_{0.25}$)C is exposed to Zr ions to generate a micron-scale, structural-defect-bearing layer. The reduction in lattice thermal transport is measured using laser thermoreflectance. Conductivity changes from different implantation temperatures suggest dislocation loops contribute little to phonon scattering while nanoscale defects serve as effective scatterers, offering a pathway for thermal engineering.

preprint2022arXiv

TR-MOT: Multi-Object Tracking by Reference

Multi-object Tracking (MOT) generally can be split into two sub-tasks, i.e., detection and association. Many previous methods follow the tracking by detection paradigm, which first obtain detections at each frame and then associate them between adjacent frames. Though with an impressive performance by utilizing a strong detector, it will degrade their detection and association performance under scenes with many occlusions and large motion if not using temporal information. In this paper, we propose a novel Reference Search (RS) module to provide a more reliable association based on the deformable transformer structure, which is natural to learn the feature alignment for each object among frames. RS takes previous detected results as references to aggregate the corresponding features from the combined features of the adjacent frames and makes a one-to-one track state prediction for each reference in parallel. Therefore, RS can attain a reliable association coping with unexpected motions by leveraging visual temporal features while maintaining the strong detection performance by decoupling from the detector. Our RS module can also be compatible with the structure of the other tracking by detection frameworks. Furthermore, we propose a joint training strategy and an effective matching pipeline for our online MOT framework with the RS module. Our method achieves competitive results on MOT17 and MOT20 datasets.

preprint2022arXiv

Traveling Wave Form Description for Dirac Field and Its Deduction To Pauli Equation Type Forms in Quantum Mechanics

We derive an equivalent traveling wave form description for Dirac field. In the non-relativistic limit, such form can reduce to inverse-Galilean transformed Schrodinger-type equation. We find that, the resulting two-component Schrodinger-type equation from the reduction of traveling wave form description of Dirac field is different to the naive Galilean transformed Schrodinger equation. Taking into account the interactions of the system to electromagnetic field by adding proper forms of covariant derivative, the traveling wave form description for Pauli equation can be similarly obtained in the non-relativistic limit. Such descriptions allow one to choose arbitrary convenient reference frame for quantum system involving spins. Using Bargmann-Wigner formalism for field with arbitrary spin $s\geq 1/2$, which satisfy Dirac-type equations in all its indices, the traveling wave description for such a field can be similarly obtained from the traveling wave form description of Dirac field, for example, for the spin-3/2 Rarita-Schwinger field and spin-2 gravitational field.

preprint2022arXiv

Uncertainty-Aware Learning Against Label Noise on Imbalanced Datasets

Learning against label noise is a vital topic to guarantee a reliable performance for deep neural networks. Recent research usually refers to dynamic noise modeling with model output probabilities and loss values, and then separates clean and noisy samples. These methods have gained notable success. However, unlike cherry-picked data, existing approaches often cannot perform well when facing imbalanced datasets, a common scenario in the real world. We thoroughly investigate this phenomenon and point out two major issues that hinder the performance, i.e., \emph{inter-class loss distribution discrepancy} and \emph{misleading predictions due to uncertainty}. The first issue is that existing methods often perform class-agnostic noise modeling. However, loss distributions show a significant discrepancy among classes under class imbalance, and class-agnostic noise modeling can easily get confused with noisy samples and samples in minority classes. The second issue refers to that models may output misleading predictions due to epistemic uncertainty and aleatoric uncertainty, thus existing methods that rely solely on the output probabilities may fail to distinguish confident samples. Inspired by our observations, we propose an Uncertainty-aware Label Correction framework~(ULC) to handle label noise on imbalanced datasets. First, we perform epistemic uncertainty-aware class-specific noise modeling to identify trustworthy clean samples and refine/discard highly confident true/corrupted labels. Then, we introduce aleatoric uncertainty in the subsequent learning process to prevent noise accumulation in the label noise modeling process. We conduct experiments on several synthetic and real-world datasets. The results demonstrate the effectiveness of the proposed method, especially on imbalanced datasets.

preprint2022arXiv

Uniqueness of Low Rank Matrix Completion and Schur Complement

In this paper we study the low rank matrix completion problem using tools from Schur complement. We give a sufficient and necessary condition such that the completed matrix is globally unique with given data. We assume the observed entries of the matrix follow a special &#34;staircase&#34; structure. Under this assumption, the matrix completion problem is either globally unique or has infinitely many solutions (thus excluding local uniqueness). In fact, the uniqueness of the matrix completion problem totally depends on the rank of the submatrices at the corners of the &#34;staircase&#34;. The proof of the theorems make extensive use of the Schur complement.

preprint2022arXiv

Unveiling non-gray surface of cloudy exoplanets: the influence of wavelength-dependent surface albedo and cloud scattering properties on retrieval solutions

Direct-imaging spectra hold rich information about a planet&#39;s atmosphere and surface, and several space-based missions aiming at such observations will become a reality in the near future. Previous spectral retrieval works have resulted in key atmospheric constraints under the assumption of a gray surface, but the effect of wavelength-dependent surface albedo on retrieval has not been shown. We explore the influence of the coupling effect of cloud and wavelength-dependent surface albedo on retrieval performance via modeling suites of Earth-like atmospheres with varying cloud and surface albedo parameterizations. Under the assumption of known cloud scattering properties, the surface spectral albedos can be reasonably recovered when the surface cover represents that of Earth-like vegetation or ocean, which may aid in characterizing the planet&#39;s habitability. When the cloud scattering properties cannot be assumed, we show that the degeneracy between the cloud properties and wavelength-dependent surface albedo leads to biased results of atmospheric and cloud properties. The multi-epoch visible band observations offer limited improvement in disentangling this degeneracy. However, the constraints on atmospheric properties from the combination of UV band (R $\sim 6$) $+$ visible band (R $\sim 140$) are consistent with input values to within 1 $σ$. If short bandpass data is not available, an alternative solution to reduce the retrieval uncertainties would be to have the prior constraints on planetary cloud fraction with less than 20% uncertainty.

preprint2022arXiv

Where Does the Performance Improvement Come From? -- A Reproducibility Concern about Image-Text Retrieval

This article aims to provide the information retrieval community with some reflections on recent advances in retrieval learning by analyzing the reproducibility of image-text retrieval models. Due to the increase of multimodal data over the last decade, image-text retrieval has steadily become a major research direction in the field of information retrieval. Numerous researchers train and evaluate image-text retrieval algorithms using benchmark datasets such as MS-COCO and Flickr30k. Research in the past has mostly focused on performance, with multiple state-of-the-art methodologies being suggested in a variety of ways. According to their assertions, these techniques provide improved modality interactions and hence more precise multimodal representations. In contrast to previous works, we focus on the reproducibility of the approaches and the examination of the elements that lead to improved performance by pretrained and nonpretrained models in retrieving images and text. To be more specific, we first examine the related reproducibility concerns and explain why our focus is on image-text retrieval tasks. Second, we systematically summarize the current paradigm of image-text retrieval models and the stated contributions of those approaches. Third, we analyze various aspects of the reproduction of pretrained and nonpretrained retrieval models. To complete this, we conducted ablation experiments and obtained some influencing factors that affect retrieval recall more than the improvement claimed in the original paper. Finally, we present some reflections and challenges that the retrieval community should consider in the future. Our source code is publicly available at https://github.com/WangFei-2019/Image-text-Retrieval.

preprint2021arXiv

Contrastive Learning Improves Critical Event Prediction in COVID-19 Patients

Machine Learning (ML) models typically require large-scale, balanced training data to be robust, generalizable, and effective in the context of healthcare. This has been a major issue for developing ML models for the coronavirus-disease 2019 (COVID-19) pandemic where data is highly imbalanced, particularly within electronic health records (EHR) research. Conventional approaches in ML use cross-entropy loss (CEL) that often suffers from poor margin classification. For the first time, we show that contrastive loss (CL) improves the performance of CEL especially for imbalanced EHR data and the related COVID-19 analyses. This study has been approved by the Institutional Review Board at the Icahn School of Medicine at Mount Sinai. We use EHR data from five hospitals within the Mount Sinai Health System (MSHS) to predict mortality, intubation, and intensive care unit (ICU) transfer in hospitalized COVID-19 patients over 24 and 48 hour time windows. We train two sequential architectures (RNN and RETAIN) using two loss functions (CEL and CL). Models are tested on full sample data set which contain all available data and restricted data set to emulate higher class imbalance.CL models consistently outperform CEL models with the restricted data set on these tasks with differences ranging from 0.04 to 0.15 for AUPRC and 0.05 to 0.1 for AUROC. For the restricted sample, only the CL model maintains proper clustering and is able to identify important features, such as pulse oximetry. CL outperforms CEL in instances of severe class imbalance, on three EHR outcomes with respect to three performance metrics: predictive power, clustering, and feature importance. We believe that the developed CL framework can be expanded and used for EHR ML work in general.

preprint2021arXiv

Gluino-SUGRA scenarios in light of FNAL muon g-2 anomaly

Gluino-SUGRA ($\tilde{g}$SUGRA), which is an economical extension of the predictive mSUGRA, adopts much heavier gluino mass parameter than other gauginos mass parameters and universal scalar mass parameter at the unification scale. It can elegantly reconcile the experimental results on the Higgs boson mass, the muon $g-2$, the null results in search for supersymmetry at the LHC and the results from B-physics. In this work, we propose several new ways to generate large gaugino hierarchy (i.e. $M_3\gg M_1,M_2$) for $\tilde{g}$SUGRA model building and then discuss in detail the implications of the new muon $g-2$ results with the updated LHC constraints on such $\tilde{g}$SUGRA scenarios. We obtain the following observations: (i) For the most interesting $M_1=M_2$ case at the GUT scale with a viable bino-like dark matter, the $\tilde{g}$SUGRA can explain the muon $g-2$ anomaly at $1σ$ level and be consistent with the updated LHC constraints for $6\leq M_3/M_1 \leq 9$ at the GUT scale; (ii) For $M_1:M_2=5:1$ at the GUT scale with wino-like dark matter, the $\tilde{g}$SUGRA model can explain the muon $g-2$ anomaly at $2σ$ level and be consistent with the updated LHC constraints for $3\leq M_3/M_1 \leq 3.2$ at the GUT scale; (iii) For $M_1:M_2=3:2$ at the GUT scale with mixed bino-wino dark matter, the $\tilde{g}$SUGRA model can explain the muon $g-2$ anomaly at $1σ$ level and be consistent with the updated LHC constraints for $6.9\leq M_3/M_1 \leq 7.5$ at the GUT scale. Although the choice of heavy gluino will always increase the FT involved, some of the $1σ/2σ$ survived points of $Δa_μ^{combine}$ can still allow low EWFT of order several hundreds and be fairly natural. Constraints from (dimension-five operator induced) proton decay are also discussed.

preprint2021arXiv

Locally Free Weight Sharing for Network Width Search

Searching for network width is an effective way to slim deep neural networks with hardware budgets. With this aim, a one-shot supernet is usually leveraged as a performance evaluator to rank the performance \wrt~different width. Nevertheless, current methods mainly follow a manually fixed weight sharing pattern, which is limited to distinguish the performance gap of different width. In this paper, to better evaluate each width, we propose a locally free weight sharing strategy (CafeNet) accordingly. In CafeNet, weights are more freely shared, and each width is jointly indicated by its base channels and free channels, where free channels are supposed to loCAte FrEely in a local zone to better represent each width. Besides, we propose to further reduce the search space by leveraging our introduced FLOPs-sensitive bins. As a result, our CafeNet can be trained stochastically and get optimized within a min-min strategy. Extensive experiments on ImageNet, CIFAR-10, CelebA and MS COCO dataset have verified our superiority comparing to other state-of-the-art baselines. For example, our method can further boost the benchmark NAS network EfficientNet-B0 by 0.41\% via searching its width more delicately.

preprint2021arXiv

On the Euler+Prandtl expansion for the Navier-Stokes equations

We establish the validity of the Euler$+$Prandtl approximation for solutions of the Navier-Stokes equations in the half plane with Dirichlet boundary conditions, in the vanishing viscosity limit, for initial data which are analytic only near the boundary, and Sobolev smooth away from the boundary. Our proof does not require higher order correctors, and works directly by estimating an $L^{1}$-type norm for the vorticity of the error term in the expansion Navier-Stokes$-($Euler$+$Prandtl$)$. An important ingredient in the proof is the propagation of local analyticity for the Euler equation, a result of independent interest.

preprint2021arXiv

Towards Improving the Consistency, Efficiency, and Flexibility of Differentiable Neural Architecture Search

Most differentiable neural architecture search methods construct a super-net for search and derive a target-net as its sub-graph for evaluation. There exists a significant gap between the architectures in search and evaluation. As a result, current methods suffer from an inconsistent, inefficient, and inflexible search process. In this paper, we introduce EnTranNAS that is composed of Engine-cells and Transit-cells. The Engine-cell is differentiable for architecture search, while the Transit-cell only transits a sub-graph by architecture derivation. Consequently, the gap between the architectures in search and evaluation is significantly reduced. Our method also spares much memory and computation cost, which speeds up the search process. A feature sharing strategy is introduced for more balanced optimization and more efficient search. Furthermore, we develop an architecture derivation method to replace the traditional one that is based on a hand-crafted rule. Our method enables differentiable sparsification, and keeps the derived architecture equivalent to that of Engine-cell, which further improves the consistency between search and evaluation. Besides, it supports the search for topology where a node can be connected to prior nodes with any number of connections, so that the searched architectures could be more flexible. For experiments on CIFAR-10, our search on the standard space requires only 0.06 GPU-day. We further have an error rate of 2.22% with 0.07 GPU-day for the search on an extended space. We can also directly perform the search on ImageNet with topology learnable and achieve a top-1 error rate of 23.8% in 2.1 GPU-day.

preprint2020arXiv

A Coarse-to-Fine Adaptive Network for Appearance-Based Gaze Estimation

Human gaze is essential for various appealing applications. Aiming at more accurate gaze estimation, a series of recent works propose to utilize face and eye images simultaneously. Nevertheless, face and eye images only serve as independent or parallel feature sources in those works, the intrinsic correlation between their features is overlooked. In this paper we make the following contributions: 1) We propose a coarse-to-fine strategy which estimates a basic gaze direction from face image and refines it with corresponding residual predicted from eye images. 2) Guided by the proposed strategy, we design a framework which introduces a bi-gram model to bridge gaze residual and basic gaze direction, and an attention component to adaptively acquire suitable fine-grained feature. 3) Integrating the above innovations, we construct a coarse-to-fine adaptive network named CA-Net and achieve state-of-the-art performances on MPIIGaze and EyeDiap.

preprint2020arXiv

A Federated Multi-View Deep Learning Framework for Privacy-Preserving Recommendations

Privacy-preserving recommendations are recently gaining momentum, since the decentralized user data is increasingly harder to collect, by recommendation service providers, due to the serious concerns over user privacy and data security. This situation is further exacerbated by the strict government regulations such as Europe&#39;s General Data Privacy Regulations(GDPR). Federated Learning(FL) is a newly developed privacy-preserving machine learning paradigm to bridge data repositories without compromising data security and privacy. Thus many federated recommendation(FedRec) algorithms have been proposed to realize personalized privacy-preserving recommendations. However, existing FedRec algorithms, mostly extended from traditional collaborative filtering(CF) method, cannot address cold-start problem well. In addition, their performance overhead w.r.t. model accuracy, trained in a federated setting, is often non-negligible comparing to centralized recommendations. This paper studies this issue and presents FL-MV-DSSM, a generic content-based federated multi-view recommendation framework that not only addresses the cold-start problem, but also significantly boosts the recommendation performance by learning a federated model from multiple data source for capturing richer user-level features. The new federated multi-view setting, proposed by FL-MV-DSSM, opens new usage models and brings in new security challenges to FL in recommendation scenarios. We prove the security guarantees of \xxx, and empirical evaluations on FL-MV-DSSM and its variations with public datasets demonstrate its effectiveness. Our codes will be released if this paper is accepted.

preprint2020arXiv

A Framework for Behavior Privacy Preserving in Radio Frequency Signal

Recent years have witnessed the bloom development of the human-centered wireless sensing applications, in which some human information, such as the user&#39;s identity and motions, can be retrieved through analyzing the signal distortion caused by the target person. However, the openness of wireless transmission raises increasing concerns on user privacy, since either the human identity or human motion is sensitive in certain scenarios, including personal residence, laboratory, and office. Researchers have reported that commodity WiFi signals can be abused to identify users. To dispel this threat, in this paper we propose a privacy-preserving framework to effectively hide the information of user behaviors in wireless signals while retaining the ability of user authentication. The core of our framework is a novel Siamese network-based deep model, namely RFBP-Net. In this way, wireless sensing reveals user information moderately. We conduct extensive experiments on both the real WiFi and RFID system and open datasets. The experiment results show that RFBP-Net is able to significantly reduce the activity recognition accuracy, i.e., 70% reduction in the RFID system and 80% reduction in the WiFi system, with a slight penalty in the user authentication accuracy, i.e., only 5% and 1% decrease in the RFID and WiFi system, respectively.

preprint2020arXiv

A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension

Referring expression comprehension aims to localize the object instance described by a natural language expression. Current referring expression methods have achieved good performance. However, none of them is able to achieve real-time inference without accuracy drop. The reason for the relatively slow inference speed is that these methods artificially split the referring expression comprehension into two sequential stages including proposal generation and proposal ranking. It does not exactly conform to the habit of human cognition. To this end, we propose a novel Realtime Cross-modality Correlation Filtering method (RCCF). RCCF reformulates the referring expression comprehension as a correlation filtering process. The expression is first mapped from the language domain to the visual domain and then treated as a template (kernel) to perform correlation filtering on the image feature map. The peak value in the correlation heatmap indicates the center points of the target box. In addition, RCCF also regresses a 2-D object size and 2-D offset. The center point coordinates, object size and center point offset together to form the target bounding box. Our method runs at 40 FPS while achieving leading performance in RefClef, RefCOCO, RefCOCO+ and RefCOCOg benchmarks. In the challenging RefClef dataset, our methods almost double the state-of-the-art performance (34.70% increased to 63.79%). We hope this work can arouse more attention and studies to the new cross-modality correlation filtering framework as well as the one-stage framework for referring expression comprehension.

preprint2020arXiv

A(DP)$^2$SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent with Differential Privacy

As deep learning models are usually massive and complex, distributed learning is essential for increasing training efficiency. Moreover, in many real-world application scenarios like healthcare, distributed learning can also keep the data local and protect privacy. A popular distributed learning strategy is federated learning, where there is a central server storing the global model and a set of local computing nodes updating the model parameters with their corresponding data. The updated model parameters will be processed and transmitted to the central server, which leads to heavy communication costs. Recently, asynchronous decentralized distributed learning has been proposed and demonstrated to be a more efficient and practical strategy where there is no central server, so that each computing node only communicates with its neighbors. Although no raw data will be transmitted across different local nodes, there is still a risk of information leak during the communication process for malicious participants to make attacks. In this paper, we present a differentially private version of asynchronous decentralized parallel SGD (ADPSGD) framework, or A(DP)$^2$SGD for short, which maintains communication efficiency of ADPSGD and prevents the inference from malicious participants. Specifically, R{é}nyi differential privacy is used to provide tighter privacy analysis for our composite Gaussian mechanisms while the convergence rate is consistent with the non-private version. Theoretical analysis shows A(DP)$^2$SGD also converges at the optimal $\mathcal{O}(1/\sqrt{T})$ rate as SGD. Empirically, A(DP)$^2$SGD achieves comparable model accuracy as the differentially private version of Synchronous SGD (SSGD) but runs much faster than SSGD in heterogeneous computing environments.

preprint2020arXiv

Adversarial Infidelity Learning for Model Interpretation

Model interpretation is essential in data mining and knowledge discovery. It can help understand the intrinsic model working mechanism and check if the model has undesired characteristics. A popular way of performing model interpretation is Instance-wise Feature Selection (IFS), which provides an importance score of each feature representing the data samples to explain how the model generates the specific output. In this paper, we propose a Model-agnostic Effective Efficient Direct (MEED) IFS framework for model interpretation, mitigating concerns about sanity, combinatorial shortcuts, model identifiability, and information transmission. Also, we focus on the following setting: using selected features to directly predict the output of the given model, which serves as a primary evaluation metric for model-interpretation methods. Apart from the features, we involve the output of the given model as an additional input to learn an explainer based on more accurate information. To learn the explainer, besides fidelity, we propose an Adversarial Infidelity Learning (AIL) mechanism to boost the explanation learning by screening relatively unimportant features. Through theoretical and experimental analysis, we show that our AIL mechanism can help learn the desired conditional distribution between selected features and targets. Moreover, we extend our framework by integrating efficient interpretation methods as proper priors to provide a warm start. Comprehensive empirical evaluation results are provided by quantitative metrics and human evaluation to demonstrate the effectiveness and superiority of our proposed method. Our code is publicly available online at https://github.com/langlrsw/MEED.

preprint2020arXiv

CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection

Keypoint-based detectors have achieved pretty-well performance. However, incorrect keypoint matching is still widespread and greatly affects the performance of the detector. In this paper, we propose CentripetalNet which uses centripetal shift to pair corner keypoints from the same instance. CentripetalNet predicts the position and the centripetal shift of the corner points and matches corners whose shifted results are aligned. Combining position information, our approach matches corner points more accurately than the conventional embedding approaches do. Corner pooling extracts information inside the bounding boxes onto the border. To make this information more aware at the corners, we design a cross-star deformable convolution network to conduct feature adaption. Furthermore, we explore instance segmentation on anchor-free detectors by equipping our CentripetalNet with a mask prediction module. On MS-COCO test-dev, our CentripetalNet not only outperforms all existing anchor-free detectors with an AP of 48.0% but also achieves comparable performance to the state-of-the-art instance segmentation approaches with a 40.2% MaskAP. Code will be available at https://github.com/KiveeDong/CentripetalNet.

preprint2020arXiv

Coreference Resolution as Query-based Span Prediction

In this paper, we present an accurate and extensible approach for the coreference resolution task. We formulate the problem as a span prediction task, like in machine reading comprehension (MRC): A query is generated for each candidate mention using its surrounding context, and a span prediction module is employed to extract the text spans of the coreferences within the document using the generated query. This formulation comes with the following key advantages: (1) The span prediction strategy provides the flexibility of retrieving mentions left out at the mention proposal stage; (2) In the MRC framework, encoding the mention and its context explicitly in a query makes it possible to have a deep and thorough examination of cues embedded in the context of coreferent mentions; and (3) A plethora of existing MRC datasets can be used for data augmentation to improve the model&#39;s generalization capability. Experiments demonstrate significant performance boost over previous models, with 87.5 (+2.5) F1 score on the GAP benchmark and 83.1 (+3.5) F1 score on the CoNLL-2012 benchmark.

preprint2020arXiv

Deep Representation Learning of Patient Data from Electronic Health Records (EHR): A Systematic Review

Patient representation learning refers to learning a dense mathematical representation of a patient that encodes meaningful information from Electronic Health Records (EHRs). This is generally performed using advanced deep learning methods. This study presents a systematic review of this field and provides both qualitative and quantitative analyses from a methodological perspective. We identified studies developing patient representations from EHRs with deep learning methods from MEDLINE, EMBASE, Scopus, the Association for Computing Machinery (ACM) Digital Library, and Institute of Electrical and Electronics Engineers (IEEE) Xplore Digital Library. After screening 363 articles, 49 papers were included for a comprehensive data collection. We noticed a typical workflow starting with feeding raw data, applying deep learning models, and ending with clinical outcome predictions as evaluations of the learned representations. Specifically, learning representations from structured EHR data was dominant (37 out of 49 studies). Recurrent Neural Networks were widely applied as the deep learning architecture (LSTM: 13 studies, GRU: 11 studies). Disease prediction was the most common application and evaluation (31 studies). Benchmark datasets were mostly unavailable (28 studies) due to privacy concerns of EHR data, and code availability was assured in 20 studies. We show the importance and feasibility of learning comprehensive representations of patient EHR data through a systematic review. Advances in patient representation learning techniques will be essential for powering patient-level EHR analyses. Future work will still be devoted to leveraging the richness and potential of available EHR data. Knowledge distillation and advanced learning techniques will be exploited to assist the capability of learning patient representation further.

preprint2020arXiv

Dynamic Knowledge Distillation for Black-box Hypothesis Transfer Learning

In real world applications like healthcare, it is usually difficult to build a machine learning prediction model that works universally well across different institutions. At the same time, the available model is often proprietary, i.e., neither the model parameter nor the data set used for model training is accessible. In consequence, leveraging the knowledge hidden in the available model (aka. the hypothesis) and adapting it to a local data set becomes extremely challenging. Motivated by this situation, in this paper we aim to address such a specific case within the hypothesis transfer learning framework, in which 1) the source hypothesis is a black-box model and 2) the source domain data is unavailable. In particular, we introduce a novel algorithm called dynamic knowledge distillation for hypothesis transfer learning (dkdHTL). In this method, we use knowledge distillation with instance-wise weighting mechanism to adaptively transfer the &#34;dark&#34; knowledge from the source hypothesis to the target domain.The weighting coefficients of the distillation loss and the standard loss are determined by the consistency between the predicted probability of the source hypothesis and the target ground-truth label.Empirical results on both transfer learning benchmark datasets and a healthcare dataset demonstrate the effectiveness of our method.

preprint2020arXiv

Eigendecomposition-Free Training of Deep Networks for Linear Least-Square Problems

Many classical Computer Vision problems, such as essential matrix computation and pose estimation from 3D to 2D correspondences, can be tackled by solving a linear least-square problem, which can be done by finding the eigenvector corresponding to the smallest, or zero, eigenvalue of a matrix representing a linear system. Incorporating this in deep learning frameworks would allow us to explicitly encode known notions of geometry, instead of having the network implicitly learn them from data. However, performing eigendecomposition within a network requires the ability to differentiate this operation. While theoretically doable, this introduces numerical instability in the optimization process in practice. In this paper, we introduce an eigendecomposition-free approach to training a deep network whose loss depends on the eigenvector corresponding to a zero eigenvalue of a matrix predicted by the network. We demonstrate that our approach is much more robust than explicit differentiation of the eigendecomposition using two general tasks, outlier rejection and denoising, with several practical examples including wide-baseline stereo, the perspective-n-point problem, and ellipse fitting. Empirically, our method has better convergence properties and yields state-of-the-art results.

preprint2020arXiv

Explaining The XENON1T Excess With Light Goldstini Dark Matter

In the scenario with a multiplicity of sectors which independently break supersymmetry, multiplicity of goldstini are predicted. We propose a new interpretation of the electron recoil excess at 2-7 keV observed in the XENON1T experiment with very long-lived goldstini DM elastically scattering off the electrons. The goldstini DM can be boosted by the late-decay of the other nearly degenerate (long-lived) goldstini DM, with their tiny mass difference being converted into kinetic energy of the lighter goldstini DM and neutrinos. We show that viable parameter space can be found which can explain the excess of electron recoil events around 2-3 keV recently reported by the XENON1T experiment.

preprint2020arXiv

Explicit-Blurred Memory Network for Analyzing Patient Electronic Health Records

In recent years, we have witnessed an increased interest in temporal modeling of patient records from large scale Electronic Health Records (EHR). While simpler RNN models have been used for such problems, memory networks, which in other domains were found to generalize well, are underutilized. Traditional memory networks involve diffused and non-linear operations where influence of past events on outputs are not readily quantifiable. We posit that this lack of interpretability makes such networks not applicable for EHR analysis. While networks with explicit memory have been proposed recently, the discontinuities imposed by the discrete operations make such networks harder to train and require more supervision. The problem is further exacerbated in the limited data setting of EHR studies. In this paper, we propose a novel memory architecture that is more interpretable than traditional memory networks while being easier to train than explicit memory banks. Inspired by well-known models of human cognition, we propose partitioning the external memory space into (a) a primary explicit memory block to store exact replicas of recent events to support interpretations, followed by (b) a secondary blurred memory block that accumulates salient aspects of past events dropped from the explicit block as higher level abstractions and allow training with less supervision by stabilize the gradients. We apply the model for 3 learning problems on ICU records from the MIMIC III database spanning millions of data points. Our model performs comparably to the state-of the art while also, crucially, enabling ready interpretation of the results.

preprint2020arXiv

Federated Learning for Healthcare Informatics

With the rapid development of computer software and hardware technologies, more and more healthcare data are becoming readily available from clinical institutions, patients, insurance companies and pharmaceutical industries, among others. This access provides an unprecedented opportunity for data science technologies to derive data-driven insights and improve the quality of care delivery. Healthcare data, however, are usually fragmented and private making it difficult to generate robust results across populations. For example, different hospitals own the electronic health records (EHR) of different patient populations and these records are difficult to share across hospitals because of their sensitive nature. This creates a big barrier for developing effective analytical approaches that are generalizable, which need diverse, &#34;big data&#34;. Federated learning, a mechanism of training a shared global model with a central server while keeping all the sensitive data in local institutions where the data belong, provides great promise to connect the fragmented healthcare data sources with privacy-preservation. The goal of this survey is to provide a review for federated learning technologies, particularly within the biomedical space. In particular, we summarize the general solutions to the statistical challenges, system challenges and privacy issues in federated learning, and point out the implications and potentials in healthcare.

preprint2020arXiv

Gated Fusion Network for Degraded Image Super Resolution

Single image super resolution aims to enhance image quality with respect to spatial content, which is a fundamental task in computer vision. In this work, we address the task of single frame super resolution with the presence of image degradation, e.g., blur, haze, or rain streaks. Due to the limitations of frame capturing and formation processes, image degradation is inevitable, and the artifacts would be exacerbated by super resolution methods. To address this problem, we propose a dual-branch convolutional neural network to extract base features and recovered features separately. The base features contain local and global information of the input image. On the other hand, the recovered features focus on the degraded regions and are used to remove the degradation. Those features are then fused through a recursive gate module to obtain sharp features for super resolution. By decomposing the feature extraction step into two task-independent streams, the dual-branch model can facilitate the training process by avoiding learning the mixed degradation all-in-one and thus enhance the final high-resolution prediction results. We evaluate the proposed method in three degradation scenarios. Experiments on these scenarios demonstrate that the proposed method performs more efficiently and favorably against the state-of-the-art approaches on benchmark datasets.

preprint2020arXiv

General-Purpose User Embeddings based on Mobile App Usage

In this paper, we report our recent practice at Tencent for user modeling based on mobile app usage. User behaviors on mobile app usage, including retention, installation, and uninstallation, can be a good indicator for both long-term and short-term interests of users. For example, if a user installs Snapseed recently, she might have a growing interest in photographing. Such information is valuable for numerous downstream applications, including advertising, recommendations, etc. Traditionally, user modeling from mobile app usage heavily relies on handcrafted feature engineering, which requires onerous human work for different downstream applications, and could be sub-optimal without domain experts. However, automatic user modeling based on mobile app usage faces unique challenges, including (1) retention, installation, and uninstallation are heterogeneous but need to be modeled collectively, (2) user behaviors are distributed unevenly over time, and (3) many long-tailed apps suffer from serious sparsity. In this paper, we present a tailored AutoEncoder-coupled Transformer Network (AETN), by which we overcome these challenges and achieve the goals of reducing manual efforts and boosting performance. We have deployed the model at Tencent, and both online/offline experiments from multiple domains of downstream applications have demonstrated the effectiveness of the output user embeddings.

preprint2020arXiv

Glyce: Glyph-vectors for Chinese Character Representations

It is intuitive that NLP tasks for logographic languages like Chinese should benefit from the use of the glyph information in those languages. However, due to the lack of rich pictographic evidence in glyphs and the weak generalization ability of standard computer vision models on character data, an effective way to utilize the glyph information remains to be found. In this paper, we address this gap by presenting Glyce, the glyph-vectors for Chinese character representations. We make three major innovations: (1) We use historical Chinese scripts (e.g., bronzeware script, seal script, traditional Chinese, etc) to enrich the pictographic evidence in characters; (2) We design CNN structures (called tianzege-CNN) tailored to Chinese character image processing; and (3) We use image-classification as an auxiliary task in a multi-task learning setup to increase the model&#39;s ability to generalize. We show that glyph-based models are able to consistently outperform word/char ID-based models in a wide range of Chinese NLP tasks. We are able to set new state-of-the-art results for a variety of Chinese NLP tasks, including tagging (NER, CWS, POS), sentence pair classification, single sentence classification tasks, dependency parsing, and semantic role labeling. For example, the proposed model achieves an F1 score of 80.6 on the OntoNotes dataset of NER, +1.5 over BERT; it achieves an almost perfect accuracy of 99.8\% on the Fudan corpus for text classification. Code found at https://github.com/ShannonAI/glyce.

preprint2020arXiv

GreedyNAS: Towards Fast One-Shot NAS with Greedy Supernet

Training a supernet matters for one-shot neural architecture search (NAS) methods since it serves as a basic performance estimator for different architectures (paths). Current methods mainly hold the assumption that a supernet should give a reasonable ranking over all paths. They thus treat all paths equally, and spare much effort to train paths. However, it is harsh for a single supernet to evaluate accurately on such a huge-scale search space (e.g., $7^{21}$). In this paper, instead of covering all paths, we ease the burden of supernet by encouraging it to focus more on evaluation of those potentially-good ones, which are identified using a surrogate portion of validation data. Concretely, during training, we propose a multi-path sampling strategy with rejection, and greedily filter the weak paths. The training efficiency is thus boosted since the training space has been greedily shrunk from all paths to those potentially-good ones. Moreover, we further adopt an exploration and exploitation policy by introducing an empirical candidate path pool. Our proposed method GreedyNAS is easy-to-follow, and experimental results on ImageNet dataset indicate that it can achieve better Top-1 accuracy under same search space and FLOPs or latency level, but with only $\sim$60\% of supernet training cost. By searching on a larger space, our GreedyNAS can also obtain new state-of-the-art architectures.

preprint2020arXiv

Learning 3D-3D Correspondences for One-shot Partial-to-partial Registration

While 3D-3D registration is traditionally tacked by optimization-based methods, recent work has shown that learning-based techniques could achieve faster and more robust results. In this context, however, only PRNet can handle the partial-to-partial registration scenario. Unfortunately, this is achieved at the cost of relying on an iterative procedure, with a complex network architecture. Here, we show that learning-based partial-to-partial registration can be achieved in a one-shot manner, jointly reducing network complexity and increasing registration accuracy. To this end, we propose an Optimal Transport layer able to account for occluded points thanks to the use of outlier bins. The resulting OPRNet framework outperforms the state of the art on standard benchmarks, demonstrating better robustness and generalization ability than existing techniques.

preprint2020arXiv

Learning Reinforced Attentional Representation for End-to-End Visual Tracking

Although numerous recent tracking approaches have made tremendous advances in the last decade, achieving high-performance visual tracking remains a challenge. In this paper, we propose an end-to-end network model to learn reinforced attentional representation for accurate target object discrimination and localization. We utilize a novel hierarchical attentional module with long short-term memory and multi-layer perceptrons to leverage both inter- and intra-frame attention to effectively facilitate visual pattern emphasis. Moreover, we incorporate a contextual attentional correlation filter into the backbone network to make our model trainable in an end-to-end fashion. Our proposed approach not only takes full advantage of informative geometries and semantics but also updates correlation filters online without fine-tuning the backbone network to enable the adaptation of variations in the target object&#39;s appearance. Extensive experiments conducted on several popular benchmark datasets demonstrate that our proposed approach is effective and computationally efficient.

preprint2020arXiv

MoFlow: An Invertible Flow Model for Generating Molecular Graphs

Generating molecular graphs with desired chemical properties driven by deep graph generative models provides a very promising way to accelerate drug discovery process. Such graph generative models usually consist of two steps: learning latent representations and generation of molecular graphs. However, to generate novel and chemically-valid molecular graphs from latent representations is very challenging because of the chemical constraints and combinatorial complexity of molecular graphs. In this paper, we propose MoFlow, a flow-based graph generative model to learn invertible mappings between molecular graphs and their latent representations. To generate molecular graphs, our MoFlow first generates bonds (edges) through a Glow based model, then generates atoms (nodes) given bonds by a novel graph conditional flow, and finally assembles them into a chemically valid molecular graph with a posthoc validity correction. Our MoFlow has merits including exact and tractable likelihood training, efficient one-pass embedding and generation, chemical validity guarantees, 100\% reconstruction of training data, and good generalization ability. We validate our model by four tasks: molecular graph generation and reconstruction, visualization of the continuous latent space, property optimization, and constrained property optimization. Our MoFlow achieves state-of-the-art performance, which implies its potential efficiency and effectiveness to explore large chemical space for drug discovery.

preprint2020arXiv

Multi-Scale Boosted Dehazing Network with Dense Feature Fusion

In this paper, we propose a Multi-Scale Boosted Dehazing Network with Dense Feature Fusion based on the U-Net architecture. The proposed method is designed based on two principles, boosting and error feedback, and we show that they are suitable for the dehazing problem. By incorporating the Strengthen-Operate-Subtract boosting strategy in the decoder of the proposed model, we develop a simple yet effective boosted decoder to progressively restore the haze-free image. To address the issue of preserving spatial information in the U-Net architecture, we design a dense feature fusion module using the back-projection feedback scheme. We show that the dense feature fusion module can simultaneously remedy the missing spatial information from high-resolution features and exploit the non-adjacent features. Extensive evaluations demonstrate that the proposed model performs favorably against the state-of-the-art approaches on the benchmark datasets as well as real-world hazy images.

preprint2020arXiv

Neural Cognitive Diagnosis for Intelligent Education Systems

Cognitive diagnosis is a fundamental issue in intelligent education, which aims to discover the proficiency level of students on specific knowledge concepts. Existing approaches usually mine linear interactions of student exercising process by manual-designed function (e.g., logistic function), which is not sufficient for capturing complex relations between students and exercises. In this paper, we propose a general Neural Cognitive Diagnosis (NeuralCD) framework, which incorporates neural networks to learn the complex exercising interactions, for getting both accurate and interpretable diagnosis results. Specifically, we project students and exercises to factor vectors and leverage multi neural layers for modeling their interactions, where the monotonicity assumption is applied to ensure the interpretability of both factors. Furthermore, we propose two implementations of NeuralCD by specializing the required concepts of each exercise, i.e., the NeuralCDM with traditional Q-matrix and the improved NeuralCDM+ exploring the rich text content. Extensive experimental results on real-world datasets show the effectiveness of NeuralCD framework with both accuracy and interpretability.

preprint2020arXiv

Neural Dynamics on Complex Networks

Learning continuous-time dynamics on complex networks is crucial for understanding, predicting and controlling complex systems in science and engineering. However, this task is very challenging due to the combinatorial complexities in the structures of high dimensional systems, their elusive continuous-time nonlinear dynamics, and their structural-dynamic dependencies. To address these challenges, we propose to combine Ordinary Differential Equation Systems (ODEs) and Graph Neural Networks (GNNs) to learn continuous-time dynamics on complex networks in a data-driven manner. We model differential equation systems by GNNs. Instead of mapping through a discrete number of neural layers in the forward process, we integrate GNN layers over continuous time numerically, leading to capturing continuous-time dynamics on graphs. Our model can be interpreted as a Continuous-time GNN model or a Graph Neural ODEs model. Our model can be utilized for continuous-time network dynamics prediction, structured sequence prediction (a regularly-sampled case), and node semi-supervised classification tasks (a one-snapshot case) in a unified framework. We validate our model by extensive experiments in the above three scenarios. The promising experimental results demonstrate our model&#39;s capability of jointly capturing the structure and dynamics of complex systems in a unified framework.

preprint2020arXiv

Non-Markovian trajectories involving future in the semi-classical path integral expression

Semiclassical path integral expression for a quantum system coupled to a harmonic bath is derived based on the stationary phase condition. It is discovered that the system path is non-Markovian. Most strikingly, the system path not only couples to its past (as in the Langevin equation), but also to its future, i.e. the equation of motion for the system is an integro-differential equation that involves all times. Numerical tests are performed to confirm that the future-involved term is indeed necessary. Because of the future-non-Markovian nature of the equation, the numerical solution cannot be obtained by iterative methods. Instead, root search algorithms must be employed.

preprint2020arXiv

PPDM: Parallel Point Detection and Matching for Real-time Human-Object Interaction Detection

We propose a single-stage Human-Object Interaction (HOI) detection method that has outperformed all existing methods on HICO-DET dataset at 37 fps on a single Titan XP GPU. It is the first real-time HOI detection method. Conventional HOI detection methods are composed of two stages, i.e., human-object proposals generation, and proposals classification. Their effectiveness and efficiency are limited by the sequential and separate architecture. In this paper, we propose a Parallel Point Detection and Matching (PPDM) HOI detection framework. In PPDM, an HOI is defined as a point triplet < human point, interaction point, object point>. Human and object points are the center of the detection boxes, and the interaction point is the midpoint of the human and object points. PPDM contains two parallel branches, namely point detection branch and point matching branch. The point detection branch predicts three points. Simultaneously, the point matching branch predicts two displacements from the interaction point to its corresponding human and object points. The human point and the object point originated from the same interaction point are considered as matched pairs. In our novel parallel architecture, the interaction points implicitly provide context and regularization for human and object detection. The isolated detection boxes are unlikely to form meaning HOI triplets are suppressed, which increases the precision of HOI detection. Moreover, the matching between human and object detection boxes is only applied around limited numbers of filtered candidate interaction points, which saves much computational cost. Additionally, we build a new application-oriented database named HOI-A, which severs as a good supplement to the existing datasets. The source code and the dataset will be made publicly available to facilitate the development of HOI detection.

preprint2020arXiv

Strange quark stars within proper time regularized (2+1)-flavor NJL model

In this work we use the equation of state (EOS) of (2+1)-flavor Nambu-Jona-Lasinio (NJL) model to study the structure of the strange quark star. With a new free parameter $α$, the Lagrangian is constructed by two parts, the original NJL Lagrangian and the Fierz transformation of it, as $\mathcal{L}=(1-α)\mathcal{L}_{NJL}+α\mathcal{L}_{Fierz}$. To determine the range of $α$, we compare the binding energies in the 2-flavor and (2+1)-flavor cases. We also consider the constraints of chemical equilibrium and electric charge neutrality in the strange quark star and choose six representative EOSs with different $α$ and $B$ (bag constant) to study their influence on the structure of the strange quark star. As a result, we find that a larger $α$ and a smaller $B$ corresponds to a heavier star with a stiffer EOS. Furthermore, the heaviest strange quark star is in agreement with not only the recent mass observation of PSR J0740+6620 and the X-ray observations on radius measurements, but also the constraint on tidal deformability of GW170817.

preprint2020arXiv

SUSY Breaking Constraints on Modular flavor $S_3$ Invariant $SU(5)$ GUT Model

Modular flavor symmetry can be used to explain the quark and lepton flavor structures. The SUSY partners of quarks and leptons, which share the same superpotential with the quarks and leptons, will also be constrained by the modular flavor structure and show a different flavor(mixing) pattern at the GUT scale. So, in realistic modular flavor models with SUSY completion, constraints from the collider and DM constraints can also be used to constrain the possible values of the modulus parameter. In the first part of this work, we discuss the possibility that the $S_3$ modular symmetry can be preserved by the fixed points of $T^2/Z_N$ orbifold, especially from $T^2/Z_2$. To illustrate the additional constraints from collider etc on modular flavor symmetry models, we take the simplest UV SUSY-completion $S_3$ modular invariance SU(5) GUT model as an example with generalized gravity mediation SUSY breaking mechanism. We find that such constraints can indeed be useful to rule out a large portion of the modulus parameters. Our numerical results show that the UV-completed model can account for both the SM (plus neutrino) flavor structure and the collider, DM constraints. Such discussions can also be applied straightforwardly to other modular flavor symmetry models, such as $A_4$ or $S_4$ models.

preprint2020arXiv

Visualizing Deep Graph Generative Models for Drug Discovery

Drug discovery aims at designing novel molecules with specific desired properties for clinical trials. Over past decades, drug discovery and development have been a costly and time consuming process. Driven by big chemical data and AI, deep generative models show great potential to accelerate the drug discovery process. Existing works investigate different deep generative frameworks for molecular generation, however, less attention has been paid to the visualization tools to quickly demo and evaluate model&#39;s results. Here, we propose a visualization framework which provides interactive visualization tools to visualize molecules generated during the encoding and decoding process of deep graph generative models, and provide real time molecular optimization functionalities. Our work tries to empower black box AI driven drug discovery models with some visual interpretabilities.

preprint2019arXiv

A mixed discontinuous Galerkin method with symmetric stress for Brinkman problem based on the velocity-pseudostress formulation

The Brinkman equations can be regarded as a combination of the Stokes and Darcy equations which model transitions between the fast flow in channels (governed by Stokes equations) and the slow flow in porous media (governed by Darcy&#39;s law). The numerical challenge for this model is the designing of a numerical scheme which is stable for both the Stokes-dominated (high permeability) and the Darcy-dominated (low permeability) equations. In this paper, we solve the Brinkman model in $n$ dimensions ($n = 2, 3$) by using the mixed discontinuous Galerkin (MDG) method, which meets this challenge. This MDG method is based on the pseudostress-velocity formulation and uses a discontinuous piecewise polynomial pair $\underline{\bm{\mathcal{P}}}_{k+1}^{\mathbb{S}}$-$\bm{\mathcal{P}}_k$ $(k \geq 0)$, where the stress field is symmetric. The main unknowns are the pseudostress and the velocity, whereas the pressure is easily recovered through a simple postprocessing. A key step in the analysis is to establish the parameter-robust inf-sup stability through specific parameter-dependent norms at both continuous and discrete levels. Therefore, the stability results presented here are uniform with respect to the permeability. Thanks to the parameter-robust stability analysis, we obtain optimal error estimates for the stress in broken $\underline{\bm{H}}(\bm{\rm div})$-norm and velocity in $\bm{L}^2$-norm. Furthermore, the optimal $\underline{\bm{L}}^2$ error estimate for pseudostress is derived under certain conditions. Finally, numerical experiments are provided to support the theoretical results and to show the robustness, accuracy, and flexibility of the MDG method.

preprint2019arXiv

ExtraOrdinary Gauge Mediation Extension of deflected AMSB

Extraordinary gauge mediation extension of deflected AMSB scenarios can be interesting because it can accommodate together the deflection in the Kahler potential and the superpotential. We revisit the EGM scenario and derive the analytical expressions for soft SUSY breaking parameters in EGM and EGM extension of deflected AMSB scenarios with wavefunction renormalization approach, especially the case with vanishing gauge beta-function at an intermediate energy scale. The Landau pole and proton decay constraints are also discussed.

preprint2019arXiv

Feature Learning Viewpoint of AdaBoost and a New Algorithm

The AdaBoost algorithm has the superiority of resisting overfitting. Understanding the mysteries of this phenomena is a very fascinating fundamental theoretical problem. Many studies are devoted to explaining it from statistical view and margin theory. In this paper, we illustrate it from feature learning viewpoint, and propose the AdaBoost+SVM algorithm, which can explain the resistant to overfitting of AdaBoost directly and easily to understand. Firstly, we adopt the AdaBoost algorithm to learn the base classifiers. Then, instead of directly weighted combination the base classifiers, we regard them as features and input them to SVM classifier. With this, the new coefficient and bias can be obtained, which can be used to construct the final classifier. We explain the rationality of this and illustrate the theorem that when the dimension of these features increases, the performance of SVM would not be worse, which can explain the resistant to overfitting of AdaBoost.

preprint2019arXiv

Non-Majorana Origin of the Half-Quantized Conductance Plateau in Quantum Anomalous Hall Insulator and Superconductor Hybrid Structures

A quantum anomalous Hall (QAH) insulator coupled to an s-wave superconductor is predicted to harbor a topological superconducting phase, the elementary excitations of which (i.e. Majorana fermions) can form topological qubits upon non-Abelian braiding operations. A recent transport experiment interprets the half-quantized two-terminal conductance plateau as the presence of chiral Majorana fermions in a millimeter-size QAH-Nb hybrid structure. However, there are concerns about this interpretation because non-Majorana mechanisms can also generate similar signatures, especially in a disordered QAH system. Here, we fabricated QAH-Nb hybrid structures and studied the QAH-Nb contact transparency and its effect on the corresponding two-terminal conductance. When the QAH film is tuned to the metallic regime by electric gating, we observed a sharp zero-bias enhancement in the differential conductance, up to 80% at zero magnetic field. This large enhancement suggests high probability of Andreev reflection and transparent interface between the magnetic topological insulator (TI) and Nb layers. When the magnetic TI film is in the QAH state with well-aligned magnetization, we found that the two-terminal conductance is always half-quantized. Our experiment provides a comprehensive understanding of the superconducting proximity effect observed in QAH-superconductor hybrid structures and shows that the half-quantized conductance plateau is unlikely to be induced by chiral Majorana fermions.

preprint2019arXiv

Robust Matrix Completion via Maximum Correntropy Criterion and Half Quadratic Optimization

Robust matrix completion aims to recover a low-rank matrix from a subset of noisy entries perturbed by complex noises, where traditional methods for matrix completion may perform poorly due to utilizing $l_2$ error norm in optimization. In this paper, we propose a novel and fast robust matrix completion method based on maximum correntropy criterion (MCC). The correntropy based error measure is utilized instead of using $l_2$-based error norm to improve the robustness to noises. Using the half-quadratic optimization technique, the correntropy based optimization can be transformed to a weighted matrix factorization problem. Then, two efficient algorithms are derived, including alternating minimization based algorithm and alternating gradient descend based algorithm. The proposed algorithms do not need to calculate singular value decomposition (SVD) at each iteration. Further, the adaptive kernel selection strategy is proposed to accelerate the convergence speed as well as improve the performance. Comparison with existing robust matrix completion algorithms is provided by simulations, showing that the new methods can achieve better performance than existing state-of-the-art algorithms.

preprint2019arXiv

Siamese Attentional Keypoint Network for High Performance Visual Tracking

In this paper, we investigate the impacts of three main aspects of visual tracking, i.e., the backbone network, the attentional mechanism, and the detection component, and propose a Siamese Attentional Keypoint Network, dubbed SATIN, for efficient tracking and accurate localization. Firstly, a new Siamese lightweight hourglass network is specially designed for visual tracking. It takes advantage of the benefits of the repeated bottom-up and top-down inference to capture more global and local contextual information at multiple scales. Secondly, a novel cross-attentional module is utilized to leverage both channel-wise and spatial intermediate attentional information, which can enhance both discriminative and localization capabilities of feature maps. Thirdly, a keypoints detection approach is invented to trace any target object by detecting the top-left corner point, the centroid point, and the bottom-right corner point of its bounding box. Therefore, our SATIN tracker not only has a strong capability to learn more effective object representations, but also is computational and memory storage efficiency, either during the training or testing stages. To the best of our knowledge, we are the first to propose this approach. Without bells and whistles, experimental results demonstrate that our approach achieves state-of-the-art performance on several recent benchmark datasets, at a speed far exceeding 27 frames per second.

preprint2018arXiv

The linearized Vlasov and Vlasov-Fokker-Planck equations in a uniform magnetic field

We study the linearized Vlasov equations and the linearized Vlasov-Fokker-Planck equations in the weakly collisional limit in a uniform magnetic field. In both cases, we consider periodic confinement and Maxwellian (or close to Maxwellian) backgrounds. In the collisionless case, for modes transverse to the magnetic field, we provide a precise decomposition into a countably infinite family of standing waves for each spatial mode. These are known as Bernstein modes in the physics literature, though the decomposition is not an obvious consequence of any existing arguments that we are aware of. We show that other modes undergo Landau damping. In the presence of collisions with collision frequency $ν\ll 1$, we show that these modes undergo uniform-in-$ν$ Landau damping and enhanced collisional relaxation at the time-scale $O(ν^{-1/3})$. The modes transverse to the field are uniformly stable and exponentially thermalize on the time-scale $O(ν^{-1})$. Most of the results are proved using Laplace transform analysis of the associated Volterra equations, whereas a simple case of Yan Guo&#39;s energy method for hypocoercivity of collision operators is applied for stability in the collisional case.