Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
53works
0followers
37topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

53 published item(s)

preprint2026arXiv

Retrieving Any Relevant Moments: Benchmark and Models for Generalized Moment Retrieval

Video Moment Retrieval (VMR) aims to localize temporal segments in videos that correspond to a natural language query, but typically assumes only a single matching moment for each query. This assumption does not always hold in real-world scenarios, where queries may correspond to multiple or no moments. Thus, we formulate Generalized Moment Retrieval (GMR), a unified setting that requires retrieving the complete set of relevant moments or predicting an empty set. To enable systematic study of GMR, we introduce Soccer-GMR, a large-scale benchmark built on challenging soccer videos that reflect general GMR scenarios, with realistic negative and positive queries. The benchmark is constructed via a duration-flexible semi-automated pipeline with human verification, enabling scalable data generation while maintaining high annotation quality. We further design a unified evaluation protocol with complementary metrics tailored for null-set rejection, positive-query localization, and end-to-end GMR performance. Finally, we establish strong baselines across two modeling paradigms: a lightweight plug-and-play GMR adapter for discriminative VMR models, and a GMR-tailored GRPO reward for fine-tuning multimodal large language models (MLLMs). Extensive experiments show consistent gains across all metrics and expose key limitations of current methods, positioning GMR as a more realistic and challenging benchmark for video-language understanding.

preprint2025arXiv

Daily Land Surface Temperature Reconstruction in Landsat Cross-Track Areas Using Deep Ensemble Learning With Uncertainty Quantification

Many real-world applications rely on land surface temperature (LST) data at high spatiotemporal resolution. In complex urban areas, LST exhibits significant variations, fluctuating dramatically within and across city blocks. Landsat provides high spatial resolution data at 100 meters but is limited by long revisit time, with cloud cover further disrupting data collection. Here, we propose DELAG, a deep ensemble learning method that integrates annual temperature cycles and Gaussian processes, to reconstruct Landsat LST in complex urban areas. Leveraging the cross-track characteristics and dual-satellite operation of Landsat since 2021, we further enhance data availability to 4 scenes every 16 days. We select New York City, London and Hong Kong from three different continents as study areas. Experiments show that DELAG successfully reconstructed LST in the three cities under clear-sky (RMSE = 0.73-0.96 K) and heavily-cloudy (RMSE = 0.84-1.62 K) situations, superior to existing methods. Additionally, DELAG can quantify uncertainty that enhances LST reconstruction reliability. We further tested the reconstructed LST to estimate near-surface air temperature, achieving results (RMSE = 1.48-2.11 K) comparable to those derived from clear-sky LST (RMSE = 1.63-2.02 K). The results demonstrate the successful reconstruction through DELAG and highlight the broader applications of LST reconstruction for estimating accurate air temperature. Our study thus provides a novel and practical method for Landsat LST reconstruction, particularly suited for complex urban areas within Landsat cross-track areas, taking one step toward addressing complex climate events at high spatiotemporal resolution. Code and data are available at https://skrisliu.com/delag

preprint2023arXiv

Fixed-Domain Asymptotics Under Vecchia's Approximation of Spatial Process Likelihoods

Statistical modeling for massive spatial data sets has generated a substantial literature on scalable spatial processes based upon Vecchia's approximation. Vecchia's approximation for Gaussian process models enables fast evaluation of the likelihood by restricting dependencies at a location to its neighbors. We establish inferential properties of microergodic spatial covariance parameters within the paradigm of fixed-domain asymptotics when they are estimated using Vecchia's approximation. The conditions required to formally establish these properties are explored, theoretically and empirically, and the effectiveness of Vecchia's approximation is further corroborated from the standpoint of fixed-domain asymptotics.

preprint2022arXiv

A discontinuous Galerkin method for nonlinear biharmonic Schrödinger equations

This paper proposes and analyzes a fully discrete scheme that discretizes space with an ultra-weak local discontinuous Galerkin scheme and time with the Crank--Nicolson method for the nonlinear biharmonic Schrödinger equation. We first rewrite the problem into a system with a second-order spatial derivative and then apply the ultra-weak discontinuous Galerkin method to the system. The proposed scheme is more computationally efficient compared with the local discontinuous Galerkin method because of fewer auxiliary variables, and unconditionally stable without any penalty terms; it also preserves the mass and Hamiltonian conservation that are important properties of the nonlinear biharmonic Schrödinger equation. We also derive optimal L2-error estimates of the semi-discrete scheme that measure both the solution and the auxiliary variable with general nonlinear terms. Several numerical studies demonstrate and support our theoretical findings.

preprint2022arXiv

A high order finite difference method for the elastic wave equation in bounded domains with nonconforming interfaces

We develop a stable finite difference method for the elastic wave equation in bounded media, where the material properties can be discontinuous at curved interfaces. The governing equation is discretized in second order form by a fourth or sixth order accurate summation-by-parts operator. The mesh size is determined by the velocity structure of the material, resulting in nonconforming grid interfaces with hanging nodes. We use order-preserving interpolation and the ghost point technique to couple adjacent mesh blocks in an energy-conserving manner, which is supported by a fully discrete stability analysis. In our previous work for the wave equation, two pairs of order-preserving interpolation operators are needed when imposing the interface conditions weakly by a penalty technique. Here, we only use one pair in the ghost point method. In numerical experiments, we demonstrate that the convergence rate is optimal, and is the same as when a globally uniform mesh is used in a single domain. In addition, with a predictor-corrector time integration method, we obtain time stepping stability with stepsize almost the same as given by the usual Courant-Friedrichs-Lewy condition.

preprint2022arXiv

A local energy-based discontinuous Galerkin method for fourth order semilinear wave equations

This paper generalizes the earlier work on the energy-based discontinuous Galerkin method for second-order wave equations to fourth-order semilinear wave equations. We first rewrite the problem into a system with a second-order spatial derivative, then apply the energy-based discontinuous Galerkin method to the system. The proposed scheme, on the one hand, is more computationally efficient compared with the local discontinuous Galerkin method because of fewer auxiliary variables. On the other hand, it is unconditionally stable without adding any penalty terms, and admits optimal convergence in the $L^2$ norm for both solution and auxiliary variables. In addition, the energy-dissipating or energy-conserving property of the scheme follows from simple, mesh-independent choices of the interelement fluxes. We also present a stability and convergence analysis along with numerical experiments to demonstrate optimal convergence for certain choices of the interelement fluxes.

preprint2022arXiv

A Syntax-Guided Edit Decoder for Neural Program Repair

Automated Program Repair (APR) helps improve the efficiency of software development and maintenance. Recent APR techniques use deep learning, particularly the encoder-decoder architecture, to generate patches. Though existing DL-based APR approaches have proposed different encoder architectures, the decoder remains to be the standard one, which generates a sequence of tokens one by one to replace the faulty statement. This decoder has multiple limitations: 1) allowing to generate syntactically incorrect programs, 2) inefficiently representing small edits, and 3) not being able to generate project-specific identifiers. In this paper, we propose Recoder, a syntax-guided edit decoder with placeholder generation. Recoder is novel in multiple aspects: 1) Recoder generates edits rather than modified code, allowing efficient representation of small edits; 2) Recoder is syntax-guided, with the novel provider/decider architecture to ensure the syntactic correctness of the patched program and accurate generation; 3) Recoder generates placeholders that could be instantiated as project-specific identifiers later. We conduct experiments to evaluate Recoder on 395 bugs from Defects4J v1.2 and 420 additional bugs from Defects4J v2.0. Our results show that Recoder repairs 53 bugs on Defects4J v1.2, which achieves 21.4% improvement over the previous state-of-the-art approach for single-hunk bugs (TBar). Importantly, to our knowledge, Recoder is the first DL-based APR approach that has outperformed the traditional APR approaches on this dataset. Furthermore, Recoder also repairs 19 bugs on the additional bugs from Defects4J v2.0, which is 137.5% more than TBar (8 bugs) and 850% more than SimFix (2 bugs). This result suggests that Recoder has better generalizability than existing APR approaches.

preprint2022arXiv

A Unified and Biologically-Plausible Relational Graph Representation of Vision Transformers

Vision transformer (ViT) and its variants have achieved remarkable successes in various visual tasks. The key characteristic of these ViT models is to adopt different aggregation strategies of spatial patch information within the artificial neural networks (ANNs). However, there is still a key lack of unified representation of different ViT architectures for systematic understanding and assessment of model representation performance. Moreover, how those well-performing ViT ANNs are similar to real biological neural networks (BNNs) is largely unexplored. To answer these fundamental questions, we, for the first time, propose a unified and biologically-plausible relational graph representation of ViT models. Specifically, the proposed relational graph representation consists of two key sub-graphs: aggregation graph and affine graph. The former one considers ViT tokens as nodes and describes their spatial interaction, while the latter one regards network channels as nodes and reflects the information communication between channels. Using this unified relational graph representation, we found that: a) a sweet spot of the aggregation graph leads to ViTs with significantly improved predictive performance; b) the graph measures of clustering coefficient and average path length are two effective indicators of model prediction performance, especially when applying on the datasets with small samples; c) our findings are consistent across various ViT architectures and multiple datasets; d) the proposed relational graph representation of ViT has high similarity with real BNNs derived from brain science data. Overall, our work provides a novel unified and biologically-plausible paradigm for more interpretable and effective representation of ViT ANNs.

preprint2022arXiv

Achieving Long-Term Fairness in Sequential Decision Making

In this paper, we propose a framework for achieving long-term fair sequential decision making. By conducting both the hard and soft interventions, we propose to take path-specific effects on the time-lagged causal graph as a quantitative tool for measuring long-term fairness. The problem of fair sequential decision making is then formulated as a constrained optimization problem with the utility as the objective and the long-term and short-term fairness as constraints. We show that such an optimization problem can be converted to a performative risk optimization. Finally, repeated risk minimization (RRM) is used for model training, and the convergence of RRM is theoretically analyzed. The empirical evaluation shows the effectiveness of the proposed algorithm on synthetic and semi-synthetic temporal datasets.

preprint2022arXiv

AGA: An Accelerated Greedy Additional Algorithm for Test Case Prioritization

In recent years, many test case prioritization (TCP) techniques have been proposed to speed up the process of fault detection. However, little work has taken the efficiency problem of these techniques into account. In this paper, we target the Greedy Additional (GA) algorithm, which has been widely recognized to be effective but less efficient, and try to improve its efficiency while preserving effectiveness. In our Accelerated GA (AGA) algorithm, we use some extra data structures to reduce redundant data accesses in the GA algorithm and thus the time complexity is reduced from $\mathcal{O}(m^2n)$ to $\mathcal{O}(kmn)$ when $n > m$, where $m$ is the number of test cases, $n$ is the number of program elements, and $k$ is the iteration number. Moreover, we observe the impact of iteration numbers on prioritization efficiency on our dataset and propose to use a specific iteration number in the AGA algorithm to further improve the efficiency. We conducted experiments on 55 open-source subjects. In particular, we implemented each TCP algorithm with two kinds of widely-used input formats, adjacency matrix and adjacency list. Since a TCP algorithm with adjacency matrix is less efficient than the algorithm with adjacency list, the result analysis is mainly conducted based on TCP algorithms with adjacency list. The results show that AGA achieves 5.95X speedup ratio over GA on average, while it achieves the same average effectiveness as GA in terms of Average Percentage of Fault Detected (APFD). Moreover, we conducted an industrial case study on 22 subjects, collected from Baidu, and find that the average speedup ratio of AGA over GA is 44.27X, which indicates the practical usage of AGA in real-world scenarios.

preprint2022arXiv

An Energy-Based Discontinuous Galerkin Method with Tame CFL Numbers for the Wave Equation

We extend and analyze the energy-based discontinuous Galerkin method for second order wave equations on staggered and structured meshes. By combining spatial staggering with local time-stepping near boundaries, the method overcomes the typical numerical stiffness associated with high order piecewise polynomial approximations. In one space dimension with periodic boundary conditions and suitably chosen numerical fluxes, we prove bounds on the spatial operators that establish stability for CFL numbers $c \frac {Δt}{h} < C$ independent of order when stability-enhanced explicit time-stepping schemes of matching order are used. For problems on bounded domains and in higher dimensions we demonstrate numerically that one can march explicitly with large time steps at high order temporal and spatial accuracy.

preprint2022arXiv

Analysis on the composite nature of the light scalar mesons $f_{0}(980)$ and $a_0(980)$

We study the weight or compositeness of the $ππ$-$K\bar{K}$ and $πη$-$K\bar{K}$ in the composition of the $f_0(980)$ and $a_0(980)$ resonances, respectively. Either we use the saturation of the total width and compositeness, or we use a Flatté parameterization taking also into account the spectral function of a near-threshold resonance. We make connections and compare between these two methods. We take input values for the pole mass and width from several determinations in the literature. In addition, we take as third input either the total compositeness or the decay-width branching ratio to the lighter channel for each resonance. It turns out that for the poles considered the meson-meson components are dominant for the $f_0(980)$, while for the $a_0(980)$ resonance they are subdominant. We also provide partial decay widths and partial compositeness coefficients, so that the $K\bar{K}$ component is the most important one for the $f_0(980)$. Additionally, this study stresses the need to distinguish between the bare and dressed couplings and widths in a Flatté parameterization. We elaborate on the connection between the partial-decay widths calculated in terms of the dressed couplings and the actual measured ones. Due to the coupled-channel dynamics when the pole lies near the heavier threshold in the second Riemann sheet some changes are needed with respect to standard relations.

preprint2022arXiv

Automatically Discovering Novel Visual Categories with Self-supervised Prototype Learning

This paper tackles the problem of novel category discovery (NCD), which aims to discriminate unknown categories in large-scale image collections. The NCD task is challenging due to the closeness to the real-world scenarios, where we have only encountered some partial classes and images. Unlike other works on the NCD, we leverage the prototypes to emphasize the importance of category discrimination and alleviate the issue of missing annotations of novel classes. Concretely, we propose a novel adaptive prototype learning method consisting of two main stages: prototypical representation learning and prototypical self-training. In the first stage, we obtain a robust feature extractor, which could serve for all images with base and novel categories. This ability of instance and category discrimination of the feature extractor is boosted by self-supervised learning and adaptive prototypes. In the second stage, we utilize the prototypes again to rectify offline pseudo labels and train a final parametric classifier for category clustering. We conduct extensive experiments on four benchmark datasets and demonstrate the effectiveness and robustness of the proposed method with state-of-the-art performance.

preprint2022arXiv

Composite nature of $Z_b$ states from data analysis

We use a near-threshold parameterization with explicit inclusion of the Castillejo-Dalitz-Dyson poles, which is more general than the effective range expansion, to study the bottomonium-like states $Z_b(10610)$ and $Z_b(10650)$. In terms of the partial-wave amplitude, we fit the event number distribution of $B^{(*)}\bar B^*$ system to the experimental data for these resonances from Belle Collaboration. The data could be described very well in our method, which supports the molecular interpretation. Then the relevant physical quantities are obtained, including the $B^{(*)}\bar{B}^*$ scattering length ($a$), effective range ($r$), and residue squared ($γ_s^2$) of the pole in the complex plane. In particular, we find the compositeness can range from about 0.4 up to 1 for the $B\bar B^*$ ($B^*\bar B^*$) component in the resonance $Z_b(10610)$ ($Z_b(10650)$).

preprint2022arXiv

Coupling Visual Semantics of Artificial Neural Networks and Human Brain Function via Synchronized Activations

Artificial neural networks (ANNs), originally inspired by biological neural networks (BNNs), have achieved remarkable successes in many tasks such as visual representation learning. However, whether there exists semantic correlations/connections between the visual representations in ANNs and those in BNNs remains largely unexplored due to both the lack of an effective tool to link and couple two different domains, and the lack of a general and effective framework of representing the visual semantics in BNNs such as human functional brain networks (FBNs). To answer this question, we propose a novel computational framework, Synchronized Activations (Sync-ACT), to couple the visual representation spaces and semantics between ANNs and BNNs in human brain based on naturalistic functional magnetic resonance imaging (nfMRI) data. With this approach, we are able to semantically annotate the neurons in ANNs with biologically meaningful description derived from human brain imaging for the first time. We evaluated the Sync-ACT framework on two publicly available movie-watching nfMRI datasets. The experiments demonstrate a) the significant correlation and similarity of the semantics between the visual representations in FBNs and those in a variety of convolutional neural networks (CNNs) models; b) the close relationship between CNN&#39;s visual representation similarity to BNNs and its performance in image classification tasks. Overall, our study introduces a general and effective paradigm to couple the ANNs and BNNs and provides novel insights for future studies such as brain-inspired artificial intelligence.

preprint2022arXiv

Disentangling Spatial-Temporal Functional Brain Networks via Twin-Transformers

How to identify and characterize functional brain networks (BN) is fundamental to gain system-level insights into the mechanisms of brain organizational architecture. Current functional magnetic resonance (fMRI) analysis highly relies on prior knowledge of specific patterns in either spatial (e.g., resting-state network) or temporal (e.g., task stimulus) domain. In addition, most approaches aim to find group-wise common functional networks, individual-specific functional networks have been rarely studied. In this work, we propose a novel Twin-Transformers framework to simultaneously infer common and individual functional networks in both spatial and temporal space, in a self-supervised manner. The first transformer takes space-divided information as input and generates spatial features, while the second transformer takes time-related information as input and outputs temporal features. The spatial and temporal features are further separated into common and individual ones via interactions (weights sharing) and constraints between the two transformers. We applied our TwinTransformers to Human Connectome Project (HCP) motor task-fMRI dataset and identified multiple common brain networks, including both task-related and resting-state networks (e.g., default mode network). Interestingly, we also successfully recovered a set of individual-specific networks that are not related to task stimulus and only exist at the individual level.

preprint2022arXiv

DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations

AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making the search for new pharmaceuticals quicker, cheaper and more efficient. In spite of its extensive use in many fields, such as ADMET prediction, virtual screening, protein folding and generative chemistry, little has been explored in terms of the out-of-distribution (OOD) learning problem with \emph{noise}, which is inevitable in real world AIDD applications. In this work, we present DrugOOD, a systematic OOD dataset curator and benchmark for AI-aided drug discovery, which comes with an open-source Python package that fully automates the data curation and OOD benchmarking processes. We focus on one of the most crucial problems in AIDD: drug target binding affinity prediction, which involves both macromolecule (protein target) and small-molecule (drug compound). In contrast to only providing fixed datasets, DrugOOD offers automated dataset curator with user-friendly customization scripts, rich domain annotations aligned with biochemistry knowledge, realistic noise annotations and rigorous benchmarking of state-of-the-art OOD algorithms. Since the molecular data is often modeled as irregular graphs using graph neural network (GNN) backbones, DrugOOD also serves as a valuable testbed for \emph{graph OOD learning} problems. Extensive empirical studies have shown a significant performance gap between in-distribution and out-of-distribution experiments, which highlights the need to develop better schemes that can allow for OOD generalization under noise for AIDD.

preprint2022arXiv

EPiDA: An Easy Plug-in Data Augmentation Framework for High Performance Text Classification

Recent works have empirically shown the effectiveness of data augmentation (DA) in NLP tasks, especially for those suffering from data scarcity. Intuitively, given the size of generated data, their diversity and quality are crucial to the performance of targeted tasks. However, to the best of our knowledge, most existing methods consider only either the diversity or the quality of augmented data, thus cannot fully mine the potential of DA for NLP. In this paper, we present an easy and plug-in data augmentation framework EPiDA to support effective text classification. EPiDA employs two mechanisms: relative entropy maximization (REM) and conditional entropy minimization (CEM) to control data generation, where REM is designed to enhance the diversity of augmented data while CEM is exploited to ensure their semantic consistency. EPiDA can support efficient and continuous data generation for effective classifier training. Extensive experiments show that EPiDA outperforms existing SOTA methods in most cases, though not using any agent networks or pre-trained generation networks, and it works well with various DA algorithms and classification models. Code is available at https://github.com/zhaominyiz/EPiDA.

preprint2022arXiv

Eye-gaze-guided Vision Transformer for Rectifying Shortcut Learning

Learning harmful shortcuts such as spurious correlations and biases prevents deep neural networks from learning the meaningful and useful representations, thus jeopardizing the generalizability and interpretability of the learned representation. The situation becomes even more serious in medical imaging, where the clinical data (e.g., MR images with pathology) are limited and scarce while the reliability, generalizability and transparency of the learned model are highly required. To address this problem, we propose to infuse human experts&#39; intelligence and domain knowledge into the training of deep neural networks. The core idea is that we infuse the visual attention information from expert radiologists to proactively guide the deep model to focus on regions with potential pathology and avoid being trapped in learning harmful shortcuts. To do so, we propose a novel eye-gaze-guided vision transformer (EG-ViT) for diagnosis with limited medical image data. We mask the input image patches that are out of the radiologists&#39; interest and add an additional residual connection in the last encoder layer of EG-ViT to maintain the correlations of all patches. The experiments on two public datasets of INbreast and SIIM-ACR demonstrate our EG-ViT model can effectively learn/transfer experts&#39; domain knowledge and achieve much better performance than baselines. Meanwhile, it successfully rectifies the harmful shortcut learning and significantly improves the EG-ViT model&#39;s interpretability. In general, EG-ViT takes the advantages of both human expert&#39;s prior knowledge and the power of deep neural networks. This work opens new avenues for advancing current artificial intelligence paradigms by infusing human intelligence.

preprint2022arXiv

FB-MSTCN: A Full-Band Single-Channel Speech Enhancement Method Based on Multi-Scale Temporal Convolutional Network

In recent years, deep learning-based approaches have significantly improved the performance of single-channel speech enhancement. However, due to the limitation of training data and computational complexity, real-time enhancement of full-band (48 kHz) speech signals is still very challenging. Because of the low energy of spectral information in the high-frequency part, it is more difficult to directly model and enhance the full-band spectrum using neural networks. To solve this problem, this paper proposes a two-stage real-time speech enhancement model with extraction-interpolation mechanism for a full-band signal. The 48 kHz full-band time-domain signal is divided into three sub-channels by extracting, and a two-stage processing scheme of `masking + compensation&#39; is proposed to enhance the signal in the complex domain. After the two-stage enhancement, the enhanced full-band speech signal is restored by interval interpolation. In the subjective listening and word accuracy test, our proposed model achieves superior performance and outperforms the baseline model overall by 0.59 MOS and 4.0% WAcc for the non-personalized speech denoising task.

preprint2022arXiv

Floodgate: inference for model-free variable importance

Many modern applications seek to understand the relationship between an outcome variable $Y$ and a covariate $X$ in the presence of a (possibly high-dimensional) confounding variable $Z$. Although much attention has been paid to testing \emph{whether} $Y$ depends on $X$ given $Z$, in this paper we seek to go beyond testing by inferring the \emph{strength} of that dependence. We first define our estimand, the minimum mean squared error (mMSE) gap, which quantifies the conditional relationship between $Y$ and $X$ in a way that is deterministic, model-free, interpretable, and sensitive to nonlinearities and interactions. We then propose a new inferential approach called \emph{floodgate} that can leverage any working regression function chosen by the user (allowing, e.g., it to be fitted by a state-of-the-art machine learning algorithm or be derived from qualitative domain knowledge) to construct asymptotic confidence bounds, and we apply it to the mMSE gap. \acc{We additionally show that floodgate&#39;s accuracy (distance from confidence bound to estimand) is adaptive to the error of the working regression function.} We then show we can apply the same floodgate principle to a different measure of variable importance when $Y$ is binary. Finally, we demonstrate floodgate&#39;s performance in a series of simulations and apply it to data from the UK Biobank to infer the strengths of dependence of platelet count on various groups of genetic mutations.

preprint2022arXiv

GDsmith: Detecting Bugs in Graph Database Engines

Graph database engines stand out in the era of big data for their efficiency of modeling and processing linked data. There is a strong need of testing graph database engines. However, random testing, the most practical way of automated test generation, faces the challenges of semantic validity, non-empty result, and behavior diversity to detect bugs in graph database engines. To address these challenges, in this paper, we propose GDsmith, the first black-box approach for testing graph database engines. It ensures that each randomly generated Cypher query satisfies the semantic requirements via skeleton generation and completion. GDsmith includes our technique to increase the probability of producing Cypher queries that return non-empty results by leveraging three types of structural mutation strategies. GDsmith also includes our technique to improve the behavior diversity of the generated Cypher queries by selecting property keys according to their previous frequencies when generating new queries. Our evaluation results demonstrate that GDsmith is effective and efficient for automated query generation and substantially outperforms the baseline. GDsmith successfully detects 27 previously unknown bugs on the released versions of three popular open-source graph database engines and receive positive feedback from their developers.

preprint2022arXiv

Generalized Equivariance and Preferential Labeling for GNN Node Classification

Existing graph neural networks (GNNs) largely rely on node embeddings, which represent a node as a vector by its identity, type, or content. However, graphs with unattributed nodes widely exist in real-world applications (e.g., anonymized social networks). Previous GNNs either assign random labels to nodes (which introduces artefacts to the GNN) or assign one embedding to all nodes (which fails to explicitly distinguish one node from another). Further, when these GNNs are applied to unattributed node classification problems, they have an undesired equivariance property, which are fundamentally unable to address the data with multiple possible outputs. In this paper, we analyze the limitation of existing approaches to node classification problems. Inspired by our analysis, we propose a generalized equivariance property and a Preferential Labeling technique that satisfies the desired property asymptotically. Experimental results show that we achieve high performance in several unattributed node classification tasks.

preprint2022arXiv

Hyperspectral Imaging for cherry tomato

Cherry tomato (Solanum Lycopersicum) is popular with consumers over the world due to its special flavor. Soluble solids content (SSC) and firmness are two key metrics for evaluating the product qualities. In this work, we develop non-destructive testing techniques for SSC and fruit firmness based on hyperspectral images and a corresponding deep learning regression model. Hyperspectral reflectance images of over 200 tomato fruits are derived with spectrum ranging from 400 to 1000 nm. The acquired hyperspectral images are corrected and the spectral information is extracted. A novel one-dimensional(1D) convolutional ResNet (Con1dResNet) based regression model is prosed and compared with the state of art techniques. Experimental results show that, with a relatively large number of samples our technique is 26.4\% better than state of art technique for SSC and 33.7\% for firmness. The results of this study indicate the application potential of hyperspectral imaging technique in the SSC and firmness detection, which provides a new option for non-destructive testing of cherry tomato fruit quality in the future.

preprint2022arXiv

Intra-Modal Constraint Loss For Image-Text Retrieval

Cross-modal retrieval has drawn much attention in both computer vision and natural language processing domains. With the development of convolutional and recurrent neural networks, the bottleneck of retrieval across image-text modalities is no longer the extraction of image and text features but an efficient loss function learning in embedding space. Many loss functions try to closer pairwise features from heterogeneous modalities. This paper proposes a method for learning joint embedding of images and texts using an intra-modal constraint loss function to reduce the violation of negative pairs from the same homogeneous modality. Experimental results show that our approach outperforms state-of-the-art bi-directional image-text retrieval methods on Flickr30K and Microsoft COCO datasets. Our code is publicly available: https://github.com/CanonChen/IMC.

preprint2022arXiv

Mask-guided Vision Transformer (MG-ViT) for Few-Shot Learning

Learning with little data is challenging but often inevitable in various application scenarios where the labeled data is limited and costly. Recently, few-shot learning (FSL) gained increasing attention because of its generalizability of prior knowledge to new tasks that contain only a few samples. However, for data-intensive models such as vision transformer (ViT), current fine-tuning based FSL approaches are inefficient in knowledge generalization and thus degenerate the downstream task performances. In this paper, we propose a novel mask-guided vision transformer (MG-ViT) to achieve an effective and efficient FSL on ViT model. The key idea is to apply a mask on image patches to screen out the task-irrelevant ones and to guide the ViT to focus on task-relevant and discriminative patches during FSL. Particularly, MG-ViT only introduces an additional mask operation and a residual connection, enabling the inheritance of parameters from pre-trained ViT without any other cost. To optimally select representative few-shot samples, we also include an active learning based sample selection method to further improve the generalizability of MG-ViT based FSL. We evaluate the proposed MG-ViT on both Agri-ImageNet classification task and ACFR apple detection task with gradient-weighted class activation mapping (Grad-CAM) as the mask. The experimental results show that the MG-ViT model significantly improves the performance when compared with general fine-tuning based ViT models, providing novel insights and a concrete approach towards generalizing data-intensive and large-scale deep learning models for FSL.

preprint2022arXiv

MetaNOR: A Meta-Learnt Nonlocal Operator Regression Approach for Metamaterial Modeling

We propose MetaNOR, a meta-learnt approach for transfer-learning operators based on the nonlocal operator regression. The overall goal is to efficiently provide surrogate models for new and unknown material-learning tasks with different microstructures. The algorithm consists of two phases: (1) learning a common nonlocal kernel representation from existing tasks; (2) transferring the learned knowledge and rapidly learning surrogate operators for unseen tasks with a different material, where only a few test samples are required. We apply MetaNOR to model the wave propagation within 1D metamaterials, showing substantial improvements on the sampling efficiency for new materials.

preprint2022arXiv

On the hyper-singular boundary integral equation methods for dynamic poroelasticity: three dimensional case

In our previous work [SIAM J. Sci. Comput. 43(3) (2021) B784-B810], an accurate hyper-singular boundary integral equation method for dynamic poroelasticity in two dimensions has been developed. This work is devoted to studying the more complex and difficult three-dimensional problems with Neumann boundary condition and both the direct and indirect methods are adopted to construct combined boundary integral equations. The strongly-singular and hyper-singular integral operators are reformulated into compositions of weakly-singular integral operators and tangential-derivative operators, which allow us to prove the jump relations associated with the poroelastic layer potentials and boundary integral operators in a simple manner. Relying on both the investigated spectral properties of the strongly-singular operators, which indicate that the corresponding eigenvalues accumulate at three points whose values are only dependent on two Lamé constants, and the spectral properties of the Calderón relations of the poroelasticity, we propose low-GMRES-iteration regularized integral equations. Numerical examples are presented to demonstrate the accuracy and efficiency of the proposed methodology by means of a Chebyshev-based rectangular-polar solver.

preprint2022arXiv

Pathfinder: Parallel quasi-Newton variational inference

We propose Pathfinder, a variational method for approximately sampling from differentiable log densities. Starting from a random initialization, Pathfinder locates normal approximations to the target density along a quasi-Newton optimization path, with local covariance estimated using the inverse Hessian estimates produced by the optimizer. Pathfinder returns draws from the approximation with the lowest estimated Kullback-Leibler (KL) divergence to the true posterior. We evaluate Pathfinder on a wide range of posterior distributions, demonstrating that its approximate draws are better than those from automatic differentiation variational inference (ADVI) and comparable to those produced by short chains of dynamic Hamiltonian Monte Carlo (HMC), as measured by 1-Wasserstein distance. Compared to ADVI and short dynamic HMC runs, Pathfinder requires one to two orders of magnitude fewer log density and gradient evaluations, with greater reductions for more challenging posteriors. Importance resampling over multiple runs of Pathfinder improves the diversity of approximate draws, reducing 1-Wasserstein distance further and providing a measure of robustness to optimization failures on plateaus, saddle points, or in minor modes. The Monte Carlo KL divergence estimates are embarrassingly parallelizable in the core Pathfinder algorithm, as are multiple runs in the resampling version, further increasing Pathfinder&#39;s speed advantage with multiple cores.

preprint2022arXiv

Phase transition of eigenvalues in deformed Ginibre ensembles

Consider a random matrix of size $N$ as an additive deformation of the complex Ginibre ensemble under a deterministic matrix $X_0$ with a finite rank, independent of $N$. When some eigenvalues of $X_0$ separate from the unit disk, outlier eigenvalues may appear asymptotically in the same locations, and their fluctuations exhibit surprising phenomena that highly depend on the Jordan canonical form of $X_0$. These findings are largely due to Benaych-Georges and Rochet \cite{BR}, Bordenave and Capitaine \cite{BC16}, and Tao \cite{Ta13}. When all eigenvalues of $X_0$ lie inside the unit disk, we prove that local eigenvalue statistics at the spectral edge form a new class of determinantal point processes, for which correlation kernels are characterized in terms of the repeated erfc integrals. This thus completes a non-Hermitian analogue of the BBP phase transition in Random Matrix Theory. Similar results hold for the deformed real quaternion Ginibre ensemble.

preprint2022arXiv

Representing Brain Anatomical Regularity and Variability by Few-Shot Embedding

Effective representation of brain anatomical architecture is fundamental in understanding brain regularity and variability. Despite numerous efforts, it is still difficult to infer reliable anatomical correspondence at finer scale, given the tremendous individual variability in cortical folding patterns. It is even more challenging to disentangle common and individual patterns when comparing brains at different neuro-developmental stages. In this work, we developed a novel learning-based few-shot embedding framework to encode the cortical folding patterns into a latent space represented by a group of anatomically meaningful embedding vectors. Specifically, we adopted 3-hinge (3HG) network as the substrate and designed an autoencoder-based embedding framework to learn a common embedding vector for each 3HG&#39;s multi-hop feature: each 3HG can be represented as a combination of these feature embeddings via a set of individual specific coefficients to characterize individualized anatomical information. That is, the regularity of folding patterns is encoded into the embeddings, while the individual variations are preserved by the multi=hop combination coefficients. To effectively learn the embeddings for the population with very limited samples, few-shot learning was adopted. We applied our method on adult HCP and pediatric datasets with 1,000+ brains (from 34 gestational weeks to young adult). Our experimental results show that: 1) the learned embedding vectors can quantitatively encode the commonality and individuality of cortical folding patterns; 2) with the embeddings we can robustly infer the complicated many-to-many anatomical correspondences among different brains and 3) our model can be successfully transferred to new populations with very limited training samples.

preprint2022arXiv

Revisiting Linearized Bregman Iterations under Lipschitz-like Convexity Condition

The linearized Bregman iterations (LBreI) and its variants have received considerable attention in signal/image processing and compressed sensing. Recently, LBreI has been extended to a larger class of nonconvex functions, along with several theoretical issues left for further investigation. In particular, the gradient Lipschitz continuity assumption precludes its use in many practical applications. In this study, we propose a generalized algorithmic framework to unify LBreI-type methods. Our main discovery is that the gradient Lipschitz continuity assumption can be replaced by a Lipschitz-like convexity condition in both convex and nonconvex cases. The proposed framework and theory are then applied to linear/quadratic inverse problems.

preprint2022arXiv

Series Photo Selection via Multi-view Graph Learning

Series photo selection (SPS) is an important branch of the image aesthetics quality assessment, which focuses on finding the best one from a series of nearly identical photos. While a great progress has been observed, most of the existing SPS approaches concentrate solely on extracting features from the original image, neglecting that multiple views, e.g, saturation level, color histogram and depth of field of the image, will be of benefit to successfully reflecting the subtle aesthetic changes. Taken multi-view into consideration, we leverage a graph neural network to construct the relationships between multi-view features. Besides, multiple views are aggregated with an adaptive-weight self-attention module to verify the significance of each view. Finally, a siamese network is proposed to select the best one from a series of nearly identical photos. Experimental results demonstrate that our model accomplish the highest success rates compared with competitive methods.

preprint2022arXiv

Stability analysis of the Tsallis holographic dark energy model

Using the generalized Tsallis entropy, the Tsallis holographic dark energy(THDE) was proposed recently. In this paper we analyze the cosmological consequences of the THDE model with an interaction between dark energy and dark matter $Q=H(αρ_{m}+βρ_{D})$. We find that the THDE model can explain the current accelerated cosmic expansion, and it is stable under certain conditions. Furthermore, through investigating the dynamical analysis, we find that there exists an attractor which represents an accelerated expansion phase of the universe. When $β=0$, this attractor corresponds to a dark energy dominated de Sitter solution and the universe can evolve into an era which is depicted by the $Λ$CDM model. The age of universe in this model is also explored.

preprint2022arXiv

Taming Hybrid-Cloud Fast and Scalable Graph Analytics at Twitter

We have witnessed a boosted demand for graph analytics at Twitter in recent years, and graph analytics has become one of the key parts of Twitter&#39;s large-scale data analytics and machine learning for driving engagement, serving the most relevant content, and promoting healthier conversations. However, infrastructure for graph analytics has historically not been an area of investment at Twitter, resulting in a long timeline and huge engineering effort for each project to deal with graphs at the Twitter scale. How do we build a unified graph analytics user experience to fulfill modern data analytics on various graph scales spanning from thousands to hundreds of billions of vertices and edges? To bring fast and scalable graph analytics capability into production, we investigate the challenges we are facing in large-scale graph analytics at Twitter and propose a unified graph analytics platform for efficient, scalable, and reliable graph analytics across on-premises and cloud, to fulfill the requirements of diverse graph use cases and challenging scales. We also conduct quantitative benchmarking on Twitter&#39;s production-level graph use cases between popular graph analytics frameworks to certify our solution.

preprint2022arXiv

Trajectory Prediction with Graph-based Dual-scale Context Fusion

Motion prediction for traffic participants is essential for a safe and robust automated driving system, especially in cluttered urban environments. However, it is highly challenging due to the complex road topology as well as the uncertain intentions of the other agents. In this paper, we present a graph-based trajectory prediction network named the Dual Scale Predictor (DSP), which encodes both the static and dynamical driving context in a hierarchical manner. Different from methods based on a rasterized map or sparse lane graph, we consider the driving context as a graph with two layers, focusing on both geometrical and topological features. Graph neural networks (GNNs) are applied to extract features with different levels of granularity, and features are subsequently aggregated with attention-based inter-layer networks, realizing better local-global feature fusion. Following the recent goal-driven trajectory prediction pipeline, goal candidates with high likelihood for the target agent are extracted, and predicted trajectories are generated conditioned on these goals. Thanks to the proposed dual-scale context fusion network, our DSP is able to generate accurate and human-like multi-modal trajectories. We evaluate the proposed method on the large-scale Argoverse motion forecasting benchmark, and it achieves promising results, outperforming the recent state-of-the-art methods.

preprint2022arXiv

Weakly Aligned Feature Fusion for Multimodal Object Detection

To achieve accurate and robust object detection in the real-world scenario, various forms of images are incorporated, such as color, thermal, and depth. However, multimodal data often suffer from the position shift problem, i.e., the image pair is not strictly aligned, making one object has different positions in different modalities. For the deep learning method, this problem makes it difficult to fuse multimodal features and puzzles the convolutional neural network (CNN) training. In this article, we propose a general multimodal detector named aligned region CNN (AR-CNN) to tackle the position shift problem. First, a region feature (RF) alignment module with adjacent similarity constraint is designed to consistently predict the position shift between two modalities and adaptively align the cross-modal RFs. Second, we propose a novel region of interest (RoI) jitter strategy to improve the robustness to unexpected shift patterns. Third, we present a new multimodal feature fusion method that selects the more reliable feature and suppresses the less useful one via feature reweighting. In addition, by locating bounding boxes in both modalities and building their relationships, we provide novel multimodal labeling named KAIST-Paired. Extensive experiments on 2-D and 3-D object detection, RGB-T, and RGB-D datasets demonstrate the effectiveness and robustness of our method.

preprint2021arXiv

Adaptively Sketched Bregman Projection Methods for Linear Systems

The sketch-and-project, as a general archetypal algorithm for solving linear systems, unifies a variety of randomized iterative methods such as the randomized Kaczmarz and randomized coordinate descent. However, since it aims to find a least-norm solution from a linear system, the randomized sparse Kaczmarz can not be included. This motivates us to propose a more general framework, called sketched Bregman projection (SBP) method, in which we are able to find solutions with certain structures from linear systems. To generalize the concept of adaptive sampling to the SBP method, we show how the progress, measured by Bregman distance, of single step depends directly on a sketched loss function. Theoretically, we provide detailed global convergence results for the SBP method with different adaptive sampling rules. At last, for the (sparse) Kaczmarz methods, a group of numerical simulations are tested, with which we verify that the methods utilizing sampling Kaczmarz-Motzkin rule demands the fewest computational costs to achieve a given error bound comparing to the corresponding methods with other sampling rules.

preprint2021arXiv

Extracting Concise Bug-Fixing Patches from Human-Written Patches in Version Control Systems

High-quality and large-scale repositories of real bugs and their concise patches collected from real-world applications are critical for research in software engineering community. In such a repository, each real bug is explicitly associated with its fix. Therefore, on one side, the real bugs and their fixes} may inspire novel approaches for finding, locating, and repairing software bugs; on the other side, the real bugs and their fixes are indispensable for rigorous and meaningful evaluation of approaches for software testing, fault localization, and program repair. To this end, a number of such repositories, e.g., Defects4J, have been proposed. However, such repositories are rather small because their construction involves expensive human intervention. Although bug-fixing code commits as well as associated test cases could be retrieved from version control systems automatically, existing approaches could not yet automatically extract concise bug-fixing patches from bug-fixing commits because such commits often involve bug-irrelevant changes. In this paper, we propose an automatic approach, called BugBuilder, to extracting complete and concise bug-fixing patches from human-written patches in version control systems. It excludes refactorings by detecting refactorings involved in bug-fixing commits, and reapplying detected refactorings on the faulty version. It enumerates all subsets of the remaining part and validates them on test cases. If none of the subsets has the potential to be a complete bug-fixing patch, the remaining part as a whole is taken as a complete and concise bug-fixing patch. Evaluation results on 809 real bug-fixing commits in Defects4J suggest that BugBuilder successfully generated complete and concise bug-fixing patches for forty percent of the bug-fixing commits, and its precision (99%) was even higher than human experts.

preprint2020arXiv

A Fixation-based 360° Benchmark Dataset for Salient Object Detection

Fixation prediction (FP) in panoramic contents has been widely investigated along with the booming trend of virtual reality (VR) applications. However, another issue within the field of visual saliency, salient object detection (SOD), has been seldom explored in 360° (or omnidirectional) images due to the lack of datasets representative of real scenes with pixel-level annotations. Toward this end, we collect 107 equirectangular panoramas with challenging scenes and multiple object classes. Based on the consistency between FP and explicit saliency judgements, we further manually annotate 1,165 salient objects over the collected images with precise masks under the guidance of real human eye fixation maps. Six state-of-the-art SOD models are then benchmarked on the proposed fixation-based 360° image dataset (F-360iSOD), by applying a multiple cubic projection-based fine-tuning method. Experimental results show a limitation of the current methods when used for SOD in panoramic images, which indicates the proposed dataset is challenging. Key issues for 360° SOD is also discussed. The proposed dataset is available at https://github.com/PanoAsh/F-360iSOD.

preprint2020arXiv

A Study of Bug Resolution Characteristics in Popular Programming Languages

This paper presents a large-scale study that investigates the bug resolution characteristics among popular Github projects written in different programming languages. We explore correlations but, of course, we cannot infer causation. Specifically, we analyse bug resolution data from approximately 70 million Source Line of Code, drawn from 3 million commits to 600 GitHub projects, primarily written in 10 programming languages. We find notable variations in apparent bug resolution time and patch (fix) size. While interpretation of results from such large-scale empirical studies is inherently difficult, we believe that the differences in medians are sufficiently large to warrant further investigation, replication, re-analysis and follow up research. For example, in our corpus, the median apparent bug resolution time (elapsed time from raise to resolve) for Ruby was 4X that for Go and 2.5X for Java. We also found that patches tend to touch more files for the corpus of strongly typed and for statically typed programs. However, we also found evidence for a lower elapsed resolution time for bug resolution committed to projects constructed from statically typed languages. These findings, if replicated in subsequent follow on studies, may shed further empirical light on the debate about the importance of static typing.

preprint2020arXiv

An accurate hyper-singular boundary integral equation method for dynamic poroelasticity in two dimensions

This paper is concerned with the boundary integral equation method for solving the exterior Neumann boundary value problem of dynamic poroelasticity in two dimensions. The main contribution of this work consists of two aspescts: the proposal of a novel regularized boundary integral equation, and the presentation of new regularized formulations of the strongly-singular and hyper-singular boundary integral operators. Firstly, turning to the spectral properties of the double-layer operator and the corresponding Calderón relation of the poroelasticity, we propose the novel low-GMRES-iteration integral equation whose eigenvalues are bounded away from zero and infinity. Secondly, with the help of the Günter derivatives, we reformulate the strongly-singular and hyper-singular integral operators into combinations of the weakly-singular operators and the tangential derivatives. The accuracy and efficiency of the proposed methodology are demonstrated through several numerical examples.

preprint2020arXiv

An energy-based discontinuous Galerkin method for semilinear wave equations

We generalize the energy-based discontinuous Galerkin method proposed in [SIAM J. Num. Anal., 53(6):2705-2726, 2015.] to second-order semilinear wave equations. A stability and convergence analysis is presented along with numerical experiments demonstrating optimal convergence for certain choices of the interelement fluxes. Applications to the sine-Gordon equation include simulations of breathers, kink, and anti-kink solitons.

preprint2020arXiv

Bi-parameter trilinear Fourier multipliers and pseudo-differential operators with flag symbols

The main purpose of this paper is to study $L^r$ Hölder type estimates for a bi-parameter trilinear Fourier multiplier with flag singularity, and the analogous pseudo-differential operator, when the symbols are in a certain product form. More precisely, for $f,g,h\in \mathcal{S}(\mathbb{R}^{2})$, the bi-parameter trilinear flag Fourier multiplier operators we consider are defined by $$ T_{m_1,m_2}(f,g,h)(x):=\int_{\mathbb{R}^{6}}m_1(ξ,η,ζ)m_2(η,ζ)\hat f(ξ) \hat g(η)\hat h(ζ)e^{2πi(ξ+η+ζ)\cdot x}dξdηdζ, $$ when $m_1,m_2$ are two bi-parameter symbols. We will show that our problem can be reduced to establish the $L^r$ estimate for the special multiplier $m_1(ξ_1, η_1, ζ_1) m_2(η_2, ζ_2)$ (see Theorem 1.7). We also study these $L^r$ estimates for the corresponding bi-parameter trilinear pseudo-differential operators defined by $$ T_{ab}(f,g,h)(x):=\int_{\mathbb{R}^6}a(x,ξ,η,ζ)b(x,η,ζ)\hat f(ξ)\hat g(η)\hat h(ζ)e^{2πi x(ξ+η+ζ)}dξdηdζ, $$ where the smooth symbols $a,b$ satisfy certain bi-parameter Hörmander conditions. We will also show that the $L^r$ estimate holds for $T_{ab}$ as long as the $L^r$ estimate for the flag multiplier operator holds when the multiplier has the special form $m_1(ξ_1, η_1, ζ_1) m_2(η_2, ζ_2)$ (see Theorem 1.10). The bi-parameter and trilinear flag Fourier multipliers considered in this paper do not satisfy the conditions of the classical bi-parameter trilinear Fourier multipliers considered by Muscalu, Tao, Thiele and the second author [21, 22]. They may also be viewed as the bi-parameter trilinear variants of estimates obtained for the one-parameter flag paraproducts by Muscalu [18].

preprint2020arXiv

Binary Probability Model for Learning Based Image Compression

In this paper, we propose to enhance learned image compression systems with a richer probability model for the latent variables. Previous works model the latents with a Gaussian or a Laplace distribution. Inspired by binary arithmetic coding , we propose to signal the latents with three binary values and one integer, with different probability models. A relaxation method is designed to perform gradient-based training. The richer probability model results in a better entropy coding leading to lower rate. Experiments under the Challenge on Learned Image Compression (CLIC) test conditions demonstrate that this method achieves 18% rate saving compared to Gaussian or Laplace models.

preprint2020arXiv

Efficient Uncertainty-aware Decision-making for Automated Driving Using Guided Branching

Decision-making in dense traffic scenarios is challenging for automated vehicles (AVs) due to potentially stochastic behaviors of other traffic participants and perception uncertainties (e.g., tracking noise and prediction errors, etc.). Although the partially observable Markov decision process (POMDP) provides a systematic way to incorporate these uncertainties, it quickly becomes computationally intractable when scaled to the real-world large-size problem. In this paper, we present an efficient uncertainty-aware decision-making (EUDM) framework, which generates long-term lateral and longitudinal behaviors in complex driving environments in real-time. The computation complexity is controlled to an appropriate level by two novel techniques, namely, the domain-specific closed-loop policy tree (DCP-Tree) structure and conditional focused branching (CFB) mechanism. The key idea is utilizing domain-specific expert knowledge to guide the branching in both action and intention space. The proposed framework is validated using both onboard sensing data captured by a real vehicle and an interactive multi-agent simulation platform. We also release the code of our framework to accommodate benchmarking.

preprint2020arXiv

Medusa: Blockchain Powered Log Storage System

Blockchain is one of the most heavily invested technologies in recent years. Due to its tamper-proof and decentralization properties, blockchain has become an ideal utility for data storage that is applicable in many real world industrial scenarios. One important scenario is web log, which is treated as sources of technical significance and commercial revenues in major internet companies. In this paper, we illustrate our design of a web log storage system based on HyperLedger. HyperLedger yields higher throughput and lower latency compared with other blockchain systems. Alongside its efficiency advantages, HyperLeger is a permissioned blockchain, which is an ideal fit for enterprise software design scenario.

preprint2020arXiv

Modeling Programs Hierarchically with Stack-Augmented LSTM

Programming language modeling has attracted extensive attention in recent years, and it plays an essential role in program processing fields. Statistical language models, which are initially designed for natural languages, have been generally used for modeling programming languages. However, different from natural languages, programming languages contain explicit and hierarchical structure that is hard to learn by traditional statistical language models. To address this challenge, we propose a novel Stack-Augmented LSTM neural network for programming language modeling. Adding a stack memory component into the LSTM network enables our model to capture the hierarchical information of programs through the PUSH and POP operations, which further allows our model capturing the long-term dependency in the programs. We evaluate the proposed model on three program analysis tasks, i.e., code completion, program classification, and code summarization. Evaluation results show that our proposed model outperforms baseline models in all the three tasks, indicating that by capturing the structural information of programs with a stack, our proposed model can represent programs more precisely.

preprint2020arXiv

ModeNet: Mode Selection Network For Learned Video Coding

In this paper, a mode selection network (ModeNet) is proposed to enhance deep learning-based video compression. Inspired by traditional video coding, ModeNet purpose is to enable competition among several coding modes. The proposed ModeNet learns and conveys a pixel-wise partitioning of the frame, used to assign each pixel to the most suited coding mode. ModeNet is trained alongside the different coding modes to minimize a rate-distortion cost. It is a flexible component which can be generalized to other systems to allow competition between different coding tools. Mod-eNet interest is studied on a P-frame coding task, where it is used to design a method for coding a frame given its prediction. ModeNet-based systems achieve compelling performance when evaluated under the Challenge on Learned Image Compression 2020 (CLIC20) P-frame coding track conditions.

preprint2020arXiv

OCoR: An Overlapping-Aware Code Retriever

Code retrieval helps developers reuse the code snippet in the open-source projects. Given a natural language description, code retrieval aims to search for the most relevant code among a set of code. Existing state-of-the-art approaches apply neural networks to code retrieval. However, these approaches still fail to capture an important feature: overlaps. The overlaps between different names used by different people indicate that two different names may be potentially related (e.g., &#34;message&#34; and &#34;msg&#34;), and the overlaps between identifiers in code and words in natural language descriptions indicate that the code snippet and the description may potentially be related. To address these problems, we propose a novel neural architecture named OCoR, where we introduce two specifically-designed components to capture overlaps: the first embeds identifiers by character to capture the overlaps between identifiers, and the second introduces a novel overlap matrix to represent the degrees of overlaps between each natural language word and each identifier. The evaluation was conducted on two established datasets. The experimental results show that OCoR significantly outperforms the existing state-of-the-art approaches and achieves 13.1% to 22.3% improvements. Moreover, we also conducted several in-depth experiments to help understand the performance of different components in OCoR.

preprint2020arXiv

Optical Flow and Mode Selection for Learning-based Video Coding

This paper introduces a new method for inter-frame coding based on two complementary autoencoders: MOFNet and CodecNet. MOFNet aims at computing and conveying the Optical Flow and a pixel-wise coding Mode selection. The optical flow is used to perform a prediction of the frame to code. The coding mode selection enables competition between direct copy of the prediction or transmission through CodecNet. The proposed coding scheme is assessed under the Challenge on Learned Image Compression 2020 (CLIC20) P-frame coding conditions, where it is shown to perform on par with the state-of-the-art video codec ITU/MPEG HEVC. Moreover, the possibility of copying the prediction enables to learn the optical flow in an end-to-end fashion i.e. without relying on pre-training and/or a dedicated loss term.

preprint2020arXiv

Service Ecosystem: A Lens of Smart Society

Intelligence services are playing an increasingly important role in the operation of our society. Exploring the evolution mechanism, boundaries and challenges of service ecosystem is essential to our ability to realize smart society, reap its benefits and prevent potential risks. We argue that this necessitates a broad scientific research agenda to study service ecosystem that incorporates and expands upon the disciplines of computer science and includes insights from across the sciences. We firstly outline a set of research issues that are fundamental to this emerging field, and then explores the technical, social, legal and institutional challenges on the study of service ecosystem.

preprint2019arXiv

Epitaxial growth and antiferromagnetism of Sn-substituted perovskite iridate SrIr$_{0.8}$Sn$_{0.2}$O$_3$

5d iridates have shown vast emergent phenomena due to a strong interplay among its lattice, charge and spin degrees of freedom, because of which the potential in spintronic application of the thin-film form is highly leveraged. Here we have epitaxially stabilized perovskite SrIr$_{0.8}$Sn$_{0.2}$O$_3$ on [001] SrTiO$_3$ substrates through pulsed laser deposition and systematically characterized the structural, electronic and magnetic properties. Physical properties measurements unravel an insulating ground state with a weak ferromagnetism in the compressively strained epitaxial film. The octahedral rotation pattern is identified by synchrotron x-ray diffraction, resolving a mix of $a^+b^-c^-$ and $a^-b^+c^-$ domains. X-ray magnetic resonant scattering directly demonstrates a G-type antiferromagnetic structure of the magnetic order and the spin canting nature of the weak ferromagnetism.