Source author record

Bing Li

Bing Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

87works

34topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

How Far Is Document Parsing from Solved? PureDocBench: A Source-TraceableBenchmark across Clean, Degraded, and Real-World Settings

The past year has seen over 20 open-source document parsing models, yet thefield still benchmarks almost exclusively on OmniDocBench, a 1,355-pagemanually annotated dataset whose top scores have saturated above 90%. Athree-stage audit pipeline we run on OmniDocBench screens its 21,353evaluator-scored blocks and confirms 2,580 errors (12.08%); combined with overa year of public availability, both annotation quality and contamination riskcall its rankings into question. To address these issues, we presentPureDocBench, a programmatically generated, source-traceable benchmark thatrenders document images from HTML/CSS and produces verifiable annotations fromthe same source, covering 10 domains, 66 subcategories, and 1,475 pages, eachin three versions: clean, digitally degraded, and real-degraded (4,425 imagestotal). Evaluating 40 models spanning pipeline specialists, end-to-endspecialists, and general-purpose VLMs, we find: (i) document parsing is farfrom solved: the best model scores only ~74 out of 100, with a 44.6-point gapbetween the strongest and weakest models; (ii) specialist parsers with <=4Bparameters rival or surpass general VLMs that are 5-100x larger, yet formularecognition remains a shared bottleneck where no model exceeds 67% whenaveraging the formula metric across all three tracks; (iii) general VLMs loseonly 0.99/8.52 Overall points under digital/real degradation versus 4.90/14.21for pipeline specialists, producing ranking reversals that make clean-onlyevaluation misleading for deployment. All data, code, and artifacts arepublicly released.

preprint2024arXiv

Dynamically Masked Discriminator for Generative Adversarial Networks

Training Generative Adversarial Networks (GANs) remains a challenging problem. The discriminator trains the generator by learning the distribution of real/generated data. However, the distribution of generated data changes throughout the training process, which is difficult for the discriminator to learn. In this paper, we propose a novel method for GANs from the viewpoint of online continual learning. We observe that the discriminator model, trained on historically generated data, often slows down its adaptation to the changes in the new arrival generated data, which accordingly decreases the quality of generated results. By treating the generated data in training as a stream, we propose to detect whether the discriminator slows down the learning of new knowledge in generated data. Therefore, we can explicitly enforce the discriminator to learn new knowledge fast. Particularly, we propose a new discriminator, which automatically detects its retardation and then dynamically masks its features, such that the discriminator can adaptively learn the temporally-vary distribution of generated data. Experimental results show our method outperforms the state-of-the-art approaches.

preprint2023arXiv

Set Prediction Guided by Semantic Concepts for Diverse Video Captioning

Diverse video captioning aims to generate a set of sentences to describe the given video in various aspects. Mainstream methods are trained with independent pairs of a video and a caption from its ground-truth set without exploiting the intra-set relationship, resulting in low diversity of generated captions. Different from them, we formulate diverse captioning into a semantic-concept-guided set prediction (SCG-SP) problem by fitting the predicted caption set to the ground-truth set, where the set-level relationship is fully captured. Specifically, our set prediction consists of two synergistic tasks, i.e., caption generation and an auxiliary task of concept combination prediction providing extra semantic supervision. Each caption in the set is attached to a concept combination indicating the primary semantic content of the caption and facilitating element alignment in set prediction. Furthermore, we apply a diversity regularization term on concepts to encourage the model to generate semantically diverse captions with various concept combinations. These two tasks share multiple semantics-specific encodings as input, which are obtained by iterative interaction between visual features and conceptual queries. The correspondence between the generated captions and specific concept combinations further guarantees the interpretability of our model. Extensive experiments on benchmark datasets show that the proposed SCG-SP achieves state-of-the-art (SOTA) performance under both relevance and diversity metrics.

preprint2022arXiv

Code Smells Detection via Modern Code Review: A Study of the OpenStack and Qt Communities

Code review that detects and locates defects and other quality issues plays an important role in software quality control. One type of issue that may impact the quality of software is code smells. Yet, little is known about the extent to which code smells are identified during modern code review. To investigate the concept behind code smells identified in modern code review and what actions reviewers suggest and developers take in response to the identified smells, we conducted a study of code smells in code reviews by analyzing reviews from four large open source projects from the OpenStack (Nova and Neutron) and Qt (Qt Base and Qt Creator) communities. We manually checked a total of 25,415 code review comments obtained by keywords search and random selection, and identified 1,539 smell-related reviews. Our analysis found that 1) code smells were not commonly identified in code reviews, 2) smells were usually caused by violation of coding conventions, 3) reviewers usually provided constructive feedback, including fixing (refactoring) recommendations to help developers remove smells, 4) developers generally followed those recommendations and actioned the changes, 5) once identified by reviewers, it usually takes developers less than one week to fix the smells, and 6) the main reason why developers chose to ignore the identified smells is not worth fixing the smell. Our results suggest that: 1) developers should closely follow coding conventions in their projects to avoid introducing code smells, 2) review-based detection of code smells is perceived to be a trustworthy approach by developers, mainly because reviews are context-sensitive (as reviewers are more aware of the context of the code given that they are part of the project's development team), and 3) program context needs to be fully considered in order to make a decision of whether to fix the identified code smell immediately.

preprint2022arXiv

Competing magnetic fluctuations and orders in a multiorbital model of doped SrCo$_2$As$_2$

We revisit the intriguing magnetic behavior of the paradigmatic itinerant frustrated magnet $\rm{Sr}\rm{Co}_2\rm{As}_2$, which shows strong and competing magnetic fluctuations yet does not develop long-range magnetic order. By calculating the static spin susceptibility $χ(\mathbf{q})$ within a realistic sixteen orbital Hubbard-Hund model, we determine the leading instability to be ferromagnetic (FM). We then explore the effect of doping and calculate the critical Hubbard interaction strength $U_c$ that is required for the development of magnetic order. We find that $U_c$ decreases under electron doping and with increasing Hund's coupling $J$, but increases rapidly under hole doping. This suggests that magnetic order could possibly emerge under electron doping but not under hole doping, which agrees with experimental findings. We map out the leading magnetic instability as a function of doping and Hund's coupling and find several antiferromagnetic phases in addition to FM. We also quantify the degree of itinerant frustration in the model and resolve the contributions of different orbitals to the magnetic susceptibility. Finally, we discuss the dynamic spin susceptibility, $χ(\mathbf{q}, ω)$, at finite frequencies, where we recover the anisotropy of the peaks at $\mathbf{Q}_π= (π, 0)$ and $(0, π)$ observed by inelastic neutron scattering that is associated with the phenomenon of itinerant magnetic frustration. By comparing results between theory and experiment, we conclude that the essential experimental features of doped SrCo$_2$As$_2$ are well captured by a Hubbard-Hund multiorbital model if one considers a small shift of the chemical potential towards hole doping.

preprint2022arXiv

Continual Prompt Tuning for Dialog State Tracking

A desirable dialog system should be able to continually learn new skills without forgetting old ones, and thereby adapt to new domains or tasks in its life cycle. However, continually training a model often leads to a well-known catastrophic forgetting issue. In this paper, we present Continual Prompt Tuning, a parameter-efficient framework that not only avoids forgetting but also enables knowledge transfer between tasks. To avoid forgetting, we only learn and store a few prompt tokens' embeddings for each task while freezing the backbone pre-trained model. To achieve bi-directional knowledge transfer among tasks, we propose several techniques (continual prompt initialization, query fusion, and memory replay) to transfer knowledge from preceding tasks and a memory-guided technique to transfer knowledge from subsequent tasks. Extensive experiments demonstrate the effectiveness and efficiency of our proposed method on continual learning for dialog state tracking, compared with state-of-the-art baselines.

preprint2022arXiv

CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation

Previous works of video captioning aim to objectively describe the video's actual content, which lacks subjective and attractive expression, limiting its practical application scenarios. Video titling is intended to achieve this goal, but there is a lack of a proper benchmark. In this paper, we propose to CREATE, the first large-scale Chinese shoRt vidEo retrievAl and Title gEneration benchmark, to facilitate research and application in video titling and video retrieval in Chinese. CREATE consists of a high-quality labeled 210K dataset and two large-scale 3M/10M pre-training datasets, covering 51 categories, 50K+ tags, 537K manually annotated titles and captions, and 10M+ short videos. Based on CREATE, we propose a novel model ALWIG which combines video retrieval and video titling tasks to achieve the purpose of multi-modal ALignment WIth Generation with the help of video tags and a GPT pre-trained model. CREATE opens new directions for facilitating future research and applications on video titling and video retrieval in the field of Chinese short videos.

preprint2022arXiv

EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

Current metrics for video captioning are mostly based on the text-level comparison between reference and candidate captions. However, they have some insuperable drawbacks, e.g., they cannot handle videos without references, and they may result in biased evaluation due to the one-to-many nature of video-to-text and the neglect of visual relevance. From the human evaluator's viewpoint, a high-quality caption should be consistent with the provided video, but not necessarily be similar to the reference in literal or semantics. Inspired by human evaluation, we propose EMScore (Embedding Matching-based score), a novel reference-free metric for video captioning, which directly measures similarity between video and candidate captions. Benefit from the recent development of large-scale pre-training models, we exploit a well pre-trained vision-language model to extract visual and linguistic embeddings for computing EMScore. Specifically, EMScore combines matching scores of both coarse-grained (video and caption) and fine-grained (frames and words) levels, which takes the overall understanding and detailed characteristics of the video into account. Furthermore, considering the potential information gain, EMScore can be flexibly extended to the conditions where human-labeled references are available. Last but not least, we collect VATEX-EVAL and ActivityNet-FOIl datasets to systematically evaluate the existing metrics. VATEX-EVAL experiments demonstrate that EMScore has higher human correlation and lower reference dependency. ActivityNet-FOIL experiment verifies that EMScore can effectively identify "hallucinating" captions. The datasets will be released to facilitate the development of video captioning metrics. The code is available at: https://github.com/ShiYaya/emscore.

preprint2022arXiv

FGNET-RH: Fine-Grained Named Entity Typing via Refinement in Hyperbolic Space

Fine-Grained Named Entity Typing (FG-NET) aims at classifying the entity mentions into a wide range of entity types (usually hundreds) depending upon the context. While distant supervision is the most common way to acquire supervised training data, it brings in label noise, as it assigns type labels to the entity mentions irrespective of mentions context. In attempts to deal with the label noise, leading research on the FG-NET assumes that the fine-grained entity typing data possesses a euclidean nature, which restraints the ability of the existing models in combating the label noise. Given the fact that the fine-grained type hierarchy exhibits a hierarchical structure, it makes hyperbolic space a natural choice to model the FG-NET data. In this research, we propose FGNET-RH, a novel framework that benefits from the hyperbolic geometry in combination with the graph structures to perform entity typing in a performance-enhanced fashion. FGNET-RH initially uses LSTM networks to encode the mention in relation with its context, later it forms a graph to distill/refine the mention encodings in the hyperbolic space. Finally, the refined mention encoding is used for entity typing. Experimentation using different benchmark datasets shows that FGNET-RH improves the performance on FG-NET by up to 3.5-% in terms of strict accuracy.

preprint2022arXiv

Glassy crystals with colossal multi-baroresponsivities

As a nontrivial solid state of matter, the glassy-crystal state embraces physical features of both crystalline and amorphous solids, where a long-range ordered periodic structure formed by the mass centers of constituent molecules accommodates orientational glasses. Here, we discover and validate a glassy-crystal state in 2-amino-2-methyl-1,3-propanediol (AMP, C4H11NO2) by neutron scattering and complementary broadband dielectric spectroscopy (BDS) measurements. The freezing process of the dynamic orientational disorder is manifested at relaxation times well described by the Vogel-Fulcher-Tammann (VFT) law and the strongly frequency-dependent freezing temperature ranging from around 225 K at 0.1 Hz to above room temperature in the GHz region. At room temperature, the supercooled state is extremely sensitive to pressure such that a few MPa pressure can induce crystallization to the ordered crystal state, eventually leading to a temperature increase by 48 K within 20 s, a significant reduction of visible light transmittance from about 95% to a few percentages, and a remarkable decrease of electrical conductivity by three orders of magnitude. These ultrasensitive baroresponsivities might find their applications in low-grade waste heat recycling, pressure sensors and non-volatile memory devices. It is expected that glassy crystals serve as an emerging platform for exploiting exotic states of matter and the associated fantastic applications.

preprint2022arXiv

Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

Visual grounding is a task to locate the target indicated by a natural language expression. Existing methods extend the generic object detection framework to this problem. They base the visual grounding on the features from pre-generated proposals or anchors, and fuse these features with the text embeddings to locate the target mentioned by the text. However, modeling the visual features from these predefined locations may fail to fully exploit the visual context and attribute information in the text query, which limits their performance. In this paper, we propose a transformer-based framework for accurate visual grounding by establishing text-conditioned discriminative features and performing multi-stage cross-modal reasoning. Specifically, we develop a visual-linguistic verification module to focus the visual features on regions relevant to the textual descriptions while suppressing the unrelated areas. A language-guided feature encoder is also devised to aggregate the visual contexts of the target object to improve the object's distinctiveness. To retrieve the target from the encoded visual features, we further propose a multi-stage cross-modal decoder to iteratively speculate on the correlations between the image and text for accurate target localization. Extensive experiments on five widely used datasets validate the efficacy of our proposed components and demonstrate state-of-the-art performance. Our code is public at https://github.com/yangli18/VLTVG.

preprint2022arXiv

iPTF14hls in the circumstellar medium interaction model: A promising candidate for a pulsational pair-instability supernova

iPTF14hls is a luminous Type II supernova (SN) with a bumpy light curve that remains debated for its origin. It maintains roughly a constant effective temperature and luminosity since discovery for about 600 days, followed by a slow decay. On $\sim 1000$\ days post discovery the light curve transitions to a very steep decline. A spectrum taken during this steep decline phase shows clear signatures of shock interaction with dense circumstellar medium (CSM). Here we explore the possibility of iPTF14hls as an interaction-powered SN. The light curve of iPTF14hls can be fitted with wind-like CSMs. Analytic modeling indicates that iPTF14hls may have undertaken six episodes of mass loss during the last $\sim 200\mathrm{yr}$. Assuming that the 1954 eruption triggered the last mass-loss episode, the stellar-wind velocity is determined to be $40-70\mathrm{km}\mathrm{s}^{-1}$, depending on different models. Mass loss rates are in the range $% 0.4-3.3M_{\odot }\mathrm{yr}^{-1}$. The inferred total mass of ejecta and CSMs ($M_{\mathrm{ej}}+M_{\mathrm{CSMs}}\simeq 245M_{\odot }$) supports the idea that iPTF14hls may be a candidate for a (pulsational) pair-instability SN. Discovery and observations of more similar stellar explosions will help understand these peculiar SNe.

preprint2022arXiv

Large intersection property for limsup sets in metric space

We show that limsup sets generated by a sequence of open sets in compact Ahlfors $s$-regular space $(X,\mathscr{B},μ,ρ)$ belong to the classes of sets with large intersections with index $λ$, denoted by $\mathcal{G}^λ(X)$, under some conditions. In particular, this provides a lower bound on Hausdorff dimension of such sets. These results are applied to obtain that limsup random fractals with indices $γ_2$ and $δ$ belong to $\mathcal{G}^{s-δ-γ_2}(X)$ almost surely, and random covering sets with exponentially mixing property belong to $\mathcal{G}^{s_0}(X)$ almost surely, where $s_0$ equals to the corresponding Hausdorff dimension of covering sets almost surely. We also investigate the large intersection property of limsup sets generated by rectangles in metric space.

preprint2022arXiv

Learning Scene Flow in 3D Point Clouds with Noisy Pseudo Labels

We propose a novel scene flow method that captures 3D motions from point clouds without relying on ground-truth scene flow annotations. Due to the irregularity and sparsity of point clouds, it is expensive and time-consuming to acquire ground-truth scene flow annotations. Some state-of-the-art approaches train scene flow networks in a self-supervised learning manner via approximating pseudo scene flow labels from point clouds. However, these methods fail to achieve the performance level of fully supervised methods, due to the limitations of point cloud such as sparsity and lacking color information. To provide an alternative, we propose a novel approach that utilizes monocular RGB images and point clouds to generate pseudo scene flow labels for training scene flow networks. Our pseudo label generation module infers pseudo scene labels for point clouds by jointly leveraging rich appearance information in monocular images and geometric information of point clouds. To further reduce the negative effect of noisy pseudo labels on the training, we propose a noisy-label-aware training scheme by exploiting the geometric relations of points. Experiment results show that our method not only outperforms state-of-the-art self-supervised approaches, but also outperforms some supervised approaches that use accurate ground-truth flows.

preprint2022arXiv

Learning Target-aware Representation for Visual Tracking via Informative Interactions

We introduce a novel backbone architecture to improve target-perception ability of feature representation for tracking. Specifically, having observed that de facto frameworks perform feature matching simply using the outputs from backbone for target localization, there is no direct feedback from the matching module to the backbone network, especially the shallow layers. More concretely, only the matching module can directly access the target information (in the reference frame), while the representation learning of candidate frame is blind to the reference target. As a consequence, the accumulation effect of target-irrelevant interference in the shallow stages may degrade the feature quality of deeper layers. In this paper, we approach the problem from a different angle by conducting multiple branch-wise interactions inside the Siamese-like backbone networks (InBN). At the core of InBN is a general interaction modeler (GIM) that injects the prior knowledge of reference image to different stages of the backbone network, leading to better target-perception and robust distractor-resistance of candidate feature representation with negligible computation cost. The proposed GIM module and InBN mechanism are general and applicable to different backbone types including CNN and Transformer for improvements, as evidenced by our extensive experiments on multiple benchmarks. In particular, the CNN version (based on SiamCAR) improves the baseline with 3.2/6.9 absolute gains of SUC on LaSOT/TNL2K, respectively. The Transformer version obtains SUC scores of 65.7/52.0 on LaSOT/TNL2K, which are on par with recent state of the arts. Code and models will be released.

preprint2022arXiv

Low temperature competing magnetic energy scales in the topological ferrimagnet TbMn6Sn6

TbMn6Sn6 is a metallic ferrimagnet displaying signatures of both topological electrons and topological magnons arising from ferromagnetism and spin-orbit coupling within its Mn kagome layers. Inelastic neutron scattering measurements find strong ferromagnetic (FM) interactions within the Mn kagome layer and reveal a magnetic bandwidth of ~230 meV. The low-energy magnetic excitations are characterized by strong FM Mn-Mn and antiferromagnetic (AFM) Mn-Tb interlayer magnetic couplings. We observe weaker, competing long-range FM and AFM Mn-Mn interlayer interactions similar to those driving helical magnetism in the YMn6Sn6 system. Combined with density-functional theory calculations, we find that competing Mn-Mn interlayer magnetic interactions occur in all RMn6Sn6 compounds with R= Y, Gd-Lu, resulting in magnetic instabilities and tunability when Mn-R interactions are weak. In the case of TbMn6Sn6, strong AFM Mn-Tb coupling ensures a highly stable three-dimensional ferrimagnetic network.

preprint2022arXiv

NAND-SPIN-Based Processing-in-MRAM Architecture for Convolutional Neural Network Acceleration

The performance and efficiency of running large-scale datasets on traditional computing systems exhibit critical bottlenecks due to the existing "power wall" and "memory wall" problems. To resolve those problems, processing-in-memory (PIM) architectures are developed to bring computation logic in or near memory to alleviate the bandwidth limitations during data transmission. NAND-like spintronics memory (NAND-SPIN) is one kind of promising magnetoresistive random-access memory (MRAM) with low write energy and high integration density, and it can be employed to perform efficient in-memory computation operations. In this work, we propose a NAND-SPIN-based PIM architecture for efficient convolutional neural network (CNN) acceleration. A straightforward data mapping scheme is exploited to improve the parallelism while reducing data movements. Benefiting from the excellent characteristics of NAND-SPIN and in-memory processing architecture, experimental results show that the proposed approach can achieve $\sim$2.6$\times$ speedup and $\sim$1.4$\times$ improvement in energy efficiency over state-of-the-art PIM solutions.

preprint2022arXiv

Nonlinear function-on-function regression by RKHS

We propose a nonlinear function-on-function regression model where both the covariate and the response are random functions. The nonlinear regression is carried out in two steps: we first construct Hilbert spaces to accommodate the functional covariate and the functional response, and then build a second-layer Hilbert space for the covariate to capture nonlinearity. The second-layer space is assumed to be a reproducing kernel Hilbert space, which is generated by a positive definite kernel determined by the inner product of the first-layer Hilbert space for $X$--this structure is known as the nested Hilbert spaces. We develop estimation procedures to implement the proposed method, which allows the functional data to be observed at different time points for different subjects. Furthermore, we establish the convergence rate of our estimator as well as the weak convergence of the predicted response in the Hilbert space. Numerical studies including both simulations and a data application are conducted to investigate the performance of our estimator in finite sample.

preprint2022arXiv

Novel Valence Transition in Elemental Metal Europium around 80 GPa

Valence transition could induce structural, insulator-metal, nonmagnetic-magnetic and superconducting transitions in rare-earth metals and compounds, while the underlying physics remains unclear due to the complex interaction of localized 4f electrons as well as their coupling with itinerant electrons. The valence transition in the elemental metal europium (Eu) still has remained as a matter of debate. Using resonant x-ray emission scattering and x-ray diffraction, we pressurize the states of 4f electrons in Eu and study its valence and structure transitions up to 160 GPa. We provide compelling evidence for a valence transition around 80 GPa, which coincides with a structural transition from a monoclinic (C2/c) to an orthorhombic phase (Pnma). We show that the valence transition occurs when the pressure-dependent energy gap between 4f and 5d electrons approaches the Coulomb interaction. Our discovery is critical for understanding the electrodynamics of Eu, including magnetism and high-pressure superconductivity.

preprint2022arXiv

Periodic repeating fast radio bursts: interaction between a magnetized neutron star and its planet in an eccentric orbit

Fast radio bursts (FRBs) are mysterious transient phenomena. The study of repeating FRBs may provide useful information about their nature due to their redetectability. The two most famous repeating sources are FRBs 121102 and 180916, with a period of 157 days and 16.35 days, respectively. Previous studies suggest that the periodicity of FRBs is likely associated with neutron star (NS) binary systems. Here we introduce a new model which proposes that periodic repeating FRBs are due to the interaction of a NS with its planet in a highly elliptical orbit. The periastron of the planet is very close to the NS so that it would be partially disrupted by tidal force every time it passes through the periastron. Fragments generated in the process could interact with the compact star through the Alfvén wing mechanism and produce FRBs. The model can naturally explain the repeatability of FRBs with a period ranging from a few days to several hundred days, but it generally requires that the eccentricity of the planet's orbit should be large enough. Taking FRBs 121102 and 180916 as examples, it is shown that the main features of the observed repeating behaviors can be satisfactorily accounted for.

preprint2022arXiv

PIC 4th Challenge: Semantic-Assisted Multi-Feature Encoding and Multi-Head Decoding for Dense Video Captioning

The task of Dense Video Captioning (DVC) aims to generate captions with timestamps for multiple events in one video. Semantic information plays an important role for both localization and description of DVC. We present a semantic-assisted dense video captioning model based on the encoding-decoding framework. In the encoding stage, we design a concept detector to extract semantic information, which is then fused with multi-modal visual features to sufficiently represent the input video. In the decoding stage, we design a classification head, paralleled with the localization and captioning heads, to provide semantic supervision. Our method achieves significant improvements on the YouMakeup dataset under DVC evaluation metrics and achieves high performance in the Makeup Dense Video Captioning (MDVC) task of PIC 4th Challenge.

preprint2022arXiv

Quasi-periodic oscillations of the X-ray burst from the magnetar SGR J1935+2154 and associated with the fast radio burst FRB 200428

The origin(s) and mechanism(s) of fast radio bursts (FRBs), which are short radio pulses from cosmological distances, have remained a major puzzle since their discovery. We report a strong Quasi-Periodic Oscillation(QPO) of 40 Hz in the X-ray burst from the magnetar SGR J1935+2154 and associated with FRB 200428, significantly detected with the Hard X-ray Modulation Telescope (Insight-HXMT) and also hinted by the Konus-Wind data. QPOs from magnetar bursts have only been rarely detected; our 3.4 sigma (p-value is 2.9e-4) detection of the QPO reported here reveals the strongest QPO signal observed from magnetars (except in some very rare giant flares), making this X-ray burst unique among magnetar bursts. The two X-ray spikes coinciding with the two FRB pulses are also among the peaks of the QPO. Our results suggest that at least some FRBs are related to strong oscillation processes of neutron stars. We also show that we may overestimate the significance of the QPO signal and underestimate the errors of QPO parameters if QPO exists only in a fraction of the time series of a X-ray burst which we use to calculate the Leahy-normalized periodogram.

preprint2022arXiv

Rational numbers in $\times b$-invariant sets

Let $b \geq 2$ be an integer and $S$ be a finite non-empty set of primes not containing divisors of $b$. For any non-dense set $A \subset [0,1)$ such that $A \cap \mathbb{Q}$ is invariant under $\times b$ operation, we prove the finiteness of rational numbers in $A$ whose denominators can only be divided by primes in $S$. A quantitative result on the largest prime divisors of the denominators of rational numbers in $A$ is also obtained.

preprint2022arXiv

Recursive Least-Squares Estimator-Aided Online Learning for Visual Tracking

Tracking visual objects from a single initial exemplar in the testing phase has been broadly cast as a one-/few-shot problem, i.e., one-shot learning for initial adaptation and few-shot learning for online adaptation. The recent few-shot online adaptation methods incorporate the prior knowledge from large amounts of annotated training data via complex meta-learning optimization in the offline phase. This helps the online deep trackers to achieve fast adaptation and reduce overfitting risk in tracking. In this paper, we propose a simple yet effective recursive least-squares estimator-aided online learning approach for few-shot online adaptation without requiring offline training. It allows an in-built memory retention mechanism for the model to remember the knowledge about the object seen before, and thus the seen data can be safely removed from training. This also bears certain similarities to the emerging continual learning field in preventing catastrophic forgetting. This mechanism enables us to unveil the power of modern online deep trackers without incurring too much extra computational cost. We evaluate our approach based on two networks in the online learning families for tracking, i.e., multi-layer perceptrons in RT-MDNet and convolutional neural networks in DiMP. The consistent improvements on several challenging tracking benchmarks demonstrate its effectiveness and efficiency.

preprint2022arXiv

Rethinking the competition between detection and ReID in Multi-Object Tracking

Due to balanced accuracy and speed, one-shot models which jointly learn detection and identification embeddings, have drawn great attention in multi-object tracking (MOT). However, the inherent differences and relations between detection and re-identification (ReID) are unconsciously overlooked because of treating them as two isolated tasks in the one-shot tracking paradigm. This leads to inferior performance compared with existing two-stage methods. In this paper, we first dissect the reasoning process for these two tasks, which reveals that the competition between them inevitably would destroy task-dependent representations learning. To tackle this problem, we propose a novel reciprocal network (REN) with a self-relation and cross-relation design so that to impel each branch to better learn task-dependent representations. The proposed model aims to alleviate the deleterious tasks competition, meanwhile improve the cooperation between detection and ReID. Furthermore, we introduce a scale-aware attention network (SAAN) that prevents semantic level misalignment to improve the association capability of ID embeddings. By integrating the two delicately designed networks into a one-shot online MOT system, we construct a strong MOT tracker, namely CSTrack. Our tracker achieves the state-of-the-art performance on MOT16, MOT17 and MOT20 datasets, without other bells and whistles. Moreover, CSTrack is efficient and runs at 16.4 FPS on a single modern GPU, and its lightweight version even runs at 34.6 FPS. The complete code has been released at https://github.com/JudasDie/SOTS.

preprint2022arXiv

SCTN: Sparse Convolution-Transformer Network for Scene Flow Estimation

We propose a novel scene flow estimation approach to capture and infer 3D motions from point clouds. Estimating 3D motions for point clouds is challenging, since a point cloud is unordered and its density is significantly non-uniform. Such unstructured data poses difficulties in matching corresponding points between point clouds, leading to inaccurate flow estimation. We propose a novel architecture named Sparse Convolution-Transformer Network (SCTN) that equips the sparse convolution with the transformer. Specifically, by leveraging the sparse convolution, SCTN transfers irregular point cloud into locally consistent flow features for estimating continuous and consistent motions within an object/local object part. We further propose to explicitly learn point relations using a point transformer module, different from exiting methods. We show that the learned relation-based contextual information is rich and helpful for matching corresponding points, benefiting scene flow estimation. In addition, a novel loss function is proposed to adaptively encourage flow consistency according to feature similarity. Extensive experiments demonstrate that our proposed approach achieves a new state of the art in scene flow estimation. Our approach achieves an error of 0.038 and 0.037 (EPE3D) on FlyingThings3D and KITTI Scene Flow respectively, which significantly outperforms previous methods by large margins.

preprint2022arXiv

Simultaneous Estimation of Graphical Models by Neighborhood Selection

In many applications concerning statistical graphical models the data originate from several subpopulations that share similarities but have also significant differences. This raises the question of how to estimate several graphical models simultaneously. Compiling all the data together to estimate a single graph would ignore the differences among subpopulations. On the other hand, estimating a graph from each subpopulation separately does not make efficient use of the common structure in the data. We develop a new method for simultaneous estimation of multiple graphical models by estimating the topological neighborhoods of the involved variables under a sparse inducing penalty that takes into account the common structure in the subpopulations. Unlike the existing methods for joint graphical models, our method does not rely on spectral decomposition of large matrices, and is therefore more computationally attractive for estimating large networks. In addition, we develop the asymptotic properties of our method, demonstrate its the numerical complexity, and compare it with several existing methods by simulation. Finally, we apply our method to the estimation of genomic networks for a lung cancer dataset which consists of several subpopulations.

preprint2022arXiv

SubGraph Networks based Entity Alignment for Cross-lingual Knowledge Graph

Entity alignment is the task of finding entities representing the same real-world object in two knowledge graphs(KGs). Cross-lingual knowledge graph entity alignment aims to discover the cross-lingual links in the multi-language KGs, which is of great significance to the NLP applications and multi-language KGs fusion. In the task of aligning cross-language knowledge graphs, the structures of the two graphs are very similar, and the equivalent entities often have the same subgraph structure characteristics. The traditional GCN method neglects to obtain structural features through representative parts of the original graph and the use of adjacency matrix is not enough to effectively represent the structural features of the graph. In this paper, we introduce the subgraph network (SGN) method into the GCN-based cross-lingual KG entity alignment method. In the method, we extracted the first-order subgraphs of the KGs to expand the structural features of the original graph to enhance the representation ability of the entity embedding and improve the alignment accuracy. Experiments show that the proposed method outperforms the state-of-the-art GCN-based method.

preprint2022arXiv

The First Insight-HXMT Gamma-Ray Burst Catalog: The First Four Years

The Hard X-ray Modulation Telescope (Insight-HXMT), is China's first X-ray astronomy satellite launched on June 15, 2017. The anti-coincidence CsI detectors of the High Energy X-ray telescope (HE) onboard Insight-HXMT could serve as an all-sky gamma-ray monitor in about 0.2-3 MeV. In its first four years of operation, Insight-HXMT has detected 322 Gamma-Ray Bursts (GRBs) by offline search pipeline including blind search and targeted search. For the GOLDEN sample of Insight-HXMT GRBs, joint analyses were performed with other GRB missions, including Fermi Gamma-ray Burst Monitor (Fermi/GBM), Swift Burst Alert Telescope (Swift/BAT) and Gravitational wave high-energy Electromagnetic Counterpart All-sky Monitor (GECAM). It shows that Insight-HXMT can provide better constraint on GRB spectrum at higher energy band. The properties of Insight-HXMT GRBs are reported in detail, including their trigger time, duration, spectral parameters, peak fluxes of different time scales and fluence. This catalog is an official product of the Insight-HXMT GRB team.

preprint2022arXiv

Two-dimensional Functional Minerals for Sustainable Optics

Optical device is a key component in our lives and organic liquid crystals are nowadays widely used to reduce human imprint. However, this technology still suffers from relatively high costs, toxicity and other environmental impacts, and cannot fully meet the demand of future sustainable society. Here we describe an alternative approach to colour-tuneable optical devices, which is based on sustainable inorganic liquid crystals derived from two-dimensional mineral materials abundant in nature. The prototypical two-dimensional mineral of vermiculite is massively produced by a green method, possessing size-to-thickness ratios of >103, in-plane magnetisation of >10 emu g-1, and an optical bandgap of >3 eV. These characteristics endow two-dimensional vermiculite with sensitive magneto-birefringence response, which is several orders of magnitude larger than organic counterparts, as well as capability of broad-spectrum modulation. Our finding consequently permits the fabrication of various chromic devices with low or even zero-energy consumption, which can be used for sustainable optics.

preprint2022arXiv

VirtualSync+: Timing Optimization with Virtual Synchronization

In digital circuit designs, sequential components such as flip-flops are used to synchronize signal propagations. Logic computations are aligned at and thus isolated by flip-flop stages. Although this fully synchronous style can reduce design efforts significantly, it may affect circuit performance negatively, because sequential components can only introduce delays into signal propagations but never accelerate them. In this paper, we propose a new timing model, VirtualSync+, in which signals, specially those along critical paths, are allowed to propagate through several sequential stages without flip-flops. Timing constraints are still satisfied at the boundary of the optimized circuit to maintain a consistent interface with existing designs. By removing clock-to-q delays and setup time requirements of flip-flops on critical paths, the performance of a circuit can be pushed even beyond the limit of traditional sequential designs. In addition, we further enhance the optimization with VirtualSync+ by fine-tuning with commercial design tools, e.g., Design Compiler from Synopsys, to achieve more accurate result. Experimental results demonstrate that circuit performance can be improved by up to 4% (average 1.5%) compared with that after extreme retiming and sizing, while the increase of area is still negligible. This timing performance is enhanced beyond the limit of traditional sequential designs. It also demonstrates that compared with those after retiming and sizing, the circuits with VirtualSync+ can achieve better timing performance under the same area cost or smaller area cost under the same clock period, respectively.

preprint2021arXiv

Cross-Lingual Named Entity Recognition Using Parallel Corpus: A New Approach Using XLM-RoBERTa Alignment

We propose a novel approach for cross-lingual Named Entity Recognition (NER) zero-shot transfer using parallel corpora. We built an entity alignment model on top of XLM-RoBERTa to project the entities detected on the English part of the parallel data to the target language sentences, whose accuracy surpasses all previous unsupervised models. With the alignment model we can get pseudo-labeled NER data set in the target language to train task-specific model. Unlike using translation methods, this approach benefits from natural fluency and nuances in target-language original corpus. We also propose a modified loss function similar to focal loss but assigns weights in the opposite direction to further improve the model training on noisy pseudo-labeled data set. We evaluated this proposed approach over 4 target languages on benchmark data sets and got competitive F1 scores compared to most recent SOTA models. We also gave extra discussions about the impact of parallel corpus size and domain on the final transfer performance.

preprint2021arXiv

Dynamic self-consistent field approach for studying kinetic processes in multiblock copolymer melts

The self-consistent field theory is a popular and highly successful theoretical framework for studying equilibrium (co)polymer systems at the mesoscopic level. Dynamic density functionals allow one to use this framework for studying dynamical processes in the diffusive, non-inertial regime. The central quantity in these approaches is the mobility function, which describes the effect of chain connectivity on the nonlocal response of monomers to thermodynamic driving fields. In a recent study [Mantha et al, Macromolecules 53, 3409 (2020)], we have developed a method to systematically construct mobility functions from reference fine-grained simulations. Here we focus on melts of linear chains in the Rouse regime and show how the mobility functions can be calculated semi-analytically for multiblock copolymers with arbitrary sequences without resorting to simulations. In this context, an accurate approximate expression for the single-chain dynamic structure factor is derived. Several limiting regimes are discussed. Then we apply the resulting density functional theory to study ordering processes in a two-length scale block copolymer system after instantaneous quenches into the ordered phase. Different dynamical regimes in the ordering process are identified: At early times, the ordering on short scales dominates; at late times, the ordering on larger scales takes over. For large quench depths, the system does not necessarily relax into the true equilibrium state. Our density functional approach could be used for the computer-assisted design of quenching protocols in order to create novel nonequilibrium materials.

preprint2021arXiv

In-orbit timing calibration of the Insight-Hard X-ray Modulation Telescope

We describe the timing system and the timing calibration results of the three payloads on-board the Insight-Hard X-ray Modulation Telescope (Insight-HXMT). These three payloads are the High Energy X-ray telescope (HE, 20-250 keV), the Medium Energy X-ray telescope (ME, 5-30 keV) and the low Energy X-ray telescope (LE, 1-10 keV). We present a method to correct the temperature-dependent period response and the long-term variation of the on-board crystal oscillator, especially for ME that does not carry a temperature-compensated crystal oscillator. The time of arrivals (ToAs) of the Crab pulsar are measured to evaluate the accuracy of the timing system. As the ephemeris of the Crab pulsar given by Jodrell Bank observatory has systematic errors around 40 μs (Rots et al. 2014), we use the quasi-simultaneous observations of the X-ray Timing Instrument (XTI) on-board the Neutron star Interior Composition Explorer (NICER) to produce the Crab ephemerides and to verify the timing system of Insight-HXMT. The energy-dependent ToAs' offsets relative to the NICER measurements including physical and instrumental origins are about 24.7μs, 10.1μs and 864.7μs, and the systematic errors of the timing system are determined as 12.1μs, 8.6μs, and 15.8μs, for HE, ME and LE respectively.

preprint2021arXiv

Named Entity Recognition in the Style of Object Detection

In this work, we propose a two-stage method for named entity recognition (NER), especially for nested NER. We borrowed the idea from the two-stage Object Detection in computer vision and the way how they construct the loss function. First, a region proposal network generates region candidates and then a second-stage model discriminates and classifies the entity and makes the final prediction. We also designed a special loss function for the second-stage training that predicts the entityness and entity type at the same time. The model is built on top of pretrained BERT encoders, and we tried both BERT base and BERT large models. For experiments, we first applied it to flat NER tasks such as CoNLL2003 and OntoNotes 5.0 and got comparable results with traditional NER models using sequence labeling methodology. We then tested the model on the nested named entity recognition task ACE2005 and Genia, and got F1 score of 85.6$\%$ and 76.8$\%$ respectively. In terms of the second-stage training, we found that adding extra randomly selected regions plays an important role in improving the precision. We also did error profiling to better evaluate the performance of the model in different circumstances for potential improvements in the future.

preprint2021arXiv

On the hitting probabilities of limsup random fractals

Let $A$ be a limsup random fractal with indices $γ_1, ~γ_2 ~$and $δ$ on $[0,1]^d$. We determine the hitting probability $\mathbb{P}(A\cap G)$ for any analytic set $G$ with the condition $(\star)$$\colon$ $\dim_{\rm H}(G)>γ_2+δ$, where $\dim_{\rm H}$ denotes the Hausdorff dimension. This extends the correspondence of Khoshnevisan, Peres and Xiao [10] by relaxing the condition that the probability $P_n$ of choosing each dyadic hyper-cube is homogeneous and $\lim\limits_{n\to\infty}\frac{\log_2P_n}{n}$ exists. We also present some counterexamples to show the Hausdorff dimension in condition $(\star)$ can not be replaced by the packing dimension.

preprint2021arXiv

Open-book Video Captioning with Retrieve-Copy-Generate Network

Due to the rapid emergence of short videos and the requirement for content understanding and creation, the video captioning task has received increasing attention in recent years. In this paper, we convert traditional video captioning task into a new paradigm, \ie, Open-book Video Captioning, which generates natural language under the prompts of video-content-relevant sentences, not limited to the video itself. To address the open-book video captioning problem, we propose a novel Retrieve-Copy-Generate network, where a pluggable video-to-text retriever is constructed to retrieve sentences as hints from the training corpus effectively, and a copy-mechanism generator is introduced to extract expressions from multi-retrieved sentences dynamically. The two modules can be trained end-to-end or separately, which is flexible and extensible. Our framework coordinates the conventional retrieval-based methods with orthodox encoder-decoder methods, which can not only draw on the diverse expressions in the retrieved sentences but also generate natural and accurate content of the video. Extensive experiments on several benchmark datasets show that our proposed approach surpasses the state-of-the-art performance, indicating the effectiveness and promising of the proposed paradigm in the task of video captioning.

preprint2021arXiv

Ultrasensitive barocaloric material for room-temperature solid-state refrigeration

Solid-state refrigeration based on caloric effects is an energetically efficient and environmentally friendly technology, which is deemed as a potential alternative to the conventional vapor-compression technology. One of the greatest obstacles to the real application is the huge driving fields. Here, we report a giant barocaloric effect in inorganic NH4I with maximum entropy changes of ΔS_BCE^max ~89 J K-1 kg-1 around room temperature, associated with the orientationally order-disorder phase transition. The phase transition temperature, Tt, varies dramatically with pressure in a rate of dTt/dP ~0.81 K MPa-1, which leads to a very much small saturation driving pressure of ΔP ~20 MPa, an unprecedentedly large caloric strength of |ΔS_BCE^max/ΔP| ~4.45 J K-1 kg-1 MPa-1, as well as a broad temperature window of ~68 K under an 80 MPa driving pressure. Comprehensive characterization of the crystal structure and dynamics by neutron scattering measurements reveals a strong reorientation-vibration coupling that is responsible for the large pressure sensitivity of Tt. This work is expected to advance the practical application of barocaloric refrigeration.

preprint2020arXiv

Discovery of oscillations above 200 keV in a black hole X-ray binary with Insight-HXMT

Low-frequency quasi-periodic oscillations (LFQPOs) are commonly found in black hole X-ray binaries, and their origin is still under debate. The properties of LFQPOs at high energies (above 30 keV) are closely related to the nature of the accretion flow in the innermost regions, and thus play a crucial role in critically testing various theoretical models. The Hard X-ray Modulation Telescope (Insight-HXMT) is capable of detecting emissions above 30 keV, and is therefore an ideal instrument to do so. Here we report the discovery of LFQPOs above 200 keV in the new black hole MAXI J1820+070 in the X-ray hard state, which allows us to understand the behaviours of LFQPOs at hundreds of kiloelectronvolts. The phase lag of the LFQPO is constant around zero below 30 keV, and becomes a soft lag (that is, the high-energy photons arrive first) above 30 keV. The soft lag gradually increases with energy and reaches ~0.9s in the 150-200 keV band. The detection at energies above 200 keV, the large soft lag and the energy-related behaviors of the LFQPO pose a great challenge for most currently existing models, but suggest that the LFQPO probably originates from the precession of a small-scale jet.

preprint2020arXiv

Dynamical Borel-Cantelli lemma for recurrence theory

We study the dynamical Borel-Cantelli lemma for recurrence sets in a measure preserving dynamical system $(X, μ, T)$ with a compatible metric $d$. We prove that, under some regularity conditions, the $μ$-measure of the following set \[ R(ψ)= \{x\in X : d(T^n x, x) < ψ(n)\ \text{for infinitely many}\ n\in\N \} \] obeys a zero-full law according to the convergence or divergence of a certain series, where $ψ:\N\to\R^+$. Some of the applications of our main theorem include the continued fractions dynamical systems, the beta dynamical systems, and the homogeneous self-similar sets.

preprint2020arXiv

Fine-Grained Named Entity Typing over Distantly Supervised Data Based on Refined Representations

Fine-Grained Named Entity Typing (FG-NET) is a key component in Natural Language Processing (NLP). It aims at classifying an entity mention into a wide range of entity types. Due to a large number of entity types, distant supervision is used to collect training data for this task, which noisily assigns type labels to entity mentions irrespective of the context. In order to alleviate the noisy labels, existing approaches on FGNET analyze the entity mentions entirely independent of each other and assign type labels solely based on mention sentence-specific context. This is inadequate for highly overlapping and noisy type labels as it hinders information passing across sentence boundaries. For this, we propose an edge-weighted attentive graph convolution network that refines the noisy mention representations by attending over corpus-level contextual clues prior to the end classification. Experimental evaluation shows that the proposed model outperforms the existing research by a relative score of upto 10.2% and 8.3% for macro f1 and micro f1 respectively.

preprint2020arXiv

FRB 200428: an Impact between an Asteroid and a Magnetar

A fast radio burst (FRB) was recently detected to be associated with a hard X-ray burst from the Galactic magnetar SGR 1935+2154. Scenarios involving magnetars for FRBs are hence highly favored. In this work, we suggest that the impact between an asteroid and a magnetar could explain such a detection. According to our calculations, an asteroid of mass $10^{20}$ g will be disrupted at a distance of $7 \times 10^9$ cm when approaching the magnetar. The accreted material will flow along the magnetic field lines from the Alfvén radius $\sim 10^7$ cm. After falling onto the magnetar's surface, an instant accretion column will be formed, producing a Comptonized X-ray burst and an FRB in the magnetosphere. We show that all the observational features of FRB 200428 could be interpreted self-consistently in this scenario. We predict quasi-periodic oscillations in this specific X-ray burst, which can serve as an independent observational test.

preprint2020arXiv

Model-Based Compensation of Moving Tissue for State Recognition in Robotic-Assisted Pedicle Drilling

Drilling is one of the hardest parts of pedicle screw fixation, and it is one of the most dangerous operations because inaccurate screw placement would injury vital tissues, particularly when the vertebra is not stationary. Here we demonstrate the drilling state recognition method for moving tissue by compensating the displacement based on a simplified motion predication model of a vertebra with respect to the tidal volume. To adapt it to different patients, the prediction model was built based on the physiological data recorded from subjects themselves. In addition, the spindle speed of the drilling tool was investigated to find a suitable speed for the robotic-assisted system. To ensure patient safety, a monitoring system was built based on the thrusting force and tracked position information. Finally, experiments were carried out on a fresh porcine lamellar bone fixed on a 3-PRS parallel robot used to simulate the vertebra displacement. The success rate of the robotic-assisted drilling procedure reached 95% when the moving bone was compensated.

preprint2020arXiv

Object Relational Graph with Teacher-Recommended Learning for Video Captioning

Taking full advantage of the information from both vision and language is critical for the video captioning task. Existing models lack adequate visual representation due to the neglect of interaction between object, and sufficient training for content-related words due to long-tailed problems. In this paper, we propose a complete video captioning system including both a novel model and an effective training strategy. Specifically, we propose an object relational graph (ORG) based encoder, which captures more detailed interaction features to enrich visual representation. Meanwhile, we design a teacher-recommended learning (TRL) method to make full use of the successful external language model (ELM) to integrate the abundant linguistic knowledge into the caption model. The ELM generates more semantically similar word proposals which extend the ground-truth words used for training to deal with the long-tailed problem. Experimental evaluations on three benchmarks: MSVD, MSR-VTT and VATEX show the proposed ORG-TRL system achieves state-of-the-art performance. Extensive ablation studies and visualizations illustrate the effectiveness of our system.

preprint2020arXiv

Ocean: Object-aware Anchor-free Tracking

Anchor-based Siamese trackers have achieved remarkable advancements in accuracy, yet the further improvement is restricted by the lagged tracking robustness. We find the underlying reason is that the regression network in anchor-based methods is only trained on the positive anchor boxes (i.e., $IoU \geq0.6$). This mechanism makes it difficult to refine the anchors whose overlap with the target objects are small. In this paper, we propose a novel object-aware anchor-free network to address this issue. First, instead of refining the reference anchor boxes, we directly predict the position and scale of target objects in an anchor-free fashion. Since each pixel in groundtruth boxes is well trained, the tracker is capable of rectifying inexact predictions of target objects during inference. Second, we introduce a feature alignment module to learn an object-aware feature from predicted bounding boxes. The object-aware feature can further contribute to the classification of target objects and background. Moreover, we present a novel tracking framework based on the anchor-free model. The experiments show that our anchor-free tracker achieves state-of-the-art performance on five benchmarks, including VOT-2018, VOT-2019, OTB-100, GOT-10k and LaSOT. The source code is available at https://github.com/researchmm/TracKit.

preprint2020arXiv

Random Covering Sets in Metric Space with Exponentially Mixing Property

Let $\{B(ξ_n,r_n)\}_{n\ge1}$ be a sequence of random balls whose centers $\{ξ_n\}_{n\ge1}$ is a stationary process, and $\{r_n\}_{n\ge1}$ is a sequence of positive numbers decreasing to 0. Our object is the random covering set $E=\limsup\limits_{n\to\infty}B(ξ_n,r_n)$, that is, the points covered by $B(ξ_n,r_n)$ infinitely often. The sizes of $E$ are investigated from the viewpoint of measure, dimension and topology.

preprint2020arXiv

Service Ecosystem: A Lens of Smart Society

Intelligence services are playing an increasingly important role in the operation of our society. Exploring the evolution mechanism, boundaries and challenges of service ecosystem is essential to our ability to realize smart society, reap its benefits and prevent potential risks. We argue that this necessitates a broad scientific research agenda to study service ecosystem that incorporates and expands upon the disciplines of computer science and includes insights from across the sciences. We firstly outline a set of research issues that are fundamental to this emerging field, and then explores the technical, social, legal and institutional challenges on the study of service ecosystem.

preprint2020arXiv

TimingCamouflage+: Netlist Security Enhancement with Unconventional Timing (with Appendix)

With recent advances in reverse engineering, attackers can reconstruct a netlist to counterfeit chips by opening the die and scanning all layers of authentic chips. This relatively easy counterfeiting is made possible by the use of the standard simple clocking scheme, where all combinational blocks function within one clock period, so that a netlist of combinational logic gates and flip-flops is sufficient to duplicate a design. In this paper, we propose to invalidate the assumption that a netlist completely represents the function of a circuit with unconventional timing. With the introduced wave-pipelining paths, attackers have to capture gate and interconnect delays during reverse engineering, or to test a huge number of combinational paths to identify the wave-pipelining paths. To hinder the test-based attack, we construct false paths with wave-pipelining to increase the counterfeiting challenge. Experimental results confirm that wave-pipelining true paths and false paths can be constructed in benchmark circuits successfully with only a negligible cost, thus thwarting the potential attack techniques.

preprint2019arXiv

Overview to the Hard X-ray Modulation Telescope (Insight-HXMT) Satellite

As China's first X-ray astronomical satellite, the Hard X-ray Modulation Telescope (HXMT), which was dubbed as Insight-HXMT after the launch on June 15, 2017, is a wide-band (1-250 keV) slat-collimator-based X-ray astronomy satellite with the capability of all-sky monitoring in 0.2-3 MeV. It was designed to perform pointing, scanning and gamma-ray burst (GRB) observations and, based on the Direct Demodulation Method (DDM), the image of the scanned sky region can be reconstructed. Here we give an overview of the mission and its progresses, including payload, core sciences, ground calibration/facility, ground segment, data archive, software, in-orbit performance, calibration, background model, observations and some preliminary results.

preprint2017arXiv

Independent component analysis for multivariate functional data

We extend two methods of independent component analysis, fourth order blind identification and joint approximate diagonalization of eigen-matrices, to vector-valued functional data. Multivariate functional data occur naturally and frequently in modern applications, and extending independent component analysis to this setting allows us to distill important information from this type of data, going a step further than the functional principal component analysis. To allow the inversion of the covariance operator we make the assumption that the dependency between the component functions lies in a finite-dimensional subspace. In this subspace we define fourth cross-cumulant operators and use them to construct the two novel, Fisher consistent methods for solving the independent component problem for vector-valued functions. Both simulations and an application on a hand gesture data set show the usefulness and advantages of the proposed methods over functional principal component analysis.

preprint2016arXiv

Approximation orders of real numbers by $β$-expansions

We prove that almost all real numbers (with respect to Lebesgue measure) are approximated by the convergents of their $β$-expansions with the exponential order $β^{-n}$. Moreover, the Hausdorff dimensions of sets of the real numbers which are approximated by all other orders, are determined. These results are also applied to investigate the orbits of real numbers under $β$-transformation, the shrinking target type problem, the Diophantine approximation and the run-length function of $β$-expansions.

preprint2016arXiv

Beta-expansion and continued fraction expansion of real numbers

Let $β> 1$ be a real number and $x \in [0,1)$ be an irrational number. We denote by $k_n(x)$ the exact number of partial quotients in the continued fraction expansion of $x$ given by the first $n$ digits in the $β$-expansion of $x$ ($n \in \mathbb{N}$). It is known that $k_n(x)/n$ converges to $(6\log2\logβ)/π^2$ almost everywhere in the sense of Lebesgue measure. In this paper, we improve this result by proving that the Lebesgue measure of the set of $x \in [0,1)$ for which $k_n(x)/n$ deviates away from $(6\log2\logβ)/π^2$ decays to 0 exponentially as $n$ tends to $\infty$, which generalizes the result of Faivre \cite{lesFai97} from $β= 10$ to any $β>1$. Moreover, we also discuss which of the $β$-expansion and continued fraction expansion yields the better approximations of real numbers.

preprint2016arXiv

Chaotic and Topological Properties of Continued Fractions

We prove that there exists a scrambled set for the Gauss map with full Hausdorff dimension. Meanwhile, we also investigate the topological properties of the sets of points with dense or non-dense orbits.

preprint2016arXiv

CP-odd component of the lightest neutral Higgs boson in the MSSM

The Higgs sector of the Minimal Supersymmetric Extension of the Standard Model may be described with a two Higgs doublet model with properties that depend on the soft supersymmetry breaking parameters. For instance, flavor independent CP-violating phases associated with the gaugino masses, the squark trilinear mass parameters and the Higgsino mass parameter $μ$ may lead to sizable CP-violation in the Higgs sector. For these CP-violating effects to affect the properties of the recently observed SM-like Higgs resonance, the non-standard charged and neutral Higgs bosons masses must be of the order of the weak scale, and both $μ$ as well as the trilinear stop mass parameter $A_t$ must be of the order or larger than the stop mass parameters. Constraints on this possibility come from direct searches for non-standard Higgs bosons, precision measurements on the lightest neutral Higgs properties, including its mass, and electric dipole moments. In this article, we discuss these constraints within the MSSM, trying to evaluate the possible size of the CP-odd component of the lightest neutral Higgs boson, and the possible experimental tests of this CP-violating effect at the LHC.

preprint2016arXiv

Hitting probabilities of random covering sets in tori and metric spaces

We provide sharp lower and upper bounds for the Hausdorff dimension of the intersection of a typical random covering set with a fixed analytic set both in Ahlfors regular metric spaces and in the $d$-dimensional torus. In metric spaces, we consider covering sets generated by balls and, in the torus, we deal with general analytic generating sets.

preprint2016arXiv

Limit theorems related to beta-expansion and continued fraction expansion

Let $β> 1$ be a real number and $x \in [0,1)$ be an irrational number. Denote by $k_n(x)$ the exact number of partial quotients in the continued fraction expansion of $x$ given by the first $n$ digits in the $β$-expansion of $x$ ($n \in \mathbb{N}$). In this paper, we show a central limit theorem and a law of the iterated logarithm for the random variables sequence $\{k_n, n \geq 1\}$, which generalize the results of Faivre and Wu respectively from $β=10$ to any $β>1$.

preprint2016arXiv

On the escape rate of unique beta-expansions

Let $1<β\leq 2$. It is well-known that the set of points in $% [0,1/(β-1)]$ having unique $β$-expansion, in other words, those points whose orbits under greedy $β$-transformation escape a hole depending on $β$, is of zero Lebesgue measure. The corresponding escape rate is investigated in this paper. A formula which links the Hausdorff dimension of univoque set and escape rate is established in this study. Then we also proved that such rate forms a devil's staircase function with respect to $β$.

preprint2016arXiv

Random Continued fractions: Lévy constant and Chernoff-type estimate

Given a stochastic process $\{A_n, n \geq 1\}$ taking values in natural numbers, the random continued fractions is defined as $[A_1, A_2, \cdots, A_n, \cdots]$ analogue to the continued fraction expansion of real numbers. Assume that $\{A_n, n \geq 1\}$ is ergodic and the expectation $E(\log A_1) < \infty$, we give a Lévy-type metric theorem which covers that of real case presented by Lévy in 1929. Moreover, a corresponding Chernoff-type estimate is obtained under the conditions $\{A_n, n \geq 1\}$ is $ψ$-mixing and for each $0< t< 1$, $E(A_1^t) < \infty$.

preprint2016arXiv

Rotor-phonon coupling in perovskite CH3NH3PbI3: the origin of exceptional transport properties

Atomic dynamics takes a fundamental part in numbers of physical properties of solids like high-Tc superconductivity, semiconducting transports, and thermoelectricity. Perovskite CH3NH3PbI3 exhibits outstanding photovoltaic performances, but the exact physical scenario has not been established yet, due to the inadequate understanding of the atomic dynamics and exceptional transport properties. We present a complete atomic dynamic picture consisting of phonons, rotational modes of protons and molecular vibrational modes, which is constructed by carrying out high-resolution time-of-flight inelastic neutron scattering measurements in a wide energy window ranging from 0.0036 to 54 meV on a large single crystal sample. A three-fold rotational mode of protons activated around 80 K reduces the lifetimes of acoustic and optical phonons down to about 4.5 ps and below 1 ps at 150 K, respectively. The orthorhombic to tetragonal phase transition takes place with a slower four-fold rotational mode of the C-N axis concomitantly setting in at ~ 165 K, above which the optical phonons are too broadened to be distinguished whereas the acoustic ones are still robust. The significantly reduced lifetimes of optical phonons are linked to the smaller mobility of charge carriers while the ultralow lattice thermal conductivity is attributed to nanoscale mean free paths of acoustic phonons. These microscopic insights provide a solid standing point, on which perovskite solar cells can be understood more accurately and their performances are perhaps further optimized. The revealed rotor-phonon coupling opens up an emergent opportunity to create unprecedented functionalities of materials.

preprint2016arXiv

TDSelector: A Training Data Selection Method for Cross-Project Defect Prediction

In recent years, cross-project defect prediction (CPDP) attracted much attention and has been validated as a feasible way to address the problem of local data sparsity in newly created or inactive software projects. Unfortunately, the performance of CPDP is usually poor, and low quality training data selection has been regarded as a major obstacle to achieving better prediction results. To the best of our knowledge, most of existing approaches related to this topic are only based on instance similarity. Therefore, the objective of this work is to propose an improved training data selection method for CPDP that considers both similarity and the number of defects each training instance has (denoted by defects), which is referred to as TDSelector, and to demonstrate the effectiveness of the proposed method. Our experiments were conducted on 14 projects (including 15 data sets) collected from two public repositories. The results indicate that, in a specific CPDP scenario, the TDSelector-based bug predictor performs, on average, better than those based on the baseline methods, and the AUC (area under ROC curve) values are increased by up to 10.6 and 4.3%, respectively. Besides, an additional experiment shows that selecting those instances with more bugs directly as training data can further improve the performance of the bug predictor trained by our method.

preprint2016arXiv

The denominators of convergents for continued fractions

For any real number $x \in [0,1)$, we denote by $q_n(x)$ the denominator of the $n$-th convergent of the continued fraction expansion of $x$ $(n \in \mathbb{N})$. It is well-known that the Lebesgue measure of the set of points $x \in [0,1)$ for which $\log q_n(x)/n$ deviates away from $π^2/(12\log2)$ decays to zero as $n$ tends to infinity. In this paper, we study the rate of this decay by giving an upper bound and a lower bound. What is interesting is that the upper bound is closely related to the Hausdorff dimensions of the level sets for $\log q_n(x)/n$. As a consequence, we obtain a large deviation type result for $\log q_n(x)/n$, which indicates that the rate of this decay is exponential.

preprint2016arXiv

The dimension of irregular set in parameter space

For any real number $β>1$. The $n$th cylinder of $β$ in the parameter space $\{β\in \mathbb{R}: β>1\}$ is a set of real numbers in $(1,\infty)$ having the same first $n$ digits in their $β$-expansion of $1$, denote by $I^P_n(β)$. We study the quantities which describe the growth of the length of $I^P_n(β)$. The Huasdorff dimension of the set of given growth rate of the length of $I^P_n(β)$ will be determined in this paper.

preprint2016arXiv

The topological property of the irregular sets on the lengths of basic intervals in beta-expansions

Let $β> 1$ be a real number and $(ε_1(x, β), ε_2(x, β), \ldots)$ be the $β$-expansion of a point $x \in (0, 1]$. For all $x \in (0,1]$, let $A(D(x))$ be the set of accumulation points of $\frac{-\log_β|I_n(x)|}{n}$ as $n \rightarrow \infty$, where $|I_n(x)|$ is the length of the basic interval of order $n$ containing $x \in (0, 1]$. In this paper, we prove that $A(D(x))$ is always a closed interval for any $x \in (0,1]$. Furthermore, if $λ(β)>0$, the extremely irregular set containing points $x \in [0, 1]$ whose upper limit of $\frac{-\log_β|I_n(x)|}{n}$ equals to $1+ł(β)$ is residual, where $1+ł(β)$ is a constant depending on $β$. As a consequence, the irregular set with $x\in [0, 1]$ whose limit of $\frac{-\log_β|I_n(x)|}{n}$ does not exist is residual for every $λ(β)>0$.

preprint2016arXiv

Zero-one law of Hausdorff dimensions of the recurrent sets

Let $(Σ, σ)$ be the one-sided shift space with $m$ symbols and $R_n(x)$ be the first return time of $x\inΣ$ to the $n$-th cylinder containing $x$. Denote $$E^φ_{α,β}=\left\{x\inΣ: \liminf_{n\to\infty}\frac{\log R_n(x)}{φ(n)}=α,\ \limsup_{n\to\infty}\frac{\log R_n(x)}{φ(n)}=β\right\},$$ where $φ: \mathbb{N}\to \mathbb{R}^+$ is a monotonically increasing function and $0\leqα\leqβ\leq +\infty$. We show that the Hausdorff dimension of the set $E^φ_{α,β}$ admits a dichotomy: it is either zero or one depending on $φ, α$ and $β$.

preprint2015arXiv

Almost automorphic funtions on time scales and almost automorphic solutions to shunting inhibitory cellular neural networks on time scales

In this paper, we first propose a new concept of almost periodic time scales, a new definition of almost automorphic functions on almost periodic time scales, and study some their basic properties. Then we prove a result ensuring the existence of an almost automorphic solution for both the linear nonhomogeneous dynamic equation on time scales and its associated homogeneous equation, assuming that the associated homogeneous equation admits an exponential dichotomy. Finally, as an application of our results, we establish the existence and global exponential stability of almost automorphic solutions to a class of shunting inhibitory cellular neural networks with time-varying delays on time scales. Our results about the shunting inhibitory cellular neural network with time-varying delays on time scales are new even for the both cases of differential equations(the time scale $\mathbb{T}=\mathbb{R})$ and difference equations(the time scale $\mathbb{T}=\mathbb{Z})$.

preprint2015arXiv

Benefit of Multipath TCP on the Stability of Network

Multipath-TCP receives a lot of attention recently and can potentially improve quality of service for both private and commercial users. It leverages the multiple available paths and send packets through all the available paths. The growing of Mutipath TCP has received a growing interest from both researchers who publish a growing number of articles on the topic and the vendors since Apple has decided to use Multipath TCP on its smartphones and tablets to support the Siri voice recognition application. In this paper, we study the performance of Multipath TCP from its impact on the stability of the network. In particular, we study three scenarios, Internet, which is the largest networks and involves heterogeneous traffic, data center, which is smaller but has different traffic patterns compared with Internet scale network and wireless network, whose energy consumption also needs to be considered. Our study shows that stability is affected but not seriouly for Internet and wireless network, but datacenter network stability is seriously affected due to its bursty traffic pattern.

preprint2015arXiv

Existence and exponential stability of positive almost periodic solution for Nicholson's blowflies models on time scales

In this paper, we first give a new definition of almost periodic time scales, two new definitions of almost periodic functions on time scales and investigate some basic properties of them. Then, as an application, by using the fixed point theorem in Banach space and the time scale calculus theory, we obtain some sufficient conditions for the existence and exponential stability of positive almost periodic solutions for a class of Nicholson's blowflies models on time scales. Finally, we present an illustrative example to show the effectiveness of obtained results. Our results show that under a simple condition the continuous-time Nicholson's blowflies models and their discrete-time analogue have the same dynamical behaviors.

preprint2015arXiv

Permanence and almost periodic solutions for a single-species system with impulsive effects on time scales

In this paper, we first propose a single-species system with impulsive effects on time scales and by establishing some new comparison theorems of impulsive dynamic equations on time scales, we obtain sufficient conditions to guarantee the permanence of the system. Then we prove a Massera type theorem for impulsive dynamic equations on time scales and based on this theorem, we establish a criterion for the existence and uniformly asymptotic stability of unique positive almost periodic solution of the system. Finally, we give an example to show the feasibility of our main results. Our example also shows that the continuous time system and its corresponding discrete time system have the same dynamics. Our results of this paper are completely new.

preprint2015arXiv

Probing the Electroweak Phase Transition at the LHC

We study the correlation between the value of the triple Higgs coupling and the nature of the electroweak phase transition. We use an effective potential approach, including higher order, non-renormalizable terms coming from integrating out new physics. We show that if only the dimension six operators are considered, large positive deviations of the triple Higgs coupling from its Standard Model (SM) value are predicted in the regions of parameter space consistent with a strong first order electroweak phase transition (SFOEPT). We also show that at higher orders sizable and negative deviations of the triple Higgs coupling may be obtained, and the sign of the corrections tends to be correlated with the order of the phase transition. We also consider a singlet extension of the SM, which allows us to establish the connection with the effective field theory (EFT) approach and analyze the limits of its validity. Furthermore, we study how to probe the triple Higgs coupling from the double Higgs production at the LHC. We show that selective cuts in the invariant mass of the two Higgs bosons should be used, to maximize the sensitivity for values of the triple Higgs coupling significantly different from the Standard Model one.

preprint2015arXiv

RF Transport Electromagnetic Properties of CVD Graphene from DC to 110 MHz

We report measurement of the radio-frequency (RF) transport electromagnetic properties of chemical vapour deposition (CVD) graphene over the DC to 110 MHz frequency range at room temperature. Graphene on Si/SiO2 substrate was mounted in a shielded four terminal-pair (4TP) adaptor which enabled direct connection to a calibrated precision impedance analyser for measurements. Good agreement is observed for the DC four-probe resistance and the 4TP resistance at 40 Hz, both yielding R ~ 104 Ω. In general the apparent graphene channel electromagnetic properties are found to be strongly influenced by the substrate parasitic capacitance and resistance, particularly for high-frequencies f > 1 MHz. A phenomenological lumped-parameter equivalent circuit model is presented which matches the frequency response of the graphene 4TP impedance device over approximately seven decades of the frequency range of the applied transport alternating current. Based on this model, it is shown for the first time, that the intrinsic graphene channel resistance of the 4TP device is frequency-independent (i.e. dissipationless) with RG ~ 105 Ω or sheet resistance of approximately 182 Ω / sq. The parasitic substrate impedance of the device is found shunt RG with RP ~ 2.2 Ω in series with CP ~ 600 pF. These results suggest that our new RF 4TP method is in good agreement with the conventional DC four-probe method for measuring the intrinsic sheet resistance of single-atom thick materials and could potentially open up new applications in RF electronics, AC quantum Hall effect metrology and sensors based on graphene 4TP devices operating over broad range of frequencies.

preprint2014arXiv

An Analysis of Research in Software Engineering: Assessment and Trends

Glass published the first report on the assessment of systems and software engineering scholars and institutions two decades ago. The ongoing, annual survey of publications in this field provides fund managers, young scholars, graduate students, etc. with useful information for different purposes. However, the studies have been questioned by some critics because of a few shortcomings of the evaluation method. It is actually very hard to reach a widely recognized consensus on such an assessment of scholars and institutions. This paper presents a module and automated method for assessment and trends analysis in software engineering compared with the prior studies. To achieve a more reasonable evaluation result, we take into consideration more high-quality publications, the rank of each publication analyzed, and the different roles of authors named on each paper in question. According to the 7638 papers published in 36 publications from 2008 to 2013, the statistics of research subjects roughly follow power laws, implying the interesting Matthew Effect. We then identify the Top 20 scholars, institutions and countries or regions in terms of a new evaluation rule based on the frequently-used one. The top-ranked scholar is Mark Harman of the University College London, UK, the top-ranked institution is the University of California, USA, and the top-ranked country is the USA. Besides, we also show two levels of trend changes based on the EI classification system and user-defined uncontrolled keywords, as well as noteworthy scholars and institutions in a specific research area. We believe that our results would provide a valuable insight for young scholars and graduate students to seek possible potential collaborators and grasp the popular research topics in software engineering.

preprint2014arXiv

An Empirical Study on Software Defect Prediction with a Simplified Metric Set

Software defect prediction plays a crucial role in estimating the most defect-prone components of software, and a large number of studies have pursued improving prediction accuracy within a project or across projects. However, the rules for making an appropriate decision between within- and cross-project defect prediction when available historical data are insufficient remain unclear. The objective of this work is to validate the feasibility of the predictor built with a simplified metric set for software defect prediction in different scenarios, and to investigate practical guidelines for the choice of training data, classifier and metric subset of a given project. First, based on six typical classifiers, we constructed three types of predictors using the size of software metric set in three scenarios. Then, we validated the acceptable performance of the predictor based on Top-k metrics in terms of statistical methods. Finally, we attempted to minimize the Top-k metric subset by removing redundant metrics, and we tested the stability of such a minimum metric subset with one-way ANOVA tests. The experimental results indicate that (1) the choice of training data should depend on the specific requirement of prediction accuracy; (2) the predictor built with a simplified metric set works well and is very useful in case limited resources are supplied; (3) simple classifiers (e.g., Naive Bayes) also tend to perform well when using a simplified metric set for defect prediction; and (4) in several cases, the minimum metric subset can be identified to facilitate the procedure of general defect prediction with acceptable loss of prediction precision in practice. The guideline for choosing a suitable simplified metric set in different scenarios is presented in Table 12.

preprint2014arXiv

Linear Contour Learning: A Method for Supervised Dimension Reduction

We propose a novel approach to sufficient dimension reduction in regression, based on estimating contour directions of negligible variation for the response surface. These directions span the orthogonal complement of the minimal space relevant for the regression, and can be extracted according to a measure of the variation in the response, leading to General Contour Regression(GCR). In comparison to exiisting sufficient dimension reduction techniques, this sontour-based mothology guarantees exhaustive estimation of the central space under ellipticity of the predictoor distribution and very mild additional assumptions, while maintaining vn-consisytency and somputational ease. Moreover, it proves to be robust to departures from ellipticity. We also establish some useful population properties for GCR. Simulations to compare performance with that of standard techniques such as ordinary least squares, sliced inverse regression, principal hessian directions, and sliced average variance estimation confirm the advntages anticipated by theoretical analyses. We also demonstrate the use of contour-based methods on a data set concerning grades of students from Massachusetts colleges.

preprint2014arXiv

On efficient dimension reduction with respect to a statistical functional of interest

We introduce a new sufficient dimension reduction framework that targets a statistical functional of interest, and propose an efficient estimator for the semiparametric estimation problems of this type. The statistical functional covers a wide range of applications, such as conditional mean, conditional variance and conditional quantile. We derive the general forms of the efficient score and efficient information as well as their specific forms for three important statistical functionals: the linear functional, the composite linear functional and the implicit functional. In conjunction with our theoretical analysis, we also propose a class of one-step Newton-Raphson estimators and show by simulations that they substantially outperform existing methods. Finally, we apply the new method to construct the central mean and central variance subspaces for a data set involving the physical measurements and age of abalones, which exhibits a strong pattern of heteroscedasticity.

preprint2014arXiv

Random affine code tree fractals and Falconer-Sloan condition

We calculate the almost sure dimension for a general class of random affine code tree fractals in $\mathbb R^d$. The result is based on a probabilistic version of the Falconer-Sloan condition $C(s)$ introduced in \cite{FS}. We verify that, in general, systems having a small number of maps do not satisfy condition $C(s)$. However, there exists a natural number $n$ such that for typical systems the family of all iterates up to level $n$ satisfies condition $C(s)$.

preprint2014arXiv

Removing Mixture of Gaussian and Impulse Noise by Patch-Based Weighted Means

We first establish a law of large numbers and a convergence theorem in distribution to show the rate of convergence of the non-local means filter for removing Gaussian noise. We then introduce the notion of degree of similarity to measure the role of similarity for the non-local means filter. Based on the convergence theorems, we propose a patch-based weighted means filter for removing impulse noise and its mixture with Gaussian noise by combining the essential idea of the trilateral filter and that of the non-local means filter. Our experiments show that our filter is competitive compared to recently proposed methods.

preprint2014arXiv

Simplification of Training Data for Cross-Project Defect Prediction

Cross-project defect prediction (CPDP) plays an important role in estimating the most likely defect-prone software components, especially for new or inactive projects. To the best of our knowledge, few prior studies provide explicit guidelines on how to select suitable training data of quality from a large number of public software repositories. In this paper, we have proposed a training data simplification method for practical CPDP in consideration of multiple levels of granularity and filtering strategies for data sets. In addition, we have also provided quantitative evidence on the selection of a suitable filter in terms of defect-proneness ratio. Based on an empirical study on 34 releases of 10 open-source projects, we have elaborately compared the prediction performance of different defect predictors built with five well-known classifiers using training data simplified at different levels of granularity and with two popular filters. The results indicate that when using the multi-granularity simplification method with an appropriate filter, the prediction models based on Naive Bayes can achieve fairly good performance and outperform the benchmark method.

preprint2014arXiv

Towards Cross-Project Defect Prediction with Imbalanced Feature Sets

Cross-project defect prediction (CPDP) has been deemed as an emerging technology of software quality assurance, especially in new or inactive projects, and a few improved methods have been proposed to support better defect prediction. However, the regular CPDP always assumes that the features of training and test data are all identical. Hence, very little is known about whether the method for CPDP with imbalanced feature sets (CPDP-IFS) works well. Considering the diversity of defect data sets available on the Internet as well as the high cost of labeling data, to address the issue, in this paper we proposed a simple approach according to a distribution characteristic-based instance (object class) mapping, and demonstrated the validity of our method based on three public defect data sets (i.e., PROMISE, ReLink and AEEEM). Besides, the empirical results indicate that the hybrid model composed of CPDP and CPDP-IFS does improve the prediction performance of the regular CPDP to some extent.

preprint2013arXiv

A note on the hitting probabilities of random covering sets

Let $E=\limsup\limits_{n\to\infty}(g_n+ξ_n)$ be the random covering set on the torus $\mathbb{T}^d$, where $\{g_n\}$ is a sequence of ball-like sets and $ξ_n$ is a sequence of independent random variables uniformly distributed on $\T^d$. We prove that $E\cap F\neq\emptyset$ almost surely whenever $F\subset\mathbb{T}^d$ is an analytic set with Hausdorff dimension, $\dim_H(F)>d-α$, where $α$ is the almost sure Hausdorff dimension of $E$. Moreover, examples are given to show that the condition on $\dim_H(F)$ cannot be replaced by the packing dimension of $F$.

preprint2013arXiv

Diophantine approximation of the orbit of 1 in the dynamical system of bete expansions

We consider the distribution of the orbits of the number 1 under the $β$-transformations $T_β$ as $β$ varies. Mainly, the size of the set of $β>1$ for which a given point can be well approximated by the orbit of 1 is measured by its Hausdorff dimension. That is, the dimension of the following set $$ E\big({\ell_n}_{n\ge 1}, x_0\big)=\Big{β>1: |T^n_β1-x_0|<β^{-\ell_n}, {for infinitely many} n\in \N\Big} $$ is determined, where $x_0$ is a given point in $[0,1]$ and ${\ell_n}_{n\ge 1}$ is a sequence of integers tending to infinity as $n\to \infty$. For the proof of this result, the notion of the recurrence time of a word in symbolic space is introduced to characterize the lengths and the distribution of cylinders (the set of $β$ with a common prefix in the expansion of 1) in the parameter space ${β\in \R: β>1}$.

preprint2013arXiv

Hausdorff dimension of affine random covering sets in torus

We calculate the almost sure Hausdorff dimension of the random covering set $\limsup_{n\to\infty}(g_n + ξ_n)$ in $d$-dimensional torus $\mathbb T^d$, where the sets $g_n\subset\mathbb T^d$ are parallelepipeds, or more generally, linear images of a set with nonempty interior, and $ξ_n\in\mathbb T^d$ are independent and uniformly distributed random points. The dimension formula, derived from the singular values of the linear mappings, holds provided that the sequences of the singular values are decreasing.

preprint2013arXiv

Projections of random covering sets

We show that, almost surely, the Hausdorff dimension $s_0$ of a random covering set is preserved under all orthogonal projections to linear subspaces with dimension $k>s_0$. The result holds for random covering sets with a generating sequence of ball-like sets, and is obtained by investigating orthogonal projections of a class of random Cantor sets.

preprint2013arXiv

The hybrid lattice of KxFe2-ySe2: why superconductivity and magnetism can coexist

Much remains unknown of the microscopic origin of superconductivity when it materializes in atomically disordered systems as in amorphous alloys (1) or in crystals riddled with defects(2). A manifestation of this conundrum is envisaged in the highly defective iron chalcogenide superconductors of KxFe2-ySe2 (3-6). How can superconductivity survive under such crude conditions that call for strong electron localization (7)? With vacancies present both at the K and Fe sites, superconductivity is bordering a semi-metallic region below x ~ 0.7 and an insulating and antiferromagnetic region above x ~ 0.85 (8,9). Here, we report on the bulk local atomic structure and show that the Fe sublattice is locally distorted in a way that it accommodates two kinds of Fe valence environments giving rise to a bimodal bond distribution. While the bond length distribution is driven by K and Fe contents, the superconducting state is characterized by the coexistence of both short (metallic) and long (insulating) Fe bond environments and is not phase separated. In contrast to other Fe-based materials in which only one kind of Fe to Fe bond is present, the dual nature of the Fe correlations explains why superconductivity is intertwined with magnetic order. Such a hybrid state is most likely present in cuprate superconductors as well (10,11) while our results point to the importance of the local atomic symmetry by which the exchange interactions between local moments can materialize (12).

preprint2012arXiv

Color Constancy based on Image Similarity via Bilayer Sparse Coding

Computational color constancy is a very important topic in computer vision and has attracted many researchers' attention. Recently, lots of research has shown the effects of high level visual content information for illumination estimation. However, all of these existing methods are essentially combinational strategies in which image's content analysis is only used to guide the combination or selection from a variety of individual illumination estimation methods. In this paper, we propose a novel bilayer sparse coding model for illumination estimation that considers image similarity in terms of both low level color distribution and high level image scene content simultaneously. For the purpose, the image's scene content information is integrated with its color distribution to obtain optimal illumination estimation model. The experimental results on two real-world image sets show that our algorithm is superior to other prevailing illumination estimation methods, even better than combinational methods.

preprint2012arXiv

Principal support vector machines for linear and nonlinear sufficient dimension reduction

We introduce a principal support vector machine (PSVM) approach that can be used for both linear and nonlinear sufficient dimension reduction. The basic idea is to divide the response variables into slices and use a modified form of support vector machine to find the optimal hyperplanes that separate them. These optimal hyperplanes are then aligned by the principal components of their normal vectors. It is proved that the aligned normal vectors provide an unbiased, $\sqrt{n}$-consistent, and asymptotically normal estimator of the sufficient dimension reduction space. The method is then generalized to nonlinear sufficient dimension reduction using the reproducing kernel Hilbert space. In that context, the aligned normal vectors become functions and it is proved that they are unbiased in the sense that they are functions of the true nonlinear sufficient predictors. We compare PSVM with other sufficient dimension reduction methods by simulation and in real data analysis, and through both comparisons firmly establish its practical advantages.

preprint2012arXiv

Sufficient dimension reduction based on an ensemble of minimum average variance estimators

We introduce a class of dimension reduction estimators based on an ensemble of the minimum average variance estimates of functions that characterize the central subspace, such as the characteristic functions, the Box--Cox transformations and wavelet basis. The ensemble estimators exhaustively estimate the central subspace without imposing restrictive conditions on the predictors, and have the same convergence rate as the minimum average variance estimates. They are flexible and easy to implement, and allow repeated use of the available sample, which enhances accuracy. They are applicable to both univariate and multivariate responses in a unified form. We establish the consistency and convergence rate of these estimators, and the consistency of a cross validation criterion for order determination. We compare the ensemble estimators with other estimators in a wide variety of models, and establish their competent performance.

preprint2010arXiv

On dimension folding of matrix- or array-valued statistical objects

We consider dimension reduction for regression or classification in which the predictors are matrix- or array-valued. This type of predictor arises when measurements are obtained for each combination of two or more underlying variables--for example, the voltage measured at different channels and times in electroencephalography data. For these applications, it is desirable to preserve the array structure of the reduced predictor (e.g., time versus channel), but this cannot be achieved within the conventional dimension reduction formulation. In this paper, we introduce a dimension reduction method, to be called dimension folding, for matrix- and array-valued predictors that preserves the array structure. In an application of dimension folding to an electroencephalography data set, we correctly classify 97 out of 122 subjects as alcoholic or nonalcoholic based on their electroencephalography in a cross-validation sample.

Bing Li

What is connected

Connect this record

See the researcher in context

Building this map preview

87 published item(s)

How Far Is Document Parsing from Solved? PureDocBench: A Source-TraceableBenchmark across Clean, Degraded, and Real-World Settings

Dynamically Masked Discriminator for Generative Adversarial Networks

Set Prediction Guided by Semantic Concepts for Diverse Video Captioning

Code Smells Detection via Modern Code Review: A Study of the OpenStack and Qt Communities

Competing magnetic fluctuations and orders in a multiorbital model of doped SrCo$_2$As$_2$

Continual Prompt Tuning for Dialog State Tracking

CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation

EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

FGNET-RH: Fine-Grained Named Entity Typing via Refinement in Hyperbolic Space

Glassy crystals with colossal multi-baroresponsivities

Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

iPTF14hls in the circumstellar medium interaction model: A promising candidate for a pulsational pair-instability supernova

Large intersection property for limsup sets in metric space

Learning Scene Flow in 3D Point Clouds with Noisy Pseudo Labels

Learning Target-aware Representation for Visual Tracking via Informative Interactions

Low temperature competing magnetic energy scales in the topological ferrimagnet TbMn6Sn6

NAND-SPIN-Based Processing-in-MRAM Architecture for Convolutional Neural Network Acceleration

Nonlinear function-on-function regression by RKHS

Novel Valence Transition in Elemental Metal Europium around 80 GPa

Periodic repeating fast radio bursts: interaction between a magnetized neutron star and its planet in an eccentric orbit

PIC 4th Challenge: Semantic-Assisted Multi-Feature Encoding and Multi-Head Decoding for Dense Video Captioning

Quasi-periodic oscillations of the X-ray burst from the magnetar SGR J1935+2154 and associated with the fast radio burst FRB 200428

Rational numbers in $\times b$-invariant sets

Recursive Least-Squares Estimator-Aided Online Learning for Visual Tracking

Rethinking the competition between detection and ReID in Multi-Object Tracking

SCTN: Sparse Convolution-Transformer Network for Scene Flow Estimation

Simultaneous Estimation of Graphical Models by Neighborhood Selection

SubGraph Networks based Entity Alignment for Cross-lingual Knowledge Graph

The First Insight-HXMT Gamma-Ray Burst Catalog: The First Four Years

Two-dimensional Functional Minerals for Sustainable Optics

VirtualSync+: Timing Optimization with Virtual Synchronization

Cross-Lingual Named Entity Recognition Using Parallel Corpus: A New Approach Using XLM-RoBERTa Alignment

Dynamic self-consistent field approach for studying kinetic processes in multiblock copolymer melts

In-orbit timing calibration of the Insight-Hard X-ray Modulation Telescope

Named Entity Recognition in the Style of Object Detection

On the hitting probabilities of limsup random fractals

Open-book Video Captioning with Retrieve-Copy-Generate Network

Ultrasensitive barocaloric material for room-temperature solid-state refrigeration

Discovery of oscillations above 200 keV in a black hole X-ray binary with Insight-HXMT

Dynamical Borel-Cantelli lemma for recurrence theory

Fine-Grained Named Entity Typing over Distantly Supervised Data Based on Refined Representations

FRB 200428: an Impact between an Asteroid and a Magnetar

Model-Based Compensation of Moving Tissue for State Recognition in Robotic-Assisted Pedicle Drilling

Object Relational Graph with Teacher-Recommended Learning for Video Captioning

Ocean: Object-aware Anchor-free Tracking

Random Covering Sets in Metric Space with Exponentially Mixing Property

Service Ecosystem: A Lens of Smart Society

TimingCamouflage+: Netlist Security Enhancement with Unconventional Timing (with Appendix)

Overview to the Hard X-ray Modulation Telescope (Insight-HXMT) Satellite

Independent component analysis for multivariate functional data

Approximation orders of real numbers by $β$-expansions

Beta-expansion and continued fraction expansion of real numbers

Chaotic and Topological Properties of Continued Fractions

CP-odd component of the lightest neutral Higgs boson in the MSSM

Hitting probabilities of random covering sets in tori and metric spaces

Limit theorems related to beta-expansion and continued fraction expansion

On the escape rate of unique beta-expansions

Random Continued fractions: Lévy constant and Chernoff-type estimate

Rotor-phonon coupling in perovskite CH3NH3PbI3: the origin of exceptional transport properties

TDSelector: A Training Data Selection Method for Cross-Project Defect Prediction

The denominators of convergents for continued fractions

The dimension of irregular set in parameter space

The topological property of the irregular sets on the lengths of basic intervals in beta-expansions

Zero-one law of Hausdorff dimensions of the recurrent sets

Almost automorphic funtions on time scales and almost automorphic solutions to shunting inhibitory cellular neural networks on time scales

Benefit of Multipath TCP on the Stability of Network

Existence and exponential stability of positive almost periodic solution for Nicholson's blowflies models on time scales

Permanence and almost periodic solutions for a single-species system with impulsive effects on time scales

Probing the Electroweak Phase Transition at the LHC

RF Transport Electromagnetic Properties of CVD Graphene from DC to 110 MHz

An Analysis of Research in Software Engineering: Assessment and Trends

An Empirical Study on Software Defect Prediction with a Simplified Metric Set

Linear Contour Learning: A Method for Supervised Dimension Reduction

On efficient dimension reduction with respect to a statistical functional of interest