Topic overview

physics.data-an

762 works3002 researchers0 institutions

Topic snapshot

What this area looks like now

762works
3002authors
0experts visible
0communities

Next steps

Move from topic reading into action

The graph preview below keeps the nearby papers, people and communities visible in the same reading flow.

Topic graph

See the topic as a live network

Open full explorer

Inspect nearby papers, researchers, institutions and communities without opening a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Papers in this area

24 featured work(s)

preprint2015arXiv

Individual-based approach to epidemic processes on arbitrary dynamic contact networks

The dynamics of contact networks and epidemics of infectious diseases often occur on comparable time scales. Ignoring one of these time scales may provide an incomplete understanding of the population dynamics of the infection process. We develop an individual-based approximation for the susceptible-infected-recovered epidemic model applicable to arbitrary dynamic networks. Our framework provides, at the individual-level, the probability flow over time associated with the infection dynamics. This computationally efficient framework discards the correlation between the states of different nodes, yet provides accurate results in approximating direct numerical simulations. It naturally captures the temporal heterogeneities and correlations of contact sequences, fundamental ingredients regulating the timing and size of an epidemic outbreak. Using real-life data, we show that the static network model overestimates the reproduction number but underestimates the infection potential of super-spreading individuals. The high accuracy of our approximation further allows us to detect the index individual of an epidemic outbreak.

preprint2015arXiv

Respondent-driven sampling bias induced by clustering and community structure in social networks

Sampling hidden populations is particularly challenging using standard sampling methods mainly because of the lack of a sampling frame. Respondent-driven sampling (RDS) is an alternative methodology that exploits the social contacts between peers to reach and weight individuals in these hard-to-reach populations. It is a snowball sampling procedure where the weight of the respondents is adjusted for the likelihood of being sampled due to differences in the number of contacts. In RDS, the structure of the social contacts thus defines the sampling process and affects its coverage, for instance by constraining the sampling within a sub-region of the network. In this paper we study the bias induced by network structures such as social triangles, community structure, and heterogeneities in the number of contacts, in the recruitment trees and in the RDS estimator. We simulate different scenarios of network structures and response-rates to study the potential biases one may expect in real settings. We find that the prevalence of the estimated variable is associated with the size of the network community to which the individual belongs. Furthermore, we observe that low-degree nodes may be under-sampled in certain situations if the sample and the network are of similar size. Finally, we also show that low response-rates lead to reasonably accurate average estimates of the prevalence but generate relatively large biases.

preprint2018arXiv

PAFit: an R Package for the Non-Parametric Estimation of Preferential Attachment and Node Fitness in Temporal Complex Networks

Many real-world systems are profitably described as complex networks that grow over time. Preferential attachment and node fitness are two simple growth mechanisms that not only explain certain structural properties commonly observed in real-world systems, but are also tied to a number of applications in modeling and inference. While there are statistical packages for estimating various parametric forms of the preferential attachment function, there is no such package implementing non-parametric estimation procedures. The non-parametric approach to the estimation of the preferential attachment function allows for comparatively finer-grained investigations of the `rich-get-richer' phenomenon that could lead to novel insights in the search to explain certain nonstandard structural properties observed in real-world networks. This paper introduces the R package PAFit, which implements non-parametric procedures for estimating the preferential attachment function and node fitnesses in a growing network, as well as a number of functions for generating complex networks from these two mechanisms. The main computational part of the package is implemented in C++ with OpenMP to ensure scalability to large-scale networks. We first introduce the main functionalities of PAFit through simulated examples, and then use the package to analyze a collaboration network between scientists in the field of complex networks. The results indicate the joint presence of `rich-get-richer' and `fit-get-richer' phenomena in the collaboration network. The estimated attachment function is observed to be near-linear, which we interpret as meaning that the chance an author gets a new collaborator is proportional to their current number of collaborators. Furthermore, the estimated author fitnesses reveal a host of familiar faces from the complex networks community among the field's topmost fittest network scientists.

preprint2019arXiv

A Geometric Perspective on Quantum Parameter Estimation

Quantum metrology holds the promise of an early practical application of quantum technologies, in which measurements of physical quantities can be made with much greater precision than what is achievable with classical technologies. In this review, we collect some of the key theoretical results in quantum parameter estimation by presenting the theory for the quantum estimation of a single parameter, multiple parameters, and optical estimation using Gaussian states. We give an overview of results in areas of current research interest, such as Bayesian quantum estimation, noisy quantum metrology, and distributed quantum sensing. We address the question how minimum measurement errors can be achieved using entanglement as well as more general quantum states. This review is presented from a geometric perspective. This has the advantage that it unifies a wide variety of estimation procedures and strategies, thus providing a more intuitive big picture of quantum parameter estimation.

preprint2019arXiv

Hierarchical Clustering Supported by Reciprocal Nearest Neighbors

Clustering is a fundamental analysis tool aiming at classifying data points into groups based on their similarity or distance. It has found successful applications in all natural and social sciences, including biology, physics, economics, chemistry, astronomy, psychology, and so on. Among numerous existent algorithms, hierarchical clustering algorithms are of a particular advantage as they can provide results under different resolutions without any predetermined number of clusters and unfold the organization of resulted clusters. At the same time, they suffer a variety of drawbacks and thus are either time-consuming or inaccurate. We propose a novel hierarchical clustering approach on the basis of a simple hypothesis that two reciprocal nearest data points should be grouped in one cluster. Extensive tests on data sets across multiple domains show that our method is much faster and more accurate than the state-of-the-art benchmarks. We further extend our method to deal with the community detection problem in real networks, achieving remarkably better results in comparison with the well-known Girvan-Newman algorithm.

preprint2019arXiv

Topological universality of on-demand ride-sharing efficiency

Ride-sharing may substantially contribute to future-compliant sustainable mobility, both in urban and rural areas. The service quality of ride-sharing fleets jointly depends on the topology of the underlying street networks, the spatio-temporal demand distributions, and the dispatching algorithms. Yet, efficiency of ride-sharing services is typically quantified by economic or ecological ad-hoc measures that do not transfer to new service regions with different characteristics. Here we derive a generic measure of ride-sharing efficiency based on the intrinsic ride-sharing dynamics that follows a universal scaling law across network topologies. We demonstrate that the same scaling holds across street networks of distinct topologies, including cities, islands and rural areas, and is insensitive to modifying request distributions and dispatching criteria. These results further our understanding of the collective dynamics of ride-sharing fleets and may enable quantitative evaluation of conditions towards increasing the feasibility of creating or transferring ride-sharing services to previously unserviced regions.

preprint2019arXiv

Spatially Continuous and High-resolution Land Surface Temperature: A Review of Reconstruction and Spatiotemporal Fusion Techniques

Remotely sensed, spatially continuous and high spatiotemporal resolution (hereafter referred to as high resolution) land surface temperature (LST) is a key parameter for studying the thermal environment and has important applications in many fields. However, difficult atmospheric conditions, sensor malfunctioning and scanning gaps between orbits frequently introduce spatial discontinuities into satellite-retri1eved LST products. For a single sensor, there is also a trade-off between temporal and spatial resolution and, therefore, it is impossible to obtain high temporal and spatial resolution simultaneously. In recent years the reconstruction and spatiotemporal fusion of LST products have become active research topics that aim at overcoming this limitation. They are two of most investigated approaches in thermal remote sensing and attract increasing attention, which has resulted in a number of different algorithms. However, to the best of our knowledge, currently no review exists that expatiates and summarizes the available LST reconstruction and spatiotemporal fusion methods and algorithms. This paper introduces the principles and theories behind LST reconstruction and spatiotemporal fusion and provides an overview of the published research and algorithms. We summarized three kinds of reconstruction methods for missing pixels (spatial, temporal and spatiotemporal methods), two kinds of reconstruction methods for cloudy pixels (Satellite Passive Microwave (PMW)-based and Surface Energy Balance (SEB)-based methods) and three kinds of spatiotemporal fusion methods (weighted function-based, unmixing-based and hybrid methods). The review concludes by summarizing validation methods and by identifying some promising future research directions for generating spatially continuous and high resolution LST products.

preprint2019arXiv

NeuDATool: An Open Source Neutron Data Analysis Tools, Supporting GPU Hardware Acceleration, and Across-computer Cluster Nodes Parallel

Empirical potential structure refinement (EPSR) is a neutron scattering data analysis algorithm and a software package. It was developed by the British spallation neutron source (ISIS) Disordered Materials Group in 1980s, and aims to construct the most-probable atomic structures of disordered liquids. It has been extensively used during the past decades, and has generated reliable results. However, it is programmed in Fortran and implements a shared-memory architecture with OpenMP. With the extensive construction of supercomputer clusters and the widespread use of graphics processing unit (GPU) acceleration technology, it is now necessary to rebuild the EPSR with these techniques in the effort to improve its calculation speed. In this study, an open source framework NeuDATool is proposed. It is programmed in the object-oriented language C++, can be paralleled across nodes within a computer cluster, and supports GPU acceleration. The performance of NeuDATool has been tested with water and amorphous silica neutron scattering data. The test shows that the software could reconstruct the correct microstructure of the samples, and the calculation speed with GPU acceleration could increase by more than 400 times compared with CPU serial algorithm at a simulation box consists about 100 thousand atoms. NeuDATool provides another choice for scientists who are familiar with C++ programming and want to define specific models and algorithms for their analyses.

preprint2020arXiv

A generalized permutation entropy for random processes

Permutation entropy measures the complexity of deterministic time series via a data symbolic quantization consisting of rank vectors called ordinal patterns or just permutations. The reasons for the increasing popularity of this entropy in time series analysis include that (i) it converges to the Kolmogorov-Sinai entropy of the underlying dynamics in the limit of ever longer permutations, and (ii) its computation dispenses with generating and ad hoc partitions. However, permutation entropy diverges when the number of allowed permutations grows super-exponentially with their length, as is usually the case when time series are output by random processes. In this Letter we propose a generalized permutation entropy that is finite for random processes, including discrete-time dynamical systems with observational or dynamical noise.

preprint2020arXiv

Rotated spectral principal component analysis (rsPCA) for identifying dynamical modes of variability in climate systems

Spectral PCA (sPCA), in contrast to classical PCA, offers the advantage of identifying organized spatio-temporal patterns within specific frequency bands and extracting dynamical modes. However, the unavoidable tradeoff between frequency resolution and robustness of the PCs leads to high sensitivity to noise and overfitting, which limits the interpretation of the sPCA results. We propose herein a simple non-parametric implementation of the sPCA using the continuous analytic Morlet wavelet as a robust estimator of the cross-spectral matrices with good frequency resolution. To improve the interpretability of the results when several modes of similar amplitude exist within the same frequency band, we propose a rotation of eigenvectors that optimizes the spatial smoothness in the phase domain. The developed method, called rotated spectral PCA (rsPCA), is tested on synthetic data simulating propagating waves and shows impressive performance even with high levels of noise in the data. Applied to historical sea surface temperature (SST) time series over the Pacific Ocean, the method accurately captures the El Niño-Southern Oscillation (ENSO) at low frequency (2 to 7 years periodicity). At high frequencies (sub-annual periodicity), at which several extratropical patterns of similar amplitude are identified, the rsPCA successfully unmixes the underlying modes, revealing spatially coherent patterns with robust propagation dynamics. Identification of higher frequency space-time climate modes holds promise for seasonal to subseasonal prediction and for diagnostic analysis of climate models.

preprint2020arXiv

Experimental quantum reading with photon counting

The final goal of quantum hypothesis testing is to achieve quantum advantage over all possible classical strategies. In the protocol of quantum reading this advantage is achieved for information retrieval from an optical memory, whose generic cell stores a bit of information in two possible lossy channels. For this protocol, we show, theoretically and experimentally, that quantum advantage is obtained by practical photon-counting measurements combined with a simple maximum-likelihood decision. In particular, we show that this receiver combined with an entangled two-mode squeezed vacuum source is able to outperform any strategy based on statistical mixtures of coherent states for the same mean number of input photons. Our experimental findings demonstrate that quantum entanglement and simple optics are able to enhance the readout of digital data, paving the way to real applications of quantum reading and with potential applications for any other model that is based on the binary discrimination of bosonic loss.

preprint2020arXiv

The statistical physics of discovering exogenous and endogenous factors in a chain of events

Event occurrence is not only subject to the environmental changes, but is also facilitated by the events that have occurred in a system. Here, we develop a method for estimating such extrinsic and intrinsic factors from a single series of event-occurrence times. The analysis is performed using a model that combines the inhomogeneous Poisson process and the Hawkes process, which represent exogenous fluctuations and endogenous chain-reaction mechanisms, respectively. The model is fit to a given dataset by minimizing the free energy, for which statistical physics and a path-integral method are utilized. Because the process of event occurrence is stochastic, parameter estimation is inevitably accompanied by errors, and it can ultimately occur that exogenous and endogenous factors cannot be captured even with the best estimator. We obtained four regimes categorized according to whether respective factors are detected. By applying the analytical method to real time series of debate in a social-networking service, we have observed that the estimated exogenous and endogenous factors are close to the first comments and the follow-up comments, respectively. This method is general and applicable to a variety of data, and we have provided an application program, by which anyone can analyze any series of event times.

preprint2020arXiv

Shortcomings of transfer entropy and partial transfer entropy: Extending them to escape the curse of dimensionality

Transfer entropy (TE) captures the directed relationships between two variables. Partial transfer entropy (PTE) accounts for the presence of all confounding variables of a multivariate system and infers only about direct causality. However, the computation of PTE involves high dimensional distributions and thus may not be robust in case of many variables. In this work, different variants of PTE are introduced, by building a reduced number of confounding variables based on different scenarios in terms of their interrelationships with the driving or response variable. Connectivity-based PTE variants and utilizing the random forests (RF) methodology are evaluated on synthetic time series. The empirical findings indicate the superiority of the suggested variants over TE and PTE, especially in case of high dimensional systems.

preprint2020arXiv

A Comprehensive Monte Carlo Framework for Jet-Quenching

This article presents the motivation for developing a comprehensive modeling framework in which different models and parameter inputs can be compared and evaluated for a large range of jet-quenching observables measured in relativistic heavy-ion collisions at RHIC and the LHC. The concept of a framework us discussed within the context of recent efforts by the JET Collaboration, the authors of JEWEL, and the JETSCAPE collaborations. The framework ingredients for each of these approaches is presented with a sample of important results from each. The role of advanced statistical tools in comparing models to data is also discussed, along with the need for a more detailed accounting of correlated errors in experimental results.

preprint2020arXiv

Conservation Laws and Spin System Modeling through Principal Component Analysis

This paper examines several applications of principal component analysis (PCA) to physical systems. The first of these demonstrates that the principal components in a basis of appropriate system variables can be employed to identify physically conserved quantities. That is, if the general form of a physical symmetry law is known, the PCA can identify an algebraic expression for the symmetry from the observed system trajectories. Secondly, the eigenvalue spectrum of the principal component spectrum for homogeneous periodic spin systems is found to reflect the geometric shape of the boundary. Finally, the PCA is employed to generate synthetic spin realizations with probability distributions in energy-magnetization space that closely resemble that of the input realizations although statistical quantities are inaccurately reproduced.

preprint2020arXiv

Designing Accurate Emulators for Scientific Processes using Calibration-Driven Deep Models

Predictive models that accurately emulate complex scientific processes can achieve exponential speed-ups over numerical simulators or experiments, and at the same time provide surrogates for improving the subsequent analysis. Consequently, there is a recent surge in utilizing modern machine learning (ML) methods, such as deep neural networks, to build data-driven emulators. While the majority of existing efforts has focused on tailoring off-the-shelf ML solutions to better suit the scientific problem at hand, we study an often overlooked, yet important, problem of choosing loss functions to measure the discrepancy between observed data and the predictions from a model. Due to lack of better priors on the expected residual structure, in practice, simple choices such as the mean squared error and the mean absolute error are made. However, the inherent symmetric noise assumption made by these loss functions makes them inappropriate in cases where the data is heterogeneous or when the noise distribution is asymmetric. We propose Learn-by-Calibrating (LbC), a novel deep learning approach based on interval calibration for designing emulators in scientific applications, that are effective even with heterogeneous data and are robust to outliers. Using a large suite of use-cases, we show that LbC provides significant improvements in generalization error over widely-adopted loss function choices, achieves high-quality emulators even in small data regimes and more importantly, recovers the inherent noise structure without any explicit priors.

preprint2020arXiv

Finding the Resistance Distance and Eigenvector Centrality from the Network's Eigenvalues

There are different measures to classify a network's data set that, depending on the problem, have different success. For example, the resistance distance and eigenvector centrality measures have been successful in revealing ecological pathways and differentiating between biomedical images of patients with Alzheimer's disease, respectively. The resistance distance measures the effective distance between any two nodes of a network taking into account all possible shortest paths between them and the eigenvector centrality measures the relative importance of each node in the network. However, both measures require knowing the network's eigenvalues and eigenvectors -- eigenvectors being the more computationally demanding task. Here, we show that we can closely approximate these two measures using only the eigenvalue spectra, where we illustrate this by experimenting on elemental resistor circuits and paradigmatic network models -- random and small-world networks. Our results are supported by analytical derivations, showing that the eigenvector centrality can be perfectly matched in all cases whilst the resistance distance can be closely approximated. Our underlying approach is based on the work by Denton, Parke, Tao, and Zhang [arXiv:1908.03795 (2019)], which is unrestricted to these topological measures and can be applied to most problems requiring the calculation of eigenvectors.

preprint2020arXiv

Non-Gaussianity Detection of EEG Signals Based on a Multivariate Scale Mixture Model for Diagnosis of Epileptic Seizures

Objective: The detection of epileptic seizures from scalp electroencephalogram (EEG) signals can facilitate early diagnosis and treatment. Previous studies suggested that the Gaussianity of EEG distributions changes depending on the presence or absence of seizures; however, no general EEG signal models can explain such changes in distributions within a unified scheme. Methods: This paper describes the formulation of a stochastic EEG model based on a multivariate scale mixture distribution that can represent changes in non-Gaussianity caused by stochastic fluctuations in EEG. In addition, we propose an EEG analysis method by combining the model with a filter bank and introduce a feature representing the non-Gaussianity latent in each EEG frequency band. Results: We applied the proposed method to multichannel EEG data from twenty patients with focal epilepsy. The results showed a significant increase in the proposed feature during epileptic seizures, particularly in the high-frequency band. The feature calculated in the high-frequency band allowed highly accurate classification of seizure and non-seizure segments [area under the receiver operating characteristic curve (AUC) = 0.881] using only a simple threshold. Conclusion: This paper proposed a multivariate scale mixture distribution-based stochastic EEG model capable of representing non-Gaussianity associated with epileptic seizures. Experiments using simulated and real EEG data demonstrated the validity of the model and its applicability to epileptic seizure detection. Significance: The stochastic fluctuations of EEG quantified by the proposed model can help detect epileptic seizures with high accuracy.

preprint2020arXiv

Quantifying the spatial resolution of the maximum a posteriori estimate in linear, rank-deficient, Bayesian hard field tomography

Image based diagnostics are interpreted in the context of spatial resolution. The same is true for tomographic image reconstruction. Current empirically driven approaches to quantify spatial resolution rely on a deterministic formulation based on point-spread functions which neglect the statistical prior information, that is integral to rank-deficient tomography. We propose a statistical spatial resolution measure based on the covariance of the reconstruction (point estimate) and show that the prior information acts as a lower limit for the spatial resolution. Furthermore, the spatial resolution measure can be employed for designing tomographic systems under consideration of spatial inhomogeneity of spatial resolution.

preprint2020arXiv

Learning Similarity Metrics for Numerical Simulations

We propose a neural network-based approach that computes a stable and generalizing metric (LSiM) to compare data from a variety of numerical simulation sources. We focus on scalar time-dependent 2D data that commonly arises from motion and transport-based partial differential equations (PDEs). Our method employs a Siamese network architecture that is motivated by the mathematical properties of a metric. We leverage a controllable data generation setup with PDE solvers to create increasingly different outputs from a reference simulation in a controlled environment. A central component of our learned metric is a specialized loss function that introduces knowledge about the correlation between single data samples into the training process. To demonstrate that the proposed approach outperforms existing metrics for vector spaces and other learned, image-based metrics, we evaluate the different methods on a large range of test data. Additionally, we analyze generalization benefits of an adjustable training data difficulty and demonstrate the robustness of LSiM via an evaluation on three real-world data sets.

preprint2020arXiv

ABCDisCo: Automating the ABCD Method with Machine Learning

The ABCD method is one of the most widely used data-driven background estimation techniques in high energy physics. Cuts on two statistically-independent classifiers separate signal and background into four regions, so that background in the signal region can be estimated simply using the other three control regions. Typically, the independent classifiers are chosen "by hand" to be intuitive and physically motivated variables. Here, we explore the possibility of automating the design of one or both of these classifiers using machine learning. We show how to use state-of-the-art decorrelation methods to construct powerful yet independent discriminators. Along the way, we uncover a previously unappreciated aspect of the ABCD method: its accuracy hinges on having low signal contamination in control regions not just overall, but relative to the signal fraction in the signal region. We demonstrate the method with three examples: a simple model consisting of three-dimensional Gaussians; boosted hadronic top jet tagging; and a recasted search for paired dijet resonances. In all cases, automating the ABCD method with machine learning significantly improves performance in terms of ABCD closure, background rejection and signal contamination.

preprint2020arXiv

Navigating differential structures in complex networks

Structural changes in a network representation of a system (e.g.,different experimental conditions, time evolution), can provide insight on its organization, function and on how it responds to external perturbations. The deeper understanding of how gene networks cope with diseases and treatments is maybe the most incisive demonstration of the gains obtained through this differential network analysis point-of-view, which lead to an explosion of new numeric techniques in the last decade. However, {\it where} to focus ones attention, or how to navigate through the differential structures can be overwhelming even for few experimental conditions. In this paper, we propose a theory and a methodological implementation for the characterization of shared "structural roles" of nodes simultaneously within and between networks, whose outcome is a highly {\em interpretable} map. The main features and accuracy are investigated with numerical benchmarks generated by a stochastic block model. Results show that it can provide nuanced and interpretable information in scenarios with very different (i) community sizes and (ii) total number of communities, and (iii) even for a large number of 100 networks been compared (e.g., for 100 different experimental conditions). Then, we show evidence that the strength of the method is its "story-telling"-like characterization of the information encoded in a set of networks, which can be used to pinpoint unexpected differential structures, leading to further investigations and providing new insights. We provide an illustrative, exploratory analysis of four gene co-expression networks from two cell types $\times$ two treatments (interferon-$β$ stimulated or control). The method proposed here allowed us to elaborate and test a set of very specific hypotheses related to {\em unique} and {\em subtle} nuances of the structural differences between these networks.

People in this topic

12 visible researcher(s)