Source author record

Daniel Campos

Daniel Campos appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.stat-mech Information Retrieval Computation and Language Machine Learning Populations and Evolution Artificial Intelligence math-ph math.AC math.MP physics.data-an

Catalog footprint

What is connected

15works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

First-passage time of a Brownian searcher with stochastic resetting to random positions

We study the effect of a resetting point randomly distributed around the origin on the mean first passage time of a Brownian searcher moving in one dimension. We compare the search efficiency with that corresponding to reset to the origin and find that the mean first passage time of the latter can be larger or smaller than the distributed case, depending on whether the resetting points are symmetrically or asymmetrically distributed. In particular, we prove the existence of an optimal reset rate that minimizes the mean first-passage time for distributed resetting to a finite interval if the target is located outside this interval. When the target position belongs to the resetting interval or it is infinite then no optimal reset rate exists, but there is an optimal resetting interval width or resetting characteristic scale which minimizes the mean first-passage time. We also show that the first-passage density averaged over the resetting points depends on its first moment only. As a consequence, there is an equivalent point such that the first-passage problem with resetting to that point is statistically equivalent to the case of distributed resetting. We end our study by analyzing the fluctuations of the first-passage times for these cases. All our analytical results are verified through numerical simulations.

preprint2022arXiv

Non-standard diffusion under Markovian resetting in bounded domains

We consider a walker moving in a one-dimensional interval with absorbing boundaries under the effect of Markovian resettings to the initial position. The walker's motion follows a random walk characterized by a general waiting time distribution between consecutive short jumps. We investigate the existence of an optimal reset rate, which minimizes the mean exit passage time, in terms of the statistical properties of the waiting time probability. Generalizing previous results restricted to Markovian random walks, we here find that, depending on the value of the relative standard deviation of the waiting time probability, resetting can be either (i) never beneficial, (ii) beneficial depending on the distance of the reset to the boundary, or (iii) always beneficial.

preprint2021arXiv

Overview of the TREC 2020 deep learning track

This is the second year of the TREC Deep Learning Track, with the goal of studying ad hoc ranking in the large training data regime. We again have a document retrieval task and a passage retrieval task, each with hundreds of thousands of human-labeled training queries. We evaluate using single-shot TREC-style evaluation, to give us a picture of which ranking methods work best when large data is available, with much more comprehensive relevance labeling on the small number of test queries. This year we have further evidence that rankers with BERT-style pretraining outperform other rankers in the large data regime.

preprint2021arXiv

Phase transition in non-Markovian animal exploration model with preferential returns

We study a non-Markovian and nonstationary model of animal mobility incorporating both exploration and memory in the form of preferential returns. We derive exact results for the probability of visiting a given number of sites and develop a practical WKB approximation to treat the nonstationary problem. We further show that this model adequately describes empirical movement data of Egyptian fruit bats (Rousettus aegyptiacus) when accounting for inter-individual variation in the population. Finally, we study the probability of visiting any site a given number of times and derive the corresponding mean-field equation. Here, we find a remarkable phase transition occurring at preferential returns which scale linearly with past visits. Following empirical evidence, we suggest that this phase transition reflects a trade-off between extensive and intensive foraging modes.

preprint2021arXiv

Significant Improvements over the State of the Art? A Case Study of the MS MARCO Document Ranking Leaderboard

Leaderboards are a ubiquitous part of modern research in applied machine learning. By design, they sort entries into some linear order, where the top-scoring entry is recognized as the "state of the art" (SOTA). Due to the rapid progress being made in information retrieval today, particularly with neural models, the top entry in a leaderboard is replaced with some regularity. These are touted as improvements in the state of the art. Such pronouncements, however, are almost never qualified with significance testing. In the context of the MS MARCO document ranking leaderboard, we pose a specific question: How do we know if a run is significantly better than the current SOTA? We ask this question against the backdrop of recent IR debates on scale types: in particular, whether commonly used significance tests are even mathematically permissible. Recognizing these potential pitfalls in evaluation methodology, our study proposes an evaluation framework that explicitly treats certain outcomes as distinct and avoids aggregating them into a single-point metric. Empirical analysis of SOTA runs from the MS MARCO document ranking leaderboard reveals insights about how one run can be "significantly better" than another that are obscured by the current official evaluation metric (MRR@100).

preprint2020arXiv

Continuous time random walks under Markovian resetting

We investigate the effects of markovian resseting events on continuous time random walks where the waiting times and the jump lengths are random variables distributed according to power law probability density functions. We prove the existence of a non-equilibrium stationary state and finite mean first arrival time. However, the existence of an optimum reset rate is conditioned to a specific relationship between the exponents of both power law tails. We also investigate the search efficiency by finding the optimal random walk which minimizes the mean first arrival time in terms of the reset rate, the distance of the initial position to the target and the characteristic transport exponents.

preprint2020arXiv

On the Reliability of Test Collections for Evaluating Systems of Different Types

As deep learning based models are increasingly being used for information retrieval (IR), a major challenge is to ensure the availability of test collections for measuring their quality. Test collections are generated based on pooling results of various retrieval systems, but until recently this did not include deep learning systems. This raises a major challenge for reusable evaluation: Since deep learning based models use external resources (e.g. word embeddings) and advanced representations as opposed to traditional methods that are mainly based on lexical similarity, they may return different types of relevant document that were not identified in the original pooling. If so, test collections constructed using traditional methods are likely to lead to biased and unfair evaluation results for deep learning (neural) systems. This paper uses simulated pooling to test the fairness and reusability of test collections, showing that pooling based on traditional systems only can lead to biased evaluation of deep learning systems.

preprint2020arXiv

ORCAS: 18 Million Clicked Query-Document Pairs for Analyzing Search

Users of Web search engines reveal their information needs through queries and clicks, making click logs a useful asset for information retrieval. However, click logs have not been publicly released for academic use, because they can be too revealing of personally or commercially sensitive information. This paper describes a click data release related to the TREC Deep Learning Track document corpus. After aggregation and filtering, including a k-anonymity requirement, we find 1.4 million of the TREC DL URLs have 18 million connections to 10 million distinct queries. Our dataset of these queries and connections to TREC documents is of similar size to proprietary datasets used in previous papers on query mining and ranking. We perform some preliminary experiments using the click data to augment the TREC DL training data, offering by comparison: 28x more queries, with 49x more connections to 4.4x more URLs in the corpus. We present a description of the dataset's generation process, characteristics, use in ranking and suggest other potential uses.

preprint2020arXiv

Overview of the TREC 2019 deep learning track

The Deep Learning Track is a new track for TREC 2019, with the goal of studying ad hoc ranking in a large data regime. It is the first track with large human-labeled training sets, introducing two sets corresponding to two tasks, each with rigorous TREC-style blind evaluation and reusable test sets. The document retrieval task has a corpus of 3.2 million documents with 367 thousand training queries, for which we generate a reusable test set of 43 queries. The passage retrieval task has a corpus of 8.8 million passages with 503 thousand training queries, for which we generate a reusable test set of 43 queries. This year 15 groups submitted a total of 75 runs, using various combinations of deep learning, transfer learning and traditional IR ranking methods. Deep learning runs significantly outperformed traditional IR runs. Possible explanations for this result are that we introduced large training data and we included deep models trained on such data in our judging pools, whereas some past studies did not have such training data or pooling.

preprint2020arXiv

XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation

In this paper, we introduce XGLUE, a new benchmark dataset that can be used to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora and evaluate their performance across a diverse set of cross-lingual tasks. Comparing to GLUE(Wang et al., 2019), which is labeled in English for natural language understanding tasks only, XGLUE has two main advantages: (1) it provides 11 diversified tasks that cover both natural language understanding and generation scenarios; (2) for each task, it provides labeled data in multiple languages. We extend a recent cross-lingual pre-trained model Unicoder(Huang et al., 2019) to cover both understanding and generation tasks, which is evaluated on XGLUE as a strong baseline. We also evaluate the base versions (12-layer) of Multilingual BERT, XLM and XLM-R for comparison.

preprint2015arXiv

First-passage times in multi-scale random walks: the impact of movement scales on search efficiency

An efficient searcher needs to balance properly the tradeoff between the exploration of new spatial areas and the exploitation of nearby resources, an idea which is at the core of scale-free Lévy search strategies. Here we study multi-scale random walks as an approximation to the scale- free case and derive the exact expressions for their mean-first passage times in a one-dimensional finite domain. This allows us to provide a complete analytical description of the dynamics driving the asymmetric regime, in which both nearby and faraway targets are available to the searcher. For this regime, we prove that the combination of only two movement scales can be enough to outperform both balistic and Lévy strategies. This two-scale strategy involves an optimal discrimination between the nearby and faraway targets, which is only possible by adjusting the range of values of the two movement scales to the typical distances between encounters. So, this optimization necessarily requires some prior information (albeit crude) about targets distances or distributions. Furthermore, we found that the incorporation of additional (three, four, ...) movement scales and its adjustment to target distances does not improve further the search efficiency. This allows us to claim that optimal random search strategies in the asymmetric regime actually arise through the informed combination of only two walk scales (related to the exploitative and the explorative scale, respectively), expanding on the well-known result that optimal strategies in strictly uninformed scenarios are achieved through Lévy paths (or, equivalently, through a hierarchical combination of multiple scales).

preprint2015arXiv

Mesoscopic description of random walks on combs

Combs are a simple caricature of various types of natural branched structures, which belong to the category of loopless graphs and consist of a backbone and branches. We study continuous time random walks on combs and present a generic method to obtain their transport properties. The random walk along the branches may be biased, and we account for the effect of the branches by renormalizing the waiting time probability distribution function for the motion along the backbone. We analyze the overall diffusion properties along the backbone and find normal diffusion, anomalous diffusion, and stochastic localization (diffusion failure), respectively, depending on the characteristics of the continuous time random walk along the branches.

preprint2015arXiv

Phase transitions in optimal search times: how random walkers should combine resetting and flight scales

Recent works have explored the properties of Lévy flights with resetting in one-dimensional domains and have reported the existence of phase transitions in the phase space of parameters which minimizes the Mean First Passage Time (MFPT) through the origin [Phys. Rev. Lett. 113, 220602 (2014)]. Here we show how actually an interesting dynamics, including also phase transitions for the minimization of the MFPT, can also be obtained without invoking the use of Lévy statistics but for the simpler case of random walks with exponentially distributed flights of constant speed. We explore this dynamics both in the case of finite and infinite domains, and for different implementations of the resetting mechanism to show that different ways to introduce resetting consistently lead to a quite similar dynamics. The use of exponential flights has the strong advantage that exact solutions can be obtained easily for the MFPT through the origin, so a complete analytical characterization of the system dynamics can be provided. Furthermore, we discuss in detail how the phase transitions observed in random walks with resetting are closely related to several ideas recurrently used in the field of random search theory, in particular to other mechanisms proposed to understand random search in space as mortal random-walks or multi-scale random-walks. As a whole we corroborate that one of the essential ingredients behind MFPT minimization lies in the combination of multiple movement scales (whatever its origin).

preprint2015arXiv

Stochastic dynamics and logistic population growth

The Verhulst model is probably the best known macroscopic rate equation in population ecology. It depends on two parameters, the intrinsic growth rate and the carrying capacity. These parameters can be estimated for different populations and are related to the reproductive fitness and the competition for limited resources, respectively. We investigate analytically and numerically the simplest possible microscopic scenarios that give rise to the logistic equation in the deterministic mean-field limit. We provide a definition of the two parameters of the Verhulst equation in terms of microscopic parameters. In addition, we derive the conditions for extinction or persistence of the population by employing either the "momentum-space" spectral theory or the "real-space" Wentzel-Kramers-Brillouin (WKB) approximation to determine the probability distribution function and the mean time to extinction of the population. Our analytical results agree well with numerical simulations.

preprint2012arXiv

Depths and Cohen-Macaulay Properties of Path Ideals

Given a tree T on n vertices, there is an associated ideal I of a polynomial ring in n variables over a field, generated by all paths of a fixed length of T. We show that such an ideal always satisfies the Konig property and classify all trees for which R/I is Cohen-Macaulay. More generally, we show that an ideal I whose generators correspond to any collection of subtrees of T satisfies the Konig property. Since the edge ideal of a simplicial tree has this form, this generalizes a result of Faridi. Moreover, every square-free monomial ideal can be represented (non-uniquely) as a subtree ideal of a graph, so this construction provides a new combinatorial tool for studying square-free monomial ideals. For a special class of trees, namely trees that are themselves a path, a precise formula for the depth is given and it is shown that the proof extends to provide a lower bound on the Stanley depth of these ideals. Combining these results gives a new class of ideals for which the Stanley Conjecture holds.

Daniel Campos

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

First-passage time of a Brownian searcher with stochastic resetting to random positions

Non-standard diffusion under Markovian resetting in bounded domains

Overview of the TREC 2020 deep learning track

Phase transition in non-Markovian animal exploration model with preferential returns

Significant Improvements over the State of the Art? A Case Study of the MS MARCO Document Ranking Leaderboard

Continuous time random walks under Markovian resetting

On the Reliability of Test Collections for Evaluating Systems of Different Types

ORCAS: 18 Million Clicked Query-Document Pairs for Analyzing Search

Overview of the TREC 2019 deep learning track

XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation

First-passage times in multi-scale random walks: the impact of movement scales on search efficiency

Mesoscopic description of random walks on combs

Phase transitions in optimal search times: how random walkers should combine resetting and flight scales

Stochastic dynamics and logistic population growth

Depths and Cohen-Macaulay Properties of Path Ideals