Researcher profile

Daniel Campos

Daniel Campos contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2024arXiv

First-passage time of a Brownian searcher with stochastic resetting to random positions

We study the effect of a resetting point randomly distributed around the origin on the mean first passage time of a Brownian searcher moving in one dimension. We compare the search efficiency with that corresponding to reset to the origin and find that the mean first passage time of the latter can be larger or smaller than the distributed case, depending on whether the resetting points are symmetrically or asymmetrically distributed. In particular, we prove the existence of an optimal reset rate that minimizes the mean first-passage time for distributed resetting to a finite interval if the target is located outside this interval. When the target position belongs to the resetting interval or it is infinite then no optimal reset rate exists, but there is an optimal resetting interval width or resetting characteristic scale which minimizes the mean first-passage time. We also show that the first-passage density averaged over the resetting points depends on its first moment only. As a consequence, there is an equivalent point such that the first-passage problem with resetting to that point is statistically equivalent to the case of distributed resetting. We end our study by analyzing the fluctuations of the first-passage times for these cases. All our analytical results are verified through numerical simulations.

preprint2022arXiv

Non-standard diffusion under Markovian resetting in bounded domains

We consider a walker moving in a one-dimensional interval with absorbing boundaries under the effect of Markovian resettings to the initial position. The walker's motion follows a random walk characterized by a general waiting time distribution between consecutive short jumps. We investigate the existence of an optimal reset rate, which minimizes the mean exit passage time, in terms of the statistical properties of the waiting time probability. Generalizing previous results restricted to Markovian random walks, we here find that, depending on the value of the relative standard deviation of the waiting time probability, resetting can be either (i) never beneficial, (ii) beneficial depending on the distance of the reset to the boundary, or (iii) always beneficial.

preprint2021arXiv

Overview of the TREC 2020 deep learning track

This is the second year of the TREC Deep Learning Track, with the goal of studying ad hoc ranking in the large training data regime. We again have a document retrieval task and a passage retrieval task, each with hundreds of thousands of human-labeled training queries. We evaluate using single-shot TREC-style evaluation, to give us a picture of which ranking methods work best when large data is available, with much more comprehensive relevance labeling on the small number of test queries. This year we have further evidence that rankers with BERT-style pretraining outperform other rankers in the large data regime.

preprint2021arXiv

Phase transition in non-Markovian animal exploration model with preferential returns

We study a non-Markovian and nonstationary model of animal mobility incorporating both exploration and memory in the form of preferential returns. We derive exact results for the probability of visiting a given number of sites and develop a practical WKB approximation to treat the nonstationary problem. We further show that this model adequately describes empirical movement data of Egyptian fruit bats (Rousettus aegyptiacus) when accounting for inter-individual variation in the population. Finally, we study the probability of visiting any site a given number of times and derive the corresponding mean-field equation. Here, we find a remarkable phase transition occurring at preferential returns which scale linearly with past visits. Following empirical evidence, we suggest that this phase transition reflects a trade-off between extensive and intensive foraging modes.

preprint2021arXiv

Significant Improvements over the State of the Art? A Case Study of the MS MARCO Document Ranking Leaderboard

Leaderboards are a ubiquitous part of modern research in applied machine learning. By design, they sort entries into some linear order, where the top-scoring entry is recognized as the "state of the art" (SOTA). Due to the rapid progress being made in information retrieval today, particularly with neural models, the top entry in a leaderboard is replaced with some regularity. These are touted as improvements in the state of the art. Such pronouncements, however, are almost never qualified with significance testing. In the context of the MS MARCO document ranking leaderboard, we pose a specific question: How do we know if a run is significantly better than the current SOTA? We ask this question against the backdrop of recent IR debates on scale types: in particular, whether commonly used significance tests are even mathematically permissible. Recognizing these potential pitfalls in evaluation methodology, our study proposes an evaluation framework that explicitly treats certain outcomes as distinct and avoids aggregating them into a single-point metric. Empirical analysis of SOTA runs from the MS MARCO document ranking leaderboard reveals insights about how one run can be "significantly better" than another that are obscured by the current official evaluation metric (MRR@100).

preprint2020arXiv

Continuous time random walks under Markovian resetting

We investigate the effects of markovian resseting events on continuous time random walks where the waiting times and the jump lengths are random variables distributed according to power law probability density functions. We prove the existence of a non-equilibrium stationary state and finite mean first arrival time. However, the existence of an optimum reset rate is conditioned to a specific relationship between the exponents of both power law tails. We also investigate the search efficiency by finding the optimal random walk which minimizes the mean first arrival time in terms of the reset rate, the distance of the initial position to the target and the characteristic transport exponents.

preprint2020arXiv

On the Reliability of Test Collections for Evaluating Systems of Different Types

As deep learning based models are increasingly being used for information retrieval (IR), a major challenge is to ensure the availability of test collections for measuring their quality. Test collections are generated based on pooling results of various retrieval systems, but until recently this did not include deep learning systems. This raises a major challenge for reusable evaluation: Since deep learning based models use external resources (e.g. word embeddings) and advanced representations as opposed to traditional methods that are mainly based on lexical similarity, they may return different types of relevant document that were not identified in the original pooling. If so, test collections constructed using traditional methods are likely to lead to biased and unfair evaluation results for deep learning (neural) systems. This paper uses simulated pooling to test the fairness and reusability of test collections, showing that pooling based on traditional systems only can lead to biased evaluation of deep learning systems.

preprint2020arXiv

ORCAS: 18 Million Clicked Query-Document Pairs for Analyzing Search

Users of Web search engines reveal their information needs through queries and clicks, making click logs a useful asset for information retrieval. However, click logs have not been publicly released for academic use, because they can be too revealing of personally or commercially sensitive information. This paper describes a click data release related to the TREC Deep Learning Track document corpus. After aggregation and filtering, including a k-anonymity requirement, we find 1.4 million of the TREC DL URLs have 18 million connections to 10 million distinct queries. Our dataset of these queries and connections to TREC documents is of similar size to proprietary datasets used in previous papers on query mining and ranking. We perform some preliminary experiments using the click data to augment the TREC DL training data, offering by comparison: 28x more queries, with 49x more connections to 4.4x more URLs in the corpus. We present a description of the dataset's generation process, characteristics, use in ranking and suggest other potential uses.

preprint2020arXiv

Overview of the TREC 2019 deep learning track

The Deep Learning Track is a new track for TREC 2019, with the goal of studying ad hoc ranking in a large data regime. It is the first track with large human-labeled training sets, introducing two sets corresponding to two tasks, each with rigorous TREC-style blind evaluation and reusable test sets. The document retrieval task has a corpus of 3.2 million documents with 367 thousand training queries, for which we generate a reusable test set of 43 queries. The passage retrieval task has a corpus of 8.8 million passages with 503 thousand training queries, for which we generate a reusable test set of 43 queries. This year 15 groups submitted a total of 75 runs, using various combinations of deep learning, transfer learning and traditional IR ranking methods. Deep learning runs significantly outperformed traditional IR runs. Possible explanations for this result are that we introduced large training data and we included deep models trained on such data in our judging pools, whereas some past studies did not have such training data or pooling.

preprint2020arXiv

XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation

In this paper, we introduce XGLUE, a new benchmark dataset that can be used to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora and evaluate their performance across a diverse set of cross-lingual tasks. Comparing to GLUE(Wang et al., 2019), which is labeled in English for natural language understanding tasks only, XGLUE has two main advantages: (1) it provides 11 diversified tasks that cover both natural language understanding and generation scenarios; (2) for each task, it provides labeled data in multiple languages. We extend a recent cross-lingual pre-trained model Unicoder(Huang et al., 2019) to cover both understanding and generation tasks, which is evaluated on XGLUE as a strong baseline. We also evaluate the base versions (12-layer) of Multilingual BERT, XLM and XLM-R for comparison.