Source author record

Pedro Rodriguez

Pedro Rodriguez appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Artificial Intelligence astro-ph astro-ph.HE

Catalog footprint

What is connected

6works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks

We introduce Dynatask: an open source system for setting up custom NLP tasks that aims to greatly lower the technical knowledge and effort required for hosting and evaluating state-of-the-art NLP models, as well as for conducting model in the loop data collection with crowdworkers. Dynatask is integrated with Dynabench, a research platform for rethinking benchmarking in AI that facilitates human and model in the loop data collection and evaluation. To create a task, users only need to write a short task configuration file from which the relevant web interfaces and model hosting infrastructure are automatically generated. The system is available at https://dynabench.org/ and the full library can be found at https://github.com/facebookresearch/dynabench.

preprint2021arXiv

Quizbowl: The Case for Incremental Question Answering

Scholastic trivia competitions test knowledge and intelligence through mastery of question answering. Modern question answering benchmarks are one variant of the Turing test. Specifically, answering a set of questions as well as a human is a minimum bar towards demonstrating human-like intelligence. This paper makes the case that the format of one competition -- where participants can answer in the middle of hearing a question (incremental) -- better differentiates the skill between (human or machine) players. Additionally, merging a sequential decision-making sub-task with question answering (QA) provides a good setting for research in model calibration and opponent modeling. Thus, embedded in this task are three machine learning challenges: (1) factoid QA over thousands of Wikipedia-like answers, (2) calibration of the QA model's confidence scores, and (3) sequential decision-making that incorporates knowledge of the QA model, its calibration, and what the opponent may do. We make two contributions: (1) collecting and curating a large factoid QA dataset and an accompanying gameplay dataset, and (2) developing a model that addresses these three machine learning challenges. In addition to offline evaluation, we pitted our model against some of the most accomplished trivia players in the world in a series of exhibition matches spanning several years. Throughout this paper, we show that collaborations with the vibrant trivia community have contributed to the quality of our dataset, spawned new research directions, and doubled as an exciting way to engage the public with research in machine learning and natural language processing.

preprint2020arXiv

Information Seeking in the Spirit of Learning: a Dataset for Conversational Curiosity

Open-ended human learning and information-seeking are increasingly mediated by digital assistants. However, such systems often ignore the user's pre-existing knowledge. Assuming a correlation between engagement and user responses such as "liking" messages or asking followup questions, we design a Wizard-of-Oz dialog task that tests the hypothesis that engagement increases when users are presented with facts related to what they know. Through crowd-sourcing of this experiment, we collect and release 14K dialogs (181K utterances) where users and assistants converse about geographic topics like geopolitical entities and locations. This dataset is annotated with pre-existing user knowledge, message-level dialog acts, grounding to Wikipedia, and user reactions to messages. Responses using a user's prior knowledge increase engagement. We incorporate this knowledge into a multi-task model that reproduces human assistant policies and improves over a BERT content model by 13 mean reciprocal rank points.

preprint2018arXiv

Pathologies of Neural Models Make Interpretations Difficult

One way to interpret neural model predictions is to highlight the most important input features---for example, a heatmap visualization over the words in an input sentence. In existing interpretation methods for NLP, a word's importance is determined by either input perturbation---measuring the decrease in model confidence when that word is removed---or by the gradient with respect to that word. To understand the limitations of these methods, we use input reduction, which iteratively removes the least important word from the input. This exposes pathological behaviors of neural models: the remaining words appear nonsensical to humans and are not the ones determined as important by interpretation methods. As we confirm with human experiments, the reduced examples lack information to support the prediction of any label, but models still make the same predictions with high confidence. To explain these counterintuitive results, we draw connections to adversarial examples and confidence calibration: pathological behaviors reveal difficulties in interpreting neural models trained with maximum likelihood. To mitigate their deficiencies, we fine-tune the models by encouraging high entropy outputs on reduced examples. Fine-tuned models become more interpretable under input reduction without accuracy loss on regular examples.

preprint2015arXiv

Finding AGN in Deep X-ray Flux States with Swift

We report on our ongoing project of finding Active Galactic Nuclei (AGN) that go into deep X-ray flux states detected by Swift. Swift is performing an extensive study on the flux and spectral variability of AGN using Guest Investigator and team fill-in programs followed by triggering XMM_Newton for deeper follow-up observations. So far this program has been very successful and has led to a number of XMM-Newton follow up observations, including Mkn 335, PG 0844+349, and RX J2340.8-5329. Recent analysis of new Swift AGN observations reveal several AGN went into a very low X-ray flux state, particularly Narrow-Line Seyfert 1 galaxies. One of these is RX J2317-4422, which dropped by a factor of about 60 when compared to the ROSAT All-Sky Survey.

preprint2005arXiv

Solar Control on Jupiter's Equatorial X-ray Emissions: 26-29 November 2003 XMM-Newton Observation

During November 26-29, 2003 XMM-Newton observed soft (0.2-2 keV) X-ray emission from Jupiter for 69 hours. The low-latitude X-ray disk emission of Jupiter is observed to be almost uniform in intensity with brightness that is consistent with a solar-photon driven process. The simultaneous lightcurves of Jovian equatorial X-rays and solar X-rays (measured by the TIMED/SEE and GOES satellites) show similar day-to-day variability. A large solar X-ray flare occurring on the Jupiter-facing side of the Sun is found to have a corresponding feature in the Jovian X-rays. These results support the hypothesis that X-ray emission from Jovian low-latitudes are solar X-rays scattered from the planet's upper atmosphere, and suggest that the Sun directly controls the non-auroral X-rays from Jupiter's disk. Our study also suggests that Jovian equatorial X-rays can be used to monitor the solar X-ray flare activity on the hemisphere of the Sun that is invisible to space weather satellites.