Source author record

Claudio T. Silva

Claudio T. Silva appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Human-Computer Interaction Computer Vision cs.CY Machine Learning Computational Engineering, Finance, and Science Databases eess.AS physics.ao-ph physics.soc-ph Social and Information Networks Software Engineering Sound

Catalog footprint

What is connected

9works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

IntentVizor: Towards Generic Query Guided Interactive Video Summarization

The target of automatic video summarization is to create a short skim of the original long video while preserving the major content/events. There is a growing interest in the integration of user queries into video summarization or query-driven video summarization. This video summarization method predicts a concise synopsis of the original video based on the user query, which is commonly represented by the input text. However, two inherent problems exist in this query-driven way. First, the text query might not be enough to describe the exact and diverse needs of the user. Second, the user cannot edit once the summaries are produced, while we assume the needs of the user should be subtle and need to be adjusted interactively. To solve these two problems, we propose IntentVizor, an interactive video summarization framework guided by generic multi-modality queries. The input query that describes the user's needs are not limited to text but also the video snippets. We further represent these multi-modality finer-grained queries as user `intent', which is interpretable, interactable, editable, and can better quantify the user's needs. In this paper, we use a set of the proposed intents to represent the user query and design a new interactive visual analytic interface. Users can interactively control and adjust these mixed-initiative intents to obtain a more satisfying summary through the interface. Also, to improve the summarization quality via video understanding, a novel Granularity-Scalable Ego-Graph Convolutional Networks (GSE-GCN) is proposed. We conduct our experiments on two benchmark datasets. Comparisons with the state-of-the-art methods verify the effectiveness of the proposed framework. Code and dataset are available at https://github.com/jnzs1836/intent-vizor.

preprint2022arXiv

Towards Global-Scale Crowd+AI Techniques to Map and Assess Sidewalks for People with Disabilities

There is a lack of data on the location, condition, and accessibility of sidewalks across the world, which not only impacts where and how people travel but also fundamentally limits interactive mapping tools and urban analytics. In this paper, we describe initial work in semi-automatically building a sidewalk network topology from satellite imagery using hierarchical multi-scale attention models, inferring surface materials from street-level images using active learning-based semantic segmentation, and assessing sidewalk condition and accessibility features using Crowd+AI. We close with a call to create a database of labeled satellite and streetscape scenes for sidewalks and sidewalk accessibility issues along with standardized benchmarks.

preprint2022arXiv

Urban Rhapsody: Large-scale exploration of urban soundscapes

Noise is one of the primary quality-of-life issues in urban environments. In addition to annoyance, noise negatively impacts public health and educational performance. While low-cost sensors can be deployed to monitor ambient noise levels at high temporal resolutions, the amount of data they produce and the complexity of these data pose significant analytical challenges. One way to address these challenges is through machine listening techniques, which are used to extract features in attempts to classify the source of noise and understand temporal patterns of a city's noise situation. However, the overwhelming number of noise sources in the urban environment and the scarcity of labeled data makes it nearly impossible to create classification models with large enough vocabularies that capture the true dynamism of urban soundscapes In this paper, we first identify a set of requirements in the yet unexplored domain of urban soundscape exploration. To satisfy the requirements and tackle the identified challenges, we propose Urban Rhapsody, a framework that combines state-of-the-art audio representation, machine learning, and visual analytics to allow users to interactively create classification models, understand noise patterns of a city, and quickly retrieve and label audio excerpts in order to create a large high-precision annotated database of urban sound recordings. We demonstrate the tool's utility through case studies performed by domain experts using data generated over the five-year deployment of a one-of-a-kind sensor network in New York City.

preprint2020arXiv

A Tracking System For Baseball Game Reconstruction

The baseball game is often seen as many contests that are performed between individuals. The duel between the pitcher and the batter, for example, is considered the engine that drives the sport. The pitchers use a variety of strategies to gain competitive advantage against the batter, who does his best to figure out the ball trajectory and react in time for a hit. In this work, we propose a system that captures the movements of the pitcher, the batter, and the ball in a high level of detail, and discuss several ways how this information may be processed to compute interesting statistics. We demonstrate on a large database of videos that our methods achieve comparable results as previous systems, while operating solely on video material. In addition, state-of-the-art AI techniques are incorporated to augment the amount of information that is made available for players, coaches, teams, and fans.

preprint2020arXiv

Learning Geo-Contextual Embeddings for Commuting Flow Prediction

Predicting commuting flows based on infrastructure and land-use information is critical for urban planning and public policy development. However, it is a challenging task given the complex patterns of commuting flows. Conventional models, such as gravity model, are mainly derived from physics principles and limited by their predictive power in real-world scenarios where many factors need to be considered. Meanwhile, most existing machine learning-based methods ignore the spatial correlations and fail to model the influence of nearby regions. To address these issues, we propose Geo-contextual Multitask Embedding Learner (GMEL), a model that captures the spatial correlations from geographic contextual information for commuting flow prediction. Specifically, we first construct a geo-adjacency network containing the geographic contextual information. Then, an attention mechanism is proposed based on the framework of graph attention network (GAT) to capture the spatial correlations and encode geographic contextual information to embedding space. Two separate GATs are used to model supply and demand characteristics. A multitask learning framework is used to introduce stronger restrictions and enhance the effectiveness of the embedding representation. Finally, a gradient boosting machine is trained based on the learned embeddings to predict commuting flows. We evaluate our model using real-world datasets from New York City and the experimental results demonstrate the effectiveness of our proposal against the state of the art.

preprint2020arXiv

Melody: Generating and Visualizing Machine Learning Model Summary to Understand Data and Classifiers Together

With the increasing sophistication of machine learning models, there are growing trends of developing model explanation techniques that focus on only one instance (local explanation) to ensure faithfulness to the original model. While these techniques provide accurate model interpretability on various data primitive (e.g., tabular, image, or text), a holistic Explainable Artificial Intelligence (XAI) experience also requires a global explanation of the model and dataset to enable sensemaking in different granularity. Thus, there is a vast potential in synergizing the model explanation and visual analytics approaches. In this paper, we present MELODY, an interactive algorithm to construct an optimal global overview of the model and data behavior by summarizing the local explanations using information theory. The result (i.e., an explanation summary) does not require additional learning models, restrictions of data primitives, or the knowledge of machine learning from the users. We also design MELODY UI, an interactive visual analytics system to demonstrate how the explanation summary connects the dots in various XAI tasks from a global overview to local inspections. We present three usage scenarios regarding tabular, image, and text classifications to illustrate how to generalize model interpretability of different data. Our experiments show that our approaches: (1) provides a better explanation summary compared to a straightforward information-theoretic summarization and (2) achieves a significant speedup in the end-to-end data modeling pipeline.

preprint2020arXiv

Urban Mosaic: Visual Exploration of Streetscapes Using Large-Scale Image Data

Urban planning is increasingly data driven, yet the challenge of designing with data at a city scale and remaining sensitive to the impact at a human scale is as important today as it was for Jane Jacobs. We address this challenge with Urban Mosaic,a tool for exploring the urban fabric through a spatially and temporally dense data set of 7.7 million street-level images from New York City, captured over the period of a year. Working in collaboration with professional practitioners, we use Urban Mosaic to investigate questions of accessibility and mobility, and preservation and retrofitting. In doing so, we demonstrate how tools such as this might provide a bridge between the city and the street, by supporting activities such as visual comparison of geographically distant neighborhoods,and temporal analysis of unfolding urban development.

preprint2013arXiv

Enabling Reproducible Science with VisTrails

With the increasing amount of data and use of computation in science, software has become an important component in many different domains. Computing is now being used more often and in more aspects of scientific work including data acquisition, simulation, analysis, and visualization. To ensure reproducibility, it is important to capture the different computational processes used as well as their executions. VisTrails is an open-source scientific workflow system for data analysis and visualization that seeks to address the problem of integrating varied tools as well as automatically documenting the methods and parameters employed. Growing from a specific project need to supporting a wide array of users required close collaborations in addition to new research ideas to design a usable and efficient system. The VisTrails project now includes standard software processes like unit testing and developer documentation while serving as a base for further research. In this paper, we describe how VisTrails has developed and how our efforts in structuring and advertising the system have contributed to its adoption in many domains.

preprint2011arXiv

A wildland fire modeling and visualization environment

We present an overview of a modeling environment, consisting of a coupled atmosphere-wildfire model, utilities for visualization, data processing, and diagnostics, open source software repositories, and a community wiki. The fire model, called SFIRE, is based on a fire-spread model, implemented by the level-set method, and it is coupled with the Weather Research Forecasting (WRF) model. A version with a subset of the features is distributed with WRF 3.3 as WRF-Fire. In each time step, the fire module takes the wind as input and returns the latent and sensible heat fluxes. The software architecture uses WRF parallel infrastructure for massively parallel computing. Recent features of the code include interpolation from an ideal logarithmic wind profile for nonhomogeneous fuels and ignition from a fire perimeter with an atmosphere and fire spin-up. Real runs use online sources for fuel maps, fine-scale topography, and meteorological data, and can run faster than real time. Visualization pathways allow generating images and animations in many packages, including VisTrails, VAPOR, MayaVi, and Paraview, as well as output to Google Earth. The environment is available from openwfm.org. New diagnostic variables were added to the code recently, including a new kind of fireline intensity, which takes into account also the speed of burning, unlike Byram's fireline intensity.

Claudio T. Silva

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

IntentVizor: Towards Generic Query Guided Interactive Video Summarization

Towards Global-Scale Crowd+AI Techniques to Map and Assess Sidewalks for People with Disabilities

Urban Rhapsody: Large-scale exploration of urban soundscapes

A Tracking System For Baseball Game Reconstruction

Learning Geo-Contextual Embeddings for Commuting Flow Prediction

Melody: Generating and Visualizing Machine Learning Model Summary to Understand Data and Classifiers Together

Urban Mosaic: Visual Exploration of Streetscapes Using Large-Scale Image Data

Enabling Reproducible Science with VisTrails

A wildland fire modeling and visualization environment