Source author record

Anthony Rowe

Anthony Rowe appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Robotics Artificial Intelligence eess.SP eess.SY Information Theory Machine Learning math.IT Networking and Internet Architecture Systems and Control

Catalog footprint

What is connected

4works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Flame3D: Zero-shot Compositional Reasoning of 3D Scenes with Agentic Language Models

3D scene understanding spans reasoning about free space, object grounding, hypothetical object insertions, complex geometric relationships, and integrating all of these with external tools and data sources. Existing 3D understanding methods typically rely on large-scale 3D-language training or focus on object grounding and simple spatial relationships. We argue that the broad generalization that motivates 3D-language training can be achieved at inference time, without 3D-specific training. We propose Flame3D, a training-free framework that represents scenes as editable visual-textual 3D memories and exposes them to an off-the-shelf MLLM through composable spatial tools. Flame3D also lets the agent synthesize custom spatial programs at inference time, enabling open-ended reasoning over layouts, empty space, and objects not yet present in the scene. External data and corrections can be added to the memory without retraining. In addition to showing competitive performance to finetuned 3D-LMM methods on ScanQA, we study multi-hop 3D reasoning capabilities of Flame3D by evaluating it on a curated compositional spatial-reasoning benchmark, Compose3D. We find that fixed tools fall short and that the agent's ability to synthesize spatial operations at inference time is essential. These results invite the question: should future progress in 3D scene understanding focus on richer scene memories and expressive compositional abstractions?

preprint2022arXiv

A Hybrid mmWave and Camera System for Long-Range Depth Imaging

mmWave radars offer excellent depth resolution even at very long ranges owing to their high bandwidth. But their angular resolution is at least an order-of-magnitude worse than camera and lidar systems. Hence, mmWave radar is not a capable 3-D imaging solution in isolation. We propose Metamoran, a system that combines the complimentary strengths of radar and camera to obtain accurate, high resolution depth images over long ranges even in high clutter environments, all from a single fixed vantage point. Metamoran enables rich long-range depth imaging with applications in security and surveillance, roadside safety infrastructure and wide-area mapping. Our approach leverages the high angular resolution from cameras using computer vision techniques, including image segmentation and monocular depth estimation, to obtain object shape. Our core contribution is a method to convert this object shape into an RF I/Q equivalent, which we use in a novel radar processing pipeline to help declutter the scene and capture extremely weak reflections from objects at long distances. We perform a detailed evaluation of Metamoran's depth imaging capabilities in 400 diverse scenes. Our evaluation shows that Metamoran estimates the depth of static objects up to 90 m and moving objects up to 305 m and with a median error of 28 cm, an improvement of 13$\times$ compared to a naive radar+camera baseline and 23$\times$ compared to monocular depth estimation.

preprint2020arXiv

Data-driven Thermal Model Inference with ARMAX, in Smart Environments, based on Normalized Mutual Information

Understanding the models that characterize the thermal dynamics in a smart building is important for the comfort of its occupants and for its energy optimization. A significant amount of research has attempted to utilize thermodynamics (physical) models for smart building control, but these approaches remain challenging due to the stochastic nature of the intermittent environmental disturbances. This paper presents a novel data-driven approach for indoor thermal model inference, which combines an Autoregressive Moving Average with eXogenous inputs model (ARMAX) with a Normalized Mutual Information scheme (NMI). Based on this information-theoretic method, NMI, causal dependencies between the indoor temperature and exogenous inputs are explicitly obtained as a guideline for the ARMAX model to find the dominating inputs. For validation, we use three datasets based on building energy systems-against which we compare our method to an autoregressive model with exogenous inputs (ARX), a regularized ARMAX model, and state-space models.

preprint2015arXiv

Tracking Motion and Proxemics using Thermal-sensor Array

Indoor tracking has all-pervasive applications beyond mere surveillance, for example in education, health monitoring, marketing, energy management and so on. Image and video based tracking systems are intrusive. Thermal array sensors on the other hand can provide coarse-grained tracking while preserving privacy of the subjects. The goal of the project is to facilitate motion detection and group proxemics modeling using an 8 x 8 infrared sensor array. Each of the 8 x 8 pixels is a temperature reading in Fahrenheit. We refer to each 8 x 8 matrix as a scene. We collected approximately 902 scenes with different configurations of human groups and different walking directions. We infer direction of motion of a subject across a set of scenes as left-to-right, right-to-left, up-to-down and down-to-up using cross-correlation analysis. We used features from connected component analysis of each background subtracted scene and performed Support Vector Machine classification to estimate number of instances of human subjects in the scene.