Source author record

Matthew Brown

Matthew Brown appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning hep-ph Robotics Artificial Intelligence astro-ph.IM cond-mat.mtrl-sci gr-qc Graphics hep-ex hep-th physics.chem-ph Systems and Control

Catalog footprint

What is connected

14works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

LA-Pose: Latent Action Pretraining Meets Pose Estimation

This paper revisits camera pose estimation through the lens of self-supervised pretraining, focusing on inverse-dynamics pretraining as a scalable alternative to the current trend of fully supervised training with 3D annotations. Concretely, we employ inverse- and forward-dynamics models to learn latent action representations, similar to Genie from large-scale driving videos. Our idea is simple yet effective. Existing methods use latent actions in their original capacity, that is, as action conditioning of world-models or as proxies of robot action parameters in policy networks. Our method, dubbed LA-Pose, repurposes the latent action features as inputs to a camera pose estimator, finetuned on a limited set of high-quality 3D annotations. This formulation enables accurate and generalizable pose prediction while maintaining feed-forward efficiency. Extensive experiments on driving benchmarks show that LA-Pose achieves competitive and even superior performance to state-of-the-art methods while using orders of magnitude less labeled data. Concretely, on the Waymo and PandaSet benchmarks, LA-Pose achieves over 10% higher pose accuracy than recent feed-forward methods. To our knowledge, this work is the first to demonstrate the power of inverse-dynamics self-supervised learning for pose estimation.

preprint2022arXiv

The JWST Early Release Observations

The James Webb Space Telescope (JWST) Early Release Observations (EROs) is a set of public outreach products created to mark the end of commissioning and the beginning of science operations for JWST. Colloquially known as the "Webb First Images and Spectra", these products were intended to demonstrate to the worldwide public that JWST is ready for science, and is capable of producing spectacular results. The package was released on July 12, 2022, and included images and spectra of the galaxy cluster SMACS~J0723.3-7327 and distant lensed galaxies, the interacting galaxy group Stephan's Quintet, NGC 3324 in the Carina star-forming complex, the Southern Ring planetary nebula NGC 3132, and the transiting hot Jupiter WASP 96b. This paper describes the ERO technical design, observations, and scientific processing of data underlying the colorful outreach products.

preprint2020arXiv

AirSim Drone Racing Lab

Autonomous drone racing is a challenging research problem at the intersection of computer vision, planning, state estimation, and control. We introduce AirSim Drone Racing Lab, a simulation framework for enabling fast prototyping of algorithms for autonomy and enabling machine learning research in this domain, with the goal of reducing the time, money, and risks associated with field robotics. Our framework enables generation of racing tracks in multiple photo-realistic environments, orchestration of drone races, comes with a suite of gate assets, allows for multiple sensor modalities (monocular, depth, neuromorphic events, optical flow), different camera models, and benchmarking of planning, control, computer vision, and learning-based algorithms. We used our framework to host a simulation based drone racing competition at NeurIPS 2019. The competition binaries are available at our github repository.

preprint2020arXiv

Federated Visual Classification with Real-World Data Distribution

Federated Learning enables visual models to be trained on-device, bringing advantages for user privacy (data need never leave the device), but challenges in terms of data diversity and quality. Whilst typical models in the datacenter are trained using data that are independent and identically distributed (IID), data at source are typically far from IID. Furthermore, differing quantities of data are typically available at each device (imbalance). In this work, we characterize the effect these real-world data distributions have on distributed learning, using as a benchmark the standard Federated Averaging (FedAvg) algorithm. To do so, we introduce two new large-scale datasets for species and landmark classification, with realistic per-user data splits that simulate real-world edge learning scenarios. We also develop two new algorithms (FedVC, FedIR) that intelligently resample and reweight over the client pool, bringing large improvements in accuracy and stability in training. The datasets are made available online.

preprint2020arXiv

GeLaTO: Generative Latent Textured Objects

Accurate modeling of 3D objects exhibiting transparency, reflections and thin structures is an extremely challenging problem. Inspired by billboards and geometric proxies used in computer graphics, this paper proposes Generative Latent Textured Objects (GeLaTO), a compact representation that combines a set of coarse shape proxies defining low frequency geometry with learned neural textures, to encode both medium and fine scale geometry as well as view-dependent appearance. To generate the proxies' textures, we learn a joint latent space allowing category-level appearance and geometry interpolation. The proxies are independently rasterized with their corresponding neural texture and composited using a U-Net, which generates an output photorealistic image including an alpha map. We demonstrate the effectiveness of our approach by reconstructing complex objects from a sparse set of views. We show results on a dataset of real images of eyeglasses frames, which are particularly challenging to reconstruct using classical methods. We also demonstrate that these coarse proxies can be handcrafted when the underlying object geometry is easy to model, like eyeglasses, or generated using a neural network for more complex categories, such as cars.

preprint2020arXiv

Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition from a Domain Adaptation Perspective

Object frequency in the real world often follows a power law, leading to a mismatch between datasets with long-tailed class distributions seen by a machine learning model and our expectation of the model to perform well on all classes. We analyze this mismatch from a domain adaptation point of view. First of all, we connect existing class-balanced methods for long-tailed classification to target shift, a well-studied scenario in domain adaptation. The connection reveals that these methods implicitly assume that the training data and test data share the same class-conditioned distribution, which does not hold in general and especially for the tail classes. While a head class could contain abundant and diverse training examples that well represent the expected data at inference time, the tail classes are often short of representative training data. To this end, we propose to augment the classic class-balanced learning by explicitly estimating the differences between the class-conditioned distributions with a meta-learning approach. We validate our approach with six benchmark datasets and three loss functions.

preprint2020arXiv

When Ensembling Smaller Models is More Efficient than Single Large Models

Ensembling is a simple and popular technique for boosting evaluation performance by training multiple models (e.g., with different initializations) and aggregating their predictions. This approach is commonly reserved for the largest models, as it is commonly held that increasing the model size provides a more substantial reduction in error than ensembling smaller models. However, we show results from experiments on CIFAR-10 and ImageNet that ensembles can outperform single models with both higher accuracy and requiring fewer total FLOPs to compute, even when those individual models' weights and hyperparameters are highly optimized. Furthermore, this gap in improvement widens as models become large. This presents an interesting observation that output diversity in ensembling can often be more efficient than training larger models, especially when the models approach the size of what their dataset can foster. Instead of using the common practice of tuning a single large model, one can use ensembles as a more flexible trade-off between a model's inference speed and accuracy. This also potentially eases hardware design, e.g., an easier way to parallelize the model across multiple workers for real-time or distributed inference.

preprint2019arXiv

Contingency Model Predictive Control for Automated Vehicles

We present Contingency Model Predictive Control (CMPC), a novel and implementable control framework which tracks a desired path while simultaneously maintaining a contingency plan -- an alternate trajectory to avert an identified potential emergency. In this way, CMPC anticipates events that might take place, instead of reacting when emergencies occur. We accomplish this by adding an additional prediction horizon in parallel to the classical receding MPC horizon. The contingency horizon is constrained to maintain a feasible avoidance solution; as such, CMPC is selectively robust to this emergency while tracking the desired path as closely as possible. After defining the framework mathematically, we demonstrate its effectiveness experimentally by comparing its performance to a state-of-the-art deterministic MPC. The controllers drive an automated research platform through a left-hand turn which may be covered by ice. Contingency MPC prepares for the potential loss of friction by purposefully and intuitively deviating from the prescribed path to approach the turn more conservatively; this deviation significantly mitigates the consequence of encountering ice.

preprint2016arXiv

Decision Forests, Convolutional Networks and the Models in-Between

This paper investigates the connections between two state of the art classifiers: decision forests (DFs, including decision jungles) and convolutional neural networks (CNNs). Decision forests are computationally efficient thanks to their conditional computation property (computation is confined to only a small region of the tree, the nodes along a single branch). CNNs achieve state of the art accuracy, thanks to their representation learning capabilities. We present a systematic analysis of how to fuse conditional computation with representation learning and achieve a continuum of hybrid models with different ratios of accuracy vs. efficiency. We call this new family of hybrid models conditional networks. Conditional networks can be thought of as: i) decision trees augmented with data transformation operators, or ii) CNNs, with block-diagonal sparse weight matrices, and explicit data routing functions. Experimental validation is performed on the common task of image classification on both the CIFAR and Imagenet datasets. Compared to state of the art CNNs, our hybrid models yield the same accuracy with a fraction of the compute cost and much smaller number of parameters.

preprint2016arXiv

Drift Robust Non-rigid Optical Flow Enhancement for Long Sequences

It is hard to densely track a nonrigid object in long term, which is a fundamental research issue in the computer vision community. This task often relies on estimating pairwise correspondences between images over time where the error is accumulated and leads to a drift issue. In this paper, we introduce a novel optimization framework with an Anchor Patch constraint. It is supposed to significantly reduce overall errors given long sequences containing non-rigidly deformable objects. Our framework can be applied to any dense tracking algorithm, e.g. optical flow. We demonstrate the success of our approach by showing significant error reduction on 6 popular optical flow algorithms applied to a range of real-world nonrigid benchmarks. We also provide quantitative analysis of our approach given synthetic occlusions and image noise.

preprint2016arXiv

Nonrigid Optical Flow Ground Truth for Real-World Scenes with Time-Varying Shading Effects

In this paper we present a dense ground truth dataset of nonrigidly deforming real-world scenes. Our dataset contains both long and short video sequences, and enables the quantitatively evaluation for RGB based tracking and registration methods. To construct ground truth for the RGB sequences, we simultaneously capture Near-Infrared (NIR) image sequences where dense markers - visible only in NIR - represent ground truth positions. This allows for comparison with automatically tracked RGB positions and the formation of error metrics. Most previous datasets containing nonrigidly deforming sequences are based on synthetic data. Our capture protocol enables us to acquire real-world deforming objects with realistic photometric effects - such as blur and illumination change - as well as occlusion and complex deformations. A public evaluation website is constructed to allow for ranking of RGB image based optical flow and other dense tracking algorithms, with various statistical measures. Furthermore, we present an RGB-NIR multispectral optical flow model allowing for energy optimization by adoptively combining featured information from both the RGB and the complementary NIR channels. In our experiments we evaluate eight existing RGB based optical flow methods on our new dataset. We also evaluate our hybrid optical flow algorithm by comparing to two existing multispectral approaches, as well as varying our input channels across RGB, NIR and RGB-NIR.

preprint2013arXiv

Discovering Minimal Universal Extra Dimensions (MUED) at the LHC

In this work we discuss our consistent implementation of the minimal model of Universal Extra Dimensions in CalcHEP. We pay special attention to the gauge invariance issues that arise due to the incorporation of 5D quantum corrections. After validating the implementation we perform a complete study of the tri-lepton signature, including a realistic estimate of the backgrounds, for the present LHC energy and luminosity. We also derive the expected LHC discovery reach for different luminosities, at centre-of-mass energies of both 7 TeV and 8 TeV.

preprint2012arXiv

Testing Minimal Universal Extra Dimensions Using Higgs Boson Searches at the LHC

Large Hadron Collider (LHC) searches for the SM Higgs boson provide a powerful limit on models involving Universal Extra Dimensions (UED) where the Higgs production is enhanced. We have evaluated all one-loop diagrams for Higgs production from gluon fusion and decay to two photons within "minimal" UED (mUED), independently confirming previous results, and we have evaluated enhancement factors for Higgs boson production and decay over the mUED parameter space. Using these we have derived limits on the parameter space, combining data from both ATLAS and CMS collaborations for the most recent 7 TeV and 8 TeV LHC data. We have performed a rigorous statistical combination of several Higgs boson search channels which is important because mUED signatures from the Higgs boson are not universally enhanced. We have found that 1/R < 500 GeV is excluded at 95% CL, while for larger 1/R only a very narrow (\pm1-4 GeV) mass window around m_h = 125 GeV and another window (up to 2 GeV wide for 1/R > 1000 GeV) around m_h = 118 GeV are left. The latter is likely to be excluded as more data becomes available whereas the region around 125 GeV is where the recently discovered Higgs-like particle was observed and therefore where the exclusion limit is weaker. It is worth stressing that mUED predicts an enhancement for all channels for Higgs production by gluon fusion and decay while the vector boson fusion process WW/ZZ -> h -> AA is generically suppressed and WW/ZZ -> h -> WW*/ZZ* is standard. Therefore, as more 8 TeV LHC data becomes available, the information on individual Higgs boson production and decay processes provided by the CMS and ATLAS experiments can be effectively used to favour mUED or exclude it further.

preprint2010arXiv

Energies of the first row atoms from quantum Monte Carlo

All-electron variational and diffusion quantum Monte Carlo calculations of the ground state energies of the first row atoms (Li to Ne) are reported. We use trial wavefunctions of four types: single determinant Slater-Jastrow wavefunctions; multi-determinant Slater-Jastrow wavefunctions; single determinant Slater-Jastrow wavefunctions with backflow transformations; multi-determinant Slater-Jastrow wavefunctions with backflow transformations. At the diffusion quantum Monte Carlo level and using our best trial wavefunctions we recover 99% or more of the correlation energy for Li, Be, B, C, N, and Ne, 97% for O, and 98% for F.

Matthew Brown

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

LA-Pose: Latent Action Pretraining Meets Pose Estimation

The JWST Early Release Observations

AirSim Drone Racing Lab

Federated Visual Classification with Real-World Data Distribution

GeLaTO: Generative Latent Textured Objects

Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition from a Domain Adaptation Perspective

When Ensembling Smaller Models is More Efficient than Single Large Models

Contingency Model Predictive Control for Automated Vehicles

Decision Forests, Convolutional Networks and the Models in-Between

Drift Robust Non-rigid Optical Flow Enhancement for Long Sequences

Nonrigid Optical Flow Ground Truth for Real-World Scenes with Time-Varying Shading Effects

Discovering Minimal Universal Extra Dimensions (MUED) at the LHC

Testing Minimal Universal Extra Dimensions Using Higgs Boson Searches at the LHC

Energies of the first row atoms from quantum Monte Carlo