Researcher profile

David F. Fouhey

David F. Fouhey contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2022arXiv

PlaneFormers: From Sparse View Planes to 3D Reconstruction

We present an approach for the planar surface reconstruction of a scene from images with limited overlap. This reconstruction task is challenging since it requires jointly reasoning about single image 3D reconstruction, correspondence between images, and the relative camera pose between images. Past work has proposed optimization-based approaches. We introduce a simpler approach, the PlaneFormer, that uses a transformer applied to 3D-aware plane tokens to perform 3D reasoning. Our experiments show that our approach is substantially more effective than prior work, and that several 3D-specific design decisions are crucial for its success.

preprint2022arXiv

Understanding 3D Object Articulation in Internet Videos

We propose to investigate detecting and characterizing the 3D planar articulation of objects from ordinary videos. While seemingly easy for humans, this problem poses many challenges for computers. We propose to approach this problem by combining a top-down detection system that finds planes that can be articulated along with an optimization approach that solves for a 3D plane that can explain a sequence of observed articulations. We show that this system can be trained on a combination of videos and 3D scan datasets. When tested on a dataset of challenging Internet videos and the Charades dataset, our approach obtains strong performance. Project site: https://jasonqsy.github.io/Articulation3D

preprint2022arXiv

What's Behind the Couch? Directed Ray Distance Functions (DRDF) for 3D Scene Reconstruction

We present an approach for full 3D scene reconstruction from a single unseen image. We train on dataset of realistic non-watertight scans of scenes. Our approach predicts a distance function, since these have shown promise in handling complex topologies and large spaces. We identify and analyze two key challenges for predicting such image conditioned distance functions that have prevented their success on real 3D scene data. First, we show that predicting a conventional scene distance from an image requires reasoning over a large receptive field. Second, we analytically show that the optimal output of the network trained to predict these distance functions does not obey all the distance function properties. We propose an alternate distance function, the Directed Ray Distance Function (DRDF), that tackles both challenges. We show that a deep network trained to predict DRDFs outperforms all other methods quantitatively and qualitatively on 3D reconstruction from single image on Matterport3D, 3DFront, and ScanNet.

preprint2021arXiv

SynthIA: A Synthetic Inversion Approximation for the Stokes Vector Fusing SDO and Hinode into a Virtual Observatory

Both NASA's Solar Dynamics Observatory (SDO) and the JAXA/NASA Hinode mission include spectropolarimetric instruments designed to measure the photospheric magnetic field. SDO's Helioseismic and Magnetic Imager (HMI) emphasizes full-disk high-cadence and good spatial resolution data acquisition while Hinode's Solar Optical Telescope Spectro-Polarimeter (SOT-SP) focuses on high spatial resolution and spectral sampling at the cost of a limited field of view and slower temporal cadence. This work introduces a deep-learning system named SynthIA (Synthetic Inversion Approximation), that can enhance both missions by capturing the best of each instrument's characteristics. We use SynthIA to produce a new magnetogram data product, SynodeP (Synthetic Hinode Pipeline), that mimics magnetograms from the higher spectral resolution Hinode/SOT-SP pipeline, but is derived from full-disk, high-cadence, and lower spectral-resolution SDO/HMI Stokes observations. Results on held-out data show that SynodeP has good agreement with the Hinode/SOT-SP pipeline inversions, including magnetic fill fraction, which is not provided by the current SDO/HMI pipeline. SynodeP further shows a reduction in the magnitude of the 24-hour oscillations present in the SDO/HMI data. To demonstrate SynthIA's generality, we show the use of SDO/AIA data and subsets of the HMI data as inputs, which enables trade-offs between fidelity to the Hinode/SOT-SP inversions, number of observations used, and temporal artifacts. We discuss possible generalizations of SynthIA and its implications for space weather modeling. This work is part of the NASA Heliophysics DRIVE Science Center (SOLSTICE) at the University of Michigan under grant NASA 80NSSC20K0600E, and will be open-sourced.

preprint2020arXiv

Articulation-aware Canonical Surface Mapping

We tackle the tasks of: 1) predicting a Canonical Surface Mapping (CSM) that indicates the mapping from 2D pixels to corresponding points on a canonical template shape, and 2) inferring the articulation and pose of the template corresponding to the input image. While previous approaches rely on keypoint supervision for learning, we present an approach that can learn without such annotations. Our key insight is that these tasks are geometrically related, and we can obtain supervisory signal via enforcing consistency among the predictions. We present results across a diverse set of animal object categories, showing that our method can learn articulation and CSM prediction from image collections using only foreground mask labels for training. We empirically show that allowing articulation helps learn more accurate CSM prediction, and that enforcing the consistency with predicted CSM is similarly critical for learning meaningful articulation.

preprint2020arXiv

Associative3D: Volumetric Reconstruction from Sparse Views

This paper studies the problem of 3D volumetric reconstruction from two views of a scene with an unknown camera. While seemingly easy for humans, this problem poses many challenges for computers since it requires simultaneously reconstructing objects in the two views while also figuring out their relationship. We propose a new approach that estimates reconstructions, distributions over the camera/object and camera/camera transformations, as well as an inter-view object affinity matrix. This information is then jointly reasoned over to produce the most likely explanation of the scene. We train and test our approach on a dataset of indoor scenes, and rigorously evaluate the merits of our joint reasoning approach. Our experiments show that it is able to recover reasonable scenes from sparse views, while the problem is still challenging. Project site: https://jasonqsy.github.io/Associative3D

preprint2020arXiv

Full-Body Awareness from Partial Observations

There has been great progress in human 3D mesh recovery and great interest in learning about the world from consumer video data. Unfortunately current methods for 3D human mesh recovery work rather poorly on consumer video data, since on the Internet, unusual camera viewpoints and aggressive truncations are the norm rather than a rarity. We study this problem and make a number of contributions to address it: (i) we propose a simple but highly effective self-training framework that adapts human 3D mesh recovery systems to consumer videos and demonstrate its application to two recent systems; (ii) we introduce evaluation protocols and keypoint annotations for 13K frames across four consumer video datasets for studying this task, including evaluations on out-of-image keypoints; and (iii) we show that our method substantially improves PCK and human-subject judgments compared to baselines, both on test videos from the dataset it was trained on, as well as on three other datasets without further adaptation. Project website: https://crockwell.github.io/partial_humans

preprint2020arXiv

Novel Object Viewpoint Estimation through Reconstruction Alignment

The goal of this paper is to estimate the viewpoint for a novel object. Standard viewpoint estimation approaches generally fail on this task due to their reliance on a 3D model for alignment or large amounts of class-specific training data and their corresponding canonical pose. We overcome those limitations by learning a reconstruct and align approach. Our key insight is that although we do not have an explicit 3D model or a predefined canonical pose, we can still learn to estimate the object's shape in the viewer's frame and then use an image to provide our reference model or canonical pose. In particular, we propose learning two networks: the first maps images to a 3D geometry-aware feature bottleneck and is trained via an image-to-image translation loss; the second learns whether two instances of features are aligned. At test time, our model finds the relative transformation that best aligns the bottleneck features of our test image to a reference image. We evaluate our method on novel object viewpoint estimation by generalizing across different datasets, analyzing the impact of our different modules, and providing a qualitative analysis of the learned features to identify what representations are being learnt for alignment.

preprint2020arXiv

Understanding Human Hands in Contact at Internet Scale

Hands are the central means by which humans manipulate their world and being able to reliably extract hand state information from Internet videos of humans engaged in their hands has the potential to pave the way to systems that can learn from petabytes of video data. This paper proposes steps towards this by inferring a rich representation of hands engaged in interaction method that includes: hand location, side, contact state, and a box around the object in contact. To support this effort, we gather a large-scale dataset of hands in contact with objects consisting of 131 days of footage as well as a 100K annotated hand-contact video frame dataset. The learned model on this dataset can serve as a foundation for hand-contact understanding in videos. We quantitatively evaluate it both on its own and in service of predicting and learning from 3D meshes of human hands.