Source author record

Ashok Veeraraghavan

Ashok Veeraraghavan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.IV Graphics Machine Learning physics.optics Applications math.OC

Catalog footprint

What is connected

20works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

GeoViSTA: Geospatial Vision-Tabular Transformer for Multimodal Environment Representation

Large-scale pretraining on Earth observation imagery has yielded powerful representations of the natural and built environment. However, most existing geospatial foundation models do not directly model the structured socioeconomic covariates typically stored in tabular form. This modality gap limits their ability to capture the complete total environment, which is critical for reasoning about complex environmental, social, and health-related outcomes. In this work, we propose GeoViSTA (Geospatial Vision-Tabular Transformer), a vision-tabular architecture that learns unified geospatial embeddings from co-registered gridded imagery and tabular data. GeoViSTA utilizes bilateral cross-attention to exchange spatial and semantic information across modalities, guided by a geography-aware attention mechanism that aligns continuous image patches with irregular census-tract tokens. We train GeoViSTA with a self-supervised joint masked-autoencoding objective, forcing it to recover missing image patches and tabular rows using local spatial context and cross-modal cues. Empirically, GeoViSTA's unified embeddings improve linear probing performance on high-impact downstream tasks, outperforming baselines in predicting disease-specific mortality and fire hazard frequency across held-out regions. These results demonstrate that jointly modeling the physical environment alongside structured socioeconomic context yields highly transferable representations for holistic geospatial inference.

preprint2023arXiv

WIRE: Wavelet Implicit Neural Representations

Implicit neural representations (INRs) have recently advanced numerous vision-related areas. INR performance depends strongly on the choice of the nonlinear activation function employed in its multilayer perceptron (MLP) network. A wide range of nonlinearities have been explored, but, unfortunately, current INRs designed to have high accuracy also suffer from poor robustness (to signal noise, parameter variation, etc.). Inspired by harmonic analysis, we develop a new, highly accurate and robust INR that does not exhibit this tradeoff. Wavelet Implicit neural REpresentation (WIRE) uses a continuous complex Gabor wavelet activation function that is well-known to be optimally concentrated in space-frequency and to have excellent biases for representing images. A wide range of experiments (image denoising, image inpainting, super-resolution, computed tomography reconstruction, image overfitting, and novel view synthesis with neural radiance fields) demonstrate that WIRE defines the new state of the art in INR accuracy, training time, and robustness.

preprint2022arXiv

DeepTensor: Low-Rank Tensor Decomposition with Deep Network Priors

DeepTensor is a computationally efficient framework for low-rank decomposition of matrices and tensors using deep generative networks. We decompose a tensor as the product of low-rank tensor factors (e.g., a matrix as the outer product of two vectors), where each low-rank tensor is generated by a deep network (DN) that is trained in a self-supervised manner to minimize the mean-squared approximation error. Our key observation is that the implicit regularization inherent in DNs enables them to capture nonlinear signal structures (e.g., manifolds) that are out of the reach of classical linear methods like the singular value decomposition (SVD) and principal component analysis (PCA). Furthermore, in contrast to the SVD and PCA, whose performance deteriorates when the tensor's entries deviate from additive white Gaussian noise, we demonstrate that the performance of DeepTensor is robust to a wide range of distributions. We validate that DeepTensor is a robust and computationally efficient drop-in replacement for the SVD, PCA, nonnegative matrix factorization (NMF), and similar decompositions by exploring a range of real-world applications, including hyperspectral image denoising, 3D MRI tomography, and image classification. In particular, DeepTensor offers a 6dB signal-to-noise ratio improvement over standard denoising methods for signals corrupted by Poisson noise and learns to decompose 3D tensors 60 times faster than a single DN equipped with 3D convolutions.

preprint2022arXiv

Distributed Generalized Wirtinger Flow for Interferometric Imaging on Networks

We study the problem of decentralized interferometric imaging over networks, where agents have access to a subset of local radar measurements and can compute pair-wise correlations with their neighbors. We propose a primal-dual distributed algorithm named Distributed Generalized Wirtinger Flow (DGWF). We use the theory of low rank matrix recovery to show when the interferometric imaging problem satisfies the Regularity Condition, which implies the Polyak-Lojasiewicz inequality. Moreover, we show that DGWF converges geometrically for smooth functions. Numerical simulations for single-scattering radar interferometric imaging demonstrate that DGWF can achieve the same mean-squared error image reconstruction quality as its centralized counterpart for various network connectivity and size.

preprint2022arXiv

MINER: Multiscale Implicit Neural Representations

We introduce a new neural signal model designed for efficient high-resolution representation of large-scale signals. The key innovation in our multiscale implicit neural representation (MINER) is an internal representation via a Laplacian pyramid, which provides a sparse multiscale decomposition of the signal that captures orthogonal parts of the signal across scales. We leverage the advantages of the Laplacian pyramid by representing small disjoint patches of the pyramid at each scale with a small MLP. This enables the capacity of the network to adaptively increase from coarse to fine scales, and only represent parts of the signal with strong signal energy. The parameters of each MLP are optimized from coarse-to-fine scale which results in faster approximations at coarser scales, thereby ultimately an extremely fast training process. We apply MINER to a range of large-scale signal representation tasks, including gigapixel images and very large point clouds, and demonstrate that it requires fewer than 25% of the parameters, 33% of the memory footprint, and 10% of the computation time of competing techniques such as ACORN to reach the same representation accuracy.

preprint2022arXiv

PANDORA: Polarization-Aided Neural Decomposition Of Radiance

Reconstructing an object's geometry and appearance from multiple images, also known as inverse rendering, is a fundamental problem in computer graphics and vision. Inverse rendering is inherently ill-posed because the captured image is an intricate function of unknown lighting conditions, material properties and scene geometry. Recent progress in representing scene properties as coordinate-based neural networks have facilitated neural inverse rendering resulting in impressive geometry reconstruction and novel-view synthesis. Our key insight is that polarization is a useful cue for neural inverse rendering as polarization strongly depends on surface normals and is distinct for diffuse and specular reflectance. With the advent of commodity, on-chip, polarization sensors, capturing polarization has become practical. Thus, we propose PANDORA, a polarimetric inverse rendering approach based on implicit neural representations. From multi-view polarization images of an object, PANDORA jointly extracts the object's 3D geometry, separates the outgoing radiance into diffuse and specular and estimates the illumination incident on the object. We show that PANDORA outperforms state-of-the-art radiance decomposition techniques. PANDORA outputs clean surface reconstructions free from texture artefacts, models strong specularities accurately and estimates illumination under practical unstructured scenarios.

preprint2022arXiv

PS$^2$F: Polarized Spiral Point Spread Function for Single-Shot 3D Sensing

We propose a compact snapshot monocular depth estimation technique that relies on an engineered point spread function (PSF). Traditional approaches used in microscopic super-resolution imaging such as the Double-Helix PSF (DHPSF) are ill-suited for scenes that are more complex than a sparse set of point light sources. We show, using the Cramér-Rao lower bound, that separating the two lobes of the DHPSF and thereby capturing two separate images leads to a dramatic increase in depth accuracy. A special property of the phase mask used for generating the DHPSF is that a separation of the phase mask into two halves leads to a spatial separation of the two lobes. We leverage this property to build a compact polarization-based optical setup, where we place two orthogonal linear polarizers on each half of the DHPSF phase mask and then capture the resulting image with a polarization-sensitive camera. Results from simulations and a lab prototype demonstrate that our technique achieves up to $50\%$ lower depth error compared to state-of-the-art designs including the DHPSF and the Tetrapod PSF, with little to no loss in spatial resolution.

preprint2020arXiv

SASSI -- Super-Pixelated Adaptive Spatio-Spectral Imaging

We introduce a novel video-rate hyperspectral imager with high spatial, and temporal resolutions. Our key hypothesis is that spectral profiles of pixels in a super-pixel of an oversegmented image tend to be very similar. Hence, a scene-adaptive spatial sampling of an hyperspectral scene, guided by its super-pixel segmented image, is capable of obtaining high-quality reconstructions. To achieve this, we acquire an RGB image of the scene, compute its super-pixels, from which we generate a spatial mask of locations where we measure high-resolution spectrum. The hyperspectral image is subsequently estimated by fusing the RGB image and the spectral measurements using a learnable guided filtering approach. Due to low computational complexity of the superpixel estimation step, our setup can capture hyperspectral images of the scenes with little overhead over traditional snapshot hyperspectral cameras, but with significantly higher spatial and spectral resolutions. We validate the proposed technique with extensive simulations as well as a lab prototype that measures hyperspectral video at a spatial resolution of $600 \times 900$ pixels, at a spectral resolution of 10 nm over visible wavebands, and achieving a frame rate at $18$fps.

preprint2016arXiv

ASP Vision: Optically Computing the First Layer of Convolutional Neural Networks using Angle Sensitive Pixels

Deep learning using convolutional neural networks (CNNs) is quickly becoming the state-of-the-art for challenging computer vision applications. However, deep learning's power consumption and bandwidth requirements currently limit its application in embedded and mobile systems with tight energy budgets. In this paper, we explore the energy savings of optically computing the first layer of CNNs. To do so, we utilize bio-inspired Angle Sensitive Pixels (ASPs), custom CMOS diffractive image sensors which act similar to Gabor filter banks in the V1 layer of the human visual cortex. ASPs replace both image sensing and the first layer of a conventional CNN by directly performing optical edge filtering, saving sensing energy, data bandwidth, and CNN FLOPS to compute. Our experimental results (both on synthetic data and a hardware prototype) for a variety of vision tasks such as digit recognition, object recognition, and face identification demonstrate using ASPs while achieving similar performance compared to traditional deep learning pipelines.

preprint2016arXiv

FlatCam: Thin, Bare-Sensor Cameras using Coded Aperture and Computation

FlatCam is a thin form-factor lensless camera that consists of a coded mask placed on top of a bare, conventional sensor array. Unlike a traditional, lens-based camera where an image of the scene is directly recorded on the sensor pixels, each pixel in FlatCam records a linear combination of light from multiple scene elements. A computational algorithm is then used to demultiplex the recorded measurements and reconstruct an image of the scene. FlatCam is an instance of a coded aperture imaging system; however, unlike the vast majority of related work, we place the coded mask extremely close to the image sensor that can enable a thin system. We employ a separable mask to ensure that both calibration and image reconstruction are scalable in terms of memory requirements and computational complexity. We demonstrate the potential of the FlatCam design using two prototypes: one at visible wavelengths and one at infrared wavelengths.

preprint2016arXiv

TabletGaze: Unconstrained Appearance-based Gaze Estimation in Mobile Tablets

We study gaze estimation on tablets, our key design goal is uncalibrated gaze estimation using the front-facing camera during natural use of tablets, where the posture and method of holding the tablet is not constrained. We collected the first large unconstrained gaze dataset of tablet users, labeled Rice TabletGaze dataset. The dataset consists of 51 subjects, each with 4 different postures and 35 gaze locations. Subjects vary in race, gender and in their need for prescription glasses, all of which might impact gaze estimation accuracy. Driven by our observations on the collected data, we present a TabletGaze algorithm for automatic gaze estimation using multi-level HoG feature and Random Forests regressor. The TabletGaze algorithm achieves a mean error of 3.17 cm. We perform extensive evaluation on the impact of various factors such as dataset size, race, wearing glasses and user posture on the gaze estimation accuracy and make important observations about the impact of these factors.

preprint2015arXiv

Depth Fields: Extending Light Field Techniques to Time-of-Flight Imaging

A variety of techniques such as light field, structured illumination, and time-of-flight (TOF) are commonly used for depth acquisition in consumer imaging, robotics and many other applications. Unfortunately, each technique suffers from its individual limitations preventing robust depth sensing. In this paper, we explore the strengths and weaknesses of combining light field and time-of-flight imaging, particularly the feasibility of an on-chip implementation as a single hybrid depth sensor. We refer to this combination as depth field imaging. Depth fields combine light field advantages such as synthetic aperture refocusing with TOF imaging advantages such as high depth resolution and coded signal processing to resolve multipath interference. We show applications including synthesizing virtual apertures for TOF imaging, improved depth mapping through partial and scattering occluders, and single frequency TOF phase unwrapping. Utilizing space, angle, and temporal coding, depth fields can improve depth sensing in the wild and generate new insights into the dimensions of light's plenoptic function.

preprint2015arXiv

DistancePPG: Robust non-contact vital signs monitoring using a camera

Vital signs such as pulse rate and breathing rate are currently measured using contact probes. But, non-contact methods for measuring vital signs are desirable both in hospital settings (e.g. in NICU) and for ubiquitous in-situ health tracking (e.g. on mobile phone and computers with webcams). Recently, camera-based non-contact vital sign monitoring have been shown to be feasible. However, camera-based vital sign monitoring is challenging for people with darker skin tone, under low lighting conditions, and/or during movement of an individual in front of the camera. In this paper, we propose distancePPG, a new camera-based vital sign estimation algorithm which addresses these challenges. DistancePPG proposes a new method of combining skin-color change signals from different tracked regions of the face using a weighted average, where the weights depend on the blood perfusion and incident light intensity in the region, to improve the signal-to-noise ratio (SNR) of camera-based estimate. One of our key contributions is a new automatic method for determining the weights based only on the video recording of the subject. The gains in SNR of camera-based PPG estimated using distancePPG translate into reduction of the error in vital sign estimation, and thus expand the scope of camera-based vital sign monitoring to potentially challenging scenarios. Further, a dataset will be released, comprising of synchronized video recordings of face and pulse oximeter based ground truth recordings from the earlobe for people with different skin tones, under different lighting conditions and for various motion scenarios.

preprint2015arXiv

FPA-CS: Focal Plane Array-based Compressive Imaging in Short-wave Infrared

Cameras for imaging in short and mid-wave infrared spectra are significantly more expensive than their counterparts in visible imaging. As a result, high-resolution imaging in those spectrum remains beyond the reach of most consumers. Over the last decade, compressive sensing (CS) has emerged as a potential means to realize inexpensive short-wave infrared cameras. One approach for doing this is the single-pixel camera (SPC) where a single detector acquires coded measurements of a high-resolution image. A computational reconstruction algorithm is then used to recover the image from these coded measurements. Unfortunately, the measurement rate of a SPC is insufficient to enable imaging at high spatial and temporal resolutions. We present a focal plane array-based compressive sensing (FPA-CS) architecture that achieves high spatial and temporal resolutions. The idea is to use an array of SPCs that sense in parallel to increase the measurement rate, and consequently, the achievable spatio-temporal resolution of the camera. We develop a proof-of-concept prototype in the short-wave infrared using a sensor with 64$\times$ 64 pixels; the prototype provides a 4096$\times$ increase in the measurement rate compared to the SPC and achieves a megapixel resolution at video rate using CS techniques.

preprint2015arXiv

Spatial Phase-Sweep: Increasing temporal resolution of transient imaging using a light source array

Transient imaging or light-in-flight techniques capture the propagation of an ultra-short pulse of light through a scene, which in effect captures the optical impulse response of the scene. Recently, it has been shown that we can capture transient images using commercially available Time-of-Flight (ToF) systems such as Photonic Mixer Devices (PMD). In this paper, we propose `spatial phase-sweep', a technique that exploits the speed of light to increase the temporal resolution beyond the 100 picosecond limit imposed by current electronics. Spatial phase-sweep uses a linear array of light sources with spatial separation of about 3 mm between them, thereby resulting in a time shift of about 10 picoseconds, which translates into 100 Gfps of transient imaging in theory. We demonstrate a prototype and transient imaging results using spatial phase-sweep.

preprint2015arXiv

Toward Long Distance, Sub-diffraction Imaging Using Coherent Camera Arrays

In this work, we propose using camera arrays coupled with coherent illumination as an effective method of improving spatial resolution in long distance images by a factor of ten and beyond. Recent advances in ptychography have demonstrated that one can image beyond the diffraction limit of the objective lens in a microscope. We demonstrate a similar imaging system to image beyond the diffraction limit in long range imaging. We emulate a camera array with a single camera attached to an X-Y translation stage. We show that an appropriate phase retrieval based reconstruction algorithm can be used to effectively recover the lost high resolution details from the multiple low resolution acquired images. We analyze the effects of noise, required degree of image overlap, and the effect of increasing synthetic aperture size on the reconstructed image quality. We show that coherent camera arrays have the potential to greatly improve imaging performance. Our simulations show resolution gains of 10x and more are achievable. Furthermore, experimental results from our proof-of-concept systems show resolution gains of 4x-7x for real scenes. Finally, we introduce and analyze in simulation a new strategy to capture macroscopic Fourier Ptychography images in a single snapshot, albeit using a camera array.

preprint2014arXiv

A Framework for the Analysis of Computational Imaging Systems with Practical Applications

Over the last decade, a number of Computational Imaging (CI) systems have been proposed for tasks such as motion deblurring, defocus deblurring and multispectral imaging. These techniques increase the amount of light reaching the sensor via multiplexing and then undo the deleterious effects of multiplexing by appropriate reconstruction algorithms. Given the widespread appeal and the considerable enthusiasm generated by these techniques, a detailed performance analysis of the benefits conferred by this approach is important. Unfortunately, a detailed analysis of CI has proven to be a challenging problem because performance depends equally on three components: (1) the optical multiplexing, (2) the noise characteristics of the sensor, and (3) the reconstruction algorithm. A few recent papers have performed analysis taking multiplexing and noise characteristics into account. However, analysis of CI systems under state-of-the-art reconstruction algorithms, most of which exploit signal prior models, has proven to be unwieldy. In this paper, we present a comprehensive analysis framework incorporating all three components. In order to perform this analysis, we model the signal priors using a Gaussian Mixture Model (GMM). A GMM prior confers two unique characteristics. Firstly, GMM satisfies the universal approximation property which says that any prior density function can be approximated to any fidelity using a GMM with appropriate number of mixtures. Secondly, a GMM prior lends itself to analytical tractability allowing us to derive simple expressions for the `minimum mean square error' (MMSE), which we use as a metric to characterize the performance of CI systems. We use our framework to analyze several previously proposed CI techniques, giving conclusive answer to the question: `How much performance gain is due to use of a signal prior and how much is due to multiplexing?

preprint2014arXiv

Fast Sublinear Sparse Representation using Shallow Tree Matching Pursuit

Sparse approximations using highly over-complete dictionaries is a state-of-the-art tool for many imaging applications including denoising, super-resolution, compressive sensing, light-field analysis, and object recognition. Unfortunately, the applicability of such methods is severely hampered by the computational burden of sparse approximation: these algorithms are linear or super-linear in both the data dimensionality and size of the dictionary. We propose a framework for learning the hierarchical structure of over-complete dictionaries that enables fast computation of sparse representations. Our method builds on tree-based strategies for nearest neighbor matching, and presents domain-specific enhancements that are highly efficient for the analysis of image patches. Contrary to most popular methods for building spatial data structures, out methods rely on shallow, balanced trees with relatively few layers. We show an extensive array of experiments on several applications such as image denoising/superresolution, compressive video/light-field sensing where we practically achieve 100-1000x speedup (with a less than 1dB loss in accuracy).

preprint2012arXiv

Reconstruction of hidden 3D shapes using diffuse reflections

We analyze multi-bounce propagation of light in an unknown hidden volume and demonstrate that the reflected light contains sufficient information to recover the 3D structure of the hidden scene. We formulate the forward and inverse theory of secondary and tertiary scattering reflection using ideas from energy front propagation and tomography. We show that using careful choice of approximations, such as Fresnel approximation, greatly simplifies this problem and the inversion can be achieved via a backpropagation process. We provide a theoretical analysis of the invertibility, uniqueness and choices of space-time-angle dimensions using synthetic examples. We show that a 2D streak camera can be used to discover and reconstruct hidden geometry. Using a 1D high speed time of flight camera, we show that our method can be used recover 3D shapes of objects "around the corner".

preprint2011arXiv

Progressive versus Random Projections for Compressive Capture of Images, Lightfields and Higher Dimensional Visual Signals

Computational photography involves sophisticated capture methods. A new trend is to capture projection of higher dimensional visual signals such as videos, multi-spectral data and lightfields on lower dimensional sensors. Carefully designed capture methods exploit the sparsity of the underlying signal in a transformed domain to reduce the number of measurements and use an appropriate reconstruction method. Traditional progressive methods may capture successively more detail using a sequence of simple projection basis, such as DCT or wavelets and employ straightforward backprojection for reconstruction. Randomized projection methods do not use any specific sequence and use L0 minimization for reconstruction. In this paper, we analyze the statistical properties of natural images, videos, multi-spectral data and light-fields and compare the effectiveness of progressive and random projections. We define effectiveness by plotting reconstruction SNR against compression factor. The key idea is a procedure to measure best-case effectiveness that is fast, independent of specific hardware and independent of the reconstruction procedure. We believe this is the first empirical study to compare different lossy capture strategies without the complication of hardware or reconstruction ambiguity. The scope is limited to linear non-adaptive sensing. The results show that random projections produce significant advantages over other projections only for higher dimensional signals, and suggest more research to nascent adaptive and non-linear projection methods.

Ashok Veeraraghavan

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

GeoViSTA: Geospatial Vision-Tabular Transformer for Multimodal Environment Representation

WIRE: Wavelet Implicit Neural Representations

DeepTensor: Low-Rank Tensor Decomposition with Deep Network Priors

Distributed Generalized Wirtinger Flow for Interferometric Imaging on Networks

MINER: Multiscale Implicit Neural Representations

PANDORA: Polarization-Aided Neural Decomposition Of Radiance

PS$^2$F: Polarized Spiral Point Spread Function for Single-Shot 3D Sensing

SASSI -- Super-Pixelated Adaptive Spatio-Spectral Imaging

ASP Vision: Optically Computing the First Layer of Convolutional Neural Networks using Angle Sensitive Pixels

FlatCam: Thin, Bare-Sensor Cameras using Coded Aperture and Computation

TabletGaze: Unconstrained Appearance-based Gaze Estimation in Mobile Tablets

Depth Fields: Extending Light Field Techniques to Time-of-Flight Imaging

DistancePPG: Robust non-contact vital signs monitoring using a camera

FPA-CS: Focal Plane Array-based Compressive Imaging in Short-wave Infrared

Spatial Phase-Sweep: Increasing temporal resolution of transient imaging using a light source array

Toward Long Distance, Sub-diffraction Imaging Using Coherent Camera Arrays

A Framework for the Analysis of Computational Imaging Systems with Practical Applications

Fast Sublinear Sparse Representation using Shallow Tree Matching Pursuit

Reconstruction of hidden 3D shapes using diffuse reflections

Progressive versus Random Projections for Compressive Capture of Images, Lightfields and Higher Dimensional Visual Signals