Source author record

Stephen Gould

Stephen Gould appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Computation and Language math.CO Robotics Artificial Intelligence Human-Computer Interaction math.OC physics.optics

Catalog footprint

What is connected

27works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Almost all optimally coloured complete graphs contain a rainbow Hamilton path

A subgraph $H$ of an edge-coloured graph is called rainbow if all of the edges of $H$ have different colours. In 1989, Andersen conjectured that every proper edge-colouring of $K_{n}$ admits a rainbow path of length $n-2$. We show that almost all optimal edge-colourings of $K_{n}$ admit both (i) a rainbow Hamilton path and (ii) a rainbow cycle using all of the colours. This result demonstrates that Andersen's Conjecture holds for almost all optimal edge-colourings of $K_{n}$ and answers a recent question of Ferber, Jain, and Sudakov. Our result also has applications to the existence of transversals in random symmetric Latin squares.

preprint2022arXiv

Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation

Most existing works in vision-and-language navigation (VLN) focus on either discrete or continuous environments, training agents that cannot generalize across the two. The fundamental difference between the two setups is that discrete navigation assumes prior knowledge of the connectivity graph of the environment, so that the agent can effectively transfer the problem of navigation with low-level controls to jumping from node to node with high-level actions by grounding to an image of a navigable direction. To bridge the discrete-to-continuous gap, we propose a predictor to generate a set of candidate waypoints during navigation, so that agents designed with high-level actions can be transferred to and trained in continuous environments. We refine the connectivity graph of Matterport3D to fit the continuous Habitat-Matterport3D, and train the waypoints predictor with the refined graphs to produce accessible waypoints at each time step. Moreover, we demonstrate that the predicted waypoints can be augmented during training to diversify the views and paths, and therefore enhance agent's generalization ability. Through extensive experiments we show that agents navigating in continuous environments with predicted waypoints perform significantly better than agents using low-level actions, which reduces the absolute discrete-to-continuous gap by 11.76% Success Weighted by Path Length (SPL) for the Cross-Modal Matching Agent and 18.24% SPL for the Recurrent VLN-BERT. Our agents, trained with a simple imitation learning objective, outperform previous methods by a large margin, achieving new state-of-the-art results on the testing environments of the R2R-CE and the RxR-CE datasets.

preprint2022arXiv

Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer

Recent developments in transformer models for visual data have led to significant improvements in recognition and detection tasks. In particular, using learnable queries in place of region proposals has given rise to a new class of one-stage detection models, spearheaded by the Detection Transformer (DETR). Variations on this one-stage approach have since dominated human-object interaction (HOI) detection. However, the success of such one-stage HOI detectors can largely be attributed to the representation power of transformers. We discovered that when equipped with the same transformer, their two-stage counterparts can be more performant and memory-efficient, while taking a fraction of the time to train. In this work, we propose the Unary-Pairwise Transformer, a two-stage detector that exploits unary and pairwise representations for HOIs. We observe that the unary and pairwise parts of our transformer network specialise, with the former preferentially increasing the scores of positive examples and the latter decreasing the scores of negative examples. We evaluate our method on the HICO-DET and V-COCO datasets, and significantly outperform state-of-the-art approaches. At inference time, our model with ResNet50 approaches real-time performance on a single GPU.

preprint2022arXiv

Exploiting Problem Structure in Deep Declarative Networks: Two Case Studies

Deep declarative networks and other recent related works have shown how to differentiate the solution map of a (continuous) parametrized optimization problem, opening up the possibility of embedding mathematical optimization problems into end-to-end learnable models. These differentiability results can lead to significant memory savings by providing an expression for computing the derivative without needing to unroll the steps of the forward-pass optimization procedure during the backward pass. However, the results typically require inverting a large Hessian matrix, which is computationally expensive when implemented naively. In this work we study two applications of deep declarative networks -- robust vector pooling and optimal transport -- and show how problem structure can be exploited to obtain very efficient backward pass computations in terms of both time and memory. Our ideas can be used as a guide for improving the computational performance of other novel deep declarative nodes.

preprint2022arXiv

Hamilton transversals in random Latin squares

Gyárfás and Sárközy conjectured that every $n\times n$ Latin square has a `cycle-free' partial transversal of size $n-2$. We confirm this conjecture in a strong sense for almost all Latin squares, by showing that as $n \rightarrow \infty$, all but a vanishing proportion of $n\times n$ Latin squares have a Hamilton transversal, i.e. a full transversal for which any proper subset is cycle-free. In fact, we prove a counting result that in almost all Latin squares, the number of Hamilton transversals is essentially that of Taranenko's upper bound on the number of full transversals. This result strengthens a result of Kwan (which in turn implies that almost all Latin squares also satisfy the famous Ryser-Brualdi-Stein conjecture).

preprint2022arXiv

Learning to Structure an Image with Few Colors and Beyond

Color and structure are the two pillars that combine to give an image its meaning. Interested in critical structures for neural network recognition, we isolate the influence of colors by limiting the color space to just a few bits, and find structures that enable network recognition under such constraints. To this end, we propose a color quantization network, ColorCNN, which learns to structure an image in limited color spaces by minimizing the classification loss. Building upon the architecture and insights of ColorCNN, we introduce ColorCNN+, which supports multiple color space size configurations, and addresses the previous issues of poor recognition accuracy and undesirable visual fidelity under large color spaces. Via a novel imitation learning approach, ColorCNN+ learns to cluster colors like traditional color quantization methods. This reduces overfitting and helps both visual fidelity and recognition accuracy under large color spaces. Experiments verify that ColorCNN+ achieves very competitive results under most circumstances, preserving both key structures for network recognition and visual fidelity with accurate colors. We further discuss differences between key structures and accurate colors, and their specific contributions to network recognition. For potential applications, we show that ColorCNNs can be used as image compression methods for network recognition.

preprint2022arXiv

Multi-View Correlation Consistency for Semi-Supervised Semantic Segmentation

Semi-supervised semantic segmentation needs rich and robust supervision on unlabeled data. Consistency learning enforces the same pixel to have similar features in different augmented views, which is a robust signal but neglects relationships with other pixels. In comparison, contrastive learning considers rich pairwise relationships, but it can be a conundrum to assign binary positive-negative supervision signals for pixel pairs. In this paper, we take the best of both worlds and propose multi-view correlation consistency (MVCC) learning: it considers rich pairwise relationships in self-correlation matrices and matches them across views to provide robust supervision. Together with this correlation consistency loss, we propose a view-coherent data augmentation strategy that guarantees pixel-pixel correspondence between different views. In a series of semi-supervised settings on two datasets, we report competitive accuracy compared with the state-of-the-art methods. Notably, on Cityscapes, we achieve 76.8% mIoU with 1/8 labeled data, just 0.6% shy from the fully supervised oracle.

preprint2022arXiv

On the Strong Correlation Between Model Invariance and Generalization

Generalization and invariance are two essential properties of any machine learning model. Generalization captures a model's ability to classify unseen data while invariance measures consistency of model predictions on transformations of the data. Existing research suggests a positive relationship: a model generalizing well should be invariant to certain visual factors. Building on this qualitative implication we make two contributions. First, we introduce effective invariance (EI), a simple and reasonable measure of model invariance which does not rely on image labels. Given predictions on a test image and its transformed version, EI measures how well the predictions agree and with what level of confidence. Second, using invariance scores computed by EI, we perform large-scale quantitative correlation studies between generalization and invariance, focusing on rotation and grayscale transformations. From a model-centric view, we observe generalization and invariance of different models exhibit a strong linear relationship, on both in-distribution and out-of-distribution datasets. From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets. Apart from these major findings, other minor but interesting insights are also discussed.

preprint2021arXiv

Semantics for Robotic Mapping, Perception and Interaction: A Survey

For robots to navigate and interact more richly with the world around them, they will likely require a deeper understanding of the world in which they operate. In robotics and related research fields, the study of understanding is often referred to as semantics, which dictates what does the world "mean" to a robot, and is strongly tied to the question of how to represent that meaning. With humans and robots increasingly operating in the same world, the prospects of human-robot interaction also bring semantics and ontology of natural language into the picture. Driven by need, as well as by enablers like increasing availability of training data and computational resources, semantics is a rapidly growing research area in robotics. The field has received significant attention in the research literature to date, but most reviews and surveys have focused on particular aspects of the topic: the technical research issues regarding its use in specific robotic topics like mapping or segmentation, or its relevance to one particular application domain like autonomous driving. A new treatment is therefore required, and is also timely because so much relevant research has occurred since many of the key surveys were published. This survey therefore provides an overarching snapshot of where semantics in robotics stands today. We establish a taxonomy for semantics research in or relevant to robotics, split into four broad categories of activity, in which semantics are extracted, used, or both. Within these broad categories we survey dozens of major topics including fundamentals from the computer vision field and key robotics research areas utilizing semantics, including mapping, navigation and interaction with the world. The survey also covers key practical considerations, including enablers like increased data availability and improved computational hardware, and major application areas where...

preprint2020arXiv

A Multi-modal Approach to Fine-grained Opinion Mining on Video Reviews

Despite the recent advances in opinion mining for written reviews, few works have tackled the problem on other sources of reviews. In light of this issue, we propose a multi-modal approach for mining fine-grained opinions from video reviews that is able to determine the aspects of the item under review that are being discussed and the sentiment orientation towards them. Our approach works at the sentence level without the need for time annotations and uses features derived from the audio, video and language transcriptions of its contents. We evaluate our approach on two datasets and show that leveraging the video and audio modalities consistently provides increased performance over text-only baselines, providing evidence these extra modalities are key in better understanding video reviews.

preprint2020arXiv

A Signal Propagation Perspective for Pruning Neural Networks at Initialization

Network pruning is a promising avenue for compressing deep neural networks. A typical approach to pruning starts by training a model and then removing redundant parameters while minimizing the impact on what is learned. Alternatively, a recent approach shows that pruning can be done at initialization prior to training, based on a saliency criterion called connection sensitivity. However, it remains unclear exactly why pruning an untrained, randomly initialized neural network is effective. In this work, by noting connection sensitivity as a form of gradient, we formally characterize initialization conditions to ensure reliable connection sensitivity measurements, which in turn yields effective pruning results. Moreover, we analyze the signal propagation properties of the resulting pruned networks and introduce a simple, data-free method to improve their trainability. Our modifications to the existing pruning at initialization method lead to improved results on all tested network models for image classification tasks. Furthermore, we empirically study the effect of supervision for pruning and demonstrate that our signal propagation perspective, combined with unsupervised pruning, can be useful in various scenarios where pruning is applied to non-standard arbitrarily-designed architectures.

preprint2020arXiv

ArTIST: Autoregressive Trajectory Inpainting and Scoring for Tracking

One of the core components in online multiple object tracking (MOT) frameworks is associating new detections with existing tracklets, typically done via a scoring function. Despite the great advances in MOT, designing a reliable scoring function remains a challenge. In this paper, we introduce a probabilistic autoregressive generative model to score tracklet proposals by directly measuring the likelihood that a tracklet represents natural motion. One key property of our model is its ability to generate multiple likely futures of a tracklet given partial observations. This allows us to not only score tracklets but also effectively maintain existing tracklets when the detector fails to detect some objects even for a long time, e.g., due to occlusion, by sampling trajectories so as to inpaint the gaps caused by misdetection. Our experiments demonstrate the effectiveness of our approach to scoring and inpainting tracklets on several MOT benchmark datasets. We additionally show the generality of our generative model by using it to produce future representations in the challenging task of human motion prediction.

preprint2020arXiv

Blended Convolution and Synthesis for Efficient Discrimination of 3D Shapes

Existing networks directly learn feature representations on 3D point clouds for shape analysis. We argue that 3D point clouds are highly redundant and hold irregular (permutation-invariant) structure, which makes it difficult to achieve inter-class discrimination efficiently. In this paper, we propose a two-faceted solution to this problem that is seamlessly integrated in a single `Blended Convolution and Synthesis' layer. This fully differentiable layer performs two critical tasks in succession. In the first step, it projects the input 3D point clouds into a latent 3D space to synthesize a highly compact and more inter-class discriminative point cloud representation. Since, 3D point clouds do not follow a Euclidean topology, standard 2/3D Convolutional Neural Networks offer limited representation capability. Therefore, in the second step, it uses a novel 3D convolution operator functioning inside the unit ball ($\mathbb{B}^3$) to extract useful volumetric features. We extensively derive formulae to achieve both translation and rotation of our novel convolution kernels. Finally, using the proposed techniques we present an extremely light-weight, end-to-end architecture that achieves compelling results on 3D shape recognition and retrieval.

preprint2020arXiv

DeepFit: 3D Surface Fitting via Neural Network Weighted Least Squares

We propose a surface fitting method for unstructured 3D point clouds. This method, called DeepFit, incorporates a neural network to learn point-wise weights for weighted least squares polynomial surface fitting. The learned weights act as a soft selection for the neighborhood of surface points thus avoiding the scale selection required of previous methods. To train the network we propose a novel surface consistency loss that improves point weight estimation. The method enables extracting normal vectors and other geometrical properties, such as principal curvatures, the latter were not presented as ground truth during training. We achieve state-of-the-art results on a benchmark normal and curvature estimation dataset, demonstrate robustness to noise, outliers and density variations, and show its application on noise removal.

preprint2020arXiv

Enhanced Light-Matter Interactions in Dielectric Nanostructures via Machine Learning Approach

A key concept underlying the specific functionalities of metasurfaces, i.e. arrays of subwavelength nanoparticles, is the use of constituent components to shape the wavefront of the light, on-demand. Metasurfaces are versatile and novel platforms to manipulate the scattering, colour, phase or the intensity of the light. Currently, one of the typical approaches for designing a metasurface is to optimize one or two variables, among a vast number of fixed parameters, such as various materials' properties and coupling effects, as well as the geometrical parameters. Ideally, it would require a multi-dimensional space optimization through direct numerical simulations. Recently, an alternative approach became quite popular allowing to reduce the computational cost significantly based on a deep-learning-assisted method. In this paper, we utilize a deep-learning approach for obtaining high-quality factor (high-Q) resonances with desired characteristics, such as linewidth, amplitude and spectral position. We exploit such high-Q resonances for the enhanced light-matter interaction in nonlinear optical metasurfaces and optomechanical vibrations, simultaneously. We demonstrate that optimized metasurfaces lead up to 400+ folds enhancement of the third harmonic generation (THG); at the same time, they also contribute to 100+ folds enhancement in optomechanical vibrations. This approach can be further used to realize structures with unconventional scattering responses.

preprint2020arXiv

Inferring Temporal Compositions of Actions Using Probabilistic Automata

This paper presents a framework to recognize temporal compositions of atomic actions in videos. Specifically, we propose to express temporal compositions of actions as semantic regular expressions and derive an inference framework using probabilistic automata to recognize complex actions as satisfying these expressions on the input video features. Our approach is different from existing works that either predict long-range complex activities as unordered sets of atomic actions, or retrieve videos using natural language sentences. Instead, the proposed approach allows recognizing complex fine-grained activities using only pretrained action classifiers, without requiring any additional data, annotations or neural network training. To evaluate the potential of our approach, we provide experiments on synthetic datasets and challenging real action recognition datasets, such as MultiTHUMOS and Charades. We conclude that the proposed approach can extend state-of-the-art primitive action classifiers to vastly more complex activities without large performance degradation.

preprint2020arXiv

Joint Unsupervised Learning of Optical Flow and Egomotion with Bi-Level Optimization

We address the problem of joint optical flow and camera motion estimation in rigid scenes by incorporating geometric constraints into an unsupervised deep learning framework. Unlike existing approaches which rely on brightness constancy and local smoothness for optical flow estimation, we exploit the global relationship between optical flow and camera motion using epipolar geometry. In particular, we formulate the prediction of optical flow and camera motion as a bi-level optimization problem, consisting of an upper-level problem to estimate the flow that conforms to the predicted camera motion, and a lower-level problem to estimate the camera motion given the predicted optical flow. We use implicit differentiation to enable back-propagation through the lower-level geometric optimization layer independent of its implementation, allowing end-to-end training of the network. With globally-enforced geometric constraints, we are able to improve the quality of the estimated optical flow in challenging scenarios and obtain better camera motion estimates compared to other unsupervised learning methods.

preprint2020arXiv

Learning Variations in Human Motion via Mix-and-Match Perturbation

Human motion prediction is a stochastic process: Given an observed sequence of poses, multiple future motions are plausible. Existing approaches to modeling this stochasticity typically combine a random noise vector with information about the previous poses. This combination, however, is done in a deterministic manner, which gives the network the flexibility to learn to ignore the random noise. In this paper, we introduce an approach to stochastically combine the root of variations with previous pose information, which forces the model to take the noise into account. We exploit this idea for motion prediction by incorporating it into a recurrent encoder-decoder network with a conditional variational autoencoder block that learns to exploit the perturbations. Our experiments demonstrate that our model yields high-quality pose sequences that are much more diverse than those from state-of-the-art stochastic motion prediction techniques.

preprint2020arXiv

Proposal-free Temporal Moment Localization of a Natural-Language Query in Video using Guided Attention

This paper studies the problem of temporal moment localization in a long untrimmed video using natural language as the query. Given an untrimmed video and a sentence as the query, the goal is to determine the starting, and the ending, of the relevant visual moment in the video, that corresponds to the query sentence. While previous works have tackled this task by a propose-and-rank approach, we introduce a more efficient, end-to-end trainable, and {\em proposal-free approach} that relies on three key components: a dynamic filter to transfer language information to the visual domain, a new loss function to guide our model to attend the most relevant parts of the video, and soft labels to model annotation uncertainty. We evaluate our method on two benchmark datasets, Charades-STA and ActivityNet-Captions. Experimental results show that our approach outperforms state-of-the-art methods on both datasets.

preprint2020arXiv

Solving the Blind Perspective-n-Point Problem End-To-End With Robust Differentiable Geometric Optimization

Blind Perspective-n-Point (PnP) is the problem of estimating the position and orientation of a camera relative to a scene, given 2D image points and 3D scene points, without prior knowledge of the 2D-3D correspondences. Solving for pose and correspondences simultaneously is extremely challenging since the search space is very large. Fortunately it is a coupled problem: the pose can be found easily given the correspondences and vice versa. Existing approaches assume that noisy correspondences are provided, that a good pose prior is available, or that the problem size is small. We instead propose the first fully end-to-end trainable network for solving the blind PnP problem efficiently and globally, that is, without the need for pose priors. We make use of recent results in differentiating optimization problems to incorporate geometric model fitting into an end-to-end learning framework, including Sinkhorn, RANSAC and PnP algorithms. Our proposed approach significantly outperforms other methods on synthetic and real data.

preprint2020arXiv

Spectral-GANs for High-Resolution 3D Point-cloud Generation

Point-clouds are a popular choice for vision and graphics tasks due to their accurate shape description and direct acquisition from range-scanners. This demands the ability to synthesize and reconstruct high-quality point-clouds. Current deep generative models for 3D data generally work on simplified representations (e.g., voxelized objects) and cannot deal with the inherent redundancy and irregularity in point-clouds. A few recent efforts on 3D point-cloud generation offer limited resolution and their complexity grows with the increase in output resolution. In this paper, we develop a principled approach to synthesize 3D point-clouds using a spectral-domain Generative Adversarial Network (GAN). Our spectral representation is highly structured and allows us to disentangle various frequency bands such that the learning task is simplified for a GAN model. As compared to spatial-domain generative approaches, our formulation allows us to generate arbitrary number of points high-resolution point-clouds with minimal computational overhead. Furthermore, we propose a fully differentiable block to transform from {the} spectral to the spatial domain and back, thereby allowing us to integrate knowledge from well-established spatial models. We demonstrate that Spectral-GAN performs well for point-cloud generation task. Additionally, it can learn {a} highly discriminative representation in an unsupervised fashion and can be used to accurately reconstruct 3D objects.

preprint2016arXiv

Built-in Foreground/Background Prior for Weakly-Supervised Semantic Segmentation

Pixel-level annotations are expensive and time consuming to obtain. Hence, weak supervision using only image tags could have a significant impact in semantic segmentation. Recently, CNN-based methods have proposed to fine-tune pre-trained networks using image tags. Without additional information, this leads to poor localization accuracy. This problem, however, was alleviated by making use of objectness priors to generate foreground/background masks. Unfortunately these priors either require training pixel-level annotations/bounding boxes, or still yield inaccurate object boundaries. Here, we propose a novel method to extract markedly more accurate masks from the pre-trained network itself, forgoing external objectness modules. This is accomplished using the activations of the higher-level convolutional layers, smoothed by a dense CRF. We demonstrate that our method, based on these masks and a weakly-supervised loss, outperforms the state-of-the-art tag-based weakly-supervised semantic segmentation techniques. Furthermore, we introduce a new form of inexpensive weak supervision yielding an additional accuracy boost.

preprint2016arXiv

On Differentiating Parameterized Argmin and Argmax Problems with Application to Bi-level Optimization

Some recent works in machine learning and computer vision involve the solution of a bi-level optimization problem. Here the solution of a parameterized lower-level problem binds variables that appear in the objective of an upper-level problem. The lower-level problem typically appears as an argmin or argmax optimization problem. Many techniques have been proposed to solve bi-level optimization problems, including gradient descent, which is popular with current end-to-end learning approaches. In this technical report we collect some results on differentiating argmin and argmax optimization problems with and without constraints and provide some insightful motivating examples.

preprint2016arXiv

SPICE: Semantic Propositional Image Caption Evaluation

There is considerable interest in the task of automatically generating image captions. However, evaluation is challenging. Existing automatic evaluation metrics are primarily sensitive to n-gram overlap, which is neither necessary nor sufficient for the task of simulating human judgment. We hypothesize that semantic propositional content is an important component of human caption evaluation, and propose a new automated caption evaluation metric defined over scene graphs coined SPICE. Extensive evaluations across a range of models and datasets indicate that SPICE captures human judgments over model-generated captions better than other automatic metrics (e.g., system-level correlation of 0.88 with human judgments on the MS COCO dataset, versus 0.43 for CIDEr and 0.53 for METEOR). Furthermore, SPICE can answer questions such as `which caption-generator best understands colors?' and `can caption-generators count?'

preprint2015arXiv

Deep CNN Ensemble with Data Augmentation for Object Detection

We report on the methods used in our recent DeepEnsembleCoco submission to the PASCAL VOC 2012 challenge, which achieves state-of-the-art performance on the object detection task. Our method is a variant of the R-CNN model proposed Girshick:CVPR14 with two key improvements to training and evaluation. First, our method constructs an ensemble of deep CNN models with different architectures that are complementary to each other. Second, we augment the PASCAL VOC training set with images from the Microsoft COCO dataset to significantly enlarge the amount training data. Importantly, we select a subset of the Microsoft COCO images to be consistent with the PASCAL VOC task. Results on the PASCAL VOC evaluation server show that our proposed method outperform all previous methods on the PASCAL VOC 2012 detection task at time of submission.

preprint2015arXiv

Multi-Target Tracking with Time-Varying Clutter Rate and Detection Profile: Application to Time-lapse Cell Microscopy Sequences

Quantitative analysis of the dynamics of tiny cellular and sub-cellular structures, known as particles, in time-lapse cell microscopy sequences requires the development of a reliable multi-target tracking method capable of tracking numerous similar targets in the presence of high levels of noise, high target density, complex motion patterns and intricate interactions. In this paper, we propose a framework for tracking these structures based on the random finite set Bayesian filtering framework. We focus on challenging biological applications where image characteristics such as noise and background intensity change during the acquisition process. Under these conditions, detection methods usually fail to detect all particles and are often followed by missed detections and many spurious measurements with unknown and time-varying rates. To deal with this, we propose a bootstrap filter composed of an estimator and a tracker. The estimator adaptively estimates the required meta parameters for the tracker such as clutter rate and the detection probability of the targets, while the tracker estimates the state of the targets. Our results show that the proposed approach can outperform state-of-the-art particle trackers on both synthetic and real data in this regime.

preprint2012arXiv

Projected Subgradient Methods for Learning Sparse Gaussians

Gaussian Markov random fields (GMRFs) are useful in a broad range of applications. In this paper we tackle the problem of learning a sparse GMRF in a high-dimensional space. Our approach uses the l1-norm as a regularization on the inverse covariance matrix. We utilize a novel projected gradient method, which is faster than previous methods in practice and equal to the best performing of these in asymptotic complexity. We also extend the l1-regularized objective to the problem of sparsifying entire blocks within the inverse covariance matrix. Our methods generalize fairly easily to this case, while other methods do not. We demonstrate that our extensions give better generalization performance on two real domains--biological network analysis and a 2D-shape modeling image task.

Stephen Gould

What is connected

Connect this record

See the researcher in context

Building this map preview

27 published item(s)

Almost all optimally coloured complete graphs contain a rainbow Hamilton path

Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation

Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer

Exploiting Problem Structure in Deep Declarative Networks: Two Case Studies

Hamilton transversals in random Latin squares

Learning to Structure an Image with Few Colors and Beyond

Multi-View Correlation Consistency for Semi-Supervised Semantic Segmentation

On the Strong Correlation Between Model Invariance and Generalization

Semantics for Robotic Mapping, Perception and Interaction: A Survey

A Multi-modal Approach to Fine-grained Opinion Mining on Video Reviews

A Signal Propagation Perspective for Pruning Neural Networks at Initialization

ArTIST: Autoregressive Trajectory Inpainting and Scoring for Tracking

Blended Convolution and Synthesis for Efficient Discrimination of 3D Shapes

DeepFit: 3D Surface Fitting via Neural Network Weighted Least Squares

Enhanced Light-Matter Interactions in Dielectric Nanostructures via Machine Learning Approach

Inferring Temporal Compositions of Actions Using Probabilistic Automata

Joint Unsupervised Learning of Optical Flow and Egomotion with Bi-Level Optimization

Learning Variations in Human Motion via Mix-and-Match Perturbation

Proposal-free Temporal Moment Localization of a Natural-Language Query in Video using Guided Attention

Solving the Blind Perspective-n-Point Problem End-To-End With Robust Differentiable Geometric Optimization

Spectral-GANs for High-Resolution 3D Point-cloud Generation

Built-in Foreground/Background Prior for Weakly-Supervised Semantic Segmentation

On Differentiating Parameterized Argmin and Argmax Problems with Application to Bi-level Optimization

SPICE: Semantic Propositional Image Caption Evaluation

Deep CNN Ensemble with Data Augmentation for Object Detection

Multi-Target Tracking with Time-Varying Clutter Rate and Detection Profile: Application to Time-lapse Cell Microscopy Sequences

Projected Subgradient Methods for Learning Sparse Gaussians