Source author record

Rabab Ward

Rabab Ward appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Computation and Language eess.IV eess.SP Information Retrieval Neural and Evolutionary Computing Robotics

Catalog footprint

What is connected

5works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

AdaptPose: Cross-Dataset Adaptation for 3D Human Pose Estimation by Learnable Motion Generation

This paper addresses the problem of cross-dataset generalization of 3D human pose estimation models. Testing a pre-trained 3D pose estimator on a new dataset results in a major performance drop. Previous methods have mainly addressed this problem by improving the diversity of the training data. We argue that diversity alone is not sufficient and that the characteristics of the training data need to be adapted to those of the new dataset such as camera viewpoint, position, human actions, and body size. To this end, we propose AdaptPose, an end-to-end framework that generates synthetic 3D human motions from a source dataset and uses them to fine-tune a 3D pose estimator. AdaptPose follows an adversarial training scheme. From a source 3D pose the generator generates a sequence of 3D poses and a camera orientation that is used to project the generated poses to a novel view. Without any 3D labels or camera information AdaptPose successfully learns to create synthetic 3D poses from the target dataset while only being trained on 2D poses. In experiments on the Human3.6M, MPI-INF-3DHP, 3DPW, and Ski-Pose datasets our method outperforms previous work in cross-dataset evaluations by 14% and previous semi-supervised learning methods that use partial 3D annotations by 16%.

preprint2022arXiv

Multi-modal Streaming 3D Object Detection

Modern autonomous vehicles rely heavily on mechanical LiDARs for perception. Current perception methods generally require 360° point clouds, collected sequentially as the LiDAR scans the azimuth and acquires consecutive wedge-shaped slices. The acquisition latency of a full scan (~ 100ms) may lead to outdated perception which is detrimental to safe operation. Recent streaming perception works proposed directly processing LiDAR slices and compensating for the narrow field of view (FOV) of a slice by reusing features from preceding slices. These works, however, are all based on a single modality and require past information which may be outdated. Meanwhile, images from high-frequency cameras can support streaming models as they provide a larger FoV compared to a LiDAR slice. However, this difference in FoV complicates sensor fusion. To address this research gap, we propose an innovative camera-LiDAR streaming 3D object detection framework that uses camera images instead of past LiDAR slices to provide an up-to-date, dense, and wide context for streaming perception. The proposed method outperforms prior streaming models on the challenging NuScenes benchmark. It also outperforms powerful full-scan detectors while being much faster. Our method is shown to be robust to missing camera images, narrow LiDAR slices, and small camera-LiDAR miscalibration.

preprint2020arXiv

Epileptic Seizure Prediction: A Semi-Dilated Convolutional Neural Network Architecture

Accurate prediction of epileptic seizures has remained elusive, despite the many advances in machine learning and time-series classification. In this work, we develop a convolutional network module that exploits Electroencephalogram (EEG) scalograms to distinguish between the pre-seizure and normal brain activities. Since these scalograms have rectangular image shapes with many more temporal bins than spectral bins, the presented module uses "semi-dilated convolutions" to create a proportional non-square receptive field. The proposed semi-dilated convolutions support exponential expansion of the receptive field over the long dimension (image width, i.e. time) while maintaining high resolution over the short dimension (image height, i.e., frequency). The proposed architecture comprises a set of co-operative semi-dilated convolutional blocks, each block has a stack of parallel semi-dilated convolutional modules with different dilation rates. Results show that our proposed solution outperforms the state-of-the-art methods, achieving seizure prediction sensitivity scores of 88.45% and 89.52% for the American Epilepsy Society and Melbourne University EEG datasets, respectively.

preprint2016arXiv

Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval

This paper develops a model that addresses sentence embedding, a hot topic in current natural language processing research, using recurrent neural networks with Long Short-Term Memory (LSTM) cells. Due to its ability to capture long term memory, the LSTM-RNN accumulates increasingly richer information as it goes through the sentence, and when it reaches the last word, the hidden layer of the network provides a semantic representation of the whole sentence. In this paper, the LSTM-RNN is trained in a weakly supervised manner on user click-through data logged by a commercial web search engine. Visualization and analysis are performed to understand how the embedding process works. The model is found to automatically attenuate the unimportant words and detects the salient keywords in the sentence. Furthermore, these detected keywords are found to automatically activate different cells of the LSTM-RNN, where words belonging to a similar topic activate the same cell. As a semantic representation of the sentence, the embedding vector can be used in many different applications. These automatic keyword detection and topic allocation abilities enabled by the LSTM-RNN allow the network to perform document retrieval, a difficult language processing task, where the similarity between the query and documents can be measured by the distance between their corresponding sentence embedding vectors computed by the LSTM-RNN. On a web search task, the LSTM-RNN embedding is shown to significantly outperform several existing state of the art methods. We emphasize that the proposed model generates sentence embedding vectors that are specially useful for web document retrieval tasks. A comparison with a well known general sentence embedding method, the Paragraph Vector, is performed. The results show that the proposed method in this paper significantly outperforms it for web document retrieval task.

preprint2016arXiv

Distributed Compressive Sensing: A Deep Learning Approach

Various studies that address the compressed sensing problem with Multiple Measurement Vectors (MMVs) have been recently carried. These studies assume the vectors of the different channels to be jointly sparse. In this paper, we relax this condition. Instead we assume that these sparse vectors depend on each other but that this dependency is unknown. We capture this dependency by computing the conditional probability of each entry in each vector being non-zero, given the "residuals" of all previous vectors. To estimate these probabilities, we propose the use of the Long Short-Term Memory (LSTM)[1], a data driven model for sequence modelling that is deep in time. To calculate the model parameters, we minimize a cross entropy cost function. To reconstruct the sparse vectors at the decoder, we propose a greedy solver that uses the above model to estimate the conditional probabilities. By performing extensive experiments on two real world datasets, we show that the proposed method significantly outperforms the general MMV solver (the Simultaneous Orthogonal Matching Pursuit (SOMP)) and a number of the model-based Bayesian methods. The proposed method does not add any complexity to the general compressive sensing encoder. The trained model is used just at the decoder. As the proposed method is a data driven method, it is only applicable when training data is available. In many applications however, training data is indeed available, e.g. in recorded images and videos.

Rabab Ward

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

AdaptPose: Cross-Dataset Adaptation for 3D Human Pose Estimation by Learnable Motion Generation

Multi-modal Streaming 3D Object Detection

Epileptic Seizure Prediction: A Semi-Dilated Convolutional Neural Network Architecture

Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval

Distributed Compressive Sensing: A Deep Learning Approach