Source author record

Huadong Ma

Huadong Ma appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Computer Science and Game Theory Artificial Intelligence Distributed, Parallel, and Cluster Computing Machine Learning Networking and Internet Architecture

Catalog footprint

What is connected

9works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Multi-Stage Contrastive Regression for Action Quality Assessment

In recent years, there has been growing interest in the video-based action quality assessment (AQA). Most existing methods typically solve AQA problem by considering the entire video yet overlooking the inherent stage-level characteristics of actions. To address this issue, we design a novel Multi-stage Contrastive Regression (MCoRe) framework for the AQA task. This approach allows us to efficiently extract spatial-temporal information, while simultaneously reducing computational costs by segmenting the input video into multiple stages or procedures. Inspired by the graph contrastive learning, we propose a new stage-wise contrastive learning loss function to enhance performance. As a result, MCoRe demonstrates the state-of-the-art result so far on the widely-adopted fine-grained AQA dataset.

preprint2024arXiv

Uncovering the human motion pattern: Pattern Memory-based Diffusion Model for Trajectory Prediction

Human trajectory forecasting is a critical challenge in fields such as robotics and autonomous driving. Due to the inherent uncertainty of human actions and intentions in real-world scenarios, various unexpected occurrences may arise. To uncover latent motion patterns in human behavior, we introduce a novel memory-based method, named Motion Pattern Priors Memory Network. Our method involves constructing a memory bank derived from clustered prior knowledge of motion patterns observed in the training set trajectories. We introduce an addressing mechanism to retrieve the matched pattern and the potential target distributions for each prediction from the memory bank, which enables the identification and retrieval of natural motion patterns exhibited by agents, subsequently using the target priors memory token to guide the diffusion model to generate predictions. Extensive experiments validate the effectiveness of our approach, achieving state-of-the-art trajectory prediction accuracy. The code will be made publicly available.

preprint2023arXiv

Coarse-to-Fine Video Denoising with Dual-Stage Spatial-Channel Transformer

Video denoising aims to recover high-quality frames from the noisy video. While most existing approaches adopt convolutional neural networks~(CNNs) to separate the noise from the original visual content, however, CNNs focus on local information and ignore the interactions between long-range regions in the frame. Furthermore, most related works directly take the output after basic spatio-temporal denoising as the final result, leading to neglect the fine-grained denoising process. In this paper, we propose a Dual-stage Spatial-Channel Transformer for coarse-to-fine video denoising, which inherits the advantages of both Transformer and CNNs. Specifically, DSCT is proposed based on a progressive dual-stage architecture, namely a coarse-level and a fine-level stage to extract dynamic features and static features, respectively. At both stages, a Spatial-Channel Encoding Module is designed to model the long-range contextual dependencies at both spatial and channel levels. Meanwhile, we design a Multi-Scale Residual Structure to preserve multiple aspects of information at different stages, which contains a Temporal Features Aggregation Module to summarize the dynamic representation. Extensive experiments on four publicly available datasets demonstrate our proposed method achieves significant improvements compared to the state-of-the-art methods.

preprint2020arXiv

A Real-time Action Representation with Temporal Encoding and Deep Compression

Deep neural networks have achieved remarkable success for video-based action recognition. However, most of existing approaches cannot be deployed in practice due to the high computational cost. To address this challenge, we propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation. T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed. Specifically, we propose a residual 3D Convolutional Neural Network (CNN) to capture complementary information on the appearance of a single frame and the motion between consecutive frames. Based on this CNN, we develop a new temporal encoding method to explore the temporal dynamics of the whole video. Furthermore, we integrate deep compression techniques with T-C3D to further accelerate the deployment of models via reducing the size of the model. By these means, heavy calculations can be avoided when doing the inference, which enables the method to deal with videos beyond real-time speed while keeping promising performance. Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model. We validate our approach by studying its action representation performance on four different benchmarks over three different tasks. Extensive experiments demonstrate comparable recognition performance to the state-of-the-art methods. The source code and the pre-trained models are publicly available at https://github.com/tc3d.

preprint2020arXiv

Language Guided Networks for Cross-modal Moment Retrieval

We address the challenging task of cross-modal moment retrieval, which aims to localize a temporal segment from an untrimmed video described by a natural language query. It poses great challenges over the proper semantic alignment between vision and linguistic domains. Existing methods independently extract the features of videos and sentences and purely utilize the sentence embedding in the multi-modal fusion stage, which do not make full use of the potential of language. In this paper, we present Language Guided Networks (LGN), a new framework that leverages the sentence embedding to guide the whole process of moment retrieval. In the first feature extraction stage, we propose to jointly learn visual and language features to capture the powerful visual information which can cover the complex semantics in the sentence query. Specifically, the early modulation unit is designed to modulate the visual feature extractor's feature maps by a linguistic embedding. Then we adopt a multi-modal fusion module in the second fusion stage. Finally, to get a precise localizer, the sentence information is utilized to guide the process of predicting temporal positions. Specifically, the late guidance module is developed to linearly transform the output of localization networks via the channel attention mechanism. The experimental results on two popular datasets demonstrate the superior performance of our proposed method on moment retrieval (improving by 5.8\% in terms of Rank1@IoU0.5 on Charades-STA and 5.2\% on TACoS). The source code for the complete system will be publicly available.

preprint2020arXiv

MemNet: Memory-Efficiency Guided Neural Architecture Search with Augment-Trim learning

Recent studies on automatic neural architectures search have demonstrated significant performance, competitive to or even better than hand-crafted neural architectures. However, most of the existing network architecture tend to use residual, parallel structures and concatenation block between shallow and deep features to construct a large network. This requires large amounts of memory for storing both weights and feature maps. This is challenging for mobile and embedded devices since they may not have enough memory to perform inference with the designed large network model. To close this gap, we propose MemNet, an augment-trim learning-based neural network search framework that optimizes not only performance but also memory requirement. Specifically, it employs memory consumption based ranking score which forces an upper bound on memory consumption for navigating the search process. Experiment results show that, as compared to the state-of-the-art efficient designing methods, MemNet can find an architecture which can achieve competitive accuracy and save an average of 24.17% on the total memory needed.

preprint2014arXiv

Frugal Online Incentive Mechanisms for Crowdsourcing Tasks Truthfully

Mobile Crowd Sensing (MCS) is a new paradigm which takes advantage of pervasive smartphones to efficiently collect data, enabling numerous novel applications. To achieve good service quality for a MCS application, incentive mechanisms are necessary to attract more user participation. Most of existing mechanisms apply only for the offline scenario where all users' information are known a priori. On the contrary, we focus on a more realistic scenario where users arrive one by one online in a random order. Based on the online auction model, we investigate the problem that users submit their private profiles to the crowdsourcer when they arrive, and the crowdsourcer aims at selecting a subset of users before a specified deadline for minimizing the total payment while a specific number of tasks can be completed.We design three online mechanisms, Homo-OMZ, Hetero-OMZ and Hetero-OMG, all of which can satisfy the computational efficiency, individual rationality, cost-truthfulness, and consumer sovereignty. The Homo-OMZ mechanism is applicable to the homogeneous user model and can satisfy the social efficiency but not constant frugality. The Hetero-OMZ and Hetero-OMG mechanisms are applicable to both the homogeneous and heterogeneous user models, and can satisfy the constant frugality. Besides, the Hetero-OMG mechanism can also satisfy the time-truthfulness. Through extensive simulations, we evaluate the performance and validate the theoretical properties of our online mechanisms.

preprint2014arXiv

On Exploiting Hotspot and Entropy for Data Forwarding in Delay Tolerant Networks

Performance of data forwarding in Delay Tolerant Networks (DTNs) benefits considerably if one can make use of human mobility in terms of social structures. However, it is difficult and time-consuming to calculate the centrality and similarity of nodes by using solutions for traditional social networks, this is mainly because of the transient node contact and the intermittently connected environment. In this work, we are interested in the following question: Can we explore some other stable social attributes to quantify the centrality and similarity of nodes? Taking GPS traces of human walks from the real world, we find that there exist two known phenomena. One is public hotspot, the other is personal hotspot. Motivated by this observation, we present Hoten (hotspot and entropy), a novel routing metric to improve routing performance in DTNs. First, we use the relative entropy between the public hotspots and the personal hotspots to compute the centrality of nodes. Then we utilize the inverse symmetrized entropy of the personal hotspots between two nodes to compute the similarity between them. Third, we exploit the entropy of personal hotspots of a node to estimate its personality. Besides, we propose a method to ascertain the optimized size of hotspot. Finally, we compare our routing strategy with other state-of-the-art routing schemes through extensive trace-driven simulations, the results show that Hoten largely outperforms other solutions, especially in terms of combined overhead/packet delivery ratio and the average number of hops per message.

preprint2013arXiv

OMG: How Much Should I Pay Bob in Truthful Online Mobile Crowdsourced Sensing?

Mobile crowdsourced sensing (MCS) is a new paradigm which takes advantage of the pervasive smartphones to efficiently collect data, enabling numerous novel applications. To achieve good service quality for a MCS application, incentive mechanisms are necessary to attract more user participation. Most of existing mechanisms apply only for the offline scenario where all users' information are known a priori. On the contrary, we focus on a more real scenario where users arrive one by one online in a random order. We model the problem as an online auction in which the users submit their private types to the crowdsourcer over time, and the crowdsourcer aims to select a subset of users before a specified deadline for maximizing the total value of the services provided by selected users under a budget constraint. We design two online mechanisms, OMZ and OMG, satisfying the computational efficiency, individual rationality, budget feasibility, truthfulness, consumer sovereignty and constant competitiveness under the zero arrival-departure interval case and a more general case, respectively. Through extensive simulations, we evaluate the performance and validate the theoretical properties of our online mechanisms.

Huadong Ma

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Multi-Stage Contrastive Regression for Action Quality Assessment

Uncovering the human motion pattern: Pattern Memory-based Diffusion Model for Trajectory Prediction

Coarse-to-Fine Video Denoising with Dual-Stage Spatial-Channel Transformer

A Real-time Action Representation with Temporal Encoding and Deep Compression

Language Guided Networks for Cross-modal Moment Retrieval

MemNet: Memory-Efficiency Guided Neural Architecture Search with Augment-Trim learning

Frugal Online Incentive Mechanisms for Crowdsourcing Tasks Truthfully

On Exploiting Hotspot and Entropy for Data Forwarding in Delay Tolerant Networks

OMG: How Much Should I Pay Bob in Truthful Online Mobile Crowdsourced Sensing?