Source author record

Mengmeng Wang

Mengmeng Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Computation and Language Computer Science and Game Theory Machine Learning math-ph math.MP Multiagent Systems Networking and Internet Architecture quant-ph Robotics

Catalog footprint

What is connected

14works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

BEDA: Belief Estimation as Probabilistic Constraints for Performing Strategic Dialogue Acts

Strategic dialogue requires agents to execute distinct dialogue acts, for which belief estimation is essential. While prior work often estimates beliefs accurately, it lacks a principled mechanism to use those beliefs during generation. We bridge this gap by first formalizing two core acts Adversarial and Alignment, and by operationalizing them via probabilistic constraints on what an agent may generate. We instantiate this idea in BEDA, a framework that consists of the world set, the belief estimator for belief estimation, and the conditional generator that selects acts and realizes utterances consistent with the inferred beliefs. Across three settings, Conditional Keeper Burglar (CKBG, adversarial), Mutual Friends (MF, cooperative), and CaSiNo (negotiation), BEDA consistently outperforms strong baselines: on CKBG it improves success rate by at least 5.0 points across backbones and by 20.6 points with GPT-4.1-nano; on Mutual Friends it achieves an average improvement of 9.3 points; and on CaSiNo it achieves the optimal deal relative to all baselines. These results indicate that casting belief estimation as constraints provides a simple, general mechanism for reliable strategic dialogue.

preprint2023arXiv

BSNet: Lane Detection via Draw B-spline Curves Nearby

Curve-based methods are one of the classic lane detection methods. They learn the holistic representation of lane lines, which is intuitive and concise. However, their performance lags behind the recent state-of-the-art methods due to the limitation of their lane representation and optimization. In this paper, we revisit the curve-based lane detection methods from the perspectives of the lane representations' globality and locality. The globality of lane representation is the ability to complete invisible parts of lanes with visible parts. The locality of lane representation is the ability to modify lanes locally which can simplify parameter optimization. Specifically, we first propose to exploit the b-spline curve to fit lane lines since it meets the locality and globality. Second, we design a simple yet efficient network BSNet to ensure the acquisition of global and local features. Third, we propose a new curve distance to make the lane detection optimization objective more reasonable and alleviate ill-conditioned problems. The proposed methods achieve state-of-the-art performance on the Tusimple, CULane, and LLAMAS datasets, which dramatically improved the accuracy of curve-based methods in the lane detection task while running far beyond real-time (197FPS).

preprint2022arXiv

Dynamically Stable Poincaré Embeddings for Neural Manifolds

In a Riemannian manifold, the Ricci flow is a partial differential equation for evolving the metric to become more regular. We hope that topological structures from such metrics may be used to assist in the tasks of machine learning. However, this part of the work is still missing. In this paper, we propose Ricci flow assisted Eucl2Hyp2Eucl neural networks that bridge this gap between the Ricci flow and deep neural networks by mapping neural manifolds from the Euclidean space to the dynamically stable Poincaré ball and then back to the Euclidean space. As a result, we prove that, if initial metrics have an $L^2$-norm perturbation which deviates from the Hyperbolic metric on the Poincaré ball, the scaled Ricci-DeTurck flow of such metrics smoothly and exponentially converges to the Hyperbolic metric. Specifically, the role of the Ricci flow is to serve as naturally evolving to the stable Poincaré ball. For such dynamically stable neural manifolds under the Ricci flow, the convergence of neural networks embedded with such manifolds is not susceptible to perturbations. And we show that Ricci flow assisted Eucl2Hyp2Eucl neural networks outperform with their all Euclidean counterparts on image classification tasks.

preprint2022arXiv

E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context

Recently, the image-wise implicit neural representation of videos, NeRV, has gained popularity for its promising results and swift speed compared to regular pixel-wise implicit representations. However, the redundant parameters within the network structure can cause a large model size when scaling up for desirable performance. The key reason of this phenomenon is the coupled formulation of NeRV, which outputs the spatial and temporal information of video frames directly from the frame index input. In this paper, we propose E-NeRV, which dramatically expedites NeRV by decomposing the image-wise implicit neural representation into separate spatial and temporal context. Under the guidance of this new formulation, our model greatly reduces the redundant model parameters, while retaining the representation ability. We experimentally find that our method can improve the performance to a large extent with fewer parameters, resulting in a more than $8\times$ faster speed on convergence. Code is available at https://github.com/kyleleey/E-NeRV.

preprint2021arXiv

Structure-aware Person Image Generation with Pose Decomposition and Semantic Correlation

In this paper we tackle the problem of pose guided person image generation, which aims to transfer a person image from the source pose to a novel target pose while maintaining the source appearance. Given the inefficiency of standard CNNs in handling large spatial transformation, we propose a structure-aware flow based method for high-quality person image generation. Specifically, instead of learning the complex overall pose changes of human body, we decompose the human body into different semantic parts (e.g., head, torso, and legs) and apply different networks to predict the flow fields for these parts separately. Moreover, we carefully design the network modules to effectively capture the local and global semantic correlations of features within and among the human parts respectively. Extensive experimental results show that our method can generate high-quality results under large pose discrepancy and outperforms state-of-the-art methods in both qualitative and quantitative comparisons.

preprint2020arXiv

Collaborative Distillation in the Parameter and Spectrum Domains for Video Action Recognition

Recent years have witnessed the significant progress of action recognition task with deep networks. However, most of current video networks require large memory and computational resources, which hinders their applications in practice. Existing knowledge distillation methods are limited to the image-level spatial domain, ignoring the temporal and frequency information which provide structural knowledge and are important for video analysis. This paper explores how to train small and efficient networks for action recognition. Specifically, we propose two distillation strategies in the frequency domain, namely the feature spectrum and parameter distribution distillations respectively. Our insight is that appealing performance of action recognition requires \textit{explicitly} modeling the temporal frequency spectrum of video features. Therefore, we introduce a spectrum loss that enforces the student network to mimic the temporal frequency spectrum from the teacher network, instead of \textit{implicitly} distilling features as many previous works. Second, the parameter frequency distribution is further adopted to guide the student network to learn the appearance modeling process from the teacher. Besides, a collaborative learning strategy is presented to optimize the training process from a probabilistic view. Extensive experiments are conducted on several action recognition benchmarks, such as Kinetics, Something-Something, and Jester, which consistently verify effectiveness of our approach, and demonstrate that our method can achieve higher performance than state-of-the-art methods with the same backbone.

preprint2020arXiv

Extended Feature Pyramid Network for Small Object Detection

Small object detection remains an unsolved challenge because it is hard to extract information of small objects with only a few pixels. While scale-level corresponding detection in feature pyramid network alleviates this problem, we find feature coupling of various scales still impairs the performance of small objects. In this paper, we propose extended feature pyramid network (EFPN) with an extra high-resolution pyramid level specialized for small object detection. Specifically, we design a novel module, named feature texture transfer (FTT), which is used to super-resolve features and extract credible regional details simultaneously. Moreover, we design a foreground-background-balanced loss function to alleviate area imbalance of foreground and background. In our experiments, the proposed EFPN is efficient on both computation and memory, and yields state-of-the-art results on small traffic-sign dataset Tsinghua-Tencent 100K and small category of general object detection dataset MS COCO.

preprint2020arXiv

FReeNet: Multi-Identity Face Reenactment

This paper presents a novel multi-identity face reenactment framework, named FReeNet, to transfer facial expressions from an arbitrary source face to a target face with a shared model. The proposed FReeNet consists of two parts: Unified Landmark Converter (ULC) and Geometry-aware Generator (GAG). The ULC adopts an encode-decoder architecture to efficiently convert expression in a latent landmark space, which significantly narrows the gap of the face contour between source and target identities. The GAG leverages the converted landmark to reenact the photorealistic image with a reference image of the target person. Moreover, a new triplet perceptual loss is proposed to force the GAG module to learn appearance and geometry information simultaneously, which also enriches facial details of the reenacted images. Further experiments demonstrate the superiority of our approach for generating photorealistic and expression-alike faces, as well as the flexibility for transferring facial expressions between identities.

preprint2020arXiv

Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose

Recent works have shown how realistic talking face images can be obtained under the supervision of geometry guidance, e.g., facial landmark or boundary. To alleviate the demand for manual annotations, in this paper, we propose a novel self-supervised hybrid model (DAE-GAN) that learns how to reenact face naturally given large amounts of unlabeled videos. Our approach combines two deforming autoencoders with the latest advances in the conditional generation. On the one hand, we adopt the deforming autoencoder to disentangle identity and pose representations. A strong prior in talking face videos is that each frame can be encoded as two parts: one for video-specific identity and the other for various poses. Inspired by that, we utilize a multi-frame deforming autoencoder to learn a pose-invariant embedded face for each video. Meanwhile, a multi-scale deforming autoencoder is proposed to extract pose-related information for each frame. On the other hand, the conditional generator allows for enhancing fine details and overall reality. It leverages the disentangled features to generate photo-realistic and pose-alike face images. We evaluate our model on VoxCeleb1 and RaFD dataset. Experiment results demonstrate the superior quality of reenacted images and the flexibility of transferring facial movements between identities.

preprint2020arXiv

Semantic Graph Based Place Recognition for 3D Point Clouds

Due to the difficulty in generating the effective descriptors which are robust to occlusion and viewpoint changes, place recognition for 3D point cloud remains an open issue. Unlike most of the existing methods that focus on extracting local, global, and statistical features of raw point clouds, our method aims at the semantic level that can be superior in terms of robustness to environmental changes. Inspired by the perspective of humans, who recognize scenes through identifying semantic objects and capturing their relations, this paper presents a novel semantic graph based approach for place recognition. First, we propose a novel semantic graph representation for the point cloud scenes by reserving the semantic and topological information of the raw point cloud. Thus, place recognition is modeled as a graph matching problem. Then we design a fast and effective graph similarity network to compute the similarity. Exhaustive evaluations on the KITTI dataset show that our approach is robust to the occlusion as well as viewpoint changes and outperforms the state-of-the-art methods with a large margin. Our code is available at: \url{https://github.com/kxhit/SG_PR}.

preprint2020arXiv

The 'Letter' Distribution in the Chinese Language

Corpus-based statistical analysis plays a significant role in linguistic research, and ample evidence has shown that different languages exhibit some common laws. Studies have found that letters in some alphabetic writing languages have strikingly similar statistical usage frequency distributions. Does this hold for Chinese, which employs ideogram writing? We obtained letter frequency data of some alphabetic writing languages and found the common law of the letter distributions. In addition, we collected Chinese literature corpora for different historical periods from the Tang Dynasty to the present, and we dismantled the Chinese written language into three kinds of basic particles: characters, strokes and constructive parts. The results of the statistical analysis showed that, in different historical periods, the intensity of the use of basic particles in Chinese writing varied, but the form of the distribution was consistent. In particular, the distributions of the Chinese constructive parts are certainly consistent with those alphabetic writing languages. This study provides new evidence of the consistency of human languages.

preprint2016arXiv

An efficient source of frequency anti-correlated entanglement at telecom wavelength

We demonstrate an efficient generation of frequency anti-correlated entangled photon pairs at telecom wavelength. The fundamental laser is a continuous-wave high-power fiber laser at 1560 nm, through an extracavity frequency doubling system, a 780-nm pump with a power as high as 742 mW is realized. After single passing through a periodically poled KTiOPO4 (PPKTP) crystal, degenerate down-converted photon pairs are generated. With an overall detection efficiency of 14.8 %, the count rates of the single photons and coincidence of the photon pairs are measured to be 370 kHz and 22 kHz, respectively. The spectra of the signal and idler photons are centered at 1560.23 and 1560.04 nm, while their 3-dB bandwidths being 3.22 nm both. The joint spectrum of the photon pair is observed to be frequency anti correlated and have a spectral bandwidth of 0.52 nm. According to the ratio of the single photon spectral bandwidth to the joint spectral bandwidth of the photon pairs, the degree of frequency entanglement is quantified to be 6.19. Based on a Hong Ou Mandel interferometric coincidence measurement, a frequency indistinguishability of 95 % is demonstrated. The good agreements with the theoretical estimations show that the inherent extra intensity noise in fiber lasers has little influence on frequency entanglement of the generated photon pairs.

preprint2016arXiv

Robust Object Tracking with a Hierarchical Ensemble Framework

Autonomous robots enjoy a wide popularity nowadays and have been applied in many applications, such as home security, entertainment, delivery, navigation and guidance. It is vital to robots to track objects accurately in these applications, so it is necessary to focus on tracking algorithms to improve the robustness and accuracy. In this paper, we propose a robust object tracking algorithm based on a hierarchical ensemble framework which can incorporate information including individual pixel features, local patches and holistic target models. The framework combines multiple ensemble models simultaneously instead of using a single ensemble model individually. A discriminative model which accounts for the matching degree of local patches is adopted via a bottom ensemble layer, and a generative model which exploits holistic templates is used to search for the object through the middle ensemble layer as well as an adaptive Kalman filter. We test the proposed tracker on challenging benchmark image sequences. Both qualitative and quantitative evaluations demonstrate that the proposed tracker performs superiorly against several state-of-the-art algorithms, especially when the appearance changes dramatically and the occlusions occur.

preprint2015arXiv

Stochastic Duty Cycling for Heterogenous Energy Harvesting Networks

In recent years, there have been several kinds of energy harvesting networks containing some tiny devices, such as ambient backscatter, ring and renewable sensor networks. During energy harvesting, such networks suffer from the energy heterogeneity, dynamics and prediction hardness because the access to natural resources is often spatiotemporal different and timely changing among the devices. Meanwhile, the charging efficiency is quite low especially when the power of the harvested energy is weak. It results in the energy waste to store the harvested energy indirectly. These features bring challenging and interesting issues on efficient allocation of the harvested energy. This paper studies the \emph{stochastic duty cycling} by considering these features with the objective characterized by maximizing the common active time. We consider two cases: offline and online stochastic duty cycling. For the offline case, we design an optimal solution: offline duty cycling algorithm. For the online case, we design an online duty cycling algorithm, which achieves the approximation ratio with at least $1-e^{-γ^2}$, where $γ$ is the probability able to harvest energy. We also evaluate our algorithms with the experiment on a real energy harvesting network. The experiment results show that the performance of the online algorithm can be very close to the offline algorithm.

Mengmeng Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

BEDA: Belief Estimation as Probabilistic Constraints for Performing Strategic Dialogue Acts

BSNet: Lane Detection via Draw B-spline Curves Nearby

Dynamically Stable Poincaré Embeddings for Neural Manifolds

E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context

Structure-aware Person Image Generation with Pose Decomposition and Semantic Correlation

Collaborative Distillation in the Parameter and Spectrum Domains for Video Action Recognition

Extended Feature Pyramid Network for Small Object Detection

FReeNet: Multi-Identity Face Reenactment

Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose

Semantic Graph Based Place Recognition for 3D Point Clouds

The 'Letter' Distribution in the Chinese Language

An efficient source of frequency anti-correlated entanglement at telecom wavelength

Robust Object Tracking with a Hierarchical Ensemble Framework

Stochastic Duty Cycling for Heterogenous Energy Harvesting Networks