Researcher profile

Ming Cheng

Ming Cheng contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2022arXiv

H2-Stereo: High-Speed, High-Resolution Stereoscopic Video System

High-speed, high-resolution stereoscopic (H2-Stereo) video allows us to perceive dynamic 3D content at fine granularity. The acquisition of H2-Stereo video, however, remains challenging with commodity cameras. Existing spatial super-resolution or temporal frame interpolation methods provide compromised solutions that lack temporal or spatial details, respectively. To alleviate this problem, we propose a dual camera system, in which one camera captures high-spatial-resolution low-frame-rate (HSR-LFR) videos with rich spatial details, and the other captures low-spatial-resolution high-frame-rate (LSR-HFR) videos with smooth temporal details. We then devise a Learned Information Fusion network (LIFnet) that exploits the cross-camera redundancies to enhance both camera views to high spatiotemporal resolution (HSTR) for reconstructing the H2-Stereo video effectively. We utilize a disparity network to transfer spatiotemporal information across views even in large disparity scenes, based on which, we propose disparity-guided flow-based warping for LSR-HFR view and complementary warping for HSR-LFR view. A multi-scale fusion method in feature domain is proposed to minimize occlusion-induced warping ghosts and holes in HSR-LFR view. The LIFnet is trained in an end-to-end manner using our collected high-quality Stereo Video dataset from YouTube. Extensive experiments demonstrate that our model outperforms existing state-of-the-art methods for both views on synthetic data and camera-captured real data with large disparity. Ablation studies explore various aspects, including spatiotemporal resolution, camera baseline, camera desynchronization, long/short exposures and applications, of our system to fully understand its capability for potential applications.

preprint2022arXiv

Multi-Graph Fusion Networks for Urban Region Embedding

Learning the embeddings for urban regions from human mobility data can reveal the functionality of regions, and then enables the correlated but distinct tasks such as crime prediction. Human mobility data contains rich but abundant information, which yields to the comprehensive region embeddings for cross domain tasks. In this paper, we propose multi-graph fusion networks (MGFN) to enable the cross domain prediction tasks. First, we integrate the graphs with spatio-temporal similarity as mobility patterns through a mobility graph fusion module. Then, in the mobility pattern joint learning module, we design the multi-level cross-attention mechanism to learn the comprehensive embeddings from multiple mobility patterns based on intra-pattern and inter-pattern messages. Finally, we conduct extensive experiments on real-world urban datasets. Experimental results demonstrate that the proposed MGFN outperforms the state-of-the-art methods by up to 12.35% improvement.

preprint2021arXiv

The evolution of network controllability in growing networks

The study of network structural controllability focuses on the minimum number of driver nodes needed to control a whole network. Despite intensive studies on this topic, most of them consider static networks only. It is well-known, however, that real networks are growing, with new nodes and links added to the system. Here, we analyze controllability of evolving networks and propose a general rule for the change of driver nodes. We further apply the rule to solve the problem of network augmentation subject to the controllability constraint. The findings fill a gap in our understanding of network controllability and shed light on controllability of real systems.

preprint2020arXiv

A Dual Camera System for High Spatiotemporal Resolution Video Acquisition

This paper presents a dual camera system for high spatiotemporal resolution (HSTR) video acquisition, where one camera shoots a video with high spatial resolution and low frame rate (HSR-LFR) and another one captures a low spatial resolution and high frame rate (LSR-HFR) video. Our main goal is to combine videos from LSR-HFR and HSR-LFR cameras to create an HSTR video. We propose an end-to-end learning framework, AWnet, mainly consisting of a FlowNet and a FusionNet that learn an adaptive weighting function in pixel domain to combine inputs in a frame recurrent fashion. To improve the reconstruction quality for cameras used in reality, we also introduce noise regularization under the same framework. Our method has demonstrated noticeable performance gains in terms of both objective PSNR measurement in simulation with different publicly available video and light-field datasets and subjective evaluation with real data captured by dual iPhone 7 and Grasshopper3 cameras. Ablation studies are further conducted to investigate and explore various aspects (such as reference structure, camera parallax, exposure time, etc) of our system to fully understand its capability for potential applications.

preprint2020arXiv

Intelligent Autofocus

We demonstrate that deep learning methods can determine the best focus position from 1-2 image samples, enabling 5-10x faster focus than traditional search-based methods. In contrast with phase detection methods, deep autofocus does not require specialized hardware. In further constrast with conventional methods, which assume a static "best focus," AI methods can generate scene-based focus trajectories that optimize synthesized image quality for dynamic and three dimensional scenes.

preprint2020arXiv

Joint Beamforming and Computation Offloading for Multi-user Mobile-Edge Computing

Mobile edge computing (MEC) is considered as an efficient method to relieve the computation burden of mobile devices. In order to reduce the energy consumption and time delay of mobile devices (MDs) in MEC, multiple users multiple input and multiple output (MU-MIMO) communications is considered to be applied to the MEC system. The purpose of this paper is to minimize the weighted sum of energy consumption and time delay of MDs by jointly considering the offloading decision and MU-MIMO beamforming problems. And the resulting optimization problem is a mixed-integer non-linear programming problem, which is NP-hard. To solve the optimization problem, a semidefinite relaxation based algorithm is proposed to solve the offloading decision problem. Then, the MU-MIMO beamforming design problem is handled with a newly proposed fractional programming method. Simulation results show that the proposed algorithms can effectively reduce the energy consumption and time delay of the computation offloading.

preprint2020arXiv

LO-Net: Deep Real-time Lidar Odometry

We present a novel deep convolutional network pipeline, LO-Net, for real-time lidar odometry estimation. Unlike most existing lidar odometry (LO) estimations that go through individually designed feature selection, feature matching, and pose estimation pipeline, LO-Net can be trained in an end-to-end manner. With a new mask-weighted geometric constraint loss, LO-Net can effectively learn feature representation for LO estimation, and can implicitly exploit the sequential dependencies and dynamics in the data. We also design a scan-to-map module, which uses the geometric and semantic information learned in LO-Net, to improve the estimation accuracy. Experiments on benchmark datasets demonstrate that LO-Net outperforms existing learning based approaches and has similar accuracy with the state-of-the-art geometry-based approach, LOAM.

preprint2020arXiv

Smart Cameras

We review camera architecture in the age of artificial intelligence. Modern cameras use physical components and software to capture, compress and display image data. Over the past 5 years, deep learning solutions have become superior to traditional algorithms for each of these functions. Deep learning enables 10-100x reduction in electrical sensor power per pixel, 10x improvement in depth of field and dynamic range and 10-100x improvement in image pixel count. Deep learning enables multiframe and multiaperture solutions that fundamentally shift the goals of physical camera design. Here we review the state of the art of deep learning in camera operations and consider the impact of AI on the physical design of cameras.