Source author record

Chen Feng

Chen Feng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Information Theory Machine Learning math.IT Artificial Intelligence eess.IV Robotics Distributed, Parallel, and Cluster Computing Cryptography and Security Networking and Internet Architecture Performance Computational Geometry eess.AS eess.SP eess.SY quant-ph Sound Systems and Control

Catalog footprint

What is connected

34works

18topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

FlyCo: Foundation Model-Empowered Drones for Autonomous 3D Structure Scanning in Open-World Environments

Autonomous 3D scanning of open-world target structures via drones remains challenging despite broad applications. Existing paradigms rely on restrictive assumptions or effortful human priors, limiting practicality, efficiency, and adaptability. Recent foundation models (FMs) offer great potential to bridge this gap. This paper investigates a critical research problem: What system architecture can effectively integrate FM knowledge for this task? We answer it with FlyCo, a principled FM-empowered perception-prediction-planning loop enabling fully autonomous, prompt-driven 3D target scanning in diverse unknown open-world environments. FlyCo directly translates low-effort human prompts (text, visual annotations) into precise adaptive scanning flights via three coordinated stages: (1) perception fuses streaming sensor data with vision-language FMs for robust target grounding and tracking; (2) prediction distills FM knowledge and combines multi-modal cues to infer the partially observed target's complete geometry; (3) planning leverages predictive foresight to generate efficient and safe paths with comprehensive target coverage. Building on this, we further design key components to boost open-world target grounding efficiency and robustness, enhance prediction quality in terms of shape accuracy, zero-shot generalization, and temporal stability, and balance long-horizon flight efficiency with real-time computability and online collision avoidance. Extensive challenging real-world and simulation experiments show FlyCo delivers precise scene understanding, high efficiency, and real-time safety, outperforming existing paradigms with lower human effort and verifying the proposed architecture's practicality. Comprehensive ablations validate each component's contribution. FlyCo also serves as a flexible, extensible blueprint, readily leveraging future FM and robotics advances. Code will be released.

preprint2026arXiv

Merged Bitcoin: Proof of Work Blockchains with Multiple Hash Types

Proof of work blockchain protocols using multiple hash types are considered. It is proven that the security region of such a protocol cannot be the AND of a 51\% attack on all the hash types. Nevertheless, a protocol called Merged Bitcoin is introduced, which is the Bitcoin protocol where links between blocks can be formed using multiple different hash types. Closed form bounds on its security region in the $Δ$-bounded delay network model are proven, and these bounds are compared to simulation results. This protocol is proven to maximize cost of attack in the linear cost-per-hash model. A difficulty adjustment method is introduced, and it is argued that this can partly remedy asymmetric advantages an adversary may gain in hashing power for some hash types, including from algorithmic advances, quantum attacks like Grover's algorithm, or hardware backdoor attacks.

preprint2026arXiv

ViewSAM: Learning View-aware Cross-modal Semantics for Weakly Supervised Cross-view Referring Multi-Object Tracking

Cross-view Referring Multi-Object Tracking (CRMOT) aims to track multiple objects specified by natural language across multiple camera views, with globally consistent identities. Despite recent progress, existing methods rely heavily on costly frame-level spatial annotations and cross-view identity supervision. To reduce such reliance, we explore CRMOT under weak supervision by leveraging the capabilities of foundation models. However, our empirical study shows that directly applying foundation models such as SAM2 and SAM3, even with task-specific modifications, fails to accurately understand referring expressions and maintain consistent identities across views. Yet, they remain effective at producing reliable object tracklets that can serve as pseudo supervision. We therefore repurpose foundation models as pseudo-label generators and propose a two-stage framework for weakly supervised CRMOT, using only object category labels as coarse-grained supervision. In the first stage, we design an Affinity-guided Cross-view Re-prompting strategy to refine and associate SAM3-generated tracklets across cameras, producing reliable cross-view pseudo labels for subsequent training. In the second stage, we introduce ViewSAM, a CRMOT model built upon SAM2 that explicitly models view-aware cross-modal semantics. By formulating view-induced variations as learnable conditions, ViewSAM bridges the gap between view-variant visual observations and view-invariant textual expressions, enabling robust cross-view referring tracking with only approximately 10% additional parameters. Extensive experiments demonstrate that ViewSAM achieves SOTA performance under weak supervision and remains competitive with fully supervised methods.

preprint2022arXiv

An Empirical Study of Low Precision Quantization for TinyML

Tiny machine learning (tinyML) has emerged during the past few years aiming to deploy machine learning models to embedded AI processors with highly constrained memory and computation capacity. Low precision quantization is an important model compression technique that can greatly reduce both memory consumption and computation cost of model inference. In this study, we focus on post-training quantization (PTQ) algorithms that quantize a model to low-bit (less than 8-bit) precision with only a small set of calibration data and benchmark them on different tinyML use cases. To achieve a fair comparison, we build a simulated quantization framework to investigate recent PTQ algorithms. Furthermore, we break down those algorithms into essential components and re-assembled a generic PTQ pipeline. With ablation study on different alternatives of components in the pipeline, we reveal key design choices when performing low precision quantization. We hope this work could provide useful data points and shed lights on the future research of low precision quantization.

preprint2022arXiv

Decentralized Voltage Control with Peer-to-peer Energy Trading in a Distribution Network

Utilizing distributed renewable and energy storage resources via peer-to-peer (P2P) energy trading has long been touted as a solution to improve energy system's resilience and sustainability. Consumers and prosumers (those who have energy generation resources), however, do not have expertise to engage in repeated P2P trading, and the zero-marginal costs of renewables present challenges in determining fair market prices. To address these issues, we propose a multi-agent reinforcement learning (MARL) framework to help automate consumers' bidding and management of their solar PV and energy storage resources, under a specific P2P clearing mechanism that utilizes the so-called supply-demand ratio. In addition, we show how the MARL framework can integrate physical network constraints to realize decentralized voltage control, hence ensuring physical feasibility of the P2P energy trading and paving ways for real-world implementations.

preprint2022arXiv

Egocentric Prediction of Action Target in 3D

We are interested in anticipating as early as possible the target location of a person's object manipulation action in a 3D workspace from egocentric vision. It is important in fields like human-robot collaboration, but has not yet received enough attention from vision and learning communities. To stimulate more research on this challenging egocentric vision task, we propose a large multimodality dataset of more than 1 million frames of RGB-D and IMU streams, and provide evaluation metrics based on our high-quality 2D and 3D labels from semi-automatic annotation. Meanwhile, we design baseline methods using recurrent neural networks and conduct various ablation studies to validate their effectiveness. Our results demonstrate that this new task is worthy of further study by researchers in robotics, vision, and learning communities.

preprint2022arXiv

Enhancing VVC with Deep Learning based Multi-Frame Post-Processing

This paper describes a CNN-based multi-frame post-processing approach based on a perceptually-inspired Generative Adversarial Network architecture, CVEGAN. This method has been integrated with the Versatile Video Coding Test Model (VTM) 15.2 to enhance the visual quality of the final reconstructed content. The evaluation results on the CLIC 2022 validation sequences show consistent coding gains over the original VVC VTM at the same bitrates when assessed by PSNR. The integrated codec has been submitted to the Challenge on Learned Image Compression (CLIC) 2022 (video track), and the team name associated with this submission is BVI_VC.

preprint2022arXiv

Learning Distilled Collaboration Graph for Multi-Agent Perception

To promote better performance-bandwidth trade-off for multi-agent perception, we propose a novel distilled collaboration graph (DiscoGraph) to model trainable, pose-aware, and adaptive collaboration among agents. Our key novelties lie in two aspects. First, we propose a teacher-student framework to train DiscoGraph via knowledge distillation. The teacher model employs an early collaboration with holistic-view inputs; the student model is based on intermediate collaboration with single-view inputs. Our framework trains DiscoGraph by constraining post-collaboration feature maps in the student model to match the correspondences in the teacher model. Second, we propose a matrix-valued edge weight in DiscoGraph. In such a matrix, each element reflects the inter-agent attention at a specific spatial region, allowing an agent to adaptively highlight the informative regions. During inference, we only need to use the student model named as the distilled collaboration network (DiscoNet). Attributed to the teacher-student framework, multiple agents with the shared DiscoNet could collaboratively approach the performance of a hypothetical teacher model with a holistic view. Our approach is validated on V2X-Sim 1.0, a large-scale multi-agent perception dataset that we synthesized using CARLA and SUMO co-simulation. Our quantitative and qualitative experiments in multi-agent 3D object detection show that DiscoNet could not only achieve a better performance-bandwidth trade-off than the state-of-the-art collaborative perception methods, but also bring more straightforward design rationale. Our code is available on https://github.com/ai4ce/DiscoNet.

preprint2022arXiv

Opportunistic Routing in Quantum Networks

Unlike classical routing algorithms, quantum routing algorithms make use of entangled states - a type of resources that have a limited lifetime and need to be regenerated after consumption. In a nutshell, quantum routing algorithms have to use these resources efficiently, while optimizing some objectives such as the total waiting time. Current routing algorithms tend to keep a routing request waiting until all of the resources on its path are available. In this paper, we introduce a new way of managing entanglement resources in an opportunistic fashion: a request can move forward along its path as soon as possible (even if some resources on its path are not ready). We show that this opportunistic approach is fundamentally better than conventional approaches. In particular, our results indicate that this new approach achieves a 30-50% improvement in the average total waiting time and average link waiting time compared with several state-of-the-art routing algorithms. As a by-product of this work, we develop a new simulator for quantum routing, which can be used to evaluate various design choices under different scenarios.

preprint2022arXiv

Represent, Compare, and Learn: A Similarity-Aware Framework for Class-Agnostic Counting

Class-agnostic counting (CAC) aims to count all instances in a query image given few exemplars. A standard pipeline is to extract visual features from exemplars and match them with query images to infer object counts. Two essential components in this pipeline are feature representation and similarity metric. Existing methods either adopt a pretrained network to represent features or learn a new one, while applying a naive similarity metric with fixed inner product. We find this paradigm leads to noisy similarity matching and hence harms counting performance. In this work, we propose a similarity-aware CAC framework that jointly learns representation and similarity metric. We first instantiate our framework with a naive baseline called Bilinear Matching Network (BMNet), whose key component is a learnable bilinear similarity metric. To further embody the core of our framework, we extend BMNet to BMNet+ that models similarity from three aspects: 1) representing the instances via their self-similarity to enhance feature robustness against intra-class variations; 2) comparing the similarity dynamically to focus on the key patterns of each exemplar; 3) learning from a supervision signal to impose explicit constraints on matching results. Extensive experiments on a recent CAC dataset FSC147 show that our models significantly outperform state-of-the-art CAC approaches. In addition, we also validate the cross-dataset generality of BMNet and BMNet+ on a car counting dataset CARPK. Code is at tiny.one/BMNet

preprint2022arXiv

Self-supervised Spatial Reasoning on Multi-View Line Drawings

Spatial reasoning on multi-view line drawings by state-of-the-art supervised deep networks is recently shown with puzzling low performances on the SPARE3D dataset. Based on the fact that self-supervised learning is helpful when a large number of data are available, we propose two self-supervised learning approaches to improve the baseline performance for view consistency reasoning and camera pose reasoning tasks on the SPARE3D dataset. For the first task, we use a self-supervised binary classification network to contrast the line drawing differences between various views of any two similar 3D objects, enabling the trained networks to effectively learn detail-sensitive yet view-invariant line drawing representations of 3D objects. For the second type of task, we propose a self-supervised multi-class classification framework to train a model to select the correct corresponding view from which a line drawing is rendered. Our method is even helpful for the downstream tasks with unseen camera poses. Experiments show that our method could significantly increase the baseline performance in SPARE3D, while some popular self-supervised learning methods cannot.

preprint2022arXiv

TENET: Transformer Encoding Network for Effective Temporal Flow on Motion Prediction

This technical report presents an effective method for motion prediction in autonomous driving. We develop a Transformer-based method for input encoding and trajectory prediction. Besides, we propose the Temporal Flow Header to enhance the trajectory encoding. In the end, an efficient K-means ensemble method is used. Using our Transformer network and ensemble method, we win the first place of Argoverse 2 Motion Forecasting Challenge with the state-of-the-art brier-minFDE score of 1.90.

preprint2022arXiv

V2X-Sim: Multi-Agent Collaborative Perception Dataset and Benchmark for Autonomous Driving

Vehicle-to-everything (V2X) communication techniques enable the collaboration between vehicles and many other entities in the neighboring environment, which could fundamentally improve the perception system for autonomous driving. However, the lack of a public dataset significantly restricts the research progress of collaborative perception. To fill this gap, we present V2X-Sim, a comprehensive simulated multi-agent perception dataset for V2X-aided autonomous driving. V2X-Sim provides: (1) \hl{multi-agent} sensor recordings from the road-side unit (RSU) and multiple vehicles that enable collaborative perception, (2) multi-modality sensor streams that facilitate multi-modality perception, and (3) diverse ground truths that support various perception tasks. Meanwhile, we build an open-source testbed and provide a benchmark for the state-of-the-art collaborative perception algorithms on three tasks, including detection, tracking and segmentation. V2X-Sim seeks to stimulate collaborative perception research for autonomous driving before realistic datasets become widely available. Our dataset and code are available at \url{https://ai4ce.github.io/V2X-Sim/}.

preprint2021arXiv

Dissecting the Performance of Chained-BFT

Permissioned blockchains employ Byzantine fault-tolerant (BFT) state machine replication (SMR) to reach agreement on an ever-growing, linearly ordered log of transactions. A new paradigm, combined with decades of research in BFT SMR and blockchain (namely chained-BFT, or cBFT), has emerged for directly constructing blockchain protocols. Chained-BFT protocols have a unifying propose-vote scheme instead of multiple different voting phases with a set of voting and commit rules to guarantee safety and liveness. However, distinct voting and commit rules impose varying impacts on performance under different workloads, network conditions, and Byzantine attacks. Therefore, a fair comparison of the proposed protocols poses a challenge that has not yet been addressed by existing work. We fill this gap by studying a family of cBFT protocols with a two-pronged systematic approach. First, we present an evaluation framework, Bamboo, for quick prototyping of cBFT protocols and that includes helpful benchmarking facilities. To validate Bamboo, we introduce an analytic model using queuing theory which also offers a back-of-the-envelope guide for dissecting these protocols. We build multiple cBFT protocols using Bamboo and we are the first to fairly compare three representatives (i.e., HotStuff, two-chain HotStuff, and Streamlet). We evaluated these protocols under various parameters and scenarios, including two Byzantine attacks that have not been widely discussed in the literature. Our findings reveal interesting trade-offs (e.g., responsiveness vs. forking-resilience) between different cBFT protocols and their design choices, which provide developers and researchers with insights into the design and implementation of this protocol family.

preprint2021arXiv

Video Compression with CNN-based Post Processing

In recent years, video compression techniques have been significantly challenged by the rapidly increased demands associated with high quality and immersive video content. Among various compression tools, post-processing can be applied on reconstructed video content to mitigate visible compression artefacts and to enhance overall perceptual quality. Inspired by advances in deep learning, we propose a new CNN-based post-processing approach, which has been integrated with two state-of-the-art coding standards, VVC and AV1. The results show consistent coding gains on all tested sequences at various spatial resolutions, with average bit rate savings of 4.0% and 5.8% against original VVC and AV1 respectively (based on the assessment of PSNR). This network has also been trained with perceptually inspired loss functions, which have further improved reconstruction quality based on perceptual quality assessment (VMAF), with average coding gains of 13.9% over VVC and 10.5% against AV1.

preprint2020arXiv

3D Point Cloud Processing and Learning for Autonomous Driving

We present a review of 3D point cloud processing and learning for autonomous driving. As one of the most important sensors in autonomous vehicles, light detection and ranging (LiDAR) sensors collect 3D point clouds that precisely record the external surfaces of objects and scenes. The tools for 3D point cloud processing and learning are critical to the map creation, localization, and perception modules in an autonomous vehicle. While much attention has been paid to data collected from cameras, such as images and videos, an increasing number of researchers have recognized the importance and significance of LiDAR in autonomous driving and have proposed processing and learning algorithms to exploit 3D point clouds. We review the recent progress in this research area and summarize what has been tried and what is needed for practical and safe autonomous vehicles. We also offer perspectives on open issues that are needed to be solved in the future.

preprint2020arXiv

A Robust Speaker Clustering Method Based on Discrete Tied Variational Autoencoder

Recently, the speaker clustering model based on aggregation hierarchy cluster (AHC) is a common method to solve two main problems: no preset category number clustering and fix category number clustering. In general, model takes features like i-vectors as input of probability and linear discriminant analysis model (PLDA) aims to form the distance matric in long voice application scenario, and then clustering results are obtained through the clustering model. However, traditional speaker clustering method based on AHC has the shortcomings of long-time running and remains sensitive to environment noise. In this paper, we propose a novel speaker clustering method based on Mutual Information (MI) and a non-linear model with discrete variable, which under the enlightenment of Tied Variational Autoencoder (TVAE), to enhance the robustness against noise. The proposed method named Discrete Tied Variational Autoencoder (DTVAE) which shortens the elapsed time substantially. With experience results, it outperforms the general model and yields a relative Accuracy (ACC) improvement and significant time reduction.

preprint2020arXiv

Automatic Failure Recovery and Re-Initialization for Online UAV Tracking with Joint Scale and Aspect Ratio Optimization

Current unmanned aerial vehicle (UAV) visual tracking algorithms are primarily limited with respect to: (i) the kind of size variation they can deal with, (ii) the implementation speed which hardly meets the real-time requirement. In this work, a real-time UAV tracking algorithm with powerful size estimation ability is proposed. Specifically, the overall tracking task is allocated to two 2D filters: (i) translation filter for location prediction in the space domain, (ii) size filter for scale and aspect ratio optimization in the size domain. Besides, an efficient two-stage re-detection strategy is introduced for long-term UAV tracking tasks. Large-scale experiments on four UAV benchmarks demonstrate the superiority of the presented method which has computation feasibility on a low-cost CPU.

preprint2020arXiv

DR^2Track: Towards Real-Time Visual Tracking for UAV via Distractor Repressed Dynamic Regression

Visual tracking has yielded promising applications with unmanned aerial vehicle (UAV). In literature, the advanced discriminative correlation filter (DCF) type trackers generally distinguish the foreground from the background with a learned regressor which regresses the implicit circulated samples into a fixed target label. However, the predefined and unchanged regression target results in low robustness and adaptivity to uncertain aerial tracking scenarios. In this work, we exploit the local maximum points of the response map generated in the detection phase to automatically locate current distractors. By repressing the response of distractors in the regressor learning, we can dynamically and adaptively alter our regression target to leverage the tracking robustness as well as adaptivity. Substantial experiments conducted on three challenging UAV benchmarks demonstrate both excellent performance and extraordinary speed (~50fps on a cheap CPU) of our tracker.

preprint2020arXiv

LUVLi Face Alignment: Estimating Landmarks' Location, Uncertainty, and Visibility Likelihood

Modern face alignment methods have become quite accurate at predicting the locations of facial landmarks, but they do not typically estimate the uncertainty of their predicted locations nor predict whether landmarks are visible. In this paper, we present a novel framework for jointly predicting landmark locations, associated uncertainties of these predicted locations, and landmark visibilities. We model these as mixed random variables and estimate them using a deep network trained with our proposed Location, Uncertainty, and Visibility Likelihood (LUVLi) loss. In addition, we release an entirely new labeling of a large face alignment dataset with over 19,000 face images in a full range of head poses. Each face is manually labeled with the ground-truth locations of 68 landmarks, with the additional information of whether each landmark is unoccluded, self-occluded (due to extreme head poses), or externally occluded. Not only does our joint estimation yield accurate estimates of the uncertainty of predicted landmark locations, but it also yields state-of-the-art estimates for the landmark locations themselves on multiple standard face alignment datasets. Our method's estimates of the uncertainty of predicted landmark locations could be used to automatically identify input images on which face alignment fails, which can be critical for downstream tasks.

preprint2020arXiv

P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds

Towards 3D object tracking in point clouds, a novel point-to-box network termed P2B is proposed in an end-to-end learning manner. Our main idea is to first localize potential target centers in 3D search area embedded with target information. Then point-driven 3D target proposal and verification are executed jointly. In this way, the time-consuming 3D exhaustive search can be avoided. Specifically, we first sample seeds from the point clouds in template and search area respectively. Then, we execute permutation-invariant feature augmentation to embed target clues from template into search area seeds and represent them with target-specific features. Consequently, the augmented search area seeds regress the potential target centers via Hough voting. The centers are further strengthened with seed-wise targetness scores. Finally, each center clusters its neighbors to leverage the ensemble power for joint 3D target proposal and verification. We apply PointNet++ as our backbone and experiments on KITTI tracking dataset demonstrate P2B's superiority (~10%'s improvement over state-of-the-art). Note that P2B can run with 40FPS on a single NVIDIA 1080Ti GPU. Our code and model are available at https://github.com/HaozheQi/P2B.

preprint2020arXiv

Regularizing Neural Networks via Minimizing Hyperspherical Energy

Inspired by the Thomson problem in physics where the distribution of multiple propelling electrons on a unit sphere can be modeled via minimizing some potential energy, hyperspherical energy minimization has demonstrated its potential in regularizing neural networks and improving their generalization power. In this paper, we first study the important role that hyperspherical energy plays in neural network training by analyzing its training dynamics. Then we show that naively minimizing hyperspherical energy suffers from some difficulties due to highly non-linear and non-convex optimization as the space dimensionality becomes higher, therefore limiting the potential to further improve the generalization. To address these problems, we propose the compressive minimum hyperspherical energy (CoMHE) as a more effective regularization for neural networks. Specifically, CoMHE utilizes projection mappings to reduce the dimensionality of neurons and minimizes their hyperspherical energy. According to different designs for the projection mapping, we propose several distinct yet well-performing variants and provide some theoretical guarantees to justify their effectiveness. Our experiments show that CoMHE consistently outperforms existing regularization methods, and can be easily applied to different neural networks.

preprint2020arXiv

SPARE3D: A Dataset for SPAtial REasoning on Three-View Line Drawings

Spatial reasoning is an important component of human intelligence. We can imagine the shapes of 3D objects and reason about their spatial relations by merely looking at their three-view line drawings in 2D, with different levels of competence. Can deep networks be trained to perform spatial reasoning tasks? How can we measure their "spatial intelligence"? To answer these questions, we present the SPARE3D dataset. Based on cognitive science and psychometrics, SPARE3D contains three types of 2D-3D reasoning tasks on view consistency, camera pose, and shape generation, with increasing difficulty. We then design a method to automatically generate a large number of challenging questions with ground truth answers for each task. They are used to provide supervision for training our baseline models using state-of-the-art architectures like ResNet. Our experiments show that although convolutional networks have achieved superhuman performance in many visual learning tasks, their spatial reasoning performance on SPARE3D tasks is either lower than average human performance or even close to random guesses. We hope SPARE3D can stimulate new problem formulations and network designs for spatial reasoning to empower intelligent robots to operate effectively in the 3D world via 2D sensors. The dataset and code are available at https://ai4ce.github.io/SPARE3D.

preprint2020arXiv

The Hermes BFT for Blockchains

The performance of partially synchronous BFT-based consensus protocols is highly dependent on the primary node. All participant nodes in the network are blocked until they receive a proposal from the primary node to begin the consensus process.Therefore, an honest but slack node (with limited bandwidth) can adversely affect the performance when selected as primary. Hermes decreases protocol dependency on the primary node and minimizes transmission delay induced by the slack primary while keeping low message complexity and latency. Hermes achieves these performance improvements by relaxing strong BFT agreement (safety) guarantees only for a specific type of Byzantine faults (also called equivocated faults). Interestingly, we show that in Hermes equivocating by a Byzantine primary is unlikely, expensive and ineffective. Therefore, the safety of Hermes is comparable to the general BFT consensus. We deployed and tested Hermes on 190 Amazon EC2 instances. In these tests, Hermes's performance was comparable to the state-of-the-art BFT protocol for blockchains (when the network size is large) in the absence of slack nodes. Whereas, in the presence of slack nodes Hermes out performed the state-of-the-art BFT protocol by more than 4x in terms of throughput as well as 15x in terms of latency.

preprint2019arXiv

Deep Unsupervised Learning of 3D Point Clouds via Graph Topology Inference and Filtering

We propose a deep autoencoder with graph topology inference and filtering to achieve compact representations of unorganized 3D point clouds in an unsupervised manner. Many previous works discretize 3D points to voxels and then use lattice-based methods to process and learn 3D spatial information; however, this leads to inevitable discretization errors. In this work, we handle raw 3D points without such compromise. The proposed networks follow the autoencoder framework with a focus on designing the decoder. The encoder adopts similar architectures as in PointNet. The decoder involves three novel modules. The folding module folds a canonical 2D lattice to the underlying surface of a 3D point cloud, achieving coarse reconstruction; the graph-topology-inference module learns a graph topology to represent pairwise relationships between 3D points, pushing the latent code to preserve both coordinates and pairwise relationships of points in 3D point clouds; and the graph-filtering module couples the above two modules, refining the coarse reconstruction through a learnt graph topology to obtain the final reconstruction. The proposed decoder leverages a learnable graph topology to push the codeword to preserve representative features and further improve the unsupervised-learning performance. We further provide theoretical analyses of the proposed architecture. In the experiments, we validate the proposed networks in three tasks, including 3D point cloud reconstruction, visualization, and transfer classification. The experimental results show that (1) the proposed networks outperform the state-of-the-art methods in various tasks; (2) a graph topology can be inferred as auxiliary information without specific supervision on graph topology inference; and (3) graph filtering refines the reconstruction, leading to better performances.

preprint2016arXiv

A Joint Typicality Approach to Algebraic Network Information Theory

This paper presents a joint typicality framework for encoding and decoding nested linear codes for multi-user networks. This framework provides a new perspective on compute-forward within the context of discrete memoryless networks. In particular, it establishes an achievable rate region for computing the weighted sum of nested linear codewords over a discrete memoryless multiple-access channel (MAC). When specialized to the Gaussian MAC, this rate region recovers and improves upon the lattice-based compute-forward rate region of Nazer and Gastpar, thus providing a unified approach for discrete memoryless and Gaussian networks. Furthermore, this framework can be used to shed light on the joint decoding rate region for compute-forward, which is considered an open problem. Specifically, this work establishes an achievable rate region for simultaneously decoding two linear combinations of nested linear codewords from K senders.

preprint2016arXiv

An Alternating Direction Method Approach to Cloud Traffic Management

In this paper, we introduce a unified framework for studying various cloud traffic management problems, ranging from geographical load balancing to backbone traffic engineering. We first abstract these real-world problems as a multi-facility resource allocation problem, and then present two distributed optimization algorithms by exploiting the special structure of the problem. Our algorithms are inspired by Alternating Direction Method of Multipliers (ADMM), enjoying a number of unique features. Compared to dual decomposition, they converge with non-strictly convex objective functions; compared to other ADMM-type algorithms, they not only achieve faster convergence under weaker assumptions, but also have lower computational complexity and lower message-passing overhead. The simulation results not only confirm these desirable features of our algorithms, but also highlight several additional advantages, such as scalability and fault-tolerance.

preprint2016arXiv

Performance Analysis of CSMA with Multi-Packet Reception: The Inhomogeneous Case

The problem of Carrier Sense Multiple Access (CSMA) with multi-packet reception (MPR) is studied. Most prior work has focused on the homogeneous case, where all the mobile users are assumed to have identical packet arrival rates and transmission probabilities. The inhomogeneous case remains largely open in the literature. In this work, we make a first step towards this open problem by deriving throughput and delay expressions for inhomogeneous CSMA, with a particular focus on a family of MPR models called the "all-or-nothing" symmetric MPR. This family of MPR models allows us to overcome several technical challenges associated with conventional analysis and to derive accurate throughput and delay expressions in the large-systems regime. Interestingly, this family of MPR models is still general enough to include a number of useful MPR techniques - such as successive interference cancellation (SIC), compute-and-forward (C&F), and successive compute-and-forward (SCF) - as special cases. Based on these throughput and delay expressions, we provide theoretical guidelines for meeting quality-of-service requirements and for achieving global stability; we also evaluate the performances of various MPR techniques, highlighting the clear advantages offered by SCF.

preprint2015arXiv

Experimental Quantum Fingerprinting

Quantum communication holds the promise of creating disruptive technologies that will play an essential role in future communication networks. For example, the study of quantum communication complexity has shown that quantum communication allows exponential reductions in the information that must be transmitted to solve distributed computational tasks. Recently, protocols that realize this advantage using optical implementations have been proposed. Here we report a proof of concept experimental demonstration of a quantum fingerprinting system that is capable of transmitting less information than the best known classical protocol. Our implementation is based on a modified version of a commercial quantum key distribution system using off-the-shelf optical components over telecom wavelengths, and is practical for messages as large as 100 Mbits, even in the presence of experimental imperfections. Our results provide a first step in the development of experimental quantum communication complexity.

preprint2014arXiv

Communication over Finite-Chain-Ring Matrix Channels

Though network coding is traditionally performed over finite fields, recent work on nested-lattice-based network coding suggests that, by allowing network coding over certain finite rings, more efficient physical-layer network coding schemes can be constructed. This paper considers the problem of communication over a finite-ring matrix channel $Y = AX + BE$, where $X$ is the channel input, $Y$ is the channel output, $E$ is random error, and $A$ and $B$ are random transfer matrices. Tight capacity results are obtained and simple polynomial-complexity capacity-achieving coding schemes are provided under the assumption that $A$ is uniform over all full-rank matrices and $BE$ is uniform over all rank-$t$ matrices, extending the work of Silva, Kschischang and Kötter (2010), who handled the case of finite fields. This extension is based on several new results, which may be of independent interest, that generalize concepts and methods from matrices over finite fields to matrices over finite chain rings.

preprint2014arXiv

Performance Benefits of DataMPI: A Case Study with BigDataBench

Apache Hadoop and Spark are gaining prominence in Big Data processing and analytics. Both of them are widely deployed on Internet companies. On the other hand, high-performance data analysis requirements are causing academical and industrial communities to adopt state-of-the-art technologies in HPC to solve Big Data problems. Recently, we have proposed a key-value pair based communication library, DataMPI, which is extending MPI to support Hadoop/Spark-like Big Data Computing jobs. In this paper, we use BigDataBench, a Big Data benchmark suite, to do comprehensive studies on performance and resource utilization characterizations of Hadoop, Spark and DataMPI. From our experiments, we observe that the job execution time of DataMPI has up to 55% and 39% speedups compared with those of Hadoop and Spark, respectively. Most of the benefits come from the high-efficiency communication mechanisms in DataMPI. We also notice that the resource (CPU, memory, disk and network I/O) utilizations of DataMPI are also more efficient than those of the other two frameworks.

preprint2013arXiv

Algebraic Approach to Physical-Layer Network Coding

The problem of designing physical-layer network coding (PNC) schemes via nested lattices is considered. Building on the compute-and-forward (C&F) relaying strategy of Nazer and Gastpar, who demonstrated its asymptotic gain using information-theoretic tools, an algebraic approach is taken to show its potential in practical, non-asymptotic, settings. A general framework is developed for studying nested-lattice-based PNC schemes---called lattice network coding (LNC) schemes for short---by making a direct connection between C&F and module theory. In particular, a generic LNC scheme is presented that makes no assumptions on the underlying nested lattice code. C&F is re-interpreted in this framework, and several generalized constructions of LNC schemes are given. The generic LNC scheme naturally leads to a linear network coding channel over modules, based on which non-coherent network coding can be achieved. Next, performance/complexity tradeoffs of LNC schemes are studied, with a particular focus on hypercube-shaped LNC schemes. The error probability of this class of LNC schemes is largely determined by the minimum inter-coset distances of the underlying nested lattice code. Several illustrative hypercube-shaped LNC schemes are designed based on Construction A and D, showing that nominal coding gains of 3 to 7.5 dB can be obtained with reasonable decoding complexity. Finally, the possibility of decoding multiple linear combinations is considered and related to the shortest independent vectors problem. A notion of dominant solutions is developed together with a suitable lattice-reduction-based algorithm.

preprint2013arXiv

On Multiplicative Matrix Channels over Finite Chain Rings

Motivated by physical-layer network coding, this paper considers communication in multiplicative matrix channels over finite chain rings. Such channels are defined by the law $Y =A X$, where $X$ and $Y$ are the input and output matrices, respectively, and $A$ is called the transfer matrix. It is assumed a coherent scenario in which the instances of the transfer matrix are unknown to the transmitter, but available to the receiver. It is also assumed that $A$ and $X$ are independent. Besides that, no restrictions on the statistics of $A$ are imposed. As contributions, a closed-form expression for the channel capacity is obtained, and a coding scheme for the channel is proposed. It is then shown that the scheme can achieve the capacity with polynomial time complexity and can provide correcting guarantees under a worst-case channel model. The results in the paper extend the corresponding ones for finite fields.

preprint2010arXiv

An Algebraic Approach to Physical-Layer Network Coding

The problem of designing new physical-layer network coding (PNC) schemes via lattice partitions is considered. Building on a recent work by Nazer and Gastpar, who demonstrated its asymptotic gain using information-theoretic tools, we take an algebraic approach to show its potential in non-asymptotic settings. We first relate Nazer-Gastpar's approach to the fundamental theorem of finitely generated modules over a principle ideal domain. Based on this connection, we generalize their code construction and simplify their encoding and decoding methods. This not only provides a transparent understanding of their approach, but more importantly, it opens up the opportunity to design efficient and practical PNC schemes. Finally, we apply our framework for PNC to a Gaussian relay network and demonstrate its advantage over conventional PNC schemes.

Chen Feng

What is connected

Connect this record

See the researcher in context

Building this map preview

34 published item(s)

FlyCo: Foundation Model-Empowered Drones for Autonomous 3D Structure Scanning in Open-World Environments

Merged Bitcoin: Proof of Work Blockchains with Multiple Hash Types

ViewSAM: Learning View-aware Cross-modal Semantics for Weakly Supervised Cross-view Referring Multi-Object Tracking

An Empirical Study of Low Precision Quantization for TinyML

Decentralized Voltage Control with Peer-to-peer Energy Trading in a Distribution Network

Egocentric Prediction of Action Target in 3D

Enhancing VVC with Deep Learning based Multi-Frame Post-Processing

Learning Distilled Collaboration Graph for Multi-Agent Perception

Opportunistic Routing in Quantum Networks

Represent, Compare, and Learn: A Similarity-Aware Framework for Class-Agnostic Counting

Self-supervised Spatial Reasoning on Multi-View Line Drawings

TENET: Transformer Encoding Network for Effective Temporal Flow on Motion Prediction

V2X-Sim: Multi-Agent Collaborative Perception Dataset and Benchmark for Autonomous Driving

Dissecting the Performance of Chained-BFT

Video Compression with CNN-based Post Processing

3D Point Cloud Processing and Learning for Autonomous Driving

A Robust Speaker Clustering Method Based on Discrete Tied Variational Autoencoder

Automatic Failure Recovery and Re-Initialization for Online UAV Tracking with Joint Scale and Aspect Ratio Optimization

DR^2Track: Towards Real-Time Visual Tracking for UAV via Distractor Repressed Dynamic Regression

LUVLi Face Alignment: Estimating Landmarks' Location, Uncertainty, and Visibility Likelihood

P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds

Regularizing Neural Networks via Minimizing Hyperspherical Energy

SPARE3D: A Dataset for SPAtial REasoning on Three-View Line Drawings

The Hermes BFT for Blockchains

Deep Unsupervised Learning of 3D Point Clouds via Graph Topology Inference and Filtering

A Joint Typicality Approach to Algebraic Network Information Theory

An Alternating Direction Method Approach to Cloud Traffic Management

Performance Analysis of CSMA with Multi-Packet Reception: The Inhomogeneous Case

Experimental Quantum Fingerprinting

Communication over Finite-Chain-Ring Matrix Channels

Performance Benefits of DataMPI: A Case Study with BigDataBench

Algebraic Approach to Physical-Layer Network Coding

On Multiplicative Matrix Channels over Finite Chain Rings

An Algebraic Approach to Physical-Layer Network Coding