Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
20works
0followers
17topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

20 published item(s)

preprint2023arXiv

AI of Brain and Cognitive Sciences: From the Perspective of First Principles

Nowadays, we have witnessed the great success of AI in various applications, including image classification, game playing, protein structure analysis, language translation, and content generation. Despite these powerful applications, there are still many tasks in our daily life that are rather simple to humans but pose great challenges to AI. These include image and language understanding, few-shot learning, abstract concepts, and low-energy cost computing. Thus, learning from the brain is still a promising way that can shed light on the development of next-generation AI. The brain is arguably the only known intelligent machine in the universe, which is the product of evolution for animals surviving in the natural environment. At the behavior level, psychology and cognitive sciences have demonstrated that human and animal brains can execute very intelligent high-level cognitive functions. At the structure level, cognitive and computational neurosciences have unveiled that the brain has extremely complicated but elegant network forms to support its functions. Over years, people are gathering knowledge about the structure and functions of the brain, and this process is accelerating recently along with the initiation of giant brain projects worldwide. Here, we argue that the general principles of brain functions are the most valuable things to inspire the development of AI. These general principles are the standard rules of the brain extracting, representing, manipulating, and retrieving information, and here we call them the first principles of the brain. This paper collects six such first principles. They are attractor network, criticality, random network, sparse coding, relational memory, and perceptual learning. On each topic, we review its biological background, fundamental property, potential application to AI, and future development.

preprint2022arXiv

AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results

This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022. This challenge includes two tracks. Track 1 aims at the super-resolution of compressed image, and Track~2 targets the super-resolution of compressed video. In Track 1, we use the popular dataset DIV2K as the training, validation and test sets. In Track 2, we propose the LDV 3.0 dataset, which contains 365 videos, including the LDV 2.0 dataset (335 videos) and 30 additional videos. In this challenge, there are 12 teams and 2 teams that submitted the final results to Track 1 and Track 2, respectively. The proposed methods and solutions gauge the state-of-the-art of super-resolution on compressed image and video. The proposed LDV 3.0 dataset is available at https://github.com/RenYang-home/LDV_dataset. The homepage of this challenge is at https://github.com/RenYang-home/AIM22_CompressSR.

preprint2022arXiv

Coarse-to-Fine Cascaded Networks with Smooth Predicting for Video Facial Expression Recognition

Facial expression recognition plays an important role in human-computer interaction. In this paper, we propose the Coarse-to-Fine Cascaded network with Smooth Predicting (CFC-SP) to improve the performance of facial expression recognition. CFC-SP contains two core components, namely Coarse-to-Fine Cascaded networks (CFC) and Smooth Predicting (SP). For CFC, it first groups several similar emotions to form a rough category, and then employs a network to conduct a coarse but accurate classification. Later, an additional network for these grouped emotions is further used to obtain fine-grained predictions. For SP, it improves the recognition capability of the model by capturing both universal and unique expression features. To be specific, the universal features denote the general characteristic of facial emotions within a period and the unique features denote the specific characteristic at this moment. Experiments on Aff-Wild2 show the effectiveness of the proposed CFSP. We achieved 3rd place in the Expression Classification Challenge of the 3rd Competition on Affective Behavior Analysis in-the-wild. The code will be released at https://github.com/BR-IDL/PaddleViT.

preprint2022arXiv

Exploring and Evaluating Image Restoration Potential in Dynamic Scenes

In dynamic scenes, images often suffer from dynamic blur due to superposition of motions or low signal-noise ratio resulted from quick shutter speed when avoiding motions. Recovering sharp and clean results from the captured images heavily depends on the ability of restoration methods and the quality of the input. Although existing research on image restoration focuses on developing models for obtaining better restored results, fewer have studied to evaluate how and which input image leads to superior restored quality. In this paper, to better study an image's potential value that can be explored for restoration, we propose a novel concept, referring to image restoration potential (IRP). Specifically, We first establish a dynamic scene imaging dataset containing composite distortions and applied image restoration processes to validate the rationality of the existence to IRP. Based on this dataset, we investigate several properties of IRP and propose a novel deep model to accurately predict IRP values. By gradually distilling and selective fusing the degradation features, the proposed model shows its superiority in IRP prediction. Thanks to the proposed model, we are then able to validate how various image restoration related applications are benefited from IRP prediction. We show the potential usages of IRP as a filtering principle to select valuable frames, an auxiliary guidance to improve restoration models, and even an indicator to optimize camera settings for capturing better images under dynamic scenarios.

preprint2022arXiv

Free Energy Node Embedding via Generalized Skip-gram with Negative Sampling

A widely established set of unsupervised node embedding methods can be interpreted as consisting of two distinctive steps: i) the definition of a similarity matrix based on the graph of interest followed by ii) an explicit or implicit factorization of such matrix. Inspired by this viewpoint, we propose improvements in both steps of the framework. On the one hand, we propose to encode node similarities based on the free energy distance, which interpolates between the shortest path and the commute time distances, thus, providing an additional degree of flexibility. On the other hand, we propose a matrix factorization method based on a loss function that generalizes that of the skip-gram model with negative sampling to arbitrary similarity matrices. Compared with factorizations based on the widely used $\ell_2$ loss, the proposed method can better preserve node pairs associated with higher similarity scores. Moreover, it can be easily implemented using advanced automatic differentiation toolkits and computed efficiently by leveraging GPU resources. Node clustering, node classification, and link prediction experiments on real-world datasets demonstrate the effectiveness of incorporating free-energy-based similarities as well as the proposed matrix factorization compared with state-of-the-art alternatives.

preprint2022arXiv

Hypergraph Cuts with Edge-Dependent Vertex Weights

We develop a framework for incorporating edge-dependent vertex weights (EDVWs) into the hypergraph minimum s-t cut problem. These weights are able to reflect different importance of vertices within a hyperedge, thus leading to better characterized cut properties. More precisely, we introduce a new class of hyperedge splitting functions that we call EDVWs-based, where the penalty of splitting a hyperedge depends only on the sum of EDVWs associated with the vertices on each side of the split. Moreover, we provide a way to construct submodular EDVWs-based splitting functions and prove that a hypergraph equipped with such splitting functions can be reduced to a graph sharing the same cut properties. In this case, the hypergraph minimum s-t cut problem can be solved using well-developed solutions to the graph minimum s-t cut problem. In addition, we show that an existing sparsification technique can be easily extended to our case and makes the reduced graph smaller and sparser, thus further accelerating the algorithms applied to the reduced graph. Numerical experiments using real-world data demonstrate the effectiveness of our proposed EDVWs-based splitting functions in comparison with the all-or-nothing splitting function and cardinality-based splitting functions commonly adopted in existing work.

preprint2022arXiv

Signal Processing on Higher-Order Networks: Livin' on the Edge ... and Beyond

In this tutorial, we provide a didactic treatment of the emerging topic of signal processing on higher-order networks. Drawing analogies from discrete and graph signal processing, we introduce the building blocks for processing data on simplicial complexes and hypergraphs, two common higher-order network abstractions that can incorporate polyadic relationships. We provide brief introductions to simplicial complexes and hypergraphs, with a special emphasis on the concepts needed for the processing of signals supported on these structures. Specifically, we discuss Fourier analysis, signal denoising, signal interpolation, node embeddings, and nonlinear processing through neural networks, using these two higher-order network models. In the context of simplicial complexes, we specifically focus on signal processing using the Hodge Laplacian matrix, a multi-relational operator that leverages the special structure of simplicial complexes and generalizes desirable properties of the Laplacian matrix in graph signal processing. For hypergraphs, we present both matrix and tensor representations, and discuss the trade-offs in adopting one or the other. We also highlight limitations and potential research avenues, both to inform practitioners and to motivate the contribution of new researchers to the area.

preprint2022arXiv

Signal processing on simplicial complexes

Higher-order networks have so far been considered primarily in the context of studying the structure of complex systems, i.e., the higher-order or multi-way relations connecting the constituent entities. More recently, a number of studies have considered dynamical processes that explicitly account for such higher-order dependencies, e.g., in the context of epidemic spreading processes or opinion formation. In this chapter, we focus on a closely related, but distinct third perspective: how can we use higher-order relationships to process signals and data supported on higher-order network structures. In particular, we survey how ideas from signal processing of data supported on regular domains, such as time series or images, can be extended to graphs and simplicial complexes. We discuss Fourier analysis, signal denoising, signal interpolation, and nonlinear processing through neural networks based on simplicial complexes. Key to our developments is the Hodge Laplacian matrix, a multi-relational operator that leverages the special structure of simplicial complexes and generalizes desirable properties of the Laplacian matrix in graph signal processing.

preprint2022arXiv

The generality of uncooperative and cooperative effects in elementary hydrogen-bonded systems

The cooperative effect plays a significant role in understanding the intermolecular donor-acceptor interactions of hydrogen bonds (H-bonds, D-H...A). Herein, using the benchmark method of high-precision ab initio, the well-known cooperative effect is reproduced in elementary H-bonded systems with different D and A atoms. That is, with the decreasing of intermolecular distance, the D-H bond length first increases and then decreases, while the H...A bond length decreases. On the contrary, when D and A are the same, as the intermolecular distance decreases, the D-H bond length decreases without increasing, which is referred to as the uncooperative effect. Further analyses conclude that compared to cooperative H-bonded systems, uncooperative systems at their respective equilibrium position have a larger core-valence bifurcation (CVB) index (>0.022) and lower binding energies (<0.25 eV), showing a clear linear inverse relationship related to H-bond strength. Therefore, the intermolecular non-H-bonding interactions are predicted to reflect the uncooperative characteristics, which is confirmed by high-precision ab initio calculations. These findings provide a direction for the comprehensive understanding of H-bonds.

preprint2021arXiv

Co-clustering Vertices and Hyperedges via Spectral Hypergraph Partitioning

We propose a novel method to co-cluster the vertices and hyperedges of hypergraphs with edge-dependent vertex weights (EDVWs). In this hypergraph model, the contribution of every vertex to each of its incident hyperedges is represented through an edge-dependent weight, conferring the model higher expressivity than the classical hypergraph. In our method, we leverage random walks with EDVWs to construct a hypergraph Laplacian and use its spectral properties to embed vertices and hyperedges in a common space. We then cluster these embeddings to obtain our proposed co-clustering method, of particular relevance in applications requiring the simultaneous clustering of data entities and features. Numerical experiments using real-world data demonstrate the effectiveness of our proposed approach in comparison with state-of-the-art alternatives.

preprint2021arXiv

Joint Active and Passive Beamforming for Intelligent Reflecting Surface Aided Multiuser MIMO Communications

This letter investigates the joint active and passive beamforming optimization for intelligent reflecting surface (IRS) aided multiuser multiple-input multiple-output systems with the objective of maximizing the weighted sum-rate. We show that this problem can be solved via a matrix weighted mean square error minimization equivalence. In particular, for the optimization of the passive IRS beamforming, we first propose an iterative algorithm with excellent performance based on the manifold optimization. By using the matrix fractional programming technique to obtain a more tractable object function, we then propose a low complexity algorithm based on the majorization-minimization method. Numerical results verify the convergence of our proposed algorithms and the significant performance improvement over the communication scenario without IRS assistance.

preprint2021arXiv

Learning Depth via Leveraging Semantics: Self-supervised Monocular Depth Estimation with Both Implicit and Explicit Semantic Guidance

Self-supervised depth estimation has made a great success in learning depth from unlabeled image sequences. While the mappings between image and pixel-wise depth are well-studied in current methods, the correlation between image, depth and scene semantics, however, is less considered. This hinders the network to better understand the real geometry of the scene, since the contextual clues, contribute not only the latent representations of scene depth, but also the straight constraints for depth map. In this paper, we leverage the two benefits by proposing the implicit and explicit semantic guidance for accurate self-supervised depth estimation. We propose a Semantic-aware Spatial Feature Alignment (SSFA) scheme to effectively align implicit semantic features with depth features for scene-aware depth estimation. We also propose a semantic-guided ranking loss to explicitly constrain the estimated depth maps to be consistent with real scene contextual properties. Both semantic label noise and prediction uncertainty is considered to yield reliable depth supervisions. Extensive experimental results show that our method produces high quality depth maps which are consistently superior either on complex scenes or diverse semantic categories, and outperforms the state-of-the-art methods by a significant margin.

preprint2020arXiv

A short-range metastable defect in the double layer ice

Although the phase of water has extensively investigated whether there exists a defect distorting only locally the structure still under debate. Here we report a localized 5775 defect phase presented in the double layer ice on the Au (111) surface, which is a metastable structure with 5- and 7-membered rings compared with a perfect hexagonal one. Without altering the total number of the hydrogen bonds of the ice, the defect only introduces 0.08 Å molecular displacement and 3.27% interaction energy change outside the defected area. Such defect also exists without Au support but causes a larger lattice relaxation or smaller interaction energy change. The excessively high barrier as well as the low quantum tunneling and thermodynamic probabilities hinder the formation of the defect by post-grown isomerization from the perfect to the defected structure. This finding indicates that the defected ice is stable, and the defect can be formed during the ice growth stage.

preprint2020arXiv

AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results

This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor x4 based on a set of prior examples of low and corresponding high resolution images. The goal is to devise a network that reduces one or several aspects such as runtime, parameter count, FLOPs, activations, and memory consumption while at least maintaining PSNR of MSRResNet. The track had 150 registered participants, and 25 teams submitted the final results. They gauge the state-of-the-art in efficient single image super-resolution.

preprint2020arXiv

Attention-based network for low-light image enhancement

The captured images under low light conditions often suffer insufficient brightness and notorious noise. Hence, low-light image enhancement is a key challenging task in computer vision. A variety of methods have been proposed for this task, but these methods often failed in an extreme low-light environment and amplified the underlying noise in the input image. To address such a difficult problem, this paper presents a novel attention-based neural network to generate high-quality enhanced low-light images from the raw sensor data. Specifically, we first employ attention strategy (i.e. channel attention and spatial attention modules) to suppress undesired chromatic aberration and noise. The channel attention module guides the network to refine redundant colour features. The spatial attention module focuses on denoising by taking advantage of the non-local correlation in the image. Furthermore, we propose a new pooling layer, called inverted shuffle layer, which adaptively selects useful information from previous features. Extensive experiments demonstrate the superiority of the proposed network in terms of suppressing the chromatic aberration and noise artifacts in enhancement, especially when the low-light image has severe noise.

preprint2020arXiv

Channel Estimation for Intelligent Reflecting Surface-Assisted Millimeter Wave MIMO Systems

Intelligent reflecting surfaces (IRSs) are regarded as promising enablers for future millimeter wave (mmWave) wireless communication, due to their ability to create favorable line-of-sight (LoS) propagation environments. In this paper, we investigate channel estimation in downlink IRS-assisted mmWave multiple-input multiple-output (MIMO) systems. By leveraging the sparsity of mmWave channels, we formulate the channel estimation problem as a fixed-rank constrained non-convex optimization problem. To tackle the non-convexity, an efficient algorithm is proposed by capitalizing on alternating minimization and manifold optimization (MO), which yields a locally optimal solution. Simulation results show that the proposed MO-based estimation (MO-EST) algorithm significantly outperforms two benchmark schemes and demonstrate the robustness of the MO-EST algorithm with respect to imperfect knowledge of the sparsity level of the channels in practical implementations.

preprint2020arXiv

GINet: Graph Interaction Network for Scene Parsing

Recently, context reasoning using image regions beyond local convolution has shown great potential for scene parsing. In this work, we explore how to incorporate the linguistic knowledge to promote context reasoning over image regions by proposing a Graph Interaction unit (GI unit) and a Semantic Context Loss (SC-loss). The GI unit is capable of enhancing feature representations of convolution networks over high-level semantics and learning the semantic coherency adaptively to each sample. Specifically, the dataset-based linguistic knowledge is first incorporated in the GI unit to promote context reasoning over the visual graph, then the evolved representations of the visual graph are mapped to each local representation to enhance the discriminated capability for scene parsing. GI unit is further improved by the SC-loss to enhance the semantic representations over the exemplar-based semantic graph. We perform full ablation studies to demonstrate the effectiveness of each component in our approach. Particularly, the proposed GINet outperforms the state-of-the-art approaches on the popular benchmarks, including Pascal-Context and COCO Stuff.

preprint2020arXiv

Network Inference from Consensus Dynamics with Unknown Parameters

We explore the problem of inferring the graph Laplacian of a weighted, undirected network from snapshots of a single or multiple discrete-time consensus dynamics, subject to parameter uncertainty, taking place on the network. Specifically, we consider three problems in which we assume different levels of knowledge about the diffusion rates, observation times, and the input signal power of the dynamics. To solve these underdetermined problems, we propose a set of algorithms that leverage the spectral properties of the observed data and tools from convex optimization. Furthermore, we provide theoretical performance guarantees associated with these algorithms. We complement our theoretical work with numerical experiments, that demonstrate how our proposed methods outperform current state-of-the-art algorithms and showcase their effectiveness in recovering both synthetic and real-world networks.

preprint2020arXiv

Unidirectional Oriented Water Wire in Short Nanotube

The orientation of water molecules is the key factor for the fast transport of water in small nanotubes. It has been accepted that the bidirectional water burst in short nanotubes can be transformed into unidirectional transport when the orientation of water molecules is maintained in long nanotubes under the external field. In this work, based on molecular dynamics simulations and first-principles calculations, we showed without external field, it only needs 21 water molecules to maintain the unidirectional single file water intrinsically in carbon nanotube at seconds. Detailed analysis indicates that the surprising result comes from the step by step process for the flip of water chain, which is different with the perceived concerted mechanism. Considering the thickness of cell membrane (normally 5-10 nm) is larger than the length threshold of the unidirectional water wire, this study suggests it may not need the external field to maintain the unidirectional flow in the water channel at the macroscopic timescale.

preprint2019arXiv

Beamforming Design for Large-Scale Antenna Arrays Using Deep Learning

Beamforming (BF) design for large-scale antenna arrays with limited radio frequency chains and the phase-shifter-based analog BF architecture, has been recognized as a key issue in millimeter wave communication systems. It becomes more challenging with imperfect channel state information (CSI). In this letter, we propose a deep learning based BF design approach and develop a BF neural network (BFNN) which can be trained to learn how to optimize the beamformer for maximizing the spectral efficiency with hardware limitation and imperfect CSI. Simulation results show that the proposed BFNN achieves significant performance improvement and strong robustness to imperfect CSI over the conventional BF algorithms.