Researcher profile

Lin Gao

Lin Gao contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
16works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

16 published item(s)

preprint2022arXiv

DrawingInStyles: Portrait Image Generation and Editing with Spatially Conditioned StyleGAN

The research topic of sketch-to-portrait generation has witnessed a boost of progress with deep learning techniques. The recently proposed StyleGAN architectures achieve state-of-the-art generation ability but the original StyleGAN is not friendly for sketch-based creation due to its unconditional generation nature. To address this issue, we propose a direct conditioning strategy to better preserve the spatial information under the StyleGAN framework. Specifically, we introduce Spatially Conditioned StyleGAN (SC-StyleGAN for short), which explicitly injects spatial constraints to the original StyleGAN generation process. We explore two input modalities, sketches and semantic maps, which together allow users to express desired generation results more precisely and easily. Based on SC-StyleGAN, we present DrawingInStyles, a novel drawing interface for non-professional users to easily produce high-quality, photo-realistic face images with precise control, either from scratch or editing existing ones. Qualitative and quantitative evaluations show the superior generation ability of our method to existing and alternative solutions. The usability and expressiveness of our system are confirmed by a user study.

preprint2022arXiv

DSG-Net: Learning Disentangled Structure and Geometry for 3D Shape Generation

D shape generation is a fundamental operation in computer graphics. While significant progress has been made, especially with recent deep generative models, it remains a challenge to synthesize high-quality shapes with rich geometric details and complex structure, in a controllable manner. To tackle this, we introduce DSG-Net, a deep neural network that learns a disentangled structured and geometric mesh representation for 3D shapes, where two key aspects of shapes, geometry, and structure, are encoded in a synergistic manner to ensure plausibility of the generated shapes, while also being disentangled as much as possible. This supports a range of novel shape generation applications with disentangled control, such as interpolation of structure (geometry) while keeping geometry (structure) unchanged. To achieve this, we simultaneously learn structure and geometry through variational autoencoders (VAEs) in a hierarchical manner for both, with bijective mappings at each level. In this manner, we effectively encode geometry and structure in separate latent spaces, while ensuring their compatibility: the structure is used to guide the geometry and vice versa. At the leaf level, the part geometry is represented using a conditional part VAE, to encode high-quality geometric details, guided by the structure context as the condition. Our method not only supports controllable generation applications but also produces high-quality synthesized shapes, outperforming state-of-the-art methods. The code has been released at https://github.com/IGLICT/DSG-Net.

preprint2022arXiv

NeRF-Editing: Geometry Editing of Neural Radiance Fields

Implicit neural rendering, especially Neural Radiance Field (NeRF), has shown great potential in novel view synthesis of a scene. However, current NeRF-based methods cannot enable users to perform user-controlled shape deformation in the scene. While existing works have proposed some approaches to modify the radiance field according to the user's constraints, the modification is limited to color editing or object translation and rotation. In this paper, we propose a method that allows users to perform controllable shape deformation on the implicit representation of the scene, and synthesizes the novel view images of the edited scene without re-training the network. Specifically, we establish a correspondence between the extracted explicit mesh representation and the implicit neural representation of the target scene. Users can first utilize well-developed mesh-based deformation methods to deform the mesh representation of the scene. Our method then utilizes user edits from the mesh representation to bend the camera rays by introducing a tetrahedra mesh as a proxy, obtaining the rendering results of the edited scene. Extensive experiments demonstrate that our framework can achieve ideal editing results not only on synthetic data, but also on real scenes captured by users.

preprint2022arXiv

StylizedNeRF: Consistent 3D Scene Stylization as Stylized NeRF via 2D-3D Mutual Learning

3D scene stylization aims at generating stylized images of the scene from arbitrary novel views following a given set of style examples, while ensuring consistency when rendered from different views. Directly applying methods for image or video stylization to 3D scenes cannot achieve such consistency. Thanks to recently proposed neural radiance fields (NeRF), we are able to represent a 3D scene in a consistent way. Consistent 3D scene stylization can be effectively achieved by stylizing the corresponding NeRF. However, there is a significant domain gap between style examples which are 2D images and NeRF which is an implicit volumetric representation. To address this problem, we propose a novel mutual learning framework for 3D scene stylization that combines a 2D image stylization network and NeRF to fuse the stylization ability of 2D stylization network with the 3D consistency of NeRF. We first pre-train a standard NeRF of the 3D scene to be stylized and replace its color prediction module with a style network to obtain a stylized NeRF. It is followed by distilling the prior knowledge of spatial consistency from NeRF to the 2D stylization network through an introduced consistency loss. We also introduce a mimic loss to supervise the mutual learning of the NeRF style module and fine-tune the 2D stylization decoder. In order to further make our model handle ambiguities of 2D stylization results, we introduce learnable latent codes that obey the probability distributions conditioned on the style. They are attached to training samples as conditional inputs to better learn the style module in our novel stylized NeRF. Experimental results demonstrate that our method is superior to existing approaches in both visual quality and long-range consistency.

preprint2021arXiv

A Revisit of Shape Editing Techniques: from the Geometric to the Neural Viewpoint

3D shape editing is widely used in a range of applications such as movie production, computer games and computer aided design. It is also a popular research topic in computer graphics and computer vision. In past decades, researchers have developed a series of editing methods to make the editing process faster, more robust, and more reliable. Traditionally, the deformed shape is determined by the optimal transformation and weights for an energy term. With increasing availability of 3D shapes on the Internet, data-driven methods were proposed to improve the editing results. More recently as the deep neural networks became popular, many deep learning based editing methods have been developed in this field, which is naturally data-driven. We mainly survey recent research works from the geometric viewpoint to those emerging neural deformation techniques and categorize them into organic shape editing methods and man-made model editing methods. Both traditional methods and recent neural network based methods are reviewed.

preprint2021arXiv

Deep Deformation Detail Synthesis for Thin Shell Models

In physics-based cloth animation, rich folds and detailed wrinkles are achieved at the cost of expensive computational resources and huge labor tuning. Data-driven techniques make efforts to reduce the computation significantly by a database. One type of methods relies on human poses to synthesize fitted garments which cannot be applied to general cloth. Another type of methods adds details to the coarse meshes without such restrictions. However, existing works usually utilize coordinate-based representations which cannot cope with large-scale deformation, and requires dense vertex correspondences between coarse and fine meshes. Moreover, as such methods only add details, they require coarse meshes to be close to fine meshes, which can be either impossible, or require unrealistic constraints when generating fine meshes. To address these challenges, we develop a temporally and spatially as-consistent-as-possible deformation representation (named TS-ACAP) and a DeformTransformer network to learn the mapping from low-resolution meshes to detailed ones. This TS-ACAP representation is designed to ensure both spatial and temporal consistency for sequential large-scale deformations from cloth animations. With this representation, our DeformTransformer network first utilizes two mesh-based encoders to extract the coarse and fine features, respectively. To transduct the coarse features to the fine ones, we leverage the Transformer network that consists of frame-level attention mechanisms to ensure temporal coherence of the prediction. Experimental results show that our method is able to produce reliable and realistic animations in various datasets at high frame rates: 10 ~ 35 times faster than physics-based simulation, with superior detail synthesis abilities than existing methods.

preprint2020arXiv

3D-FUTURE: 3D Furniture shape with TextURE

The 3D CAD shapes in current 3D benchmarks are mostly collected from online model repositories. Thus, they typically have insufficient geometric details and less informative textures, making them less attractive for comprehensive and subtle research in areas such as high-quality 3D mesh and texture recovery. This paper presents 3D Furniture shape with TextURE (3D-FUTURE): a richly-annotated and large-scale repository of 3D furniture shapes in the household scenario. At the time of this technical report, 3D-FUTURE contains 20,240 clean and realistic synthetic images of 5,000 different rooms. There are 9,992 unique detailed 3D instances of furniture with high-resolution textures. Experienced designers developed the room scenes, and the 3D CAD shapes in the scene are used for industrial production. Given the well-organized 3D-FUTURE, we provide baseline experiments on several widely studied tasks, such as joint 2D instance segmentation and 3D object pose estimation, image-based 3D shape retrieval, 3D object reconstruction from a single image, and texture recovery for 3D shapes, to facilitate related future researches on our database.

preprint2020arXiv

5G mmWave Cooperative Positioning and Mapping using Multi-Model PHD Filter and Map Fusion

5G millimeter wave (mmWave) signals can enable accurate positioning in vehicular networks when the base station and vehicles are equipped with large antenna arrays. However, radio-based positioning suffers from multipath signals generated by different types of objects in the physical environment. Multipath can be turned into a benefit, by building up a radio map (comprising the number of objects, object type, and object state) and using this map to exploit all available signal paths for positioning. We propose a new method for cooperative vehicle positioning and mapping of the radio environment, comprising a multiple-model probability hypothesis density filter and a map fusion routine, which is able to consider different types of objects and different fields of views. Simulation results demonstrate the performance of the proposed method.

preprint2020arXiv

A Node Embedding Framework for Integration of Similarity-based Drug Combination Prediction

Motivation: Drug combination is a sensible strategy for disease treatment by improving the efficacy and reducing concomitant side effects. Due to the large number of possible combinations among candidate compounds, exhaustive screening is prohibitive. Currently, a plenty of studies have focused on predicting potential drug combinations. However, these methods are not entirely satisfactory in performance and scalability. Results: In this paper, we proposed a Network Embedding framework in Multiplex Networks (NEMN) to predict synthetic drug combinations. Based on a multiplex drug similarity network, we offered alternative methods to integrate useful information from different aspects and to decide quantitative importance of each network. To explain the feasibility of NEMN, we applied our framework to the data of drug-drug interactions, on which it showed better performance in terms of AUPR and ROC. For Drug combination prediction, we found seven novel drug combinations which have been validated by external sources among the top-ranked predictions of our model.

preprint2020arXiv

A Survey on Deep Geometry Learning: From a Representation Perspective

Researchers have now achieved great success on dealing with 2D images using deep learning. In recent years, 3D computer vision and Geometry Deep Learning gain more and more attention. Many advanced techniques for 3D shapes have been proposed for different applications. Unlike 2D images, which can be uniformly represented by regular grids of pixels, 3D shapes have various representations, such as depth and multi-view images, voxel-based representation, point-based representation, mesh-based representation, implicit surface representation, etc. However, the performance for different applications largely depends on the representation used, and there is no unique representation that works well for all applications. Therefore, in this survey, we review recent development in deep learning for 3D geometry from a representation perspective, summarizing the advantages and disadvantages of different representations in different applications. We also present existing datasets in these representations and further discuss future research directions.

preprint2020arXiv

Crowd-MECS: A Novel Crowdsourcing Framework for Mobile Edge Caching and Sharing

Crowdsourced mobile edge caching and sharing (Crowd-MECS) is emerging as a promising content delivery paradigm by employing a large crowd of existing edge devices (EDs) to cache and share popular contents. The successful technology adoption of Crowd-MECS relies on a comprehensive understanding of the complicated economic interactions and strategic decision-making of different stakeholders. In this paper, we focus on studying the economic and strategic interactions between one content provider (CP) and a large crowd of EDs, where the EDs can decide whether to cache and share contents for the CP, and the CP can decide to share a certain revenue with EDs as the incentive of caching and sharing contents. We formulate such an interaction as a two-stage Stackelberg game. In Stage I, the CP aims to maximize its own profit by deciding the ratio of revenue shared with EDs. In Stage II, EDs aim to maximize their own payoffs by choosing to be agents who cache and share contents, and meanwhile gain a certain revenue from the CP, or requesters who do not cache but request contents in the on-demand fashion. We first analyze the EDs' best responses and prove the existence and uniqueness of the equilibrium in Stage II by using the non-atomic game theory. Then, we identify the piece-wise structure and the unimodal feature of the CP's profit function, based on which we design a tailored low-complexity one-dimensional search algorithm to achieve the optimal revenue sharing ratio for the CP in Stage I. Simulation results show that both the CP's profit and the EDs' total welfare can be improved significantly (e.g., by 120% and 50%, respectively) by using the proposed Crowd-MECS, comparing with the Non-MEC system where the CP serves all EDs directly.

preprint2020arXiv

Deep Generation of Face Images from Sketches

Recent deep image-to-image translation techniques allow fast generation of face images from freehand sketches. However, existing solutions tend to overfit to sketches, thus requiring professional sketches or even edge maps as input. To address this issue, our key idea is to implicitly model the shape space of plausible face images and synthesize a face image in this space to approximate an input sketch. We take a local-to-global approach. We first learn feature embeddings of key face components, and push corresponding parts of input sketches towards underlying component manifolds defined by the feature vectors of face component samples. We also propose another deep neural network to learn the mapping from the embedded component features to realistic images with multi-channel feature maps as intermediate results to improve the information flow. Our method essentially uses input sketches as soft constraints and is thus able to produce high-quality face images even from rough and/or incomplete sketches. Our tool is easy to use even for non-artists, while still supporting fine-grained control of shape details. Both qualitative and quantitative evaluations show the superior generation ability of our system to existing and alternative solutions. The usability and expressiveness of our system are confirmed by a user study.

preprint2020arXiv

Deep Line Art Video Colorization with a Few References

Coloring line art images based on the colors of reference images is an important stage in animation production, which is time-consuming and tedious. In this paper, we propose a deep architecture to automatically color line art videos with the same color style as the given reference images. Our framework consists of a color transform network and a temporal constraint network. The color transform network takes the target line art images as well as the line art and color images of one or more reference images as input, and generates corresponding target color images. To cope with larger differences between the target line art image and reference color images, our architecture utilizes non-local similarity matching to determine the region correspondences between the target image and the reference images, which are used to transform the local color information from the references to the target. To ensure global color style consistency, we further incorporate Adaptive Instance Normalization (AdaIN) with the transformation parameters obtained from a style embedding vector that describes the global color style of the references, extracted by an embedder. The temporal constraint network takes the reference images and the target image together in chronological order, and learns the spatiotemporal features through 3D convolution to ensure the temporal consistency of the target image and the reference image. Our model can achieve even better coloring results by fine-tuning the parameters with only a small amount of samples when dealing with an animation of a new style. To evaluate our method, we build a line art coloring dataset. Experiments show that our method achieves the best performance on line art video coloring compared to the state-of-the-art methods and other baselines.

preprint2020arXiv

Monetizing Edge Service in Mobile Internet Ecosystem

In mobile Internet ecosystem, Mobile Users (MUs) purchase wireless data services from Internet Service Provider (ISP) to access to Internet and acquire the interested content services (e.g., online game) from Content Provider (CP). The popularity of intelligent functions (e.g., AI and 3D modeling) increases the computation-intensity of the content services, leading to a growing computation pressure for the MUs' resource-limited devices. To this end, edge computing service is emerging as a promising approach to alleviate the MUs' computation pressure while keeping their quality-of-service, via offloading some computation tasks of MUs to edge (computing) servers deployed at the local network edge. Thus, Edge Service Provider (ESP), who deploys the edge servers and offers the edge computing service, becomes an upcoming new stakeholder in the ecosystem. In this work, we study the economic interactions of MUs, ISP, CP, and ESP in the new ecosystem with edge computing service, where MUs can acquire the computation-intensive content services (offered by CP) and offload some computation tasks, together with the necessary raw input data, to edge servers (deployed by ESP) through ISP. We first study the MU's Joint Content Acquisition and Task Offloading (J-CATO) problem, which aims to maximize his long-term payoff. We derive the off-line solution with crucial insights, based on which we design an online strategy with provable performance. Then, we study the ESP's edge service monetization problem. We propose a pricing policy that can achieve a constant fraction of the ex-post optimal revenue with an extra constant loss for the ESP. Numerical results show that the edge computing service can stimulate the MUs' content acquisition and improve the payoffs of MUs, ISP, and CP.

preprint2020arXiv

Realtime Simulation of Thin-Shell Deformable Materials using CNN-Based Mesh Embedding

We address the problem of accelerating thin-shell deformable object simulations by dimension reduction. We present a new algorithm to embed a high-dimensional configuration space of deformable objects in a low-dimensional feature space, where the configurations of objects and feature points have approximate one-to-one mapping. Our key technique is a graph-based convolutional neural network (CNN) defined on meshes with arbitrary topologies and a new mesh embedding approach based on physics-inspired loss term. We have applied our approach to accelerate high-resolution thin shell simulations corresponding to cloth-like materials, where the configuration space has tens of thousands of degrees of freedom. We show that our physics-inspired embedding approach leads to higher accuracy compared with prior mesh embedding methods. Finally, we show that the temporal evolution of the mesh in the feature space can also be learned using a recurrent neural network (RNN) leading to fully learnable physics simulators. After training our learned simulator runs $500-10000\times$ faster and the accuracy is high enough for robot manipulation tasks.

preprint2020arXiv

STD-Net: Structure-preserving and Topology-adaptive Deformation Network for 3D Reconstruction from a Single Image

3D reconstruction from a single view image is a long-standing prob-lem in computer vision. Various methods based on different shape representations(such as point cloud or volumetric representations) have been proposed. However,the 3D shape reconstruction with fine details and complex structures are still chal-lenging and have not yet be solved. Thanks to the recent advance of the deepshape representations, it becomes promising to learn the structure and detail rep-resentation using deep neural networks. In this paper, we propose a novel methodcalled STD-Net to reconstruct the 3D models utilizing the mesh representationthat is well suitable for characterizing complex structure and geometry details.To reconstruct complex 3D mesh models with fine details, our method consists of(1) an auto-encoder network for recovering the structure of an object with bound-ing box representation from a single image, (2) a topology-adaptive graph CNNfor updating vertex position for meshes of complex topology, and (3) an unifiedmesh deformation block that deforms the structural boxes into structure-awaremeshed models. Experimental results on the images from ShapeNet show that ourproposed STD-Net has better performance than other state-of-the-art methods onreconstructing 3D objects with complex structures and fine geometric details.