Source author record

Guan Huang

Guan Huang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision math.DS Machine Learning Artificial Intelligence math.AP math-ph math.MP Computation and Language Information Retrieval math.PR

Catalog footprint

What is connected

28works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

RoboTransfer: Controllable Geometry-Consistent Video Diffusion for Manipulation Policy Transfer

The goal of general-purpose robotics is to create agents that can seamlessly adapt to and operate in diverse, unstructured human environments. Imitation learning has become a key paradigm for robotic manipulation, yet collecting large-scale and diverse demonstrations is prohibitively expensive. Simulators provide a cost-effective alternative, but the sim-to-real gap remains a major obstacle to scalability. We present RoboTransfer, a diffusion-based video generation framework for synthesizing robotic data. By leveraging cross-view feature interactions and globally consistent 3D geometry, RoboTransfer ensures multi-view geometric consistency while enabling fine-grained control over scene elements, such as background editing and object replacement. Extensive experiments demonstrate that RoboTransfer produces videos with superior geometric consistency and visual fidelity. Furthermore, policies trained on this synthetic data exhibit enhanced generalization to novel, unseen scenarios. Project page: https://horizonrobotics.github.io/robot_lab/robotransfer.

preprint2023arXiv

Detachable Novel Views Synthesis of Dynamic Scenes Using Distribution-Driven Neural Radiance Fields

Representing and synthesizing novel views in real-world dynamic scenes from casual monocular videos is a long-standing problem. Existing solutions typically approach dynamic scenes by applying geometry techniques or utilizing temporal information between several adjacent frames without considering the underlying background distribution in the entire scene or the transmittance over the ray dimension, limiting their performance on static and occlusion areas. Our approach $\textbf{D}$istribution-$\textbf{D}$riven neural radiance fields offers high-quality view synthesis and a 3D solution to $\textbf{D}$etach the background from the entire $\textbf{D}$ynamic scene, which is called $\text{D}^4$NeRF. Specifically, it employs a neural representation to capture the scene distribution in the static background and a 6D-input NeRF to represent dynamic objects, respectively. Each ray sample is given an additional occlusion weight to indicate the transmittance lying in the static and dynamic components. We evaluate $\text{D}^4$NeRF on public dynamic scenes and our urban driving scenes acquired from an autonomous-driving dataset. Extensive experiments demonstrate that our approach outperforms previous methods in rendering texture details and motion areas while also producing a clean static background. Our code will be released at https://github.com/Luciferbobo/D4NeRF.

preprint2022arXiv

A Comprehensive Review on Deep Supervision: Theories and Applications

Deep supervision, or known as 'intermediate supervision' or 'auxiliary supervision', is to add supervision at hidden layers of a neural network. This technique has been increasingly applied in deep neural network learning systems for various computer vision applications recently. There is a consensus that deep supervision helps improve neural network performance by alleviating the gradient vanishing problem, as one of the many strengths of deep supervision. Besides, in different computer vision applications, deep supervision can be applied in different ways. How to make the most use of deep supervision to improve network performance in different applications has not been thoroughly investigated. In this paper, we provide a comprehensive in-depth review of deep supervision in both theories and applications. We propose a new classification of different deep supervision networks, and discuss advantages and limitations of current deep supervision networks in computer vision applications.

preprint2022arXiv

A New Mechanism for Noncollision Singularities

In this paper, we prove the existence of noncollision singularities in a planar four-body problem in a model different from [J. Xue,Acta Math.V224(2)253-388, 2020.]. In this model, the acceleration can be arbitrarily fast and the masses can be comparable. This work provides a general principle to construct noncollision singularities as well as other related orbits with complicited dynamics. It not only answers a question in [J. Xue,Acta Math.V224(2) 253-388, 2020.] but also solves an analogous version of a conjecture of Anosov.

preprint2022arXiv

A Simple Baseline for Multi-Camera 3D Object Detection

3D object detection with surrounding cameras has been a promising direction for autonomous driving. In this paper, we present SimMOD, a Simple baseline for Multi-camera Object Detection, to solve the problem. To incorporate multi-view information as well as build upon previous efforts on monocular 3D object detection, the framework is built on sample-wise object proposals and designed to work in a two-stage manner. First, we extract multi-scale features and generate the perspective object proposals on each monocular image. Second, the multi-view proposals are aggregated and then iteratively refined with multi-view and multi-scale visual features in the DETR3D-style. The refined proposals are end-to-end decoded into the detection results. To further boost the performance, we incorporate the auxiliary branches alongside the proposal generation to enhance the feature learning. Also, we design the methods of target filtering and teacher forcing to promote the consistency of two-stage training. We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD and achieve new state-of-the-art performance. Code will be available at https://github.com/zhangyp15/SimMOD.

preprint2022arXiv

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

Autonomous driving perceives its surroundings for decision making, which is one of the most complex scenarios in visual perception. The success of paradigm innovation in solving the 2D object detection task inspires us to seek an elegant, feasible, and scalable paradigm for fundamentally pushing the performance boundary in this area. To this end, we contribute the BEVDet paradigm in this paper. BEVDet performs 3D object detection in Bird-Eye-View (BEV), where most target values are defined and route planning can be handily performed. We merely reuse existing modules to build its framework but substantially develop its performance by constructing an exclusive data augmentation strategy and upgrading the Non-Maximum Suppression strategy. In the experiment, BEVDet offers an excellent trade-off between accuracy and time-efficiency. As a fast version, BEVDet-Tiny scores 31.2% mAP and 39.2% NDS on the nuScenes val set. It is comparable with FCOS3D, but requires just 11% computational budget of 215.3 GFLOPs and runs 9.2 times faster at 15.6 FPS. Another high-precision version dubbed BEVDet-Base scores 39.3% mAP and 47.2% NDS, significantly exceeding all published results. With a comparable inference speed, it surpasses FCOS3D by a large margin of +9.8% mAP and +10.0% NDS. The source code is publicly available for further research at https://github.com/HuangJunJie2017/BEVDet .

preprint2022arXiv

BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection

Single frame data contains finite information which limits the performance of the existing vision-based multi-camera 3D object detection paradigms. For fundamentally pushing the performance boundary in this area, a novel paradigm dubbed BEVDet4D is proposed to lift the scalable BEVDet paradigm from the spatial-only 3D space to the spatial-temporal 4D space. We upgrade the naive BEVDet framework with a few modifications just for fusing the feature from the previous frame with the corresponding one in the current frame. In this way, with negligible additional computing budget, we enable BEVDet4D to access the temporal cues by querying and comparing the two candidate features. Beyond this, we simplify the task of velocity prediction by removing the factors of ego-motion and time in the learning target. As a result, BEVDet4D with robust generalization performance reduces the velocity error by up to -62.9%. This makes the vision-based methods, for the first time, become comparable with those relied on LiDAR or radar in this aspect. On challenge benchmark nuScenes, we report a new record of 54.5% NDS with the high-performance configuration dubbed BEVDet4D-Base, which surpasses the previous leading method BEVDet-Base by +7.3% NDS. The source code is publicly available for further research at https://github.com/HuangJunJie2017/BEVDet .

preprint2022arXiv

BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving

In this paper, we present BEVerse, a unified framework for 3D perception and prediction based on multi-camera systems. Unlike existing studies focusing on the improvement of single-task approaches, BEVerse features in producing spatio-temporal Birds-Eye-View (BEV) representations from multi-camera videos and jointly reasoning about multiple tasks for vision-centric autonomous driving. Specifically, BEVerse first performs shared feature extraction and lifting to generate 4D BEV representations from multi-timestamp and multi-view images. After the ego-motion alignment, the spatio-temporal encoder is utilized for further feature extraction in BEV. Finally, multiple task decoders are attached for joint reasoning and prediction. Within the decoders, we propose the grid sampler to generate BEV features with different ranges and granularities for different tasks. Also, we design the method of iterative flow for memory-efficient future prediction. We show that the temporal information improves 3D object detection and semantic map construction, while the multi-task learning can implicitly benefit motion prediction. With extensive experiments on the nuScenes dataset, we show that the multi-task BEVerse outperforms existing single-task methods on 3D object detection, semantic map construction, and motion prediction. Compared with the sequential paradigm, BEVerse also favors in significantly improved efficiency. The code and trained models will be released at https://github.com/zhangyp15/BEVerse.

preprint2022arXiv

Bike Sharing Demand Prediction based on Knowledge Sharing across Modes: A Graph-based Deep Learning Approach

Bike sharing is an increasingly popular part of urban transportation systems. Accurate demand prediction is the key to support timely re-balancing and ensure service efficiency. Most existing models of bike-sharing demand prediction are solely based on its own historical demand variation, essentially regarding bike sharing as a closed system and neglecting the interaction between different transport modes. This is particularly important because bike sharing is often used to complement travel through other modes (e.g., public transit). Despite some recent efforts, there is no existing method capable of leveraging spatiotemporal information from multiple modes with heterogeneous spatial units. To address this research gap, this study proposes a graph-based deep learning approach for bike sharing demand prediction (B-MRGNN) with multimodal historical data as input. The spatial dependencies across modes are encoded with multiple intra- and inter-modal graphs. A multi-relational graph neural network (MRGNN) is introduced to capture correlations between spatial units across modes, such as bike sharing stations, subway stations, or ride-hailing zones. Extensive experiments are conducted using real-world bike sharing, subway and ride-hailing data from New York City, and the results demonstrate the superior performance of our proposed approach compared to existing methods.

preprint2022arXiv

CAFE: Learning to Condense Dataset by Aligning Features

Dataset condensation aims at reducing the network training effort through condensing a cumbersome training set into a compact synthetic one. State-of-the-art approaches largely rely on learning the synthetic data by matching the gradients between the real and synthetic data batches. Despite the intuitive motivation and promising results, such gradient-based methods, by nature, easily overfit to a biased set of samples that produce dominant gradients, and thus lack global supervision of data distribution. In this paper, we propose a novel scheme to Condense dataset by Aligning FEatures (CAFE), which explicitly attempts to preserve the real-feature distribution as well as the discriminant power of the resulting synthetic set, lending itself to strong generalization capability to various architectures. At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales, while accounting for the classification of real samples. Our scheme is further backed up by a novel dynamic bi-level optimization, which adaptively adjusts parameter updates to prevent over-/under-fitting. We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art: on the SVHN dataset, for example, the performance gain is up to 11%. Extensive experiments and analyses verify the effectiveness and necessity of proposed designs.

preprint2022arXiv

Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning

Self-supervised monocular methods can efficiently learn depth information of weakly textured surfaces or reflective objects. However, the depth accuracy is limited due to the inherent ambiguity in monocular geometric modeling. In contrast, multi-frame depth estimation methods improve the depth accuracy thanks to the success of Multi-View Stereo (MVS), which directly makes use of geometric constraints. Unfortunately, MVS often suffers from texture-less regions, non-Lambertian surfaces, and moving objects, especially in real-world video sequences without known camera motion and depth supervision. Therefore, we propose MOVEDepth, which exploits the MOnocular cues and VElocity guidance to improve multi-frame Depth learning. Unlike existing methods that enforce consistency between MVS depth and monocular depth, MOVEDepth boosts multi-frame depth learning by directly addressing the inherent problems of MVS. The key of our approach is to utilize monocular depth as a geometric priority to construct MVS cost volume, and adjust depth candidates of cost volume under the guidance of predicted camera velocity. We further fuse monocular depth and MVS depth by learning uncertainty in the cost volume, which results in a robust depth estimation against ambiguity in multi-view geometry. Extensive experiments show MOVEDepth achieves state-of-the-art performance: Compared with Monodepth2 and PackNet, our method relatively improves the depth accuracy by 20\% and 19.8\% on the KITTI benchmark. MOVEDepth also generalizes to the more challenging DDAD benchmark, relatively outperforming ManyDepth by 7.2\%. The code is available at https://github.com/JeffWang987/MOVEDepth.

preprint2022arXiv

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

Recent progress has shown that large-scale pre-training using contrastive image-text pairs can be a promising alternative for high-quality visual representation learning from natural language supervision. Benefiting from a broader source of supervision, this new paradigm exhibits impressive transferability to downstream classification tasks and datasets. However, the problem of transferring the knowledge learned from image-text pairs to more complex dense prediction tasks has barely been visited. In this work, we present a new framework for dense prediction by implicitly and explicitly leveraging the pre-trained knowledge from CLIP. Specifically, we convert the original image-text matching problem in CLIP to a pixel-text matching problem and use the pixel-text score maps to guide the learning of dense prediction models. By further using the contextual information from the image to prompt the language model, we are able to facilitate our model to better exploit the pre-trained knowledge. Our method is model-agnostic, which can be applied to arbitrary dense prediction systems and various pre-trained visual backbones including both CLIP models and ImageNet pre-trained models. Extensive experiments demonstrate the superior performance of our methods on semantic segmentation, object detection, and instance segmentation tasks. Code is available at https://github.com/raoyongming/DenseCLIP

preprint2022arXiv

HFT: Lifting Perspective Representations via Hybrid Feature Transformation

Autonomous driving requires accurate and detailed Bird's Eye View (BEV) semantic segmentation for decision making, which is one of the most challenging tasks for high-level scene perception. Feature transformation from frontal view to BEV is the pivotal technology for BEV semantic segmentation. Existing works can be roughly classified into two categories, i.e., Camera model-Based Feature Transformation (CBFT) and Camera model-Free Feature Transformation (CFFT). In this paper, we empirically analyze the vital differences between CBFT and CFFT. The former transforms features based on the flat-world assumption, which may cause distortion of regions lying above the ground plane. The latter is limited in the segmentation performance due to the absence of geometric priors and time-consuming computation. In order to reap the benefits and avoid the drawbacks of CBFT and CFFT, we propose a novel framework with a Hybrid Feature Transformation module (HFT). Specifically, we decouple the feature maps produced by HFT for estimating the layout of outdoor scenes in BEV. Furthermore, we design a mutual learning scheme to augment hybrid transformation by applying feature mimicking. Notably, extensive experiments demonstrate that with negligible extra overhead, HFT achieves a relative improvement of 13.3% on the Argoverse dataset and 16.8% on the KITTI 3D Object datasets compared to the best-performing existing method. The codes are available at https://github.com/JiayuZou2020/HFT.

preprint2022arXiv

MVSTER: Epipolar Transformer for Efficient Multi-View Stereo

Learning-based Multi-View Stereo (MVS) methods warp source images into the reference camera frustum to form 3D volumes, which are fused as a cost volume to be regularized by subsequent networks. The fusing step plays a vital role in bridging 2D semantics and 3D spatial associations. However, previous methods utilize extra networks to learn 2D information as fusing cues, underusing 3D spatial correlations and bringing additional computation costs. Therefore, we present MVSTER, which leverages the proposed epipolar Transformer to learn both 2D semantics and 3D spatial associations efficiently. Specifically, the epipolar Transformer utilizes a detachable monocular depth estimator to enhance 2D semantics and uses cross-attention to construct data-dependent 3D associations along epipolar line. Additionally, MVSTER is built in a cascade structure, where entropy-regularized optimal transport is leveraged to propagate finer depth estimations in each stage. Extensive experiments show MVSTER achieves state-of-the-art reconstruction performance with significantly higher efficiency: Compared with MVSNet and CasMVSNet, our MVSTER achieves 34% and 14% relative improvements on the DTU benchmark, with 80% and 51% relative reductions in running time. MVSTER also ranks first on Tanks&Temples-Advanced among all published works. Code is released at https://github.com/JeffWang987.

preprint2022arXiv

On averaging and mixing for stochastic PDEs

We examine the convergence in the Krylov--Bogolyubov averaging for nonlinear stochastic perturbations of linear PDEs with pure imaginary spectrum and show that if the involved effective equation is mixing, then the convergence is uniform in time.

preprint2022arXiv

WebFace260M: A Benchmark for Million-Scale Deep Face Recognition

Face benchmarks empower the research community to train and evaluate high-performance face recognition systems. In this paper, we contribute a new million-scale recognition benchmark, containing uncurated 4M identities/260M faces (WebFace260M) and cleaned 2M identities/42M faces (WebFace42M) training data, as well as an elaborately designed time-constrained evaluation protocol. Firstly, we collect 4M name lists and download 260M faces from the Internet. Then, a Cleaning Automatically utilizing Self-Training (CAST) pipeline is devised to purify the tremendous WebFace260M, which is efficient and scalable. To the best of our knowledge, the cleaned WebFace42M is the largest public face recognition training set and we expect to close the data gap between academia and industry. Referring to practical deployments, Face Recognition Under Inference Time conStraint (FRUITS) protocol and a new test set with rich attributes are constructed. Besides, we gather a large-scale masked face sub-set for biometrics assessment under COVID-19. For a comprehensive evaluation of face matchers, three recognition tasks are performed under standard, masked and unbiased settings, respectively. Equipped with this benchmark, we delve into million-scale face recognition problems. A distributed framework is developed to train face recognition models efficiently without tampering with the performance. Enabled by WebFace42M, we reduce 40% failure rate on the challenging IJB-C set and rank 3rd among 430 entries on NIST-FRVT. Even 10% data (WebFace4M) shows superior performance compared with the public training sets. Furthermore, comprehensive baselines are established under the FRUITS-100/500/1000 milliseconds protocols. The proposed benchmark shows enormous potential on standard, masked and unbiased face recognition scenarios. Our WebFace260M website is https://www.face-benchmark.org.

preprint2021arXiv

Joint Demand Prediction for Multimodal Systems: A Multi-task Multi-relational Spatiotemporal Graph Neural Network Approach

Dynamic demand prediction is crucial for the efficient operation and management of urban transportation systems. Extensive research has been conducted on single-mode demand prediction, ignoring the fact that the demands for different transportation modes can be correlated with each other. Despite some recent efforts, existing approaches to multimodal demand prediction are generally not flexible enough to account for multiplex networks with diverse spatial units and heterogeneous spatiotemporal correlations across different modes. To tackle these issues, this study proposes a multi-relational spatiotemporal graph neural network (ST-MRGNN) for multimodal demand prediction. Specifically, the spatial dependencies across modes are encoded with multiple intra- and inter-modal relation graphs. A multi-relational graph neural network (MRGNN) is introduced to capture cross-mode heterogeneous spatial dependencies, consisting of generalized graph convolution networks to learn the message passing mechanisms within relation graphs and an attention-based aggregation module to summarize different relations. We further integrate MRGNNs with temporal gated convolution layers to jointly model heterogeneous spatiotemporal correlations. Extensive experiments are conducted using real-world subway and ride-hailing datasets from New York City, and the results verify the improved performance of our proposed approach over existing methods across modes. The improvement is particularly large for demand-sparse locations. Further analysis of the attention mechanisms of ST-MRGNN also demonstrates its good interpretability for understanding cross-mode interactions.

preprint2021arXiv

WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition

In this paper, we contribute a new million-scale face benchmark containing noisy 4M identities/260M faces (WebFace260M) and cleaned 2M identities/42M faces (WebFace42M) training data, as well as an elaborately designed time-constrained evaluation protocol. Firstly, we collect 4M name list and download 260M faces from the Internet. Then, a Cleaning Automatically utilizing Self-Training (CAST) pipeline is devised to purify the tremendous WebFace260M, which is efficient and scalable. To the best of our knowledge, the cleaned WebFace42M is the largest public face recognition training set and we expect to close the data gap between academia and industry. Referring to practical scenarios, Face Recognition Under Inference Time conStraint (FRUITS) protocol and a test set are constructed to comprehensively evaluate face matchers. Equipped with this benchmark, we delve into million-scale face recognition problems. A distributed framework is developed to train face recognition models efficiently without tampering with the performance. Empowered by WebFace42M, we reduce relative 40% failure rate on the challenging IJB-C set, and ranks the 3rd among 430 entries on NIST-FRVT. Even 10% data (WebFace4M) shows superior performance compared with public training set. Furthermore, comprehensive baselines are established on our rich-attribute test set under FRUITS-100ms/500ms/1000ms protocol, including MobileNet, EfficientNet, AttentionNet, ResNet, SENet, ResNeXt and RegNet families. Benchmark website is https://www.face-benchmark.org.

preprint2020arXiv

Instance Scale Normalization for image understanding

Scale variation remains a challenging problem for object detection. Common paradigms usually adopt multiscale training & testing (image pyramid) or FPN (feature pyramid network) to process objects in a wide scale range. However, multi-scale methods aggravate more variations of scale that even deep convolution neural networks with FPN cannot handle well. In this work, we propose an innovative paradigm called Instance Scale Normalization (ISN) to resolve the above problem. ISN compresses the scale space of objects into a consistent range (ISN range), in both training and testing phases. This reassures the problem of scale variation fundamentally and reduces the difficulty of network optimization. Experiments show that ISN surpasses multi-scale counterpart significantly for object detection, instance segmentation, and multi-task human pose estimation, on several architectures. On COCO test-dev, our single model based on ISN achieves 46.5 mAP with a ResNet-101 backbone, which is among the state-of-the-art (SOTA) candidates for object detection.

preprint2020arXiv

On The Energy Transfer To High Frequencies In The Damped/Driven Nonlinear Schrödinger Equation (Extended Version)

We consider a damped/driven nonlinear Schrödinger equation in an $n$-cube $K^{n}\subset\mathbb{R}^n$, $n$ is arbitrary, under Dirichlet boundary conditions \[ u_t-νΔu+i|u|^2u=\sqrtνη(t,x),\quad x\in K^{n},\quad u|_{\partial K^{n}}=0, \quad ν>0, \] where $η(t,x)$ is a random force that is white in time and smooth in space. It is known that the Sobolev norms of solutions satisfy $ \| u(t)\|_m^2 \le Cν^{-m}, $ uniformly in $t\ge0$ and $ν>0$. In this work we prove that for small $ν>0$ and any initial data, with large probability the Sobolev norms $\|u(t,\cdot)\|_m$ of the solutions with $m>2$ become large at least to the order of $ν^{-κ_{n,m}}$ with $κ_{n,m}>0$, on time intervals of order $\mathcal{O}(\frac{1}ν)$.

preprint2020arXiv

The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation

Being a fundamental component in training and inference, data processing has not been systematically considered in human pose estimation community, to the best of our knowledge. In this paper, we focus on this problem and find that the devil of human pose estimation evolution is in the biased data processing. Specifically, by investigating the standard data processing in state-of-the-art approaches mainly including coordinate system transformation and keypoint format transformation (i.e., encoding and decoding), we find that the results obtained by common flipping strategy are unaligned with the original ones in inference. Moreover, there is a statistical error in some keypoint format transformation methods. Two problems couple together, significantly degrade the pose estimation performance and thus lay a trap for the research community. This trap has given bone to many suboptimal remedies, which are always unreported, confusing but influential. By causing failure in reproduction and unfair in comparison, the unreported remedies seriously impedes the technological development. To tackle this dilemma from the source, we propose Unbiased Data Processing (UDP) consist of two technique aspect for the two aforementioned problems respectively (i.e., unbiased coordinate system transformation and unbiased keypoint format transformation). As a model-agnostic approach and a superior solution, UDP successfully pushes the performance boundary of human pose estimation and offers a higher and more reliable baseline for research community. Code is public available in https://github.com/HuangJunJie2017/UDP-Pose

preprint2015arXiv

Tag-Weighted Topic Model For Large-scale Semi-Structured Documents

To date, there have been massive Semi-Structured Documents (SSDs) during the evolution of the Internet. These SSDs contain both unstructured features (e.g., plain text) and metadata (e.g., tags). Most previous works focused on modeling the unstructured text, and recently, some other methods have been proposed to model the unstructured text with specific tags. To build a general model for SSDs remains an important problem in terms of both model fitness and efficiency. We propose a novel method to model the SSDs by a so-called Tag-Weighted Topic Model (TWTM). TWTM is a framework that leverages both the tags and words information, not only to learn the document-topic and topic-word distributions, but also to infer the tag-topic distributions for text mining tasks. We present an efficient variational inference method with an EM algorithm for estimating the model parameters. Meanwhile, we propose three large-scale solutions for our model under the MapReduce distributed computing platform for modeling large-scale SSDs. The experimental results show the effectiveness, efficiency and the robustness by comparing our model with the state-of-the-art methods in document modeling, tags prediction and text classification. We also show the performance of the three distributed solutions in terms of time and accuracy on document modeling.

preprint2015arXiv

Time-averaging for weakly nonlinear CGL equations with arbitrary potentials

Consider weakly nonlinear complex Ginzburg--Landau (CGL) equation of the form: $$ u_t+i(-Δu+V(x)u)=εμΔu+ε\mathcal{P}( u),\quad x\in {R^d}\,, \quad(*) $$ under the periodic boundary conditions, where $μ\geqslant0$ and $\mathcal{P}$ is a smooth function. Let $\{ζ_1(x),ζ_2(x),\dots\}$ be the $L_2$-basis formed by eigenfunctions of the operator $-Δ+V(x)$. For a complex function $u(x)$, write it as $u(x)=\sum_{k\geqslant1}v_kζ_k(x)$ and set $I_k(u)=\frac{1}{2}|v_k|^2$. Then for any solution $u(t,x)$ of the linear equation $(*)_{ε=0}$ we have $I(u(t,\cdot))=const$. In this work it is proved that if equation $(*)$ with a sufficiently smooth real potential $V(x)$ is well posed on time-intervals $t\lesssim ε^{-1}$, then for any its solution $u^ε(t,x)$, the limiting behavior of the curve $I(u^ε(t,\cdot))$ on time intervals of order $ε^{-1}$, as $ε\to0$, can be uniquely characterized by a solution of a certain well-posed effective equation: $$ u_t=εμ\triangle u+εF(u), $$ where $F(u)$ is a resonant averaging of the nonlinearity $\mathcal{P}(u)$. We also prove a similar results for the stochastically perturbed equation, when a white in time and smooth in $x$ random force of order $\sqrtε$ is added to the right-hand side of the equation. The approach of this work is rather general. In particular, it applies to equations in bounded domains in $R^d$ under Dirichlet boundary conditions.

preprint2014arXiv

Long-time dynamics of resonant weakly nonlinear CGL equations

Consider a weakly nonlinear CGL equation on the torus~$\mathbb{T}^d$: \[u_t+iΔu=ε[μ(-1)^{m-1}Δ^{m} u+b|u|^{2p}u+ ic|u|^{2q}u].\eqno{(*)}\] Here $u=u(t,x)$, $x\in\mathbb{T}^d$, $0<ε<<1$, $μ\geqslant0$, $b,c\in\mathbb{R}$ and $m,p,q\in\mathbb{N}$. Define \mbox{$I(u)=(I_{\dk},\dk\in\mathbb{Z}^d)$}, where $I_{\dk}=v_{\dk}\bar{v}_{\dk}/2$ and $v_{\dk}$, $\dk\in\mathbb{Z}^d$, are the Fourier coefficients of the function~$u$ we give. Assume that the equation $(*)$ is well posed on time intervals of order $ε^{-1}$ and its solutions have there a-priori bounds, independent of the small parameter. Let $u(t,x)$ solve the equation $(*)$. If $ε$ is small enough, then for $t\lesssimε^{-1}$, the quantity $I(u(t,x))$ can be well described by solutions of an {\it effective equation}: \[u_t=ε[μ(-1)^{m-1}Δ^m u+ F(u)],\] where the term $F(u)$ can be constructed through a kind of resonant averaging of the nonlinearity $b|u|^{2p}+ ic|u|^{2q}u$.

preprint2013arXiv

An averaging theorem for nonlinear Schrödinger equations with small nonlinearities

Consider nonlinear Schrödinger equations with small nonlinearities \[\frac{d}{dt}u+i(-\triangle u+V(x)u)=ε\mathcal{P}(\triangle u,u,x),\quad x\in \mathbb{T}^d.\eqno{(*)}\] Let $\{ζ_1(x),ζ_2(x),\dots\}$ be the $L_2$-basis formed by eigenfunctions of the operator $-\triangle +V(x)$. For any complex function $u(x)$, write it as \mbox{$u(x)=\sum_{k\geqslant1}v_kζ_k(x)$} and set $I_k(u)=\frac{1}{2}|v_k|^2$. Then for any solution $u(t,x)$ of the linear equation $(*)_{ε=0}$ we have $I(u(t,\cdot))=const$. In this work it is proved that if $(*)$ is well posed on time-intervals $t\lesssim ε^{-1}$ and satisfies there some mild a-priori assumptions, then for any its solution $u^ε(t,x)$, the limiting behavior of the curve $I(u^ε(t,\cdot))$ on time intervals of order $ε^{-1}$, as $ε\to0$, can be uniquely characterized by solutions of a certain well-posed effective equation.

preprint2013arXiv

An Averaging Theorem for Perturbed KdV Equation

We consider a perturbed KdV equation: [\dot{u}+u_{xxx} - 6uu_x = εf(x,u(\cdot)), \quad x\in \mathbb{T}, \quad\int_\mathbb{T} u dx=0.] For any periodic function $u(x)$, let $I(u)=(I_1(u),I_2(u),...)\in\mathbb{R}_+^{\infty}$ be the vector, formed by the KdV integrals of motion, calculated for the potential $u(x)$. Assuming that the perturbation $εf(x,u(\cdot))$ is a smoothing mapping (e.g. it is a smooth function $εf(x)$, independent from $u$), and that solutions of the perturbed equation satisfy some mild a-priori assumptions, we prove that for solutions $u(t,x)$ with typical initial data and for $0\leqslant t\lesssim ε^{-1}$, the vector $I(u(t))$ may be well approximated by a solution of the averaged equation.

preprint2013arXiv

KdV equation under periodic boundary conditions and its perturbations

In this paper we discuss properties of the KdV equation under periodic boundary conditions, especially those which are important to study perturbations of the equation. Next we review what is known now about long-time behaviour of solutions for perturbed KdV equations.

preprint2013arXiv

On long time dynamics of perturbed KdV equations

Consider perturbed KdV equations: \[u_t+u_{xxx}-6uu_x=εf(u(\cdot)),\quad x\in\mathbb{T}=\mathbb{R}/\mathbb{Z},\;\int_{\mathbb{T}}u(x,t)dx=0,\] where the nonlinearity defines analytic operators $u(\cdot)\mapsto f(u(\cdot))$ in sufficiently smooth Sobolev spaces. Assume that the equation has an $ε$-quasi-invariant measure $μ$ and satisfies some additional mild assumptions. Let $u^ε(t)$ be a solution. Then on time intervals of order $ε^{-1}$, as $ε\to0$, its actions $I(u^ε(t,\cdot))$ can be approximated by solutions of a certain well-posed averaged equation, provided that the initial datum is $μ$-typical.

Guan Huang

What is connected

Connect this record

See the researcher in context

Building this map preview

28 published item(s)

RoboTransfer: Controllable Geometry-Consistent Video Diffusion for Manipulation Policy Transfer

Detachable Novel Views Synthesis of Dynamic Scenes Using Distribution-Driven Neural Radiance Fields

A Comprehensive Review on Deep Supervision: Theories and Applications

A New Mechanism for Noncollision Singularities

A Simple Baseline for Multi-Camera 3D Object Detection

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection

BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving

Bike Sharing Demand Prediction based on Knowledge Sharing across Modes: A Graph-based Deep Learning Approach

CAFE: Learning to Condense Dataset by Aligning Features

Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

HFT: Lifting Perspective Representations via Hybrid Feature Transformation

MVSTER: Epipolar Transformer for Efficient Multi-View Stereo

On averaging and mixing for stochastic PDEs

WebFace260M: A Benchmark for Million-Scale Deep Face Recognition

Joint Demand Prediction for Multimodal Systems: A Multi-task Multi-relational Spatiotemporal Graph Neural Network Approach

WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition

Instance Scale Normalization for image understanding

On The Energy Transfer To High Frequencies In The Damped/Driven Nonlinear Schrödinger Equation (Extended Version)

The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation

Tag-Weighted Topic Model For Large-scale Semi-Structured Documents

Time-averaging for weakly nonlinear CGL equations with arbitrary potentials

Long-time dynamics of resonant weakly nonlinear CGL equations

An averaging theorem for nonlinear Schrödinger equations with small nonlinearities

An Averaging Theorem for Perturbed KdV Equation

KdV equation under periodic boundary conditions and its perturbations

On long time dynamics of perturbed KdV equations