Researcher profile

Liang Pan

Liang Pan contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
20works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

20 published item(s)

preprint2026arXiv

Is Your Driving World Model an All-Around Player?

Today's driving world models can generate remarkably realistic dash-cam videos, yet no single model excels universally. Some generate photorealistic textures but violate basic physics; others maintain geometric consistency but fail when subjected to closed-loop planning. This disconnect exposes a critical gap: the field evaluates how real generated worlds appear, but rarely whether they behave realistically. We introduce WorldLens, a unified benchmark that measures world-model fidelity across the full spectrum, from pixel quality and 4D geometry to closed-loop driving and human perceptual alignment, through five complementary aspects and 24 standardized dimensions. Our evaluation of six representative models reveals that no existing approach dominates across all axes: texture-rich models violate geometry, geometry-aware models lack behavioral fidelity, and even the strongest performers achieve only 2-3 out of 10 on human realism ratings. To bridge algorithmic metrics with human perception, we further contribute WorldLens-26K, a 26,808-entry human-annotated preference dataset pairing numerical scores with textual rationales, and WorldLens-Agent, a vision-language evaluator distilled from these judgments that enables scalable, explainable auto-assessment. Together, the benchmark, dataset, and agent form a unified ecosystem for assessing generated worlds not merely by visual appeal, but by physical and behavioral fidelity.

preprint2022arXiv

A Conditional Point Diffusion-Refinement Paradigm for 3D Point Cloud Completion

3D point cloud is an important 3D representation for capturing real world 3D objects. However, real-scanned 3D point clouds are often incomplete, and it is important to recover complete point clouds for downstream applications. Most existing point cloud completion methods use Chamfer Distance (CD) loss for training. The CD loss estimates correspondences between two point clouds by searching nearest neighbors, which does not capture the overall point density distribution on the generated shape, and therefore likely leads to non-uniform point cloud generation. To tackle this problem, we propose a novel Point Diffusion-Refinement (PDR) paradigm for point cloud completion. PDR consists of a Conditional Generation Network (CGNet) and a ReFinement Network (RFNet). The CGNet uses a conditional generative model called the denoising diffusion probabilistic model (DDPM) to generate a coarse completion conditioned on the partial observation. DDPM establishes a one-to-one pointwise mapping between the generated point cloud and the uniform ground truth, and then optimizes the mean squared error loss to realize uniform generation. The RFNet refines the coarse output of the CGNet and further improves quality of the completed point cloud. Furthermore, we develop a novel dual-path architecture for both networks. The architecture can (1) effectively and efficiently extract multi-level features from partially observed point clouds to guide completion, and (2) accurately manipulate spatial locations of 3D points to obtain smooth surfaces and sharp details. Extensive experimental results on various benchmark datasets show that our PDR paradigm outperforms previous state-of-the-art methods for point cloud completion. Remarkably, with the help of the RFNet, we can accelerate the iterative generation process of the DDPM by up to 50 times without much performance drop.

preprint2022arXiv

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

3D avatar creation plays a crucial role in the digital age. However, the whole production process is prohibitively time-consuming and labor-intensive. To democratize this technology to a larger audience, we propose AvatarCLIP, a zero-shot text-driven framework for 3D avatar generation and animation. Unlike professional software that requires expert knowledge, AvatarCLIP empowers layman users to customize a 3D avatar with the desired shape and texture, and drive the avatar with the described motions using solely natural languages. Our key insight is to take advantage of the powerful vision-language model CLIP for supervising neural human generation, in terms of 3D geometry, texture and animation. Specifically, driven by natural language descriptions, we initialize 3D human geometry generation with a shape VAE network. Based on the generated 3D human shapes, a volume rendering model is utilized to further facilitate geometry sculpting and texture generation. Moreover, by leveraging the priors learned in the motion VAE, a CLIP-guided reference-based motion synthesis method is proposed for the animation of the generated 3D avatar. Extensive qualitative and quantitative experiments validate the effectiveness and generalizability of AvatarCLIP on a wide range of avatars. Remarkably, AvatarCLIP can generate unseen 3D avatars with novel animations, achieving superior zero-shot capability.

preprint2022arXiv

Benchmarking and Analyzing Point Cloud Classification under Corruptions

3D perception, especially point cloud classification, has achieved substantial progress. However, in real-world deployment, point cloud corruptions are inevitable due to the scene complexity, sensor inaccuracy, and processing imprecision. In this work, we aim to rigorously benchmark and analyze point cloud classification under corruptions. To conduct a systematic investigation, we first provide a taxonomy of common 3D corruptions and identify the atomic corruptions. Then, we perform a comprehensive evaluation on a wide range of representative point cloud models to understand their robustness and generalizability. Our benchmark results show that although point cloud classification performance improves over time, the state-of-the-art methods are on the verge of being less robust. Based on the obtained observations, we propose several effective techniques to enhance point cloud classifier robustness. We hope our comprehensive benchmark, in-depth analysis, and proposed techniques could spark future research in robust 3D perception.

preprint2022arXiv

Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with Transformer

With the prevalence of LiDAR sensors in autonomous driving, 3D object tracking has received increasing attention. In a point cloud sequence, 3D object tracking aims to predict the location and orientation of an object in consecutive frames given an object template. Motivated by the success of transformers, we propose Point Tracking TRansformer (PTTR), which efficiently predicts high-quality 3D tracking results in a coarse-to-fine manner with the help of transformer operations. PTTR consists of three novel designs. 1) Instead of random sampling, we design Relation-Aware Sampling to preserve relevant points to the given template during subsampling. 2) We propose a Point Relation Transformer for effective feature aggregation and feature matching between the template and search region. 3) Based on the coarse tracking results, we employ a novel Prediction Refinement Module to obtain the final refined prediction through local feature pooling. In addition, motivated by the favorable properties of the Bird's-Eye View (BEV) of point clouds in capturing object motion, we further design a more advanced framework named PTTR++, which incorporates both the point-wise view and BEV representation to exploit their complementary effect in generating high-quality tracking results. PTTR++ substantially boosts the tracking performance on top of PTTR with low computational overhead. Extensive experiments over multiple datasets show that our proposed approaches achieve superior 3D tracking accuracy and efficiency.

preprint2022arXiv

MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model

Human motion modeling is important for many modern graphics applications, which typically require professional skills. In order to remove the skill barriers for laymen, recent motion generation methods can directly generate human motions conditioned on natural languages. However, it remains challenging to achieve diverse and fine-grained motion generation with various text inputs. To address this problem, we propose MotionDiffuse, the first diffusion model-based text-driven motion generation framework, which demonstrates several desired properties over existing methods. 1) Probabilistic Mapping. Instead of a deterministic language-motion mapping, MotionDiffuse generates motions through a series of denoising steps in which variations are injected. 2) Realistic Synthesis. MotionDiffuse excels at modeling complicated data distribution and generating vivid motion sequences. 3) Multi-Level Manipulation. MotionDiffuse responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts. Our experiments show MotionDiffuse outperforms existing SoTA methods by convincing margins on text-driven motion generation and action-conditioned motion generation. A qualitative analysis further demonstrates MotionDiffuse's controllability for comprehensive motion generation. Homepage: https://mingyuan-zhang.github.io/projects/MotionDiffuse.html

preprint2022arXiv

Neural MoCon: Neural Motion Control for Physically Plausible Human Motion Capture

Due to the visual ambiguity, purely kinematic formulations on monocular human motion capture are often physically incorrect, biomechanically implausible, and can not reconstruct accurate interactions. In this work, we focus on exploiting the high-precision and non-differentiable physics simulator to incorporate dynamical constraints in motion capture. Our key-idea is to use real physical supervisions to train a target pose distribution prior for sampling-based motion control to capture physically plausible human motion. To obtain accurate reference motion with terrain interactions for the sampling, we first introduce an interaction constraint based on SDF (Signed Distance Field) to enforce appropriate ground contact modeling. We then design a novel two-branch decoder to avoid stochastic error from pseudo ground-truth and train a distribution prior with the non-differentiable physics simulator. Finally, we regress the sampling distribution from the current state of the physical character with the trained prior and sample satisfied target poses to track the estimated reference motion. Qualitative and quantitative results show that we can obtain physically plausible human motion with complex terrain interactions, human shape variations, and diverse behaviors. More information can be found at~\url{https://www.yangangwang.com/papers/HBZ-NM-2022-03.html}

preprint2022arXiv

PTTR: Relational 3D Point Cloud Object Tracking with Transformer

In a point cloud sequence, 3D object tracking aims to predict the location and orientation of an object in the current search point cloud given a template point cloud. Motivated by the success of transformers, we propose Point Tracking TRansformer (PTTR), which efficiently predicts high-quality 3D tracking results in a coarse-to-fine manner with the help of transformer operations. PTTR consists of three novel designs. 1) Instead of random sampling, we design Relation-Aware Sampling to preserve relevant points to given templates during subsampling. 2) Furthermore, we propose a Point Relation Transformer (PRT) consisting of a self-attention and a cross-attention module. The global self-attention operation captures long-range dependencies to enhance encoded point features for the search area and the template, respectively. Subsequently, we generate the coarse tracking results by matching the two sets of point features via cross-attention. 3) Based on the coarse tracking results, we employ a novel Prediction Refinement Module to obtain the final refined prediction. In addition, we create a large-scale point cloud single object tracking benchmark based on the Waymo Open Dataset. Extensive experiments show that PTTR achieves superior point cloud tracking in both accuracy and efficiency.

preprint2022arXiv

Robust Partial-to-Partial Point Cloud Registration in a Full Range

Point cloud registration for 3D objects is a challenging task due to sparse and noisy measurements, incomplete observations and large transformations. In this work, we propose \textbf{G}raph \textbf{M}atching \textbf{C}onsensus \textbf{Net}work (\textbf{GMCNet}), which estimates pose-invariant correspondences for full-range Partial-to-Partial point cloud Registration (PPR) in the object-level registration scenario. To encode robust point descriptors, \textbf{1)} we first comprehensively investigate transformation-robustness and noise-resilience of various geometric features. \textbf{2)} Then, we employ a novel {T}ransformation-robust {P}oint {T}ransformer (\textbf{TPT}) module to adaptively aggregate local features regarding the structural relations, which takes advantage from both handcrafted rotation-invariant ({\textit{RI}}) features and noise-resilient spatial coordinates. \textbf{3)} Based on a synergy of hierarchical graph networks and graphical modeling, we propose the {H}ierarchical {G}raphical {M}odeling (\textbf{HGM}) architecture to encode robust descriptors consisting of i) a unary term learned from {\textit{RI}} features; and ii) multiple smoothness terms encoded from neighboring point relations at different scales through our TPT modules. Moreover, we construct a challenging PPR dataset (\textbf{MVP-RG}) based on the recent MVP dataset that features high-quality scans. Extensive experiments show that GMCNet outperforms previous state-of-the-art methods for PPR. Notably, GMCNet encodes point descriptors for each point cloud individually without using cross-contextual information, or ground truth correspondences for training. Our code and datasets are available at: https://github.com/paul007pl/GMCNet.

preprint2022arXiv

TAda! Temporally-Adaptive Convolutions for Video Understanding

Spatial convolutions are widely used in numerous deep video models. It fundamentally assumes spatio-temporal invariance, i.e., using shared weights for every location in different frames. This work presents Temporally-Adaptive Convolutions (TAdaConv) for video understanding, which shows that adaptive weight calibration along the temporal dimension is an efficient way to facilitate modelling complex temporal dynamics in videos. Specifically, TAdaConv empowers the spatial convolutions with temporal modelling abilities by calibrating the convolution weights for each frame according to its local and global temporal context. Compared to previous temporal modelling operations, TAdaConv is more efficient as it operates over the convolution kernels instead of the features, whose dimension is an order of magnitude smaller than the spatial resolutions. Further, the kernel calibration brings an increased model capacity. We construct TAda2D and TAdaConvNeXt networks by replacing the 2D convolutions in ResNet and ConvNeXt with TAdaConv, which leads to at least on par or better performance compared to state-of-the-art approaches on multiple video action recognition and localization benchmarks. We also demonstrate that as a readily plug-in operation with negligible computation overhead, TAdaConv can effectively improve many existing video models with a convincing margin.

preprint2022arXiv

TCTrack: Temporal Contexts for Aerial Tracking

Temporal contexts among consecutive frames are far from being fully utilized in existing visual trackers. In this work, we present TCTrack, a comprehensive framework to fully exploit temporal contexts for aerial tracking. The temporal contexts are incorporated at \textbf{two levels}: the extraction of \textbf{features} and the refinement of \textbf{similarity maps}. Specifically, for feature extraction, an online temporally adaptive convolution is proposed to enhance the spatial features using temporal information, which is achieved by dynamically calibrating the convolution weights according to the previous frames. For similarity map refinement, we propose an adaptive temporal transformer, which first effectively encodes temporal knowledge in a memory-efficient way, before the temporal knowledge is decoded for accurate adjustment of the similarity map. TCTrack is effective and efficient: evaluation on four aerial tracking benchmarks shows its impressive performance; real-world UAV tests show its high speed of over 27 FPS on NVIDIA Jetson AGX Xavier.

preprint2022arXiv

Three-dimensional discontinuous Galerkin based high-order gas-kinetic scheme and GPU implementation

In this paper, the discontinuous Galerkin based high-order gas-kinetic schemes (DG-HGKS) are developed for the three-dimensional Euler and Navier-Stokes equations. Different from the traditional discontinuous Galerkin (DG) methods with Riemann solvers, the current method adopts a kinetic evolution process, which is provided by the integral solution of Bhatnagar-Gross-Krook (BGK) model. In the weak formulation of DG method, a time-dependent evolution function is provided, and both inviscid and viscous fluxes can be calculated uniformly. The temporal accuracy is achieved by the two-stage fourth-order discretization, and the second-order gas-kinetic solver is adopted for the fluxes over the cell interface and the fluxes inside a cell. Numerical examples, including accuracy tests and Taylor-Green vortex problem, are presented to validate the efficiency and accuracy of DG-HGKS. Both optimal convergence and super-convergence are achieved by the current scheme. The comparison between DG-HGKS and high-order gas-kinetic scheme with weighted essential non-oscillatory reconstruction (WENO-HGKS) is also given, and the numerical performances are comparable with the approximate number of degree of freedom. To accelerate the computation, the DG-HGKS is implemented with the graphics processing unit (GPU) using compute unified device architecture (CUDA). The obtained results are also compared with those calculated by the central processing units (CPU) code in terms of the computational efficiency. The speedup of GPU code suggests the potential of high-order gas-kinetic schemes for the large scale computation.

preprint2022arXiv

Three-dimensional third-order gas-kinetic scheme on hybrid unstructured meshes for Euler and Navier-Stokes equations

In this paper, a third order gas kinetic scheme is developed on the three dimensional hybrid unstructured meshes for the numerical simulation of compressible inviscid and viscous flows. A third-order WENO reconstruction is developed on the hybrid unstructured meshes, including tetrahedron, pyramid, prism and hexahedron. A simple strategy is adopted for the selection of big stencil and sub-stencils. Incorporate with the two-stage fourth-order temporal discretization and lower-upper symmetric Gauss-Seidel methods, both explicit and implicit high-order gas-kinetic schemes are developed. A variety of numerical examples, from the subsonic to supersonic flows, are presented to validate the accuracy and robustness for both inviscid and viscous flows.

preprint2022arXiv

TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object Detection

3D object detection using point clouds has attracted increasing attention due to its wide applications in autonomous driving and robotics. However, most existing studies focus on single point cloud frames without harnessing the temporal information in point cloud sequences. In this paper, we design TransPillars, a novel transformer-based feature aggregation technique that exploits temporal features of consecutive point cloud frames for multi-frame 3D object detection. TransPillars aggregates spatial-temporal point cloud features from two perspectives. First, it fuses voxel-level features directly from multi-frame feature maps instead of pooled instance features to preserve instance details with contextual information that are essential to accurate object localization. Second, it introduces a hierarchical coarse-to-fine strategy to fuse multi-scale features progressively to effectively capture the motion of moving objects and guide the aggregation of fine features. Besides, a variant of deformable transformer is introduced to improve the effectiveness of cross-frame feature matching. Extensive experiments show that our proposed TransPillars achieves state-of-art performance as compared to existing multi-frame detection approaches. Code will be released.

preprint2022arXiv

Versatile Multi-Modal Pre-Training for Human-Centric Perception

Human-centric perception plays a vital role in vision and graphics. But their data annotations are prohibitively expensive. Therefore, it is desirable to have a versatile pre-train model that serves as a foundation for data-efficient downstream tasks transfer. To this end, we propose the Human-Centric Multi-Modal Contrastive Learning framework HCMoCo that leverages the multi-modal nature of human data (e.g. RGB, depth, 2D keypoints) for effective representation learning. The objective comes with two main challenges: dense pre-train for multi-modality data, efficient usage of sparse human priors. To tackle the challenges, we design the novel Dense Intra-sample Contrastive Learning and Sparse Structure-aware Contrastive Learning targets by hierarchically learning a modal-invariant latent space featured with continuous and ordinal feature distribution and structure-aware semantic consistency. HCMoCo provides pre-train for different modalities by combining heterogeneous datasets, which allows efficient usage of existing task-specific human data. Extensive experiments on four downstream tasks of different modalities demonstrate the effectiveness of HCMoCo, especially under data-efficient settings (7.16% and 12% improvement on DensePose Estimation and Human Parsing). Moreover, we demonstrate the versatility of HCMoCo by exploring cross-modality supervision and missing-modality inference, validating its strong ability in cross-modal association and reasoning.

preprint2020arXiv

High-order gas-kinetic scheme with parallel computation for direct numerical simulation of turbulent flows

The performance of high-order gas-kinetic scheme (HGKS) has been investigated for the direct numerical simulation (DNS) of isotropic compressible turbulence up to the supersonic regime. Due to the multi-scale nature and coupled temporal-spatial evolution process, HGKS provides a valid tool for the numerical simulation of compressible turbulent flow. Based on the domain decomposition and message passing interface (MPI), a parallel HGKS code is developed for large-scale computation in this paper. The standard tests from the nearly incompressible flow to the supersonic one, including Taylor-Green vortex problem, turbulent channel flow and isotropic compressible turbulence, are presented to validate the parallel scalability, efficiency, accuracy and robustness of parallel implementation. The performance of HGKS for the nearly incompressible turbulence is comparable with the high-order finite difference scheme, including the resolution of flow structure and efficiency of computation. Based on the accuracy of the numerical solution, the numerical dissipation of the scheme in the turbulence simulation is quantitatively evaluated. As a mesoscopic method, HGKS performs better than both lattice Boltzmann method (LBM) and discrete unified gas-kinetic scheme (DUGKS), due to its high-order accuracy. Meanwhile, based on the kinetic formulation HGKS shows advantage for supersonic turbulent flow simulation with its accuracy and robustness. The current work demonstrates the capability of HGKS as a powerful DNS tool from the low speed to supersonic turbulence study, which is less reported under the framework of finite volume scheme.

preprint2020arXiv

Robust 6D Object Pose Estimation by Learning RGB-D Features

Accurate 6D object pose estimation is fundamental to robotic manipulation and grasping. Previous methods follow a local optimization approach which minimizes the distance between closest point pairs to handle the rotation ambiguity of symmetric objects. In this work, we propose a novel discrete-continuous formulation for rotation regression to resolve this local-optimum problem. We uniformly sample rotation anchors in SO(3), and predict a constrained deviation from each anchor to the target, as well as uncertainty scores for selecting the best prediction. Additionally, the object location is detected by aggregating point-wise vectors pointing to the 3D center. Experiments on two benchmarks: LINEMOD and YCB-Video, show that the proposed method outperforms state-of-the-art approaches. Our code is available at https://github.com/mentian/object-posenet.

preprint2020arXiv

Three dimensional high-order gas-kinetic scheme for supersonic isotropic turbulence II: coarse-grained analysis of compressible $K_{sgs}$ budget

The direct numerical simulation (DNS) of compressible isotropic turbulence up to the supersonic regime $Ma_{t} = 1.2$ has been investigated by high-order gas-kinetic scheme (HGKS) [{\it{Computers}} \& {\it{Fluids, 192, 2019}}]. In this study, the coarse-grained analysis of subgrid-scale (SGS) turbulent kinetic energy $K_{sgs}$ budget is fully analyzed for constructing one-equation SGS model in the compressible large eddy simulation (LES). The DNS on a much higher turbulent Mach number up to $Ma_{t} = 2.0$ has been obtained by HGKS, which confirms the super robustness of HGKS. Then, the exact compressible SGS turbulent kinetic energy $K_{sgs}$ transport equation is derived with density weighted filtering process. Based on the compressible $K_{sgs}$ transport equation, the coarse-grained processes are implemented on three sets of unresolved grids with the Box filter. The coarse-grained analysis of compressible $K_{sgs}$ budgets shows that all unresolved source terms are dominant terms in current system. Especially, the magnitude of SGS pressure-dilation term is in the order of SGS solenoidal dissipation term within the initial acoustic time scale. Therefore, it can be concluded that the SGS pressure-dilation term cannot be neglected as the previous work. The delicate coarse-grained analysis of SGS diffusion terms in compressible $K_{sgs}$ equation confirms that both the fluctuation velocity triple correlation term and the pressure-velocity correlation term are dominant terms. Current coarse-grained analysis gives an indication of the order of magnitude of all SGS terms in compressible $K_{sgs}$ budget, which provides a solid basis for compressible LES modeling in high Mach number turbulent flow.

preprint2019arXiv

High-order ALE gas-kinetic scheme with unstructured WENO reconstruction

In this paper, a high-order multi-dimensional gas-kinetic scheme is presented for both inviscid and viscous flows in arbitrary Lagrangian-Eulerian (ALE) formulation. Compared with the traditional ALE method, the flow variables are updated in the finite volume framework, and the rezoning and remapping steps are not required. The two-stage fourth-order method is used for the temporal discretization, and the second-order gas-kinetic solver is applied for the flux evaluation. In the two-stage method, the spatial reconstruction is performed at both initial and intermediate stage, and the computational mesh at the corresponding stage is given by the specified mesh velocity. In the moving mesh procedure, the mesh may distort severely and the mesh quality is reduced. To achieve the accuracy and improve the robustness, the newly developed WENO method \cite{un-WENO3} on quadrilateral unstructured meshes is adopted at each stage. The Gaussian quadrature is used for flux calculation. For each Gaussian point, the reconstruction performed in the local moving coordinate, where the variation of mesh velocity is taken into account. Therefore, the accuracy and geometric conservation law can be well preserved by the current scheme even with the largely deforming mesh. Numerical examples are presented to validate the performance of current scheme, where the mesh adaptation method and cell centered Lagrangian method are used to provide mesh velocity.

preprint2017arXiv

A Compact Fourth-order Gas-kinetic Scheme for the Euler and Navier-Stokes Solutions

In this paper, a fourth-order compact gas-kinetic scheme (GKS) is developed for the compressible Euler and Navier-Stokes equations under the framework of two-stage fourth-order temporal discretization and Hermite WENO (HWENO) reconstruction. Due to the high-order gas evolution model, the GKS provides a time dependent gas distribution function at a cell interface. This time evolution solution can be used not only for the flux evaluation across a cell interface and its time derivative, but also time accurate evolution solution at a cell interface. As a result, besides updating the conservative flow variables inside each control volume, the GKS can get the cell averaged slopes inside each control volume as well through the differences of flow variables at the cell interfaces. So, with the updated flow variables and their slopes inside each cell, the HWENO reconstruction can be naturally implemented for the compact high-order reconstruction at the beginning of next step. Therefore, a compact higher-order GKS, such as the two-stages fourth-order compact scheme can be constructed. This scheme is as robust as second-order one, but more accurate solution can be obtained. In comparison with compact fourth-order DG method, the current scheme has only two stages instead of four within each time step for the fourth-order temporal accuracy, and the CFL number used here can be on the order of $0.5$ instead of $0.11$ for the DG method. Through this research, it concludes that the use of high-order time evolution model rather than the first order Riemann solution is extremely important for the design of robust, accurate, and efficient higher-order schemes for the compressible flows.