Source author record

Liang Peng

Liang Peng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision physics.optics math.ST Statistics Theory Methodology Applications Machine Learning math.PR physics.app-ph q-fin.ST Social and Information Networks

Catalog footprint

What is connected

22works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Qwen-Image-2.0 Technical Report

We present Qwen-Image-2.0, an omni-capable image generation foundation model that unifies high-fidelity generation and precise image editing within a single framework. Despite recent progress, existing models still struggle with ultra-long text rendering, multilingual typography, high-resolution photorealism, robust instruction following, and efficient deployment, especially in text-rich and compositionally complex scenarios. Qwen-Image-2.0 addresses these challenges by coupling Qwen3-VL as the condition encoder with a Multimodal Diffusion Transformer for joint condition-target modeling, supported by large-scale data curation and a customized multi-stage training pipeline. This enables strong multimodal understanding while preserving flexible generation and editing capabilities. The model supports instructions of up to 1K tokens for generating text-rich content such as slides, posters, infographics, and comics, while significantly improving multilingual text fidelity and typography. It also enhances photorealistic generation with richer details, more realistic textures, and coherent lighting, and follows complex prompts more reliably across diverse styles. Extensive human evaluations show that Qwen-Image-2.0 substantially outperforms previous Qwen-Image models in both generation and editing, marking a step toward more general, reliable, and practical image generation foundation models.

preprint2026arXiv

Qwen-Image-VAE-2.0 Technical Report

We present Qwen-Image-VAE-2.0, a suite of high-compression Variational Autoencoders (VAEs) that achieve significant advances in both reconstruction fidelity and diffusability. To address the reconstruction bottlenecks of high compression, we adopt an improved architecture featuring Global Skip Connections (GSC) and expanded latent channels. Moreover, we scale training to billions of images and incorporate a synthetic rendering engine to improve performance in text-rich scenarios. To tackle the convergence challenges of high-dimensional latent space, we implement an enhanced semantic alignment strategy to make the latent space highly amenable to diffusion modeling. To optimize computational efficiency, we leverage an asymmetric and attention-free encoder-decoder backbone to minimize encoding overhead. We present a comprehensive evaluation of Qwen-Image-VAE-2.0 on public reconstruction benchmarks. To evaluate performance in text-rich scenarios, we propose OmniDoc-TokenBench, a new benchmark comprising a diverse collection of real-world documents coupled with specialized OCR-based evaluation metrics. Qwen-Image-VAE-2.0 achieves state-of-the-art reconstruction performance, demonstrating exceptional capabilities in both general domains and text-rich scenarios at high compression ratio. Furthermore, downstream DiT experiments reveal our models possess superior diffusability, significantly accelerating convergence compared to existing high-compression baselines. These establish Qwen-Image-VAE-2.0 as a leading model with high compression, superior reconstruction, and exceptional diffusability.

preprint2026arXiv

Towards Visual Query Localization in the 3D World

Visual query localization (VQL) aims to predict the spatio-temporal response of the most recent occurrence in a sequence given a query. Currently, most research focuses on visual query localization in 2D videos, while its counterpart in 3D space has received little attention. In this paper, we make the first attempt to address visual query localization in the 3D world by introducing a novel benchmark, dubbed 3DVQL. Specifically, 3DVQL contains 2,002 sequences with around 170,000 frames and 6.4K response track segments from 38 object categories. Each sequence in 3DVQL is provided with multiple modalities, including point clouds, RGB images, and depth images, to support flexible research. To ensure high-quality annotations, each sequence is manually annotated with multiple rounds of verification and refinement. To the best of our knowledge, 3DVQL is the first benchmark for 3D multimodal visual query localization. To facilitate comparison in subsequent research, we implement a series of representative 3D multimodal VQL baselines using point clouds and RGB images. The experimental results show that existing methods exhibit significant performance variations across different fusion modules. To encourage future research, we propose a lift-and-attention fusion algorithm named LaF, which significantly outperforms existing baseline models. Our benchmark and model will be publicly released at https://github.com/wuhengliangliang/3DVQL.

preprint2022arXiv

DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection

Monocular 3D detection has drawn much attention from the community due to its low cost and setup simplicity. It takes an RGB image as input and predicts 3D boxes in the 3D space. The most challenging sub-task lies in the instance depth estimation. Previous works usually use a direct estimation method. However, in this paper we point out that the instance depth on the RGB image is non-intuitive. It is coupled by visual depth clues and instance attribute clues, making it hard to be directly learned in the network. Therefore, we propose to reformulate the instance depth to the combination of the instance visual surface depth (visual depth) and the instance attribute depth (attribute depth). The visual depth is related to objects' appearances and positions on the image. By contrast, the attribute depth relies on objects' inherent attributes, which are invariant to the object affine transformation on the image. Correspondingly, we decouple the 3D location uncertainty into visual depth uncertainty and attribute depth uncertainty. By combining different types of depths and associated uncertainties, we can obtain the final instance depth. Furthermore, data augmentation in monocular 3D detection is usually limited due to the physical nature, hindering the boost of performance. Based on the proposed instance depth disentanglement strategy, we can alleviate this problem. Evaluated on KITTI, our method achieves new state-of-the-art results, and extensive ablation studies validate the effectiveness of each component in our method. The codes are released at https://github.com/SPengLiang/DID-M3D.

preprint2022arXiv

FedNI: Federated Graph Learning with Network Inpainting for Population-Based Disease Prediction

Graph Convolutional Neural Networks (GCNs) are widely used for graph analysis. Specifically, in medical applications, GCNs can be used for disease prediction on a population graph, where graph nodes represent individuals and edges represent individual similarities. However, GCNs rely on a vast amount of data, which is challenging to collect for a single medical institution. In addition, a critical challenge that most medical institutions continue to face is addressing disease prediction in isolation with incomplete data information. To address these issues, Federated Learning (FL) allows isolated local institutions to collaboratively train a global model without data sharing. In this work, we propose a framework, FedNI, to leverage network inpainting and inter-institutional data via FL. Specifically, we first federatively train missing node and edge predictor using a graph generative adversarial network (GAN) to complete the missing information of local networks. Then we train a global GCN node classifier across institutions using a federated graph learning platform. The novel design enables us to build more accurate machine learning models by leveraging federated learning and also graph learning approaches. We demonstrate that our federated model outperforms local and baseline FL methods with significant margins on two public neuroimaging datasets.

preprint2022arXiv

Multi-level Feature Learning for Contrastive Multi-view Clustering

Multi-view clustering can explore common semantics from multiple views and has attracted increasing attention. However, existing works punish multiple objectives in the same feature space, where they ignore the conflict between learning consistent common semantics and reconstructing inconsistent view-private information. In this paper, we propose a new framework of multi-level feature learning for contrastive multi-view clustering to address the aforementioned issue. Our method learns different levels of features from the raw features, including low-level features, high-level features, and semantic labels/features in a fusion-free manner, so that it can effectively achieve the reconstruction objective and the consistency objectives in different feature spaces. Specifically, the reconstruction objective is conducted on the low-level features. Two consistency objectives based on contrastive learning are conducted on the high-level features and the semantic labels, respectively. They make the high-level features effectively explore the common semantics and the semantic labels achieve the multi-view clustering. As a result, the proposed framework can reduce the adverse influence of view-private information. Extensive experiments on public datasets demonstrate that our method achieves state-of-the-art clustering effectiveness.

preprint2022arXiv

Sparse Fuse Dense: Towards High Quality 3D Detection with Depth Completion

Current LiDAR-only 3D detection methods inevitably suffer from the sparsity of point clouds. Many multi-modal methods are proposed to alleviate this issue, while different representations of images and point clouds make it difficult to fuse them, resulting in suboptimal performance. In this paper, we present a novel multi-modal framework SFD (Sparse Fuse Dense), which utilizes pseudo point clouds generated from depth completion to tackle the issues mentioned above. Different from prior works, we propose a new RoI fusion strategy 3D-GAF (3D Grid-wise Attentive Fusion) to make fuller use of information from different types of point clouds. Specifically, 3D-GAF fuses 3D RoI features from the couple of point clouds in a grid-wise attentive way, which is more fine-grained and more precise. In addition, we propose a SynAugment (Synchronized Augmentation) to enable our multi-modal framework to utilize all data augmentation approaches tailored to LiDAR-only methods. Lastly, we customize an effective and efficient feature extractor CPConv (Color Point Convolution) for pseudo point clouds. It can explore 2D image features and 3D geometric features of pseudo point clouds simultaneously. Our method holds the highest entry on the KITTI car 3D object detection leaderboard, demonstrating the effectiveness of our SFD. Codes are available at https://github.com/LittlePey/SFD.

preprint2022arXiv

Ultra-wideband Antireflection Assisted by Continuously Varying Temporal Medium

We demonstrate that reflectionless propagation of electromagnetic waves between two different materials can be achieved by designing an intermediate temporal medium, which can work in an ultra-wide frequency band. Such a temporal medium is designed with consideration of a multi-stage variation of the material' s permittivity in the time domain. The multi-stage temporal permittivity is formed by a cascaded quarter-wave temporal coating, which is an extension of the antireflection temporal coating by Pacheco-Peña et al [[1] Optica 7, 323 (2020)]. The strategy to render ultra-wideband antireflection temporal medium is discussed analytically and verified numerically. In-depth analysis shows that the multi-stage design of the temporal media implies a continuously temporal variation of the material' s constitutive parameters, thus an ultra-wideband antireflection temporal medium is reasonably obtained. As an illustrative example for application, the proposed temporal medium is adopted to realize impedance matching between a dielectric slab and free space, which validates our new findings.

preprint2022arXiv

WeakM3D: Towards Weakly Supervised Monocular 3D Object Detection

Monocular 3D object detection is one of the most challenging tasks in 3D scene understanding. Due to the ill-posed nature of monocular imagery, existing monocular 3D detection methods highly rely on training with the manually annotated 3D box labels on the LiDAR point clouds. This annotation process is very laborious and expensive. To dispense with the reliance on 3D box labels, in this paper we explore the weakly supervised monocular 3D detection. Specifically, we first detect 2D boxes on the image. Then, we adopt the generated 2D boxes to select corresponding RoI LiDAR points as the weak supervision. Eventually, we adopt a network to predict 3D boxes which can tightly align with associated RoI LiDAR points. This network is learned by minimizing our newly-proposed 3D alignment loss between the 3D box estimates and the corresponding RoI LiDAR points. We will illustrate the potential challenges of the above learning problem and resolve these challenges by introducing several effective designs into our method. Codes will be available at https://github.com/SPengLiang/WeakM3D.

preprint2015arXiv

Dynamic Bivariate Normal Copula

Normal copula with a correlation coefficient between $-1$ and $1$ is tail independent and so it severely underestimates extreme probabilities. By letting the correlation coefficient in a normal copula depend on the sample size, Hüsler and Reiss (1989) showed that the tail can become asymptotically dependent. In this paper, we extend this result by deriving the limit of the normalized maximum of $n$ independent observations, where the $i$-th observation follows from a normal copula with its correlation coefficient being either a parametric or a nonparametric function of $i/n$. Furthermore, both parametric and nonparametric inference for this unknown function are studied, which can be employed to test the condition in Hüsler and Reiss (1989). A simulation study and real data analysis are presented too.

preprint2014arXiv

Inference for a Special Bilinear Time Series Model

It is well known that estimating bilinear models is quite challenging. Many different ideas have been proposed to solve this problem. However, there is not a simple way to do inference even for its simple cases. This paper studies the special bilinear model $$Y_t=μ+ϕY_{t-2}+ bY_{t-2}\varepsilon_{t-1}+ \varepsilon_t,$$ where $\{\varepsilon_t\}$ is a sequence of i.i.d. random variables with mean zero. We first give a sufficient condition for the existence of a unique stationary solution for the model and then propose a GARCH-type maximum likelihood estimator for estimating the unknown parameters. It is shown that the GMLE is consistent and asymptotically normal under only finite fourth moment of errors. Also a simple consistent estimator for the asymptotic covariance is provided. A simulation study confirms the good finite sample performance. Our estimation approach is novel and nonstandard and it may provide a new insight for future research in this direction.

preprint2014arXiv

Maxima of a triangular array of multivariate Gaussian sequence

It is known that the normalized maxima of a sequence of independent and identically distributed bivariate normal random vectors with correlation coefficient $ρ\in (-1,1)$ is asymptotically independent, which may seriously underestimate extreme probabilities in practice. By letting $ρ$ depend on the sample size and go to one with certain rate, Hüsler and Reiss (1989) showed that the normalized maxima can become asymptotically dependent. In this paper, we extend such a study to a triangular array of multivariate Gaussian sequence, which further generalizes the results in Hsing, Hüsler and Reiss (1996) and Hashorva and Weng (2013).

preprint2014arXiv

Predictive regressions for macroeconomic data

Researchers have constantly asked whether stock returns can be predicted by some macroeconomic data. However, it is known that macroeconomic data may exhibit nonstationarity and/or heavy tails, which complicates existing testing procedures for predictability. In this paper we propose novel empirical likelihood methods based on some weighted score equations to test whether the monthly CRSP value-weighted index can be predicted by the log dividend-price ratio or the log earnings-price ratio. The new methods work well both theoretically and empirically regardless of the predicting variables being stationary or nonstationary or having an infinite variance.

preprint2014arXiv

Test for a Mean Vector with Fixed or Divergent Dimension

It has been a long history in testing whether a mean vector with a fixed dimension has a specified value. Some well-known tests include the Hotelling $T^2$-test and the empirical likelihood ratio test proposed by Owen [Biometrika 75 (1988) 237-249; Ann. Statist. 18 (1990) 90-120]. Recently, Hotelling $T^2$-test has been modified to work for a high-dimensional mean, and the empirical likelihood method for a mean has been shown to be valid when the dimension of the mean vector goes to infinity. However, the asymptotic distributions of these tests depend on whether the dimension of the mean vector is fixed or goes to infinity. In this paper, we propose to split the sample into two parts and then to apply the empirical likelihood method to two equations instead of d equations, where d is the dimension of the underlying random vector. The asymptotic distribution of the new test is independent of the dimension of the mean vector. A simulation study shows that the new test has a very stable size with respect to the dimension of the mean vector, and is much more powerful than the modified Hotelling $T^2$-test.

preprint2013arXiv

Estimation of Extreme Quantiles for Functions of Dependent Random Variables

We propose a new method for estimating the extreme quantiles for a function of several dependent random variables. In contrast to the conventional approach based on extreme value theory, we do not impose the condition that the tail of the underlying distribution admits an approximate parametric form, and, furthermore, our estimation makes use of the full observed data. The proposed method is semiparametric as no parametric forms are assumed on all the marginal distributions. But we select appropriate bivariate copulas to model the joint dependence structure by taking the advantage of the recent development in constructing large dimensional vine copulas. Consequently a sample quantile resulted from a large bootstrap sample drawn from the fitted joint distribution is taken as the estimator for the extreme quantile. This estimator is proved to be consistent. The reliable and robust performance of the proposed method is further illustrated by simulation.

preprint2013arXiv

Parameter estimation and model testing for Markov processes via conditional characteristic functions

Markov processes are used in a wide range of disciplines, including finance. The transition densities of these processes are often unknown. However, the conditional characteristic functions are more likely to be available, especially for Lévy-driven processes. We propose an empirical likelihood approach, for both parameter estimation and model specification testing, based on the conditional characteristic function for processes with either continuous or discontinuous sample paths. Theoretical properties of the empirical likelihood estimator for parameters and a smoothed empirical likelihood ratio test for a parametric specification of the process are provided. Simulations and empirical case studies are carried out to confirm the effectiveness of the proposed estimator and test.

preprint2013arXiv

Tests for covariance matrix with fixed or divergent dimension

Testing covariance structure is of importance in many areas of statistical analysis, such as microarray analysis and signal processing. Conventional tests for finite-dimensional covariance cannot be applied to high-dimensional data in general, and tests for high-dimensional covariance in the literature usually depend on some special structure of the matrix. In this paper, we propose some empirical likelihood ratio tests for testing whether a covariance matrix equals a given one or has a banded structure. The asymptotic distributions of the new tests are independent of the dimension.

preprint2013arXiv

Weighted estimation of the dependence function for an extreme-value distribution

Bivariate extreme-value distributions have been used in modeling extremes in environmental sciences and risk management. An important issue is estimating the dependence function, such as the Pickands dependence function. Some estimators for the Pickands dependence function have been studied by assuming that the marginals are known. Recently, Genest and Segers [Ann. Statist. 37 (2009) 2990-3022] derived the asymptotic distributions of those proposed estimators with marginal distributions replaced by the empirical distributions. In this article, we propose a class of weighted estimators including those of Genest and Segers (2009) as special cases. We propose a jackknife empirical likelihood method for constructing confidence intervals for the Pickands dependence function, which avoids estimating the complicated asymptotic variance. A simulation study demonstrates the effectiveness of our proposed jackknife empirical likelihood method.

preprint2011arXiv

The scattering of a cylindrical invisibility cloak: reduced parameters and optimization

We investigate the scattering of 2D cylindrical invisibility cloaks with simplified constitutive parameters with the assistance of scattering coefficients. We show that the scattering of the cloaks originates not only from the boundary conditions but also from the spatial variation of the component of permittivity/permeability. According to our formulation, we propose some restrictions to the invisibility cloak in order to minimize its scattering after the simplification has taken place. With our theoretical analysis, it is possible to design a simplified cloak by using some peculiar composites like photonic crystals (PCs) which mimic an effective refractive index landscape rather than offering effective constitutives, meanwhile canceling the scattering from the inner and outer boundaries.

preprint2010arXiv

Achieving Anisotropy in Metamaterials made of Dielectric Cylindrical Rods

We show that anisotropic negative effective dispersion relation can be achieved in pure dielectric rod-type metamaterials by turning from the symmetry of a square lattice to that of a rectangular one, i.e. by breaking the rotation symmetry of effective homogeneous medium. Theoretical predictions and conclusions are verified by both numerical calculations and computer based simulations. The proposed anisotropic metamaterial, is used to construct a refocusing slab-lens and a subdiffraction hyperlens. The all-dielectric origin makes it more straightforward to address loss and scaling, two major issues of metallic structures, thus facilitating future applications in both the terahertz and optical range.

preprint2010arXiv

Nearly-zero transmission through periodically modulated ultrathin metal films

Transmission of light through an optically ultrathin metal film with a thickness comparable to its skin depth is significant. We demonstrate experimentally nearly-zero transmission of light through a film periodically modulated by a one-dimensional array of subwavelength slits. The suppressed optical transmission is due to the excitation of surface plasmon polaritons and the zero-transmission phenomenon is strongly dependent on the polarization of the incident wave.

preprint2009arXiv

Enhanced transmission of transverse electric waves through periodic arrays of structured subwavelength apertures

Transmission through sub-wavelength apertures in perfect metals is expected to be strongly suppressed. However, by structural engineering of the apertures, we numerically demonstrate that the transmission of transverse electric waves through periodic arrays of subwavelength apertures in a thin metallic film can be significantly enhanced. Based on equivalent circuit theory analysis, periodic arrays of square structured subwavelength apertures are obtained with a 1900-fold transmission enhancement factor when the side length $a$ of the apertures is 10 times smaller than the wavelength ($a/λ=0.1$). By examining the induced surface currents and investigating the influence of the lattice constant and the incident angle to the resonant frequency, we show that the enhancement is due to the excitation of the strong localized resonant modes of the structured apertures.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Computer Vision physics.optics math.ST Statistics Theory Methodology Applications Machine Learning math.PR physics.app-ph q-fin.ST Social and Information Networks

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2605.01498:author:1:liang-peng

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.10730:author:12:liang-peng

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.13565:author:7:liang-peng

Imported May 20, 2026Synced May 20, 2026

3 works

Deng Cai

Researcher

Deng Cai contributes to research discovery and scholarly infrastructure.

Open to collaborate

3 works

Lixin Ran

Researcher

Lixin Ran contributes to research discovery and scholarly infrastructure.

Open to collaborate

3 works

Niels Asger Mortensen

Researcher

Niels Asger Mortensen contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Chenfei Wu

Researcher

Chenfei Wu contributes to research discovery and scholarly infrastructure.

Open to collaborate

Liang Peng

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

Qwen-Image-2.0 Technical Report

Qwen-Image-VAE-2.0 Technical Report

Towards Visual Query Localization in the 3D World

DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection

FedNI: Federated Graph Learning with Network Inpainting for Population-Based Disease Prediction

Multi-level Feature Learning for Contrastive Multi-view Clustering

Sparse Fuse Dense: Towards High Quality 3D Detection with Depth Completion

Ultra-wideband Antireflection Assisted by Continuously Varying Temporal Medium

WeakM3D: Towards Weakly Supervised Monocular 3D Object Detection

Dynamic Bivariate Normal Copula

Inference for a Special Bilinear Time Series Model

Maxima of a triangular array of multivariate Gaussian sequence

Predictive regressions for macroeconomic data

Test for a Mean Vector with Fixed or Divergent Dimension

Estimation of Extreme Quantiles for Functions of Dependent Random Variables

Parameter estimation and model testing for Markov processes via conditional characteristic functions

Tests for covariance matrix with fixed or divergent dimension

Weighted estimation of the dependence function for an extreme-value distribution

The scattering of a cylindrical invisibility cloak: reduced parameters and optimization

Achieving Anisotropy in Metamaterials made of Dielectric Cylindrical Rods

Nearly-zero transmission through periodically modulated ultrathin metal films

Enhanced transmission of transverse electric waves through periodic arrays of structured subwavelength apertures