Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
17works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

17 published item(s)

preprint2026arXiv

AllocMV: Optimal Resource Allocation for Music Video Generation via Structured Persistent State

Generating long-horizon music videos (MVs) is frequently constrained by prohibitive computational costs and difficulty maintaining cross-shot consistency. We propose AllocMV, a hierarchical framework formulating music video synthesis as a Multiple-Choice Knapsack Problem (MCKP). AllocMV represents the video's persistent state as a compact, structured object comprising character entities, scene priors, and sharing graphs, produced by a global planner prior to realization. By estimating segment saliency from multimodal cues, a group-level MCKP solver based on dynamic programming optimally allocates resources across High-Gen, Mid-Gen, and Reuse branches. For repetitive musical motifs, we implement a divergence-based forking strategy that reuses visual prefixes to reduce costs while ensuring motif-level continuity. Evaluated via the Cost-Quality Ratio (CQR), AllocMV achieves an optimal trade-off between perceived quality and resource expenditure under strict budgetary and rhythmic constraints.

preprint2022arXiv

A resource-efficient deep learning framework for low-dose brain PET image reconstruction and analysis

18F-fluorodeoxyglucose (18F-FDG) Positron Emission Tomography (PET) imaging usually needs a full-dose radioactive tracer to obtain satisfactory diagnostic results, which raises concerns about the potential health risks of radiation exposure, especially for pediatric patients. Reconstructing the low-dose PET (L-PET) images to the high-quality full-dose PET (F-PET) ones is an effective way that both reduces the radiation exposure and remains diagnostic accuracy. In this paper, we propose a resource-efficient deep learning framework for L-PET reconstruction and analysis, referred to as transGAN-SDAM, to generate F-PET from corresponding L-PET, and quantify the standard uptake value ratios (SUVRs) of these generated F-PET at whole brain. The transGAN-SDAM consists of two modules: a transformer-encoded Generative Adversarial Network (transGAN) and a Spatial Deformable Aggregation Module (SDAM). The transGAN generates higher quality F-PET images, and then the SDAM integrates the spatial information of a sequence of generated F-PET slices to synthesize whole-brain F-PET images. Experimental results demonstrate the superiority and rationality of our approach.

preprint2022arXiv

Activate index: an integrated index to reveal disrupted brain network organizations of major depressive disorder patients

Altered functional brain networks have been a typical manifestation that distinguishes major depressive disorder (MDD) patients from healthy control (HC) subjects in functional magnetic resonance imaging (fMRI) studies. Recently, rich club and diverse club metrics have been proposed for network or network neuroscience analyses. The rich club defines a set of nodes that tend to be the hubs of specific communities, and the diverse club defines the nodes that span more communities and have edges diversely distributed across different communities. Considering the heterogeneity of rich clubs and diverse clubs, combining them and on the basis to derive a novel indicator may reveal new evidence of brain functional integration and separation, which might provide new insights into MDD. This study for the first time discussed the differences between MDD and HC using both rich club and diverse club metrics and found the complementarity of them in analyzing brain networks. Besides, a novel index, termed "active index", has been proposed in this study. The active index defines a group of nodes that tend to be diversely distributed across communities while avoiding being a hub of a community. Experimental results demonstrate the superiority of active index in analyzing MDD brain mechanisms.

preprint2022arXiv

Deep Decomposition Network for Image Processing: A Case Study for Visible and Infrared Image Fusion

Image decomposition is a crucial subject in the field of image processing. It can extract salient features from the source image. We propose a new image decomposition method based on convolutional neural network. This method can be applied to many image processing tasks. In this paper, we apply the image decomposition network to the image fusion task. We input infrared image and visible light image and decompose them into three high-frequency feature images and a low-frequency feature image respectively. The two sets of feature images are fused using a specific fusion strategy to obtain fusion feature images. Finally, the feature images are reconstructed to obtain the fused image. Compared with the state-of-the-art fusion methods, this method has achieved better performance in both subjective and objective evaluation.

preprint2022arXiv

LDoS attack detection method based on traffic time-frequency characteristics

For the traditional denial-of-service attack detection methods have complex algorithms and high computational overhead, which are difficult to meet the demand of online detection; and the experimental environment is mostly a simulation platform, which is difficult to deploy in real network environment, we propose a real network environment-oriented LDoS attack detection method based on the time-frequency characteristics of traffic data. All the traffic data flowing through the Web server is obtained through the acquisition storage system, and the detection data set is constructed using pre-processing; the simple features of the flow fragments are used as input, and the deep neural network is used to learn the time-frequency domain features of normal traffic features and generate reconstructed sequences, and the LDoS attack is discriminated based on the differences between the reconstructed sequences and the input data in the time-frequency domain. The experimental results show that the proposed method can accurately detect the attack features in the flow fragments in a very short time and achieve high detection accuracy for complex and diverse LDoS attacks; since only the statistical features of the packets are used, there is no need to parse the packet data, which can be adapted to different network environments.

preprint2022arXiv

Network Traffic Anomaly Detection Method Based on Multi scale Residual Feature

To address the problem that traditional network traffic anomaly detection algorithms do not suffi-ciently mine potential features in long time domain, an anomaly detection method based on mul-ti-scale residual features of network traffic is proposed. The original traffic is divided into subse-quences of different time spans using sliding windows, and each subsequence is decomposed and reconstructed into data sequences of different levels using wavelet transform technique; the stacked autoencoder (SAE) constructs similar feature space using normal network traffic, and gen-erates reconstructed error vector using the difference between reconstructed samples and input samples in the similar feature space; the multi-path residual group is used to learn reconstructed error The traffic classification is completed by a lightweight classifier. The experimental results show that the detection performance of the proposed method for anomalous network traffic is sig-nificantly improved compared with traditional methods; it confirms that the longer time span and more S transformation scales have positive effects on discovering potential diversity information in the original network traffic.

preprint2022arXiv

OTFPF: Optimal Transport-Based Feature Pyramid Fusion Network for Brain Age Estimation with 3D Overlapped ConvNeXt

Chronological age of healthy brain is able to be predicted using deep neural networks from T1-weighted magnetic resonance images (T1 MRIs), and the predicted brain age could serve as an effective biomarker for detecting aging-related diseases or disorders. In this paper, we propose an end-to-end neural network architecture, referred to as optimal transport based feature pyramid fusion (OTFPF) network, for the brain age estimation with T1 MRIs. The OTFPF consists of three types of modules: Optimal Transport based Feature Pyramid Fusion (OTFPF) module, 3D overlapped ConvNeXt (3D OL-ConvNeXt) module and fusion module. These modules strengthen the OTFPF network's understanding of each brain's semi-multimodal and multi-level feature pyramid information, and significantly improve its estimation performances. Comparing with recent state-of-the-art models, the proposed OTFPF converges faster and performs better. The experiments with 11,728 MRIs aged 3-97 years show that OTFPF network could provide accurate brain age estimation, yielding mean absolute error (MAE) of 2.097, Pearson's correlation coefficient (PCC) of 0.993 and Spearman's rank correlation coefficient (SRCC) of 0.989, between the estimated and chronological ages. Widespread quantitative experiments and ablation experiments demonstrate the superiority and rationality of OTFPF network. The codes and implement details will be released on GitHub: https://github.com/ZJU-Brain/OTFPF after final decision.

preprint2022arXiv

PPT Fusion: Pyramid Patch Transformerfor a Case Study in Image Fusion

The Transformer architecture has witnessed a rapid development in recent years, outperforming the CNN architectures in many computer vision tasks, as exemplified by the Vision Transformers (ViT) for image classification. However, existing visual transformer models aim to extract semantic information for high-level tasks, such as classification and detection.These methods ignore the importance of the spatial resolution of the input image, thus sacrificing the local correlation information of neighboring pixels. In this paper, we propose a Patch Pyramid Transformer(PPT) to effectively address the above issues.Specifically, we first design a Patch Transformer to transform the image into a sequence of patches, where transformer encoding is performed for each patch to extract local representations. In addition, we construct a Pyramid Transformer to effectively extract the non-local information from the entire image. After obtaining a set of multi-scale, multi-dimensional, and multi-angle features of the original image, we design the image reconstruction network to ensure that the features can be reconstructed into the original input. To validate the effectiveness, we apply the proposed Patch Pyramid Transformer to image fusion tasks. The experimental results demonstrate its superior performance, compared to the state-of-the-art fusion approaches, achieving the best results on several evaluation indicators. Thanks to the underlying representational capacity of the PPT network, it can directly be applied to different image fusion tasks without redesigning or retraining the network.

preprint2022arXiv

Spectrum-Energy-Economy Efficiency Trade-off of Wireless Communication Systems with Separated Indoor/Outdoor Scenarios for 5G and B5G

In this paper, we consider a heterogeneous 5G cellular architecture that separates the outdoor and indoor scenarios and in particular study the trade-off between the spectrum efficiency (SE), energy efficiency (EE), economy efficiency (ECE). Mathematical expressions for the system capacity, EE, SE, and ECE respectively are derived using a proposed realistic power consumption model. The comparison of system performance in terms of SE, EE, and ECE, results in the observation that the proposed network architecture, which separates the outdoor and indoor scenarios, offers a promising solution for future communication systems that have strict requirements on the data rate and efficiency.

preprint2021arXiv

A Dual-branch Network for Infrared and Visible Image Fusion

Deep learning is a rapidly developing approach in the field of infrared and visible image fusion. In this context, the use of dense blocks in deep networks significantly improves the utilization of shallow information, and the combination of the Generative Adversarial Network (GAN) also improves the fusion performance of two source images. We propose a new method based on dense blocks and GANs , and we directly insert the input image-visible light image in each layer of the entire network. We use SSIM and gradient loss functions that are more consistent with perception instead of mean square error loss. After the adversarial training between the generator and the discriminator, we show that a trained end-to-end fusion network -- the generator network -- is finally obtained. Our experiments show that the fused images obtained by our approach achieve good score based on multiple evaluation indicators. Further, our fused images have better visual effects in multiple sets of contrasts, which are more satisfying to human visual perception.

preprint2021arXiv

RCoNet: Deformable Mutual Information Maximization and High-order Uncertainty-aware Learning for Robust COVID-19 Detection

The novel 2019 Coronavirus (COVID-19) infection has spread world widely and is currently a major healthcare challenge around the world. Chest Computed Tomography (CT) and X-ray images have been well recognized to be two effective techniques for clinical COVID-19 disease diagnoses. Due to faster imaging time and considerably lower cost than CT, detecting COVID-19 in chest X-ray (CXR) images is preferred for efficient diagnosis, assessment and treatment. However, considering the similarity between COVID-19 and pneumonia, CXR samples with deep features distributed near category boundaries are easily misclassified by the hyper-planes learned from limited training data. Moreover, most existing approaches for COVID-19 detection focus on the accuracy of prediction and overlook the uncertainty estimation, which is particularly important when dealing with noisy datasets. To alleviate these concerns, we propose a novel deep network named {\em RCoNet$^k_s$} for robust COVID-19 detection which employs {\em Deformable Mutual Information Maximization} (DeIM), {\em Mixed High-order Moment Feature} (MHMF) and {\em Multi-expert Uncertainty-aware Learning} (MUL). With DeIM, the mutual information (MI) between input data and the corresponding latent representations can be well estimated and maximized to capture compact and disentangled representational characteristics. Meanwhile, MHMF can fully explore the benefits of using high-order statistics and extract discriminative features of complex distributions in medical imaging. Finally, MUL creates multiple parallel dropout networks for each CXR image to evaluate uncertainty and thus prevent performance degradation caused by the noise in the data.

preprint2021arXiv

Thermalization of non-abelian gauge theories at next-to-leading order

We provide the first next-to-leading-order (NLO) weak-coupling description of the thermalization process of far-from-equilibrium systems in non-abelian gauge theory. We study isotropic systems starting from either over- or under-occupied initial conditions and follow their time evolution towards thermal equilibrium by numerically solving the QCD effective kinetic theory at NLO accuracy. We find that the NLO corrections remain well under control for a wide range of couplings and that the overall effect of NLO corrections is to reduce the time needed to reach thermal equilibrium in the systems considered.

preprint2020arXiv

BER Performance of Spatial Modulation Systems Under a Non-Stationary Massive MIMO Channel Model

In this paper, the bit error rate (BER) performance of spatial modulation (SM) systems is investigated both theoretically and by simulation in a non-stationary Kronecker-based massive multiple-input-multiple-output (MIMO) channel model in multi-user (MU) scenarios. Massive MIMO SM systems are considered in this paper using both a time-division multiple access (TDMA) scheme and a block diagonalization (BD) based precoding scheme, for different system settings. Their performance is compared with a vertical Bell labs layered space-time (V-BLAST) architecture based system and a conventional channel inversion system. It is observed that a higher cluster evolution factor can result in better BER performance of SM systems due to the low correlation among sub-channels. Compared with the BD-SM system, the SM system using the TDMA scheme obtains a better BER performance but with a much lower total system data rate. The BD-MU-SM system achieves the best trade-off between the data rate and the BER performance among all of the systems considered. When compared with the V-BLAST system and the channel inversion system, SM approaches offer advantages in performance for MU massive MIMO systems.

preprint2020arXiv

End-to-End Energy Efficiency Evaluation for B5G Ultra Dense Networks

Energy efficiency (EE) is a major performance metric for fifth generation (5G) and beyond 5G (B5G) wireless communication systems, especially for ultra dense networks. This paper proposes an end-to-end (e2e) power consumption model and studies the energy efficiency for a heterogeneous B5G cellular architecture that separates the indoor and outdoor communication scenarios in ultra dense networks. In this work, massive multiple-input-multiple-output (MIMO) technologies at conventional sub-6 GHz frequencies are used for long-distance outdoor communications. Light-Fidelity (LiFi) and millimeter wave (mmWave) technologies are deployed to provide a high data rate service to indoor users. Whereas, in the referenced nonseparated system, the indoor users communicate with the outdoor massive MIMO macro base station directly. The performance of these two systems are evaluated and compared in terms of the total power consumption and energy efficiency. The results show that the network architecture which separates indoor and outdoor communication can support a higher data rate transmission for less energy consumption, compared to non-separate communication scenario. In addition, the results show that deploying LiFi and mmWave IAPs can enable users to transmit at a higher data rate and further improve the EE.

preprint2020arXiv

Independence of the inverse spin Hall effect with the magnetic phase in thin NiCu films

Large spin Hall angles have been observed in 3d ferromagnets, but their origin, and especially their link with the ferromagnetic order, remain unclear. Here, we investigate the inverse spin Hall effect of Ni60Cu40 and Ni50Cu50 across their Curie temperature using spin pumping experiments. We evidence that the inverse spin Hall effect in these samples is comparable to that of platinum, and that it is insensitive to the magnetic order. These results points towards a Heisenberg localized model of the transition, and suggest that the large spin Hall effects in 3d ferromagnets can be independent of the magnetic phase.

preprint2020arXiv

On Chen's biharmonic conjecture for hypersurfaces in $\mathbb R^5$

A longstanding conjecture on biharmonic submanifolds, proposed by Chen in 1991, is that {\it any biharmonic submanifold in a Euclidean space is minimal}. In the case of a hypersurface $M^n$ in $\mathbb R^{n+1}$, Chen's conjecture was settled in the case of $n=2$ by Chen and Jiang around 1987 independently. Hasanis and Vlachos in 1995 settled Chen's conjecture for a hypersurface with $n=3$. However, the general Chen's conjecture on a hypersurface $M^n$ remains open for $n> 3$. In this paper, we settle Chen's conjecture for hypersurfaces in $\mathbb R^{5}$ for $n=4$.

preprint2019arXiv

Deep Model with Siamese Network for Viability and Necrosis Tumor Assessment in Osteosarcoma

Osteosarcoma is the most common primary malignant bone tumor, which has high mortality due to easy lung metastasis. Osteosarcoma is a highly anaplastic, pleomorphic tumor with a variety of tumor cell morphology, including fusiform, oval, epithelial, lymphocyte like, small round, transparent cells, etc. Due to the multiple patterns of osteosarcoma cell morphology, pathologists have differences in the classification (viable tumor, necrotic tumor, non-tumor) of osteosarcoma. Therefore, automatic and accurate recognition algorithms can help pathologists greatly reduce time and improve diagnostic accuracy. In recent years, deep learning technology has made great progress in the field of natural images and medical images, and has achieved excellent results beyond human performance in classification. In this paper, we propose a Deep Model with Siamese Network (DS-Net) for automatic classification in Hematoxylin and Eosin (H&E) stained osteosarcoma histology images.