Source author record

Wanting Xu

Wanting Xu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Distributed, Parallel, and Cluster Computing Genomics Machine Learning physics.optics quant-ph

Catalog footprint

What is connected

3works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AdaptiveLoad: Towards Efficient Video Diffusion Transformer Training

In video generation models, particularly world models, training large-scale video diffusion Transformers (such as DiT and MMDiT) poses significant computational challenges due to the extreme variance in sequence lengths within mixed-mode datasets. Existing bucket-based data loading strategies typically rely on "equal token length" constraints. This approach fails to account for the quadratic complexity of self-attention mechanisms, leading to severe load imbalance and underutilization of GPU resources. This paper proposes \textit{AdaptiveLoad}, an integrated optimization framework consisting of two core components: (1) A dual-constraint adaptive load balancing system, which eliminates long-sequence bottlenecks by simultaneously limiting memory consumption and computational load ($B \times S^p \le M_{\text{comp}}$); (2) A fused LayerNorm-Modulate CUDA kernel, which utilizes a D-tile coalesced reduction strategy to increase throughput and alleviate memory pressure. Experimental results on the Wan 2.1 world model demonstrate that our method reduces the computational imbalance rate from 39\% to 18.9\%, improves peak VRAM utilization efficiency by 22.7\%, and achieves an overall training throughput increase of 27.2\%.

preprint2020arXiv

Two-photon superbunching effect of broadband chaotic stationary light at femtosecond timescale based on cascaded Michelson interferometer

It is challenging for observing superbunching effect with true chaotic light, here we propose and demonstrate a method to achieve superbunching effect of the degree of second-order coherence is 2.42 with broadband stationary chaotic light based on a cascaded Michelson interferometer (CMI), exceeding the theoretical upper limit of 2 for the two-photon bunching effect of chaotic light. The superbunching correlation peak is measured with an ultrafast two-photon absorption detector which the full width at half maximum reaches about 95 fs. Two-photon superbunching theory in a CMI is developed to interpret the effect and is in agreement with experimental results. The theory also predicts that the degree of second-order coherence can be much greater than $2$ if chaotic light propagates $N$ times in a CMI. Finally, a new type of weak signals detection setup which employs broadband chaotic light circulating in a CMI is proposed. Theoretically, it can increase the detection sensitivity of weak signals 79 times after the chaotic light circulating 100 times in the CMI.

preprint2012arXiv

Differential Expression Analysis for A Mouse p53KO Microarray Dataset

Affymetrix GeneChip technology is used to detect gene expression levels in samples of cells under different conditions. In this project, we analyzed the gene expression profiling data for mouse induced pluripotent stem cell (iPSCs) (Takahashi, 2006) on Affymetrix Mouse 430 2.0 GeneChip. Three biological conditions were present: p53KO, microRNA mir34aKO, and wild type, each with three biological replicates. The first part was devoted to identifying differentially expressed genes from around 45,000 of them, and looking into their biological meanings by pathway analysis. The second part dealt with repetitive elements represented in the pool of mRNAs. We identified repetitive elements that show a significant difference between two biological conditions. Both the comparison of p53KO versus WT and mir34aKO versus WT were done. However, the emphasis was on the former. Laboratory validation with qPCR confirmed our findings. This work was done under the Overseas Research Fellowship (ORF) Scheme 2012 for Science Students by the Faculty of Science, The University of Hong Kong. Many thanks are due to the University for the fellowship, and to Professors Terry Speed and Lin He and Drs Chao-po Lin and Anne Biton of the University of California at Berkeley for their supervision and generous support.