Source author record

Siyu Wang

Siyu Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

14works

20topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

HiDream-O1-Image: A Natively Unified Image Generative Foundation Model with Pixel-level Unified Transformer

The evolution of visual generative models has long been constrained by fragmented architectures relying on disjoint text encoders and external VAEs. In this report, we present HiDream-O1-Image, a natively unified generative foundation model via pixel-space Diffusion Transformer, that pioneers a paradigm shift from modular architectures to an end-to-end in-context visual generation engine. By mapping raw image pixels, text tokens, and task-specific conditions into a single shared token space, HiDream-O1-Image achieves a structural unification of multimodal inputs within an Unified Transformer (UiT) architecture. This native encoding paradigm eliminates the need for separate VAEs or disjoint pre-trained text encoders, allowing the model to treat diverse generation and editing tasks as a consistent in-context reasoning process. Extensive experiments show that HiDream-O1-Image excels across various generation tasks, including text-to-image generation, instruction-based editing, and subject-driven personalization. Notably, with only 8B parameters, HiDream-O1-Image (8B) achieves performance parity with or even surpasses established state-of-the-art models with significantly larger parameters (e.g., 27B Qwen-Image). Crucially, to validate the immense scalability of this paradigm, we successfully scale the architecture up to over 200B parameters. Experimental results demonstrate that this massive-scale version HiDream-O1-Image-Pro (200B+) unlocks unprecedented generative capabilities and superior performance, establishing new state-of-the-art benchmarks. Ultimately, HiDream-O1-Image highlights the immense potential of natively unified architectures and charts a highly scalable path toward next-generation multimodal AI.

preprint2026arXiv

Intelligent Elastic Feature Fading: Enabling Model Retrain-Free Feature Efficiency Rollouts at Scale

Large-scale ranking systems depend on thousands of features derived from user behavior across multiple time horizons. Typically requires model retraining -- resulting in long iteration cycles (3--6 months), substantial GPU resource consumption, and limited rollout throughput. We introduce Intelligent Elastic Feature Fading (IEFF), a production infrastructure system that enables retrain-free feature efficiency rollouts by elastically controlling feature coverage and distribution at serving time. IEFF supports incremental feature coverage adjustments while models adapt through recurring training, eliminating dependencies on explicit retraining cycles. The system incorporates strict safety guardrails, reversibility mechanisms, and comprehensive monitoring to ensure stability at scale. Across multiple production use cases, IEFF accelerates efficiency-related rollouts by 5$\times$, eliminates retraining-related GPU overhead, and enables faster capacity recycling. Extensive offline and online experiments demonstrate that gradual feature fading prevents 50--55\% of online performance degradation compared to abrupt feature removal, while maintaining stable model behavior. These results establish elastic, system-level feature fading as a practical and scalable approach for managing feature efficiency in modern industrial ranking systems.

preprint2026arXiv

The RoboSense Challenge: Sense Anything, Navigate Anywhere, Adapt Across Platforms

Autonomous systems are increasingly deployed in open and dynamic environments -- from city streets to aerial and indoor spaces -- where perception models must remain reliable under sensor noise, environmental variation, and platform shifts. However, even state-of-the-art methods often degrade under unseen conditions, highlighting the need for robust and generalizable robot sensing. The RoboSense 2025 Challenge is designed to advance robustness and adaptability in robot perception across diverse sensing scenarios. It unifies five complementary research tracks spanning language-grounded decision making, socially compliant navigation, sensor configuration generalization, cross-view and cross-modal correspondence, and cross-platform 3D perception. Together, these tasks form a comprehensive benchmark for evaluating real-world sensing reliability under domain shifts, sensor failures, and platform discrepancies. RoboSense 2025 provides standardized datasets, baseline models, and unified evaluation protocols, enabling large-scale and reproducible comparison of robust perception methods. The challenge attracted 143 teams from 85 institutions across 16 countries, reflecting broad community engagement. By consolidating insights from 23 winning solutions, this report highlights emerging methodological trends, shared design principles, and open challenges across all tracks, marking a step toward building robots that can sense reliably, act robustly, and adapt across platforms in real-world environments.

preprint2022arXiv

Fast Lossless Neural Compression with Integer-Only Discrete Flows

By applying entropy codecs with learned data distributions, neural compressors have significantly outperformed traditional codecs in terms of compression ratio. However, the high inference latency of neural networks hinders the deployment of neural compressors in practical applications. In this work, we propose Integer-only Discrete Flows (IODF), an efficient neural compressor with integer-only arithmetic. Our work is built upon integer discrete flows, which consists of invertible transformations between discrete random variables. We propose efficient invertible transformations with integer-only arithmetic based on 8-bit quantization. Our invertible transformation is equipped with learnable binary gates to remove redundant filters during inference. We deploy IODF with TensorRT on GPUs, achieving 10x inference speedup compared to the fastest existing neural compressors, while retaining the high compression rates on ImageNet32 and ImageNet64.

preprint2022arXiv

Model-agnostic Counterfactual Synthesis Policy for Interactive Recommendation

Interactive recommendation is able to learn from the interactive processes between users and systems to confront the dynamic interests of users. Recent advances have convinced that the ability of reinforcement learning to handle the dynamic process can be effectively applied in the interactive recommendation. However, the sparsity of interactive data may hamper the performance of the system. We propose to train a Model-agnostic Counterfactual Synthesis Policy to generate counterfactual data and address the data sparsity problem by modelling from observation and counterfactual distribution. The proposed policy can identify and replace the trivial components for any state in the training process with other agents, which can be deployed in any RL-based algorithm. The experimental results demonstrate the effectiveness and generality of our proposed policy.

preprint2022arXiv

Two weight inequality for Hankel form on weighted Bergman spaces induced by doubling weights

The boundedness of the small Hankel operator $h_f^ν(g)=P_ν(f\bar{g})$, induced by an analytic symbol $f$ and the Bergman projection $P_ν$ associated to $ν$, acting from the weighted Bergman space $A^p_\om$ to $A^q_ν$ is characterized on the full range $0<p,q<\infty$ when $ω,ν$ belong to the class $\mathcal{D}$ of radial weights admitting certain two-sided doubling conditions. Certain results obtained are equivalent to the boundedness of bilinear Hankel forms, which are in turn used to establish the weak factorization $A_η^{q}=A_ω^{p_{1}}\odot A_ν^{p_{2}}$, where $1<q,p_{1},p_{2}<\infty$ such that $q^{-1}=p_{1}^{-1}+p_{2}^{-1}$ and $\widetildeη^{\frac{1}{q}}\asymp\widetildeω^{\frac{1}{p_{1}}}\widetildeν^{\frac{1}{p_{2}}}$. Here $\widetildeτ(r)=\int_r^1τ(t)\,dt/(1-t)$ for all $0\le r<1$.

preprint2021arXiv

Refractive index and extinction coefficient of vapor-deposited water ice in the UV-Vis range

Laboratory results of the optical properties of vapor-deposited water ice, specifically the refractive index and extinction coefficient, are available mainly for a selective set of wavelengths and a limited number of deposition temperatures. Experimental limitations are the main reason for the lack of broadband data, which is unfortunate as these quantities are needed to interpret and predict astronomical and planetary observations. The goal of this work is to address these lacking data, using an experimental broadband method that is capable of rapidly providing reliable water ice data across the entire UV-visible range. This approach combines the simultaneous use of a monochromatic HeNe laser and a broadband Xe-arc lamp to record interference fringes of water ice during deposition at astronomically relevant ice temperatures. The ice thickness is typically more than 20 $μ$m. Analyzing the period and intensity patterns combining both the monochromatic and broadband interference patterns allows the determination of the wavelength-dependent refractive index and extinction coefficient. We present accurate refractive index and extinction coefficient graphs for wavelengths between 250 and 750 nm and ices deposited between 30 and 160 K. From our data, we find a possible structural change in the ice in the 110-130 K region that has not been reported before. We also discuss that the data presented in this paper can be used to interpret astronomical observations of icy surfaces.

preprint2020arXiv

Auto-MAP: A DQN Framework for Exploring Distributed Execution Plans for DNN Workloads

The last decade has witnessed growth in the computational requirements for training deep neural networks. Current approaches (e.g., data/model parallelism, pipeline parallelism) parallelize training tasks onto multiple devices. However, these approaches always rely on specific deep learning frameworks and requires elaborate manual design, which make it difficult to maintain and share between different type of models. In this paper, we propose Auto-MAP, a framework for exploring distributed execution plans for DNN workloads, which can automatically discovering fast parallelization strategies through reinforcement learning on IR level of deep learning models. Efficient exploration remains a major challenge for reinforcement learning. We leverage DQN with task-specific pruning strategies to help efficiently explore the search space including optimized strategies. Our evaluation shows that Auto-MAP can find the optimal solution in two hours, while achieving better throughput on several NLP and convolution models.

preprint2020arXiv

DAPPLE: A Pipelined Data Parallel Approach for Training Large Models

It is a challenging task to train large DNN models on sophisticated GPU platforms with diversified interconnect capabilities. Recently, pipelined training has been proposed as an effective approach for improving device utilization. However, there are still several tricky issues to address: improving computing efficiency while ensuring convergence, and reducing memory usage without incurring additional computing costs. We propose DAPPLE, a synchronous training framework which combines data parallelism and pipeline parallelism for large DNN models. It features a novel parallelization strategy planner to solve the partition and placement problems, and explores the optimal hybrid strategy of data and pipeline parallelism. We also propose a new runtime scheduling algorithm to reduce device memory usage, which is orthogonal to re-computation approach and does not come at the expense of training throughput. Experiments show that DAPPLE planner consistently outperforms strategies generated by PipeDream's planner by up to 3.23x under synchronous training scenarios, and DAPPLE runtime outperforms GPipe by 1.6x speedup of training throughput and reduces the memory consumption of 12% at the same time.

preprint2020arXiv

Streaming Language Identification using Combination of Acoustic Representations and ASR Hypotheses

This paper presents our modeling and architecture approaches for building a highly accurate low-latency language identification system to support multilingual spoken queries for voice assistants. A common approach to solve multilingual speech recognition is to run multiple monolingual ASR systems in parallel and rely on a language identification (LID) component that detects the input language. Conventionally, LID relies on acoustic only information to detect input language. We propose an approach that learns and combines acoustic level representations with embeddings estimated on ASR hypotheses resulting in up to 50% relative reduction of identification error rate, compared to a model that uses acoustic only features. Furthermore, to reduce the processing cost and latency, we exploit a streaming architecture to identify the spoken language early when the system reaches a predetermined confidence level, alleviating the need to run multiple ASR systems until the end of input query. The combined acoustic and text LID, coupled with our proposed streaming runtime architecture, results in an average of 1500ms early identification for more than 50% of utterances, with almost no degradation in accuracy. We also show improved results by adopting a semi-supervised learning (SSL) technique using the newly proposed model architecture as a teacher model.

preprint2020arXiv

Toeplitz operators between distinct Bergman spaces

For $-1<α<\infty$, let $ω_α(z)=(1+α)(1-|z|^2)^α$ be the standard weight on the unit disk. In this note, we provide descriptions of the boundedness and compactness for the Toeplitz operators $T_{μ,β}$ between distinct weighted Bergman spaces $L_{a}^{p}(ω_α)$ and $L_{a}^{q}(ω_β)$ when $0<p\leq1$, $q=1$, $-1<α,β<\infty$ and $0<p\leq 1<q<\infty, -1<β\leqα<\infty$, respectively. Our results can be viewed as extensions of Pau and Zhao's work in \cite{Pau}. Moreover, partial of main results are new even in the unweighted settings.

preprint2016arXiv

First results of the new endcap TOF commissioning at BESIII

The upgrade of the current BESIII Endcap TOF (ETOF) is carried out with the Multi-gap Resistive Plate Chamber (MRPC) technology. The installation of the new ETOF has been finished in October 2015. The first results of the MRPCs commissioning at BESIII are reported in this paper.

preprint2016arXiv

The cosmic ray test of MRPCs for the BESIII ETOF upgrade

In order to improve the particle identification capability of the Beijing Spectrometer III (BESIII),t is proposed to upgrade the current endcap time-of-flight (ETOF) detector with multi-gap resistive plate chamber (MRPC) technology. Aiming at extending ETOF overall time resolution better than 100ps, the whole system including MRPC detectors, new-designed Front End Electronics (FEE), CLOCK module, fast control boards and time to digital modules (TDIG), was built up and operated online 3 months under the cosmic ray. The main purposes of cosmic ray test are checking the detectors' construction quality, testing the joint operation of all instruments and guaranteeing the performance of the system. The results imply MRPC time resolution better than 100$ps$, efficiency is about 98$\%$ and the noise rate of strip is lower than 1$Hz/$($scm^{2}$) at normal threshold range, the details are discussed and analyzed specifically in this paper. The test indicates that the whole ETOF system would work well and satisfy the requirements of upgrade.

preprint2016arXiv

The upgrade system of BESIII ETOF with MRPC technology

The Beijing Spectrometer III (BESIII) endcap Time-Of-Filght (ETOF) was proposed to upgrade with Multigap Resistive Plate Chamber (MRPC) technology to substitute the current ETOF of scintillator+PMT for extending time resolutin better than 80 ps and enhance the particle identification capability to satisfy the higher precision requirement of physics. The ETOF system including MRPC modules, front end electronics (FEE), CLOCK module, fast control boards and time to digital modules (TDIG), has been designed, constructed and done some experimental tests seperately. Aiming at examining the quality of entire ETOF system and training the operation of all participated parts, a cosmic ray test system was built at the laboratory and underwent about three months to guarantee performance. In this paper the results will be presented indicating that the entire ETOF system works well and satisfies the requirements of the upgrade.

Siyu Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

HiDream-O1-Image: A Natively Unified Image Generative Foundation Model with Pixel-level Unified Transformer

Intelligent Elastic Feature Fading: Enabling Model Retrain-Free Feature Efficiency Rollouts at Scale

The RoboSense Challenge: Sense Anything, Navigate Anywhere, Adapt Across Platforms

Fast Lossless Neural Compression with Integer-Only Discrete Flows

Model-agnostic Counterfactual Synthesis Policy for Interactive Recommendation

Two weight inequality for Hankel form on weighted Bergman spaces induced by doubling weights

Refractive index and extinction coefficient of vapor-deposited water ice in the UV-Vis range

Auto-MAP: A DQN Framework for Exploring Distributed Execution Plans for DNN Workloads

DAPPLE: A Pipelined Data Parallel Approach for Training Large Models

Streaming Language Identification using Combination of Acoustic Representations and ASR Hypotheses

Toeplitz operators between distinct Bergman spaces

First results of the new endcap TOF commissioning at BESIII

The cosmic ray test of MRPCs for the BESIII ETOF upgrade

The upgrade system of BESIII ETOF with MRPC technology