Source author record

Rui Zheng

Rui Zheng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language Computer Vision eess.IV hep-ph physics.med-ph astro-ph.CO cond-mat.mtrl-sci eess.AS gr-qc hep-ex hep-th Information Theory math.IT Multimedia physics.optics physics.soc-ph Social and Information Networks Software Engineering

Catalog footprint

What is connected

13works

19topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development

The evolution of Large Language Models (LLMs) into autonomous agents has expanded the scope of AI coding from localized code generation to complex, repository-level, and execution-driven problem solving. However, current benchmarks predominantly evaluate code logic in static contexts, neglecting the dynamic, full-process requirements of real-world engineering, particularly in backend development which demands rigorous environment configuration and service deployment. To address this gap, we introduce ABC-Bench, a benchmark explicitly designed to evaluate agentic backend coding within a realistic, executable workflow. Using a scalable automated pipeline, we curated 224 practical tasks spanning 8 languages and 19 frameworks from open-source repositories. Distinct from previous evaluations, ABC-Bench require the agents to manage the entire development lifecycle from repository exploration to instantiating containerized services and pass the external end-to-end API tests. Our extensive evaluation reveals that even state-of-the-art models struggle to deliver reliable performance on these holistic tasks, highlighting a substantial disparity between current model capabilities and the demands of practical backend engineering. Our code is available at https://github.com/OpenMOSS/ABC-Bench.

preprint2026arXiv

HiDream-O1-Image: A Natively Unified Image Generative Foundation Model with Pixel-level Unified Transformer

The evolution of visual generative models has long been constrained by fragmented architectures relying on disjoint text encoders and external VAEs. In this report, we present HiDream-O1-Image, a natively unified generative foundation model via pixel-space Diffusion Transformer, that pioneers a paradigm shift from modular architectures to an end-to-end in-context visual generation engine. By mapping raw image pixels, text tokens, and task-specific conditions into a single shared token space, HiDream-O1-Image achieves a structural unification of multimodal inputs within an Unified Transformer (UiT) architecture. This native encoding paradigm eliminates the need for separate VAEs or disjoint pre-trained text encoders, allowing the model to treat diverse generation and editing tasks as a consistent in-context reasoning process. Extensive experiments show that HiDream-O1-Image excels across various generation tasks, including text-to-image generation, instruction-based editing, and subject-driven personalization. Notably, with only 8B parameters, HiDream-O1-Image (8B) achieves performance parity with or even surpasses established state-of-the-art models with significantly larger parameters (e.g., 27B Qwen-Image). Crucially, to validate the immense scalability of this paradigm, we successfully scale the architecture up to over 200B parameters. Experimental results demonstrate that this massive-scale version HiDream-O1-Image-Pro (200B+) unlocks unprecedented generative capabilities and superior performance, establishing new state-of-the-art benchmarks. Ultimately, HiDream-O1-Image highlights the immense potential of natively unified architectures and charts a highly scalable path toward next-generation multimodal AI.

preprint2026arXiv

What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study

Speech-language models (SLMs) offer a promising path toward unifying speech and text understanding and generation. However, challenges remain in achieving effective cross-modal alignment and high-quality speech generation. In this work, we systematically investigate the role of speech tokenizer designs in LLM-centric SLMs, augmented by speech heads and speaker modeling. We compare coupled, semi-decoupled, and fully decoupled speech tokenizers under a fair SLM framework and find that decoupled tokenization significantly improves alignment and synthesis quality. To address the information density mismatch between speech and text, we introduce multi-token prediction (MTP) into SLMs, enabling each hidden state to decode multiple speech tokens. This leads to up to 12$\times$ faster decoding and a substantial drop in word error rate (from 6.07 to 3.01). Furthermore, we propose a speaker-aware generation paradigm and introduce RoleTriviaQA, a large-scale role-playing knowledge QA benchmark with diverse speaker identities. Experiments demonstrate that our methods enhance both knowledge understanding and speaker consistency.

preprint2022arXiv

Automatic spinal curvature measurement on ultrasound spine images using Faster R-CNN

Ultrasound spine imaging technique has been applied to the assessment of spine deformity. However, manual measurements of scoliotic angles on ultrasound images are time-consuming and heavily rely on raters experience. The objectives of this study are to construct a fully automatic framework based on Faster R-CNN for detecting vertebral lamina and to measure the fitting spinal curves from the detected lamina pairs. The framework consisted of two closely linked modules: 1) the lamina detector for identifying and locating each lamina pairs on ultrasound coronal images, and 2) the spinal curvature estimator for calculating the scoliotic angles based on the chain of detected lamina. Two hundred ultrasound images obtained from AIS patients were identified and used for the training and evaluation of the proposed method. The experimental results showed the 0.76 AP on the test set, and the Mean Absolute Difference (MAD) between automatic and manual measurement which was within the clinical acceptance error. Meanwhile the correlation between automatic measurement and Cobb angle from radiographs was 0.79. The results revealed that our proposed technique could provide accurate and reliable automatic curvature measurements on ultrasound spine images for spine deformities.

preprint2022arXiv

Decorrelate Irrelevant, Purify Relevant: Overcome Textual Spurious Correlations from a Feature Perspective

Natural language understanding (NLU) models tend to rely on spurious correlations (i.e., dataset bias) to achieve high performance on in-distribution datasets but poor performance on out-of-distribution ones. Most of the existing debiasing methods often identify and weaken these samples with biased features (i.e., superficial surface features that cause such spurious correlations). However, down-weighting these samples obstructs the model in learning from the non-biased parts of these samples. To tackle this challenge, in this paper, we propose to eliminate spurious correlations in a fine-grained manner from a feature space perspective. Specifically, we introduce Random Fourier Features and weighted re-sampling to decorrelate the dependencies between features to mitigate spurious correlations. After obtaining decorrelated features, we further design a mutual-information-based method to purify them, which forces the model to learn features that are more relevant to tasks. Extensive experiments on two well-studied NLU tasks demonstrate that our method is superior to other comparative approaches.

preprint2022arXiv

Hand-held 3D Photoacoustic Imager with GPS

As an emerging medical diagnostic technology, photoacoustic imaging has been implemented for both preclinical and clinical applications. For clinical convenience, a handheld free scan photoacoustic tomography (PAT) system providing 3D imaging capability is essentially needed, which has potential for surgical navigation and disease diagnosis. In this paper, we proposed a free scan 3D PAT (fsPAT) system based on a handheld linear array ultrasound probe. A global positioning system (GPS) is applied for ultrasound probes coordinate acquisition. The proposed fsPAT can simultaneously realize real time 2D imaging, and large field of view 3D volumetric imaging, which is reconstructed from the multiple 2D images with coordinate information acquired by the GPS. To form a high quality 3D image, a dedicated space transformation method and reconstruction algorithm are used and validated by the proposed system. Both simulation and experimental studies have been performed to prove the feasibility of the proposed fsPAT. To explore its clinical potential, in vivo 3D imaging of human wrist vessels is also conducted, showing clear subcutaneous vessel network with high image contrast.

preprint2022arXiv

VertMatch: A Semi-supervised Framework for Vertebral Structure Detection in 3D Ultrasound Volume

Three-dimensional (3D) ultrasound imaging technique has been applied for scoliosis assessment, but current assessment method only uses coronal projection image and cannot illustrate the 3D deformity and vertebra rotation. The vertebra detection is essential to reveal 3D spine information, but the detection task is challenging due to complex data and limited annotations. We propose VertMatch, a two-step framework to detect vertebral structures in 3D ultrasound volume by utilizing unlabeled data in semi-supervised manner. The first step is to detect the possible positions of structures on transverse slice globally, and then the local patches are cropped based on detected positions. The second step is to distinguish whether the patches contain real vertebral structures and screen the predicted positions from the first step. VertMatch develops three novel components for semi-supervised learning: for position detection in the first step, (1) anatomical prior is used to screen pseudo labels generated from confidence threshold method; (2) multi-slice consistency is used to utilize more unlabeled data by inputting multiple adjacent slices; (3) for patch identification in the second step, the categories are rebalanced in each batch to solve imbalance problem. Experimental results demonstrate that VertMatch can detect vertebra accurately in ultrasound volume and outperforms state-of-the-art methods. VertMatch is also validated in clinical application on forty ultrasound scans, and it can be a promising approach for 3D assessment of scoliosis.

preprint2021arXiv

Assessing Bone Quality of Spine on Children with Scoliosis Using Ultrasound Reflection FAI Method -- A Preliminary Study

Osteopenia is indicated as a common phenomenon in patients who have scoliosis. Quantitative ultrasound (QUS) has been used to assess skeletal status for decades, and recently ultrasound imaging using reflection signals from vertebra were as well applied to measure spinal curvatures on children with scoliosis. The objectives of this study are to develop a new method which can robustly extract a parameter from ultrasound spinal data for estimating bone quality of scoliotic patients and to investigate the potential for the parameter on predicting curve progression. The frequency amplitude index (FAI) was calculated based on the spectrum of the original radio frequency (RF) signals reflected from the tissue-vertebra interface. The correlation between FAI and reflection coefficient was validated using decalcified bovine bone samples in vitro, and the FAIs of scoliotic subjects were investigated in vivo referring to BMI, Cobb angles and curve progression status. The results showed that the intra-rater measures were highly reliable between different trials (ICC=0.997). The FAI value was strongly correlated to the reflection coefficient of bone tissue ($R^{2}=0.824$), and the lower FAI indicated the higher risk of curve progression for the non-mild cases. This preliminary study reported that the FAI method can provide a feasible and promising approach to assess bone quality and monitor curve progression of the patients who have AIS.

preprint2016arXiv

750 GeV Diphoton Resonance in a Vector-like Extension of Hill Model

In this paper, we study the recent 750 GeV diphoton excess in the Hill Model with vector-like fermions, in which the singlet-like Higgs boson is chosen as 750 GeV resonance and is mainly produced by the gluon fusion through vector-like top and bottom quarks. Meanwhile its diphoton decay rate is greatly enhanced by the vector-like lepton. Under the current experimental and theoretical constraints, we present the viable parameter space that fits the 750 GeV diphoton signal strength at 13 TeV LHC. We find that the heavier vector-like fermion masses are, the smaller mixing angle $θ$ is required. The mixing angle of singlet and doublet Higgs bosons is constrained within $|\sinθ| \lesssim 0.15$ in the condition of the perturbative Yukawa couplings. In the allowed parameter space, the 750 GeV diphoton cross section can be maximally enhanced to about 6 fb at 13 TeV LHC.

preprint2016arXiv

Boosted scalar confronting 750 GeV di-photon excess

We consider the di-photon signal arises from two bunches of collimated photon jets emitting from a pair of highly boosted scalars. Following the discussion of detecting the photon jets at the collider, we extend the two-Higgs-doublet model (2HDM) by adding a gauge singlet scalar. To explain the di-photon excess which is recently observed at the first 13 TeV run of the LHC, the mixing between the heavy doublet state and the newly added singlet is crucially needed. After the mixing, one can have a heavy Higgs state $Y_2$ at 750 GeV and a very singlet-like scalar $Y_1$ of sub-GeV, which would be highly boosted through the $Y_2$ decay. Both real singlet and complex singlet extension are studied. It turns out that only the complex model can yield the 1-10 fb cross section in the di-photon final state in accompany with the decay length of the order of 1 m for the $Y_1$. This complex model parametrically predicts the width of 750 GeV resonance $\gtrsim 1$ GeV. In addition, the pseudoscalar component of the singlet in this model is naturally stable and hence could be a dark matter candidate.

preprint2016arXiv

Navigation by anomalous random walks on complex networks

Anomalous random walks having long-range jumps are a critical branch of dynamical processes on networks, which can model a number of search and transport processes. However, traditional measurements based on mean first passage time are not useful as they fail to characterize the cost associated with each jump. Here we introduce a new concept of mean first traverse distance (MFTD) to characterize anomalous random walks that represents the expected traverse distance taken by walkers searching from source node to target node, and we provide a procedure for calculating the MFTD between two nodes. We use Levy walks on networks as an example, and demonstrate that the proposed approach can unravel the interplay between diffusion dynamics of Levy walks and the underlying network structure. Interestingly, applying our framework to the famous PageRank search, we can explain why its damping factor empirically chosen to be around 0.85. The framework for analyzing anomalous random walks on complex networks offers a new useful paradigm to understand the dynamics of anomalous diffusion processes, and provides a unified scheme to characterize search and transport processes on networks.

preprint2011arXiv

Growth factor in f(T) gravity

We derive the evolution equation of growth factor for the matter over-dense perturbation in $f(T)$ gravity. For instance, we investigate its behavior in power law model at small redshift and compare it to the prediction of $Λ$CDM and dark energy with the same equation of state in the framework of Einstein general relativity. We find that the perturbation in $f(T)$ gravity grows slower than that in Einstein general relativity if $\p f/\p T>0$ due to the effectively weakened gravity.

preprint2011arXiv

Metamaterials Mimicking Dynamic Spacetime, D-brane and Noncommutativity in String Theory

We propose an executable scheme to mimic the expanding cosmos in 1+2 dimensions in laboratory. Furthermore, we develop a general procedure to use nonlinear metamaterials to mimic D-brane and noncommutativity in string theory.

Rui Zheng

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development

HiDream-O1-Image: A Natively Unified Image Generative Foundation Model with Pixel-level Unified Transformer

What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study

Automatic spinal curvature measurement on ultrasound spine images using Faster R-CNN

Decorrelate Irrelevant, Purify Relevant: Overcome Textual Spurious Correlations from a Feature Perspective

Hand-held 3D Photoacoustic Imager with GPS

VertMatch: A Semi-supervised Framework for Vertebral Structure Detection in 3D Ultrasound Volume

Assessing Bone Quality of Spine on Children with Scoliosis Using Ultrasound Reflection FAI Method -- A Preliminary Study

750 GeV Diphoton Resonance in a Vector-like Extension of Hill Model

Boosted scalar confronting 750 GeV di-photon excess

Navigation by anomalous random walks on complex networks

Growth factor in f(T) gravity

Metamaterials Mimicking Dynamic Spacetime, D-brane and Noncommutativity in String Theory