Source author record

Jianhui Liu

Jianhui Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Computation and Language cond-mat.mtrl-sci Networking and Internet Architecture Biological Physics cond-mat.soft eess.SP Graphics Machine Learning

Catalog footprint

What is connected

8works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation

We present JoyAI-Image, a unified multimodal foundation model for visual understanding, text-to-image generation, and instruction-guided image editing. JoyAI-Image couples a spatially enhanced Multimodal Large Language Model (MLLM) with a Multimodal Diffusion Transformer (MMDiT), allowing perception and generation to interact through a shared multimodal interface. Around this architecture, we build a scalable training recipe that combines unified instruction tuning, long-text rendering supervision, spatially grounded data, and both general and spatial editing signals. This design gives the model broad multimodal capability while strengthening geometry-aware reasoning and controllable visual synthesis. Experiments across understanding, generation, long-text rendering, and editing benchmarks show that JoyAI-Image achieves state-of-the-art or highly competitive performance. More importantly, the bidirectional loop between enhanced understanding, controllable spatial editing, and novel-view-assisted reasoning enables the model to move beyond general visual competence toward stronger spatial intelligence. These results suggest a promising path for unified visual models in downstream applications such as vision-language-action systems and world models.

preprint2026arXiv

TextLDM: Language Modeling with Continuous Latent Diffusion

Diffusion Transformers (DiT) trained with flow matching in a VAE latent space have unified visual generation across images and videos. A natural next step toward a single architecture for both generation (visual synthesis) and understanding (text generation) is to apply this framework to language modeling. We propose TextLDM, which transfers the visual latent diffusion recipe to text generation with minimal architectural modification. A Transformer-based VAE maps discrete tokens to continuous latents, enhanced by Representation Alignment (REPA) with a frozen pretrained language model to produce representations effective for conditional denoising. A standard DiT then performs flow matching in this latent space, identical in architecture to its visual counterpart. The central challenge we address is obtaining high-quality continuous text representations: we find that reconstruction fidelity alone is insufficient, and that aligning latent features with a pretrained language model via REPA is critical for downstream generation quality. Trained from scratch on OpenWebText2, TextLDM substantially outperforms prior diffusion language models and matches GPT-2 under the same settings. Our results establish that the visual DiT recipe transfers effectively to language, taking a concrete step toward unified diffusion architectures for multimodal generation and understanding.

preprint2026arXiv

Thinking with Novel Views: A Systematic Analysis of Generative-Augmented Spatial Intelligence

Current Large Multimodal Models (LMMs) struggle with spatial reasoning tasks requiring viewpoint-dependent understanding, largely because they are confined to a single, static observation. We propose Thinking with Novel Views (TwNV), a paradigm that integrates generative novel-view synthesis into the reasoning loop: a Reasoner LMM identifies spatial ambiguity, instructs a Painter to synthesize an alternative viewpoint, and re-examines the scene with the additional evidence. Through systematic experiments we address three research questions. (1) Instruction format: numerical camera-pose specifications yield more reliable view control than free-form language. (2) Generation fidelity: synthesized view quality is tightly coupled with downstream spatial accuracy. (3) Inference-time visual scaling: iterative multi-turn view refinement further improves performance, echoing recent scaling trends in language reasoning. Across four spatial subtask categories and four LMM architectures (both closed- and open-source), TwNV consistently improves accuracy by +1.3 to +3.9 pp, with the largest gains on viewpoint-sensitive subtasks. These results establish novel-view generation as a practical lever for advancing spatial intelligence of LMMs.

preprint2022arXiv

Stratified Transformer for 3D Point Cloud Segmentation

3D point cloud segmentation has made tremendous progress in recent years. Most current methods focus on aggregating local features, but fail to directly model long-range dependencies. In this paper, we propose Stratified Transformer that is able to capture long-range contexts and demonstrates strong generalization ability and high performance. Specifically, we first put forward a novel key sampling strategy. For each query point, we sample nearby points densely and distant points sparsely as its keys in a stratified way, which enables the model to enlarge the effective receptive field and enjoy long-range contexts at a low computational cost. Also, to combat the challenges posed by irregular point arrangements, we propose first-layer point embedding to aggregate local information, which facilitates convergence and boosts performance. Besides, we adopt contextual relative position encoding to adaptively capture position information. Finally, a memory-efficient implementation is introduced to overcome the issue of varying point numbers in each window. Extensive experiments demonstrate the effectiveness and superiority of our method on S3DIS, ScanNetv2 and ShapeNetPart datasets. Code is available at https://github.com/dvlab-research/Stratified-Transformer.

preprint2021arXiv

Biomaterial design inspired by membraneless organelles

Compartmentalization is ubiquitous in the broad cellular context, especially in the formation of membraneless organelles (MOs). Membraneless organelles (MOs) are phase-separated liquid compartments that provide spatiotemporal control of biomolecules and metabolic activities inside a cell. While MOs exhibit intriguing properties such as efficient compositional regulation, thermodynamic metastability, environmental sensitivity and reversibility, their formation is driven by weak non-covalent interactions derived from simple motifs of intrinsic disordered proteins (IDPs). Understanding the natural design of IDPs and the liquid-liquid phase separation behavior will not only reveal insights about the contributions of MOs to cellular physiology and disease pathology, but also provides inspirations for the de novo design of dynamic biomolecules depots, self-regulated biochemical reactors, and stimuli-responsive systems. In this article, the sequence and structural features of IDPS that contribute to the organization of MOs are reviewed. Artificial MOs formed following these principles, including self-assembling peptides, synthetic IDPs, polyelectrolytes and peptide-polymer hybrids are described. Finally, we illustrate the applications and discuss the potential of the MO-inspired biomaterials, with examples spanning biochemical reactors, synthetic biology, drug discovery and drug delivery.

preprint2021arXiv

Minimalist design of polymer-oligopeptide hybrid as intrinsically disordered protein-mimicking scaffold for artificial membraneless organelle

Liquid-liquid phase separation (LLPS) is an emerging and universal mechanism for intracellular biomolecule organization, particularly, via the formation of membraneless organelles (MOs). Intrinsically disordered proteins (IDPs) are the main constituents of MOs, wherein multivalent interactions and low-complexity domains (LCDs) drive LLPS. Using short oligopeptide derived from LCDs as 'stickers' and dextran backbones as 'spacers', we designed polymer-oligopeptide hybrids to mimic the multivalent FUS protein as represented by the 'stickers-and-spacers' model. We demonstrated that hybrids underwent LLPS and self-assembled into micron-sized (mostly 1-10 micron, resembling LLPS in vitro and in living cells) compartments displaying liquid-like properties. Furthermore, the droplets formed were capable of recruiting proteins and RNAs, whilst providing a favorable environment for enhanced biochemical reaction, thereby mimicking the function of natural MOs. We envision this simple yet versatile model system will help elucidate the molecular interactions implicated in MO formation and pave ways to a new type of biomimetic materials.

preprint2020arXiv

Adaptive Task Partitioning at Local Device or Remote Edge Server for Offloading in MEC

Mobile edge computing (MEC) is one of the promising solutions to process computational-intensive tasks for the emerging time-critical Internet-of-Things (IoT) use cases, e.g., virtual reality (VR), augmented reality (AR), autonomous vehicle. The latency can be reduced further, when a task is partitioned and computed by multiple edge servers' (ESs) collaboration. However, the state-of-the-art work studies the MEC-enabled offloading based on a static framework, which partitions tasks at either the local user equipment (UE) or the primary ES. The dynamic selection between the two offloading schemes has not been well studied yet. In this paper, we investigate a dynamic offloading framework in a multi-user scenario. Each UE can decide who partitions a task according to the network status, e.g., channel quality and allocated computation resource. Based on the framework, we model the latency to complete a task, and formulate an optimization problem to minimize the average latency among UEs. The problem is solved by jointly optimizing task partitioning and the allocation of the communication and computation resources. The numerical results show that, compared with the static offloading schemes, the proposed algorithm achieves the lower latency in all tested scenarios. Moreover, both mathematical derivation and simulation illustrate that the wireless channel quality difference between a UE and different ESs can be used as an important criterion to determine the right scheme.

preprint2020arXiv

Computation Resource Allocation for Heterogeneous Time-Critical IoT Services in MEC

Mobile edge computing (MEC) is one of the promising solutions to process computational-intensive tasks within short latency for emerging Internet-of-Things (IoT) use cases, e.g., virtual reality (VR), augmented reality (AR), autonomous vehicle. Due to the coexistence of heterogeneous services in MEC system, the task arrival interval and required execution time can vary depending on services. It is challenging to schedule computation resource for the services with stochastic arrivals and runtime at an edge server (ES). In this paper, we propose a flexible computation offloading framework among users and ESs. Based on the framework, we propose a Lyapunov-based algorithm to dynamically allocate computation resource for heterogeneous time-critical services at the ES. The proposed algorithm minimizes the average timeout probability without any prior knowledge on task arrival process and required runtime. The numerical results show that, compared with the standard queuing models used at ES, the proposed algorithm achieves at least 35% reduction of the timeout probability, and approximated utilization efficiency of computation resource to non-cause queuing model under various scenarios.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint