Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
17topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

Towards Customized Multimodal Role-Play

Unified multimodal understanding and generation models enable richer human-AI interaction. Yet jointly customizing a character's persona, dialogue style, and visual identity while maintaining output consistency across modalities remains largely unexplored. To mitigate this gap, we introduce a new task, Customized Multimodal Role-Play (CMRP). We construct the RoleScape-20 dataset comprising 20 characters, including training and evaluation data that cover persona, stylistic descriptions, visual/expressive cues, and text-image interactions. Building on a unified model, we devise UniCharacter, a two-stage training framework containing Unified Supervised Finetuning (Unified-SFT) and character-specific group relative policy optimization (Character-GRPO). Given only 10 images plus corresponding interaction examples, the model acquires the target character and exhibits coherent persona, style, and visual identity in both generated text and images. This process takes about 100 GPU hours. Experiments on the RoleScape-20 dataset show that the proposed method substantially outperforms prior approaches. Ablation studies further validate the effectiveness of our cross-modal consistency design and few-shot customization strategy. We argue that CMRP, coupled with unified modeling, provides a basis for next-generation characterful and immersive interactive agents.

preprint2026arXiv

Transient learning dynamics drive escape from sharp valleys in Stochastic Gradient Descent

Stochastic gradient descent (SGD) is central to deep learning, yet the dynamical origin of its preference for flatter, more generalizable solutions remains unclear. Here, by analyzing SGD learning dynamics, we identify a nonequilibrium mechanism governing solution selection. Numerical experiments reveal a transient exploratory phase in which SGD trajectories repeatedly escape sharp valleys and transition toward flatter regions of the loss landscape. By using a tractable physical model, we show that the SGD noise reshapes the landscape into an effective potential that favors flat solutions. Crucially, we uncover a transient freezing mechanism: as training proceeds, growing energy barriers suppress inter-valley transitions and ultimately trap the dynamics within a single basin. Increasing the SGD noise strength delays this freezing, which enhances convergence to flatter minima. Together, these results provide a unified physical framework linking learning dynamics, loss-landscape geometry, and generalization, and suggest principles for the design of more effective optimization algorithms.

preprint2022arXiv

MorphoSim: An efficient and scalable phase-field framework for accurately simulating multicellular morphologies

The phase field model can accurately simulate the evolution of microstructures with complex morphologies, and it has been widely used for cell modeling in the last two decades. However, compared to other cellular models such as the coarse-grained model and the vertex model, its high computational cost caused by three-dimensional spatial discretization hampered its application and scalability, especially for multicellular organisms. Recently, we built a phase field model coupled with in vivo imaging data to accurately reconstruct the embryonic morphogenesis of Caenorhabditis elegans from 1- to 8-cell stages [Kuang et al, PLoS Comput. Biol., 2022]. In this work, we propose an improved phase field model by using the stabilized numerical scheme and modified volume constriction. Then we present a scalable phase-field framework, MorphoSim, which is 100 times more efficient than the previous one, and can simulate over 100 mechanically interacting cells. Finally, we demonstrate how MorphoSim can be successfully applied to reproduce the assembly, self-repairing, and dissociation of a synthetic artificial multicellular system - the synNotch system.

preprint2022arXiv

Primitive Shape Recognition for Object Grasping

Shape informs how an object should be grasped, both in terms of where and how. As such, this paper describes a segmentation-based architecture for decomposing objects sensed with a depth camera into multiple primitive shapes, along with a post-processing pipeline for robotic grasping. Segmentation employs a deep network, called PS-CNN, trained on synthetic data with 6 classes of primitive shapes and generated using a simulation engine. Each primitive shape is designed with parametrized grasp families, permitting the pipeline to identify multiple grasp candidates per shape region. The grasps are rank ordered, with the first feasible one chosen for execution. For task-free grasping of individual objects, the method achieves a 94.2% success rate placing it amongst the top performing grasp methods when compared to top-down and SE(3)-based approaches. Additional tests involving variable viewpoints and clutter demonstrate robustness to setup. For task-oriented grasping, PS-CNN achieves a 93.0% success rate. Overall, the outcomes support the hypothesis that explicitly encoding shape primitives within a grasping pipeline should boost grasping performance, including task-free and task-relevant grasp prediction.

preprint2022arXiv

Spontaneous mechanical and energetic state transitions during Caenorhabditis elegans gastrulation

Gastrulation, namely cell internalization, is a significant milestone during the development of metazoans from worm to human, which generates multiple embryonic layers with distinct cell fates and spatial organizations. Although many molecular activities are known to facilitate this process, in this paper, we focus on gastrulation of the nematode Caenorhabditis elegans and theoretically demonstrate that even a group of cells with only isotropic repulsive and attractive interactions can experience such internalization behavior when dividing within a confined space. As the cell number increases and cell size decreases, the cells contacted to the eggshell become closer to each other along with harder lateral compression, and a cell that internalizes could effectively increase the cell neighbor distance and lower the potential energy of the system. The multicellular structure transits from single- to double-layer spontaneously with bistable states existing from 15- to 44-cell stages, near the gastrulation timing in vivo. Specifically, the cells with a larger size or placed near a smaller-curvature boundary are easier to internalize. Actively regulating a few cells' internalizations can make the morphogenesis noise-resistant. Our work successfully recaptures the key characteristics in C. elegans gastrulation and provides a rational interpretation of how this phenomenon emerges and is optimally programmed.

preprint2021arXiv

Efficient Frequency Doubling with Active Stabilization on Chip

Thin-film lithium niobate (TFLN) is superior for integrated nanophotonics due to its outstanding properties in nearly all aspects: strong second-order nonlinearity, fast and efficient electro-optic effects, wide transparency window, and little two photon absorption and free carrier scattering. Together, they permit highly integrated nanophotonic circuits capable of complex photonic processing by incorporating disparate elements on the same chip. Yet, there has to be a demonstration that synergizes those superior properties for system advantage. Here we demonstrate such a chip that capitalizes on TFLNs favorable ferroelectricity, high second-order nonlinearity, and strong electro-optic effects. It consists of a monolithic circuit integrating a Z-cut, quasi-phase matched microring with high quality factor and a phase modulator used in active feedback control. By Pound-Drever-Hall locking, it realizes stable frequency doubling at about 50% conversion with only milliwatt pump, marking the highest by far among all nanophotonic platforms with milliwatt pumping. Our demonstration addresses a long-outstanding challenge facing cavity-based optical processing, including frequency conversion, frequency comb generation, and all-optical switching, whose stable performance is hindered by photorefractive or thermal effects. Our results further establish TFLN as an excellent material capable of optical multitasking, as desirable to build multi-functional chip devices.

preprint2020arXiv

Deciphering gene regulation from gene expression dynamics using deep neural network

Complex biological functions are carried out by the interaction of genes and proteins. Uncovering the gene regulation network behind a function is one of the central themes in biology. Typically, it involves extensive experiments of genetics, biochemistry and molecular biology. In this paper, we show that much of the inference task can be accomplished by a deep neural network (DNN), a form of machine learning or artificial intelligence. Specifically, the DNN learns from the dynamics of the gene expression. The learnt DNN behaves like an accurate simulator of the system, on which one can perform in-silico experiments to reveal the underlying gene network. We demonstrate the method with two examples: biochemical adaptation and the gap-gene patterning in fruit fly embryogenesis. In the first example, the DNN can successfully find the two basic network motifs for adaptation - the negative feedback and the incoherent feed-forward. In the second and much more complex example, the DNN can accurately predict behaviors of essentially all the mutants. Furthermore, the regulation network it uncovers is strikingly similar to the one inferred from experiments. In doing so, we develop methods for deciphering the gene regulation network hidden in the DNN "black box". Our interpretable DNN approach should have broad applications in genotype-phenotype mapping.

preprint2020arXiv

Generative Adversarial Network-Based Sinogram Super-Resolution for Computed Tomography Imaging

Compared with the conventional 1*1 acquisition mode of projection in computed tomography (CT) image reconstruction, the 2*2 acquisition mode improves the collection efficiency of the projection and reduces the X-ray exposure time. However, the collected projection based on the 2*2 acquisition mode has low resolution (LR) and the reconstructed image quality is poor, thus limiting the use of this mode in CT imaging systems. In this study, a novel sinogram-super-resolution generative adversarial network (SSR-GAN) model is proposed to obtain high-resolution (HR) sinograms from LR sinograms, thereby improving the reconstruction image quality under the 2*2 acquisition mode. The proposed generator is based on the residual network for LR sinogram feature extraction and super-resolution (SR) sinogram generation. A relativistic discriminator is designed to render the network capable of obtaining more realistic SR sinograms. Moreover, we combine the cycle consistency loss, sinogram domain loss, and reconstruction image domain loss in the total loss function to supervise SR sinogram generation. Then, a trained model can be obtained by inputting the paired LR/HR sinograms into the network. Finally, the classic FBP reconstruction algorithm is used for CT image reconstruction based on the generated SR sinogram. The qualitative and quantitative results of evaluations on digital and real data illustrate that the proposed model not only obtains clean SR sinograms from noisy LR sinograms but also outperforms its counterparts.

preprint2020arXiv

New structure canditates for the experimentally synthesized heptazine-based and triazine-based two dimensional graphitic carbon nitride

The widely used crystal structures for both heptazine-based and triazine-based two-dimensional (2D) graphitic carbon nitride (g-C$_3$N$_4$) are the flat P-6m2 configurations. However, the experimentally synthesized 2D g-C$_3$N$_4$ possess thickness ranging in 0.2-0.5 nm, indicating that the theoretically used flat P-6m2 configurations are not the correct ground states. In this work, we propose three new corrugated structures P321, P3m1 and Pca21 with energies of 66 (86), 77 (87) and 78 (89) meV/atom lower than that of the corresponding heptazine-based (triazine-based) g-C$_3$N$_4$ in flat P-6m2 configuration, respectively. These corrugated structures have very similar periodic patterns to the flat P-6m2 ones and they are difficult to be distinguished from each other according to their top-views. The optimized thicknesses of the three corrugated structures ranging in 1.347-3.142 Å are in good agreement with the experimental results. The first-principles results show that these corrugated structural candidates are also semiconductors with band gaps slightly larger than those of the correspondingly flat P-6m2 ones. Furthermore, they possess also suitable band edge positions for sun-light-driven water-splitting at both $pH=0$ and $pH=7$ environments. Our results show that these three new structures are more promising candidates for the experimentally synthesized g-C$_3$N$_4$.

preprint2020arXiv

Recognizing Object Affordances to Support Scene Reasoning for Manipulation Tasks

Affordance information about a scene provides important clues as to what actions may be executed in pursuit of meeting a specified goal state. Thus, integrating affordance-based reasoning into symbolic action plannning pipelines would enhance the flexibility of robot manipulation. Unfortunately, the top performing affordance recognition methods use object category priors to boost the accuracy of affordance detection and segmentation. Object priors limit generalization to unknown object categories. This paper describes an affordance recognition pipeline based on a category-agnostic region proposal network for proposing instance regions of an image across categories. To guide affordance learning in the absence of category priors, the training process includes the auxiliary task of explicitly inferencing existing affordances within a proposal. Secondly, a self-attention mechanism trained to interpret each proposal learns to capture rich contextual dependencies through the region. Visual benchmarking shows that the trained network, called AffContext, reduces the performance gap between object-agnostic and object-informed affordance recognition. AffContext is linked to the Planning Domain Definition Language (PDDL) with an augmented state keeper for action planning across temporally spaced goal-oriented tasks. Manipulation experiments show that AffContext can successfully parse scene content to seed a symbolic planner problem specification, whose execution completes the target task. Additionally, task-oriented grasping for cutting and pounding actions demonstrate the exploitation of multiple affordances for a given object to complete specified tasks.

preprint2020arXiv

Theoretical prediction of a low-energy Stone-Wales graphene with intrinsic type-III Dirac-cone

Based on first-principles method we predict a new low-energy Stone-Wales graphene SW40, which has an orthorhombic lattice with Pbam symmetry and 40 carbon atoms in its crystalline cell forming well-arranged Stone-Wales patterns. The calculated total energy of SW40 is just about 133 meV higher than that of graphene, indicating its excellent stability exceeds all the previously proposed graphene allotropes. We find that SW40 processes intrinsic Type-III Dirac-cone (Phys. Rev. Lett., 120, 237403, 2018) formed by band-crossing of a local linear-band and a local flat-band, which can result in highly anisotropic Fermions in the system. Interestingly, such intrinsic type-III Dirac-cone can be effectively tuned by inner-layer strains and it will be transferred into Type-II and Type-I Dirac-cones under tensile and compressed strains, respectively. Finally, a general tight-binding model was constructed to understand the electronic properties nearby the Fermi-level in SW40. The results show that type-III Dirac-cone feature can be well understood by the $π$-electron interactions between adjacent Stone-Wales defects.

preprint2020arXiv

Ultra-bright Quantum Photon Sources on Chip

Quantum photon sources of high rate, brightness, and purity are increasingly desirable as quantum information systems are quickly scaled up and applied to many fields. Using a periodically poled lithium niobate microresonator on chip, we demonstrate photon-pair generation at high rates of 8.5 MHz and 36.3 MHz using only 3.4-$μ$W and 13.4-$μ$W pump power, respectively, marking orders of magnitude improvement over the state-of-the-art. The measured coincidence to accidental ratio is well above 100 at those high rates and reaches $14,682\pm 4427$ at a lower pump power. The same chip enables heralded single-photon generation at tens of megahertz rates, each with low auto-correlation $g^{(2)}_{H}(0)=0.008$ and $0.097$ for the microwatt pumps. Such distinct performance, facilitated by the chip device's noiseless and giant optical nonlinearity, will contribute to the forthcoming pervasive adoption of quantum optical information technologies.

preprint2020arXiv

Ultra-efficient and highly-tunable second-harmonic generation in Z-cut periodically poled lithium niobate nanowaveguides

Thin-film lithium niobate on insulator (LNOI) has emerged as a superior integrated-photonics platform for linear, nonlinear, and electro-optics. Here we combine quasi-phase-matching, dispersion engineering, and tight mode confinement to realize nonlinear parametric processes with both high efficiency and wide wavelength tunability. On a millimeter-long, Z-cut LNOI waveguide, we demonstrate ultra-efficient ($1900\pm500 \% $W$^{-1}$cm$^{-2}$) and highly tunable (-1.71 nm/K) second-harmonic generation from 1530 to 1583 nm by type-0 quasi-phase-matching. Our technique is applicable to optical harmonic generation, quantum light sources, frequency conversion, and many other photonic information processing across visible to mid-IR spectral bands.

preprint2018arXiv

Critical slowing down and attractive manifold: a mechanism for dynamic robustness in yeast cell-cycle process

The biological processes that execute complex multiple functions, such as cell cycle, must ensure the order of sequential events and keep the dynamic robustness against various fluctuations. Here, we examine the dynamic mechanism and the fundamental structure to achieve these properties in the cell-cycle process of budding yeast Saccharomyces cerevisiae. We show that the budding yeast cell-cycle process behaves like an excitable system containing three well-coupled saddle-node bifurcations to execute DNA replication and mitosis events. The yeast cell-cycle regulatory network can be separated into G1/S phase module, early M module and late M phase module, where the positive feedbacks in each module and the interactions among the modules play important role. If the cell-cycle process operates near the critical points of the saddle-node bifurcations, there is a critical slowing down or ghost effect. This can provide the cell-cycle process with a sufficient duration for each event and an attractive manifold for the state checking of the completion of DNA replication and mitosis; moreover, the fluctuation in the early module/event is forbidden to transmit to the latter module/event. Our results suggest both a fundamental structure of cell-cycle regulatory network and a hint for the evolution of eukaryotic cell-cycle processes, from the dynamic checking mechanism to the molecule checkpoint pathway.