Source author record

Fan Shi

Fan Shi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Robotics Artificial Intelligence cond-mat.mes-hall cond-mat.mtrl-sci eess.AS Machine Learning physics.app-ph Sound

Catalog footprint

What is connected

6works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

SCRWKV: Ultra-Compact Structure-Calibrated Vision-RWKV for Topological Crack Segmentation

Achieving pixel-level accurate segmentation of structural cracks across diverse scenarios remains a formidable challenge. Existing methods face significant bottlenecks in balancing crack topology modeling with computational efficiency, often failing to reconcile high segmentation quality with low resource demands. To address these limitations, we propose the Ultra-Compact Structure-Calibrated Vision RWKV (SCRWKV), a network that achieves high-precision modeling via a novel Structure-Field Encoder (SFE) backbone while maintaining linear complexity. The SFE integrates the Adaptive Multi-scale Cascaded Modulator (AMCM) to enhance texture representation and utilizes the Structure-Calibrated Insight Unit (SCIU) as its core engine. Specifically, the SCIU employs the Geometry-guided Bidirectional Structure Transformation (GBST) to capture topological correlations and integrates the Dynamic Self-Calibrating Decay (DSCD) into Dy-WKV to suppress noise propagation. Furthermore, we introduce a lightweight Cross-Scale Harmonic Fusion (CSHF) decoder to achieve precise feature aggregation. Systematic evaluations on multiple benchmarks characterized by complex textures and severe interference demonstrate that SCRWKV, with only 1.22M parameters, significantly outperforms SOTA methods. Achieving an F1 score of 0.8428 and mIoU of 0.8512 on the TUT dataset, the model confirms its robust potential for efficient real-world deployment. The code is available at https://github.com/zhxhzy/SCRWKV.

preprint2026arXiv

Soft Responsive Materials Enhance Humanoid Safety

Humanoid robots are envisioned as general-purpose platforms in human-centered environments, yet their deployment is limited by vulnerability to falls and the risks posed by rigid metal-plastic structures to people and surroundings. We introduce a soft-rigid co-design framework that leverages non-Newtonian fluid-based soft responsive materials to enhance humanoid safety. The material remains compliant during normal interaction but rapidly stiffens under impact, absorbing and dissipating fall-induced forces. Physics-based simulations guide protector placement and thickness and enable learning of active fall policies. Applied to a 42 kg life-size humanoid, the protector markedly reduces peak impact and allows repeated falls without hardware damage, including drops from 3 m and tumbles down long staircases. Across diverse scenarios, the approach improves robot robustness and environmental safety. By uniting responsive materials, structural co-design, and learning-based control, this work advances interact-safe, industry-ready humanoid robots.

preprint2023arXiv

RefineNet: Enhancing Text-to-Image Conversion with High-Resolution and Detail Accuracy through Hierarchical Transformers and Progressive Refinement

In this research, we introduce RefineNet, a novel architecture designed to address resolution limitations in text-to-image conversion systems. We explore the challenges of generating high-resolution images from textual descriptions, focusing on the trade-offs between detail accuracy and computational efficiency. RefineNet leverages a hierarchical Transformer combined with progressive and conditional refinement techniques, outperforming existing models in producing detailed and high-quality images. Through extensive experiments on diverse datasets, we demonstrate RefineNet's superiority in clarity and resolution, particularly in complex image categories like animals, plants, and human faces. Our work not only advances the field of image-to-text conversion but also opens new avenues for high-fidelity image generation in various applications.

preprint2023arXiv

Revolutionizing Personalized Voice Synthesis: The Journey towards Emotional and Individual Authenticity with DIVSE (Dynamic Individual Voice Synthesis Engine)

This comprehensive paper delves into the forefront of personalized voice synthesis within artificial intelligence (AI), spotlighting the Dynamic Individual Voice Synthesis Engine (DIVSE). DIVSE represents a groundbreaking leap in text-to-voice (TTS) technology, uniquely focusing on adapting and personalizing voice outputs to match individual vocal characteristics. The research underlines the gap in current AI-generated voices, which, while technically advanced, fall short in replicating the unique individuality and expressiveness intrinsic to human speech. It outlines the challenges and advancements in personalized voice synthesis, emphasizing the importance of emotional expressiveness, accent and dialect variability, and capturing individual voice traits. The architecture of DIVSE is meticulously detailed, showcasing its three core components: Voice Characteristic Learning Module (VCLM), Emotional Tone and Accent Adaptation Module (ETAAM), and Dynamic Speech Synthesis Engine (DSSE). The innovative approach of DIVSE lies in its adaptive learning capability, which evolves over time to tailor voice outputs to specific user traits. The paper presents a rigorous experimental setup, utilizing accepted datasets and personalization metrics like Mean Opinion Score (MOS) and Emotional Alignment Score, to validate DIVSE's superiority over mainstream models. The results depict a clear advancement in achieving higher personalization and emotional resonance in AI-generated voices.

preprint2020arXiv

Terahertz response of gadolinium gallium garnet (GGG) and gadolinium scandium gallium garnet (SGGG)

We report the magneto-optical response of Gadolinium Gallium Garnet (GGG) and Gadolinium Scandium Gallium Garnet (SGGG) at frequencies ranging from $300 \, \mathrm{GHz}$ to $1 \, \mathrm{THz}$, and determine the material response tensor. Within this frequency window, the materials exhibit nondispersive and low-loss optical responses. At low temperatures, significant THz Faraday rotations are found in the (S)GGG samples. Such strong gyroelectric response is likely associated with the high-spin paramagnetic state of the Gd$^{3+}$ ions. A model of the material response tensor is determined, together with the Verdet and magneto-optic constants.

preprint2020arXiv

Versatile Multilinked Aerial Robot with Tilting Propellers: Design, Modeling, Control and State Estimation for Autonomous Flight and Manipulation

Multilinked aerial robot is one of the state-of-the-art works in aerial robotics, which demonstrates the deformability benefiting both maneuvering and manipulation. However, the performance in outdoor physical world has not yet been evaluated because of the weakness in the controllability and the lack of the state estimation for autonomous flight. Thus we adopt tilting propellers to enhance the controllability. The related design, modeling and control method are developed in this work to enable the stable hovering and deformation. Furthermore, the state estimation which involves the time synchronization between sensors and the multilinked kinematics is also presented in this work to enable the fully autonomous flight in the outdoor environment. Various autonomous outdoor experiments, including the fast maneuvering for interception with target, object grasping for delivery, and blanket manipulation for firefighting are performed to evaluate the feasibility and versatility of the proposed robot platform. To the best of our knowledge, this is the first study for the multilinked aerial robot to achieve the fully autonomous flight and the manipulation task in outdoor environment. We also applied our platform in all challenges of the 2020 Mohammed Bin Zayed International Robotics Competition, and ranked third place in Challenge 1 and sixth place in Challenge 3 internationally, demonstrating the reliable flight performance in the fields.

Fan Shi

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

SCRWKV: Ultra-Compact Structure-Calibrated Vision-RWKV for Topological Crack Segmentation

Soft Responsive Materials Enhance Humanoid Safety

RefineNet: Enhancing Text-to-Image Conversion with High-Resolution and Detail Accuracy through Hierarchical Transformers and Progressive Refinement

Revolutionizing Personalized Voice Synthesis: The Journey towards Emotional and Individual Authenticity with DIVSE (Dynamic Individual Voice Synthesis Engine)

Terahertz response of gadolinium gallium garnet (GGG) and gadolinium scandium gallium garnet (SGGG)

Versatile Multilinked Aerial Robot with Tilting Propellers: Design, Modeling, Control and State Estimation for Autonomous Flight and Manipulation