Source author record

Fan Feng

Fan Feng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.soft Machine Learning Robotics Artificial Intelligence eess.AS eess.IV Multimedia Sound

Catalog footprint

What is connected

13works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

From Generalist to Specialist Representation

Given a generalist model, learning a task-relevant specialist representation is fundamental for downstream applications. Identifiability, the asymptotic guarantee of recovering the ground-truth representation, is critical because it sets the ultimate limit of any model, even with infinite data and computation. We study this problem in a completely nonparametric setting, without relying on interventions, parametric forms, or structural constraints. We first prove that the structure between time steps and tasks is identifiable in a fully unsupervised manner, even when sequences lack strict temporal dependence and may exhibit disconnections, and task assignments can follow arbitrarily complex and interleaving structures. We then prove that, within each time step, the task-relevant latent representation can be disentangled from the irrelevant part under a simple sparsity regularization, without any additional information or parametric constraints. Together, these results establish a hierarchical foundation: task structure is identifiable across time steps, and task-relevant latent representations are identifiable within each step. To our knowledge, each result provides a first general nonparametric identifiability guarantee, and together they mark a step toward provably moving from generalist to specialist models.

preprint2026arXiv

SCAR: Self-Supervised Continuous Action Representation Learning

Despite the central role of action in embodied intelligence, learning transferable action representations from visual transitions remains a fundamental challenge, particularly when world models must generalize across embodiments under limited data. We argue that action is not merely an auxiliary conditioning signal, but a distinct representational factor that decouples the controllable change from embodiment-specific actuation. In this work, we propose SCAR, a joint inverse-forward dynamics framework for learning unified action representations across embodiments from visual transitions. Built on a pretrained generative backbone, SCAR uses an inverse dynamics model (IDM) to infer latent actions from latent observation pairs and a forward dynamics model (FDM) to predict future dynamics conditioned on them. To make the latent space transferable rather than a generic visual bottleneck, we regularize the latent action posterior toward a standard Gaussian prior to limit arbitrary visual encoding, and introduce adversarial invariance to suppress embodiment- and environment-specific nuisance factors. Experiments on the Procgen and Robotwin dataset show that the learned unified latent action representation serves as a stronger conditioning interface for world modeling than embodiment-specific raw actions, yielding improved cross-embodiment low-data adaptation and cross-task transfer. Taken together, these results suggest that action can be learned as a shared representation of controllable change across embodiments, providing an interface for more transferable and generalizable world models.

preprint2022arXiv

A Benchmark and Empirical Analysis for Replay Strategies in Continual Learning

With the capacity of continual learning, humans can continuously acquire knowledge throughout their lifespan. However, computational systems are not, in general, capable of learning tasks sequentially. This long-standing challenge for deep neural networks (DNNs) is called catastrophic forgetting. Multiple solutions have been proposed to overcome this limitation. This paper makes an in-depth evaluation of the memory replay methods, exploring the efficiency, performance, and scalability of various sampling strategies when selecting replay data. All experiments are conducted on multiple datasets under various domains. Finally, a practical solution for selecting replay methods for various data distributions is provided.

preprint2022arXiv

AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning

One practical challenge in reinforcement learning (RL) is how to make quick adaptations when faced with new environments. In this paper, we propose a principled framework for adaptive RL, called \textit{AdaRL}, that adapts reliably and efficiently to changes across domains with a few samples from the target domain, even in partially observable environments. Specifically, we leverage a parsimonious graphical representation that characterizes structural relationships over variables in the RL system. Such graphical representations provide a compact way to encode what and where the changes across domains are, and furthermore inform us with a minimal set of changes that one has to consider for the purpose of policy adaptation. We show that by explicitly leveraging this compact representation to encode changes, we can efficiently adapt the policy to the target domain, in which only a few samples are needed and further policy optimization is avoided. We illustrate the efficacy of AdaRL through a series of experiments that vary factors in the observation, transition, and reward functions for Cartpole and Atari games.

preprint2022arXiv

Incremental Few-Shot Object Detection for Robotics

Incremental few-shot learning is highly expected for practical robotics applications. On one hand, robot is desired to learn new tasks quickly and flexibly using only few annotated training samples; on the other hand, such new additional tasks should be learned in a continuous and incremental manner without forgetting the previous learned knowledge dramatically. In this work, we propose a novel Class-Incremental Few-Shot Object Detection (CI-FSOD) framework that enables deep object detection network to perform effective continual learning from just few-shot samples without re-accessing the previous training data. We achieve this by equipping the widely-used Faster-RCNN detector with three elegant components. Firstly, to best preserve performance on the pre-trained base classes, we propose a novel Dual-Embedding-Space (DES) architecture which decouples the representation learning of base and novel categories into different spaces. Secondly, to mitigate the catastrophic forgetting on the accumulated novel classes, we propose a Sequential Model Fusion (SMF) method, which is able to achieve long-term memory without additional storage cost. Thirdly, to promote inter-task class separation in feature space, we propose a novel regularization technique that extends the classification boundary further away from the previous classes to avoid misclassification. Overall, our framework is simple yet effective and outperforms the previous SOTA with a significant margin of 2.4 points in AP performance.

preprint2022arXiv

Interfacial metric mechanics: stitching patterns of shape change in active sheets

A flat sheet programmed with a planar pattern of spontaneous shape change will morph into a curved surface. Such metric mechanics is seen in growing biological sheets, and may be engineered in actuating soft matter sheets such as phase-changing liquid crystal elastomers (LCEs), swelling gels and inflating baromorphs. Here, we show how to combine multiple patterns in a sheet by stitching regions of different shape changes together piecewise along interfaces. This approach allows simple patterns to be used as building blocks, and enables the design of multi-material or active/passive sheets. We give a general condition for an interface to be geometrically compatible, and explore its consequences for LCE/LCE, gel/gel, and active/passive interfaces. In contraction/elongation systems such as LCEs, we find an infinite set of compatible interfaces between any pair of patterns along which the metric is discontinuous, and a finite number across which the metric is continuous. As an example, we find all possible interfaces between pairs of LCE logarithmic spiral patterns. In contrast, in isotropic systems such as swelling gels, only a finite number of continuous interfaces are available, greatly limiting the potential of stitching. In both continuous and discontinuous cases, we find the stitched interfaces generically carry singular Gaussian curvature, leading to intrinsically curved folds in the actuated surface. We give a general expression for the distribution of this curvature, and a more specialized form for interfaces in LCE patterns. The interfaces thus also have rich geometric and mechanical properties in their own right.

preprint2022arXiv

The CORSMAL benchmark for the prediction of the properties of containers

The contactless estimation of the weight of a container and the amount of its content manipulated by a person are key pre-requisites for safe human-to-robot handovers. However, opaqueness and transparencies of the container and the content, and variability of materials, shapes, and sizes, make this estimation difficult. In this paper, we present a range of methods and an open framework to benchmark acoustic and visual perception for the estimation of the capacity of a container, and the type, mass, and amount of its content. The framework includes a dataset, specific tasks and performance measures. We conduct an in-depth comparative analysis of methods that used this framework and audio-only or vision-only baselines designed from related works. Based on this analysis, we can conclude that audio-only and audio-visual classifiers are suitable for the estimation of the type and amount of the content using different types of convolutional neural networks, combined with either recurrent neural networks or a majority voting strategy, whereas computer vision methods are suitable to determine the capacity of the container using regression and geometric approaches. Classifying the content type and level using only audio achieves a weighted average F1-score up to 81% and 97%, respectively. Estimating the container capacity with vision-only approaches and estimating the filling mass with audio-visual multi-stage approaches reach up to 65% weighted average capacity and mass scores. These results show that there is still room for improvement on the design of new methods. These new methods can be ranked and compared on the individual leaderboards provided by our open framework.

preprint2022arXiv

Theorem on the Compatibility of Spherical Kirigami Tessellations

We present a theorem on the compatibility upon deployment of kirigami tessellations restricted on a spherical surface with patterned slits forming freeform quadrilateral meshes. We show that the spherical kirigami tessellations have either one or two compatible states, i.e., there are at most two isolated strain-free configurations along the deployment path. The theorem further reveals that the rigid-to-floppy transition from spherical to planar kirigami tessellations is possible if and only if the slits form parallelogram voids along with vanishing Gaussian curvature, which is also confirmed by an energy analysis and simulations. On the application side, we show a design of bistable spherical dome-like structure based on the theorem. Our study provides new insights into the rational design of morphable structures based on Euclidean and non-Euclidean geometries.

preprint2021arXiv

Origami spring-inspired shape morphing for flexible robotics

Flexible robotics are capable of achieving various functionalities by shape morphing, benefiting from their compliant bodies and reconfigurable structures. Here we construct and study a class of origami springs generalized from the known interleaved origami spring, as promising candidates for shape morphing in flexible robotics. These springs are found to exhibit nonlinear stretch-twist coupling and linear/nonlinear mechanical response in the compression/tension region, analyzed by the demonstrated continuum mechanics models, experiments, and finite element simulations. To improve the mechanical performance such as the damage resistance, we establish an origami rigidization method by adding additional creases to the spring system. Guided by the theoretical framework, we experimentally realize three types of flexible robotics -- origami spring ejectors, crawlers, and transformers. These robots show the desired functionality and outstanding mechanical performance. The proposed concept of origami-aided design is expected to pave the way to facilitate the diverse shape morphing of flexible robotics.

preprint2020arXiv

IROS 2019 Lifelong Robotic Vision Challenge -- Lifelong Object Recognition Report

This report summarizes IROS 2019-Lifelong Robotic Vision Competition (Lifelong Object Recognition Challenge) with methods and results from the top $8$ finalists (out of over~$150$ teams). The competition dataset (L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition (OpenLORIS-object) is designed for driving lifelong/continual learning research and application in robotic vision domain, with everyday objects in home, office, campus, and mall scenarios. The dataset explicitly quantifies the variants of illumination, object occlusion, object size, camera-object distance/angles, and clutter information. Rules are designed to quantify the learning capability of the robotic vision system when faced with the objects appearing in the dynamic environments in the contest. Individual reports, dataset information, rules, and released source code can be found at the project homepage: "https://lifelong-robotic-vision.github.io/competition/".

preprint2020arXiv

OpenLORIS-Object: A Robotic Vision Dataset and Benchmark for Lifelong Deep Learning

The recent breakthroughs in computer vision have benefited from the availability of large representative datasets (e.g. ImageNet and COCO) for training. Yet, robotic vision poses unique challenges for applying visual algorithms developed from these standard computer vision datasets due to their implicit assumption over non-varying distributions for a fixed set of tasks. Fully retraining models each time a new task becomes available is infeasible due to computational, storage and sometimes privacy issues, while naïve incremental strategies have been shown to suffer from catastrophic forgetting. It is crucial for the robots to operate continuously under open-set and detrimental conditions with adaptive visual perceptual systems, where lifelong learning is a fundamental capability. However, very few datasets and benchmarks are available to evaluate and compare emerging techniques. To fill this gap, we provide a new lifelong robotic vision dataset ("OpenLORIS-Object") collected via RGB-D cameras. The dataset embeds the challenges faced by a robot in the real-life application and provides new benchmarks for validating lifelong object recognition algorithms. Moreover, we have provided a testbed of $9$ state-of-the-art lifelong learning algorithms. Each of them involves $48$ tasks with $4$ evaluation metrics over the OpenLORIS-Object dataset. The results demonstrate that the object recognition task in the ever-changing difficulty environments is far from being solved and the bottlenecks are at the forward/backward transfer designs. Our dataset and benchmark are publicly available at at \href{https://lifelong-robotic-vision.github.io/dataset/object}{\underline{https://lifelong-robotic-vision.github.io/dataset/object}}.

preprint2020arXiv

The designs and deformations of rigidly and flat-foldable quadrilateral mesh origami

Rigidly and flat-foldable quadrilateral mesh origami is the class of quadrilateral mesh crease patterns with one fundamental property: the patterns can be folded from flat to fully-folded flat by a continuous one-parameter family of piecewise affine deformations that do not stretch or bend the mesh-panels. In this work, we explicitly characterize the designs and deformations of all possible rigidly and flat-foldable quadrilateral mesh origami. Our key idea is a rigidity theorem (Theorem 3.1) that characterizes compatible crease patterns surrounding a single panel and enables us to march from panel to panel to compute the pattern and its corresponding deformations explicitly. The marching procedure is computationally efficient. So we use it to formulate the inverse problem: to design a crease pattern to achieve a targeted shape along the path of its rigidly and flat-foldable motion. The initial results on the inverse problem are promising and suggest a broadly useful engineering design strategy for shape-morphing with origami.

preprint2019arXiv

Helical Miura Origami

We characterize the phase-space of all Helical Miura Origami. These structures are obtained by taking a partially folded Miura parallelogram as the unit cell, applying a generic helical or rod group to the cell, and characterizing all the parameters that lead to a globally compatible origami structure. When such compatibility is achieved, the result is cylindrical-type origami that can be manufactured from a suitably designed flat tessellation and "rolled-up" by a rigidly foldable motion into a cylinder. We find that the closed Helical Miura Origami are generically rigid to deformations that preserve cylindrical symmetry, but multistable. We are inspired by the ways atomic structures deform [1] to develop two broad strategies for reconfigurability: motion by slip, which involves relaxing the closure condition; and motion by phase transformation, which exploits multistability. Taken together, these results provide a comprehensive description of the phase-space of cylindrical origami, as well as quantitative design guidance for their use as actuators or metamaterials that exploit twist, axial extension, radial expansion, and symmetry.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint