Source author record

Kehan Wang

Kehan Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.SY Sound Systems and Control

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

AudioFab: Building A General and Intelligent Audio Factory through Tool Learning

Currently, artificial intelligence is profoundly transforming the audio domain; however, numerous advanced algorithms and tools remain fragmented, lacking a unified and efficient framework to unlock their full potential. Existing audio agent frameworks often suffer from complex environment configurations and inefficient tool collaboration. To address these limitations, we introduce AudioFab, an open-source agent framework aimed at establishing an open and intelligent audio-processing ecosystem. Compared to existing solutions, AudioFab's modular design resolves dependency conflicts, simplifying tool integration and extension. It also optimizes tool learning through intelligent selection and few-shot learning, improving efficiency and accuracy in complex audio tasks. Furthermore, AudioFab provides a user-friendly natural language interface tailored for non-expert users. As a foundational framework, AudioFab's core contribution lies in offering a stable and extensible platform for future research and development in audio and multimodal AI. The code is available at https://github.com/SmileHnu/AudioFab.

preprint2022arXiv

Composing MPC with LQR and Neural Network for Amortized Efficiency and Stable Control

Model predictive control (MPC) is a powerful control method that handles dynamical systems with constraints. However, solving MPC iteratively in real time, i.e., implicit MPC, remains a computational challenge. To address this, common solutions include explicit MPC and function approximation. Both methods, whenever applicable, may improve the computational efficiency of the implicit MPC by several orders of magnitude. Nevertheless, explicit MPC often requires expensive pre-computation and does not easily apply to higher-dimensional problems. Meanwhile, function approximation, although scales better with dimension, still requires pre-training on a large dataset and generally cannot guarantee to find an accurate surrogate policy, the failure of which often leads to closed-loop instability. To address these issues, we propose a triple-mode hybrid control scheme, named Memory-Augmented MPC, by combining a linear quadratic regulator, a neural network, and an MPC. From its standard form, we further derive two variants of such hybrid control scheme: one customized for chaotic systems and the other for slow systems. The proposed scheme does not require pre-computation and can improve the amortized running time of the composed MPC with a well-trained neural network. In addition, the scheme maintains closed-loop stability with any neural networks of proper input and output dimensions, alleviating the need for certifying optimality of the neural network in safety-critical applications.

preprint2022arXiv

Misinformation Detection in Social Media Video Posts

With the growing adoption of short-form video by social media platforms, reducing the spread of misinformation through video posts has become a critical challenge for social media providers. In this paper, we develop methods to detect misinformation in social media posts, exploiting modalities such as video and text. Due to the lack of large-scale public data for misinformation detection in multi-modal datasets, we collect 160,000 video posts from Twitter, and leverage self-supervised learning to learn expressive representations of joint visual and textual data. In this work, we propose two new methods for detecting semantic inconsistencies within short-form social media video posts, based on contrastive learning and masked language modeling. We demonstrate that our new approaches outperform current state-of-the-art methods on both artificial data generated by random-swapping of positive samples and in the wild on a new manually-labeled test set for semantic misinformation.

preprint2022arXiv

Neural Face Identification in a 2D Wireframe Projection of a Manifold Object

In computer-aided design (CAD) systems, 2D line drawings are commonly used to illustrate 3D object designs. To reconstruct the 3D models depicted by a single 2D line drawing, an important key is finding the edge loops in the line drawing which correspond to the actual faces of the 3D object. In this paper, we approach the classical problem of face identification from a novel data-driven point of view. We cast it as a sequence generation problem: starting from an arbitrary edge, we adopt a variant of the popular Transformer model to predict the edges associated with the same face in a natural order. This allows us to avoid searching the space of all possible edge loops with various hand-crafted rules and heuristics as most existing methods do, deal with challenging cases such as curved surfaces and nested edge loops, and leverage additional cues such as face types. We further discuss how possibly imperfect predictions can be used for 3D object reconstruction.

Kehan Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

AudioFab: Building A General and Intelligent Audio Factory through Tool Learning

Composing MPC with LQR and Neural Network for Amortized Efficiency and Stable Control

Misinformation Detection in Social Media Video Posts

Neural Face Identification in a 2D Wireframe Projection of a Manifold Object