Source author record

Kuan Wei Huang

Kuan Wei Huang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence cs.CY Robotics

Catalog footprint

What is connected

5works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

C3Po: Cross-View Cross-Modality Correspondence by Pointmap Prediction

Geometric models like DUSt3R have shown great advances in understanding the geometry of a scene from pairs of photos. However, they fail when the inputs are from vastly different viewpoints (e.g., aerial vs. ground) or modalities (e.g., photos vs. abstract drawings) compared to what was observed during training. This paper addresses a challenging version of this problem: predicting correspondences between ground-level photos and floor plans. Current datasets for joint photo-floor plan reasoning are limited, either lacking in varying modalities (VIGOR) or lacking in correspondences (WAFFLE). To address these limitations, we introduce a new dataset, C3, created by first reconstructing a number of scenes in 3D from Internet photo collections via structure-from-motion, then manually registering the reconstructions to floor plans gathered from the Internet, from which we can derive correspondences between images and floor plans. C3 contains 90K paired floor plans and photos across 597 scenes with 153M pixel-level correspondences and 85K camera poses. We find that state-of-the-art correspondence models struggle on this task. By training on our new data, we can improve on the best performing method by 34% in RMSE. We also use the predicted correspondences to estimate camera poses and evaluate performance using recall metrics. Lastly, we identify open challenges in cross-modal geometric reasoning that our dataset aims to help address.

preprint2022arXiv

Developing a Series of AI Challenges for the United States Department of the Air Force

Through a series of federal initiatives and orders, the U.S. Government has been making a concerted effort to ensure American leadership in AI. These broad strategy documents have influenced organizations such as the United States Department of the Air Force (DAF). The DAF-MIT AI Accelerator is an initiative between the DAF and MIT to bridge the gap between AI researchers and DAF mission requirements. Several projects supported by the DAF-MIT AI Accelerator are developing public challenge problems that address numerous Federal AI research priorities. These challenges target priorities by making large, AI-ready datasets publicly available, incentivizing open-source solutions, and creating a demand signal for dual use technologies that can stimulate further research. In this article, we describe these public challenges being developed and how their application contributes to scientific advances.

preprint2022arXiv

Learning Program Representations for Food Images and Cooking Recipes

In this paper, we are interested in modeling a how-to instructional procedure, such as a cooking recipe, with a meaningful and rich high-level representation. Specifically, we propose to represent cooking recipes and food images as cooking programs. Programs provide a structured representation of the task, capturing cooking semantics and sequential relationships of actions in the form of a graph. This allows them to be easily manipulated by users and executed by agents. To this end, we build a model that is trained to learn a joint embedding between recipes and food images via self-supervision and jointly generate a program from this embedding as a sequence. To validate our idea, we crowdsource programs for cooking recipes and show that: (a) projecting the image-recipe embeddings into programs leads to better cross-modal retrieval results; (b) generating programs from images leads to better recognition results compared to predicting raw cooking instructions; and (c) we can generate food images by manipulating programs via optimizing the latent code of a GAN. Code, data, and models are available online.

preprint2022arXiv

MultiEarth 2022 -- Multimodal Learning for Earth and Environment Workshop and Challenge

The Multimodal Learning for Earth and Environment Challenge (MultiEarth 2022) will be the first competition aimed at the monitoring and analysis of deforestation in the Amazon rainforest at any time and in any weather conditions. The goal of the Challenge is to provide a common benchmark for multimodal information processing and to bring together the earth and environmental science communities as well as multimodal representation learning communities to compare the relative merits of the various multimodal learning methods to deforestation estimation under well-defined and strictly comparable conditions. MultiEarth 2022 will have three sub-challenges: 1) matrix completion, 2) deforestation estimation, and 3) image-to-image translation. This paper presents the challenge guidelines, datasets, and evaluation metrics for the three sub-challenges. Our challenge website is available at https://sites.google.com/view/rainforest-challenge.

preprint2020arXiv

Distributed Motion Control for Multiple Connected Surface Vessels

We propose a scalable cooperative control approach which coordinates a group of rigidly connected autonomous surface vessels to track desired trajectories in a planar water environment as a single floating modular structure. Our approach leverages the implicit information of the structure's motion for force and torque allocation without explicit communication among the robots. In our system, a leader robot steers the entire group by adjusting its force and torque according to the structure's deviation from the desired trajectory, while follower robots run distributed consensus-based controllers to match their inputs to amplify the leader's intent using only onboard sensors as feedback. To cope with the complex and highly coupled system dynamics in the water, the leader robot employs a nonlinear model predictive controller (NMPC), where we experimentally estimated the dynamics model of the floating modular structure in order to achieve superior performance for leader-following control. Our method has a wide range of potential applications in transporting humans and goods in many of today's existing waterways. We conducted trajectory and orientation tracking experiments in hardware with three custom-built autonomous modular robotic boats, called Roboat, which are capable of holonomic motions and onboard state estimation. Simulation results with up to 65 robots also prove the scalability of our proposed approach.

Kuan Wei Huang

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

C3Po: Cross-View Cross-Modality Correspondence by Pointmap Prediction

Developing a Series of AI Challenges for the United States Department of the Air Force

Learning Program Representations for Food Images and Cooking Recipes

MultiEarth 2022 -- Multimodal Learning for Earth and Environment Workshop and Challenge

Distributed Motion Control for Multiple Connected Surface Vessels