Source author record

Yi-Ting Chen

Yi-Ting Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Robotics Artificial Intelligence cond-mat.mes-hall cond-mat.mtrl-sci cond-mat.str-el eess.IV physics.atom-ph quant-ph Social and Information Networks

Catalog footprint

What is connected

12works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Affordance-Guided Coarse-to-Fine Exploration for Base Placement in Open-Vocabulary Mobile Manipulation

In open-vocabulary mobile manipulation (OVMM), task success often hinges on the selection of an appropriate base placement for the robot. Existing approaches typically navigate to proximity-based regions without considering affordances, resulting in frequent manipulation failures. We propose Affordance-Guided Coarse-to-Fine Exploration, a zero-shot framework for base placement that integrates semantic understanding from vision-language models (VLMs) with geometric feasibility through an iterative optimization process. Our method constructs cross-modal representations, namely Affordance RGB and Obstacle Map+, to align semantics with spatial context. This enables reasoning that extends beyond the egocentric limitations of RGB perception. To ensure interaction is guided by task-relevant affordances, we leverage coarse semantic priors from VLMs to guide the search toward task-relevant regions and refine placements with geometric constraints, thereby reducing the risk of convergence to local optima. Evaluated on five diverse open-vocabulary mobile manipulation tasks, our system achieves an 85% success rate, significantly outperforming classical geometric planners and VLM-based methods. This demonstrates the promise of affordance-aware and multimodal reasoning for generalizable, instruction-conditioned planning in OVMM.

preprint2022arXiv

ADAM Challenge: Detecting Age-related Macular Degeneration from Fundus Images

Age-related macular degeneration (AMD) is the leading cause of visual impairment among elderly in the world. Early detection of AMD is of great importance, as the vision loss caused by this disease is irreversible and permanent. Color fundus photography is the most cost-effective imaging modality to screen for retinal disorders. Cutting edge deep learning based algorithms have been recently developed for automatically detecting AMD from fundus images. However, there are still lack of a comprehensive annotated dataset and standard evaluation benchmarks. To deal with this issue, we set up the Automatic Detection challenge on Age-related Macular degeneration (ADAM), which was held as a satellite event of the ISBI 2020 conference. The ADAM challenge consisted of four tasks which cover the main aspects of detecting and characterizing AMD from fundus images, including detection of AMD, detection and segmentation of optic disc, localization of fovea, and detection and segmentation of lesions. As part of the challenge, we have released a comprehensive dataset of 1200 fundus images with AMD diagnostic labels, pixel-wise segmentation masks for both optic disc and AMD-related lesions (drusen, exudates, hemorrhages and scars, among others), as well as the coordinates corresponding to the location of the macular fovea. A uniform evaluation framework has been built to make a fair comparison of different models using this dataset. During the challenge, 610 results were submitted for online evaluation, with 11 teams finally participating in the onsite challenge. This paper introduces the challenge, the dataset and the evaluation methods, as well as summarizes the participating methods and analyzes their results for each task. In particular, we observed that the ensembling strategy and the incorporation of clinical domain knowledge were the key to improve the performance of the deep learning models.

preprint2022arXiv

High-Performance Microwave Frequency Standard Based on Sympathetically Cooled Ions

The ion microwave frequency standard is a candidate for the next generation of microwave frequency standard with the potential for very wide applications. The Dick effect and second-order Doppler frequency shift (SODFS) limit the performance of ion microwave frequency standards. The introduction of sympathetic cooling technology can suppress the Dick effect and SODFS and improve the stability and accuracy of the frequency standard. However, the sympathetically-cooled ion microwave frequency standard has seldom been studied before. This paper reports the first sympathetically-cooled ion microwave frequency standard in a Paul trap. Using laser-cooled ${}^{40}\mathrm{Ca}^{+}$ as coolant ions, ${}^{113}\mathrm{Cd}^{+}$ ion crystal is cooled to below 100 mK and has a coherence lifetime of over 40 s. The short-term frequency stability reached $3.48 \times 10^{-13}/τ^{1/2}$, which is comparable to that of the mercury ion frequency standard. Its uncertainty is $1.5\times 10^{-14}$, which is better than that of directly laser-cooled cadmium ion frequency standard.

preprint2022arXiv

Multimodal Object Detection via Probabilistic Ensembling

Object detection with multimodal inputs can improve many safety-critical systems such as autonomous vehicles (AVs). Motivated by AVs that operate in both day and night, we study multimodal object detection with RGB and thermal cameras, since the latter provides much stronger object signatures under poor illumination. We explore strategies for fusing information from different modalities. Our key contribution is a probabilistic ensembling technique, ProbEn, a simple non-learned method that fuses together detections from multi-modalities. We derive ProbEn from Bayes' rule and first principles that assume conditional independence across modalities. Through probabilistic marginalization, ProbEn elegantly handles missing modalities when detectors do not fire on the same object. Importantly, ProbEn also notably improves multimodal detection even when the conditional independence assumption does not hold, e.g., fusing outputs from other fusion methods (both off-the-shelf and trained in-house). We validate ProbEn on two benchmarks containing both aligned (KAIST) and unaligned (FLIR) multimodal images, showing that ProbEn outperforms prior work by more than 13% in relative performance!

preprint2021arXiv

Nanoscale electronic transparency of wafer-scale hexagonal boron nitride

Monolayer hBN has attracted interest as a potentially weakly interacting 2D insulating layer in heterostructures. Recently, wafer-scale hBN growth on Cu(111) has been demonstrated for semiconductor chip fabrication processes and transistor action. For all these applications, the perturbation on the underlying electronically active layers is critical. For example, while hBN on Cu(111) has been shown to preserve the Cu(111) surface state 2D electron gas, it was previously unknown how this varies over the sample and how it is affected by local electronic corrugation. Here, we demonstrate that the Cu(111) surface state under wafer-scale hBN is robustly homogeneous in energy and spectral weight over nanometer length scales and over atomic terraces. We contrast this with a benchmark spectral feature associated with interaction between BN atoms and the Cu surface, which varies with the Moiré pattern of the hBN/Cu(111) sample and is dependent on atomic registry. This work demonstrates that fragile 2D electron systems and interface states are largely unperturbed by local variations created by the hBN due to atomic-scale interactions with the substrate, thus providing a remarkably transparent window on low-energy electronic structure below the hBN monolayer.

preprint2020arXiv

Boosting Standard Classification Architectures Through a Ranking Regularizer

We employ triplet loss as a feature embedding regularizer to boost classification performance. Standard architectures, like ResNet and Inception, are extended to support both losses with minimal hyper-parameter tuning. This promotes generality while fine-tuning pretrained networks. Triplet loss is a powerful surrogate for recently proposed embedding regularizers. Yet, it is avoided due to large batch-size requirement and high computational cost. Through our experiments, we re-assess these assumptions. During inference, our network supports both classification and embedding tasks without any computational overhead. Quantitative evaluation highlights a steady improvement on five fine-grained recognition datasets. Further evaluation on an imbalanced video dataset achieves significant improvement. Triplet loss brings feature embedding characteristics like nearest neighbor to classification models. Code available at \url{http://bit.ly/2LNYEqL}.

preprint2020arXiv

Learning 3D-aware Egocentric Spatial-Temporal Interaction via Graph Convolutional Networks

To enable intelligent automated driving systems, a promising strategy is to understand how human drives and interacts with road users in complicated driving situations. In this paper, we propose a 3D-aware egocentric spatial-temporal interaction framework for automated driving applications. Graph convolution networks (GCN) is devised for interaction modeling. We introduce three novel concepts into GCN. First, we decompose egocentric interactions into ego-thing and ego-stuff interaction, modeled by two GCNs. In both GCNs, ego nodes are introduced to encode the interaction between thing objects (e.g., car and pedestrian), and interaction between stuff objects (e.g., lane marking and traffic light). Second, objects' 3D locations are explicitly incorporated into GCN to better model egocentric interactions. Third, to implement ego-stuff interaction in GCN, we propose a MaskAlign operation to extract features for irregular objects. We validate the proposed framework on tactical driver behavior recognition. Extensive experiments are conducted using Honda Research Institute Driving Dataset, the largest dataset with diverse tactical driver behavior annotations. Our framework demonstrates substantial performance boost over baselines on the two experimental settings by 3.9% and 6.0%, respectively. Furthermore, we visualize the learned affinity matrices, which encode ego-thing and ego-stuff interactions, to showcase the proposed framework can capture interactions effectively.

preprint2020arXiv

Low Rank Density Matrix Evolution for Noisy Quantum Circuits

In this work, we present an efficient rank-compression approach for the classical simulation of Kraus decoherence channels in noisy quantum circuits. The approximation is achieved through iterative compression of the density matrix based on its leading eigenbasis during each simulation step without the need to store, manipulate, or diagonalize the full matrix. We implement this algorithm in an in-house simulator, and show that the low rank algorithm speeds up simulations by more than two orders of magnitude over an existing implementation of full rank simulator, and with negligible error in the target noise and final observables. Finally, we demonstrate the utility of the low rank method as applied to representative problems of interest by using the algorithm to speed-up noisy simulations of Grover's search algorithm and quantum chemistry solvers.

preprint2020arXiv

Uncertainty-aware Self-supervised 3D Data Association

3D object trackers usually require training on large amounts of annotated data that is expensive and time-consuming to collect. Instead, we propose leveraging vast unlabeled datasets by self-supervised metric learning of 3D object trackers, with a focus on data association. Large scale annotations for unlabeled data are cheaply obtained by automatic object detection and association across frames. We show how these self-supervised annotations can be used in a principled manner to learn point-cloud embeddings that are effective for 3D tracking. We estimate and incorporate uncertainty in self-supervised tracking to learn more robust embeddings, without needing any labeled data. We design embeddings to differentiate objects across frames, and learn them using uncertainty-aware self-supervised training. Finally, we demonstrate their ability to perform accurate data association across frames, towards effective and accurate 3D tracking. Project videos and code are at https://jianrenw.github.io/Self-Supervised-3D-Data-Association.

preprint2020arXiv

Who Make Drivers Stop? Towards Driver-centric Risk Assessment: Risk Object Identification via Causal Inference

A significant amount of people die in road accidents due to driver errors. To reduce fatalities, developing intelligent driving systems assisting drivers to identify potential risks is in an urgent need. Risky situations are generally defined based on collision prediction in the existing works. However, collision is only a source of potential risks, and a more generic definition is required. In this work, we propose a novel driver-centric definition of risk, i.e., objects influencing drivers' behavior are risky. A new task called risk object identification is introduced. We formulate the task as the cause-effect problem and present a novel two-stage risk object identification framework based on causal inference with the proposed object-level manipulable driving model. We demonstrate favorable performance on risk object identification compared with strong baselines on the Honda Research Institute Driving Dataset (HDD). Our framework achieves a substantial average performance boost over a strong baseline by 7.5%.

preprint2019arXiv

Quantum Engineered Kondo Lattices

Recent advances in atomic manipulation techniques have provided a novel bottom-up approach to investigating the unconventional properties and complex phases of strongly correlated electron materials. By engineering artificial condensed matter systems containing tens to thousands of atoms with tailored electronic or magnetic properties, it has become possible to explore how quantum many-body effects---whose existence lies at the heart of strongly correlated materials---emerge as the size of a system is increased from the nanoscale to the mesoscale. Here we investigate both theoretically and experimentally the quantum engineering of nanoscopic Kondo lattices -- Kondo droplets -- that exemplify nanoscopic replicas of heavy-fermion materials. We demonstrate that by changing a droplet's real space geometry, it is possible to not only create coherently coupled Kondo droplets whose properties asymptotically approach those of a quantum-coherent Kondo lattice, but also to markedly increase or decrease the droplet's Kondo temperature. Furthermore we report on the discovery of a new quantum phenomenon -- the Kondo echo -- a signature of droplets containing Kondo holes functioning as direct probes of spatially extended, quantum-coherent Kondo cloud correlations.

preprint2011arXiv

On Sharing Viral Video over an Ad Hoc Wireless Network

We consider the problem of broadcasting a viral video (a large file) over an ad hoc wireless network (e.g., students in a campus). Many smartphones are GPS enabled, and equipped with peer-to-peer (ad hoc) transmission mode, allowing them to wirelessly exchange files over short distances rather than use the carrier's WAN. The demand for the file however is transmitted through the social network (e.g., a YouTube link posted on Facebook). To address this coupled-network problem (demand on the social network; bandwidth on the wireless network) where the two networks have different topologies, we propose a file dissemination algorithm. In our scheme, users query their social network to find geographically nearby friends that have the desired file, and utilize the underlying ad hoc network to route the data via multi-hop transmissions. We show that for many popular models for social networks, the file dissemination time scales sublinearly with n; the number of users, compared to the linear scaling required if each user who wants the file must download it from the carrier's WAN.

Yi-Ting Chen

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Affordance-Guided Coarse-to-Fine Exploration for Base Placement in Open-Vocabulary Mobile Manipulation

ADAM Challenge: Detecting Age-related Macular Degeneration from Fundus Images

High-Performance Microwave Frequency Standard Based on Sympathetically Cooled Ions

Multimodal Object Detection via Probabilistic Ensembling

Nanoscale electronic transparency of wafer-scale hexagonal boron nitride

Boosting Standard Classification Architectures Through a Ranking Regularizer

Learning 3D-aware Egocentric Spatial-Temporal Interaction via Graph Convolutional Networks

Low Rank Density Matrix Evolution for Noisy Quantum Circuits

Uncertainty-aware Self-supervised 3D Data Association

Who Make Drivers Stop? Towards Driver-centric Risk Assessment: Risk Object Identification via Causal Inference

Quantum Engineered Kondo Lattices

On Sharing Viral Video over an Ad Hoc Wireless Network