Source author record

Yifei Hu

Yifei Hu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language math.ST Methodology Robotics Statistics Theory

Catalog footprint

What is connected

3works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CLAMP: Crowdsourcing a LArge-scale in-the-wild haptic dataset with an open-source device for Multimodal robot Perception

Robust robot manipulation in unstructured environments often requires understanding object properties that extend beyond geometry, such as material or compliance-properties that can be challenging to infer using vision alone. Multimodal haptic sensing provides a promising avenue for inferring such properties, yet progress has been constrained by the lack of large, diverse, and realistic haptic datasets. In this work, we introduce the CLAMP device, a low-cost (<\$200) sensorized reacher-grabber designed to collect large-scale, in-the-wild multimodal haptic data from non-expert users in everyday settings. We deployed 16 CLAMP devices to 41 participants, resulting in the CLAMP dataset, the largest open-source multimodal haptic dataset to date, comprising 12.3 million datapoints across 5357 household objects. Using this dataset, we train a haptic encoder that can infer material and compliance object properties from multimodal haptic data. We leverage this encoder to create the CLAMP model, a visuo-haptic perception model for material recognition that generalizes to novel objects and three robot embodiments with minimal finetuning. We also demonstrate the effectiveness of our model in three real-world robot manipulation tasks: sorting recyclable and non-recyclable waste, retrieving objects from a cluttered bag, and distinguishing overripe from ripe bananas. Our results show that large-scale, in-the-wild haptic data collection can unlock new capabilities for generalizable robot manipulation. Website: https://emprise.cs.cornell.edu/clamp/

preprint2022arXiv

Weak Signal Inclusion Under Sparsity and Dependence

We consider the scenario where important signals are not strong enough to be separable from a large amount of noise. Such weak signals commonly exist in large-scale data analysis and play vital roles in many biomedical applications. Existing methods however are mostly underpowered for such weak signals. We address the challenge from the perspective of false negative control and develop a new method to efficiently regulate false negative proportion at a user-specified level. The new method is developed in a realistic setting with arbitrary covariance dependence between variables. We calibrate the overall dependence through a parameter whose scale is compatible with the existing phase diagram in high-dimensional sparse inference. Utilizing the new calibration, we asymptotically explicate the joint effect of covariance dependence, signal sparsity, and signal intensity on the proposed method. We interpret the results using a new phase diagram, which shows that the proposed method can efficiently retain a high proportion of signals even when they cannot be well-separated from noise. Finite sample performance of the proposed method is compared to those of several existing methods in simulation studies. The proposed method outperforms the others in adapting to a user-specified false negative control level. We apply the new method to analyze an fMRI dataset to locate voxels that are functionally relevant to saccadic eye movements. The new method exhibits a nice balance in identifying functional relevant regions and avoiding excessive noise voxels.

preprint2021arXiv

Misspelling Correction with Pre-trained Contextual Language Model

Spelling irregularities, known now as spelling mistakes, have been found for several centuries. As humans, we are able to understand most of the misspelled words based on their location in the sentence, perceived pronunciation, and context. Unlike humans, computer systems do not possess the convenient auto complete functionality of which human brains are capable. While many programs provide spelling correction functionality, many systems do not take context into account. Moreover, Artificial Intelligence systems function in the way they are trained on. With many current Natural Language Processing (NLP) systems trained on grammatically correct text data, many are vulnerable against adversarial examples, yet correctly spelled text processing is crucial for learning. In this paper, we investigate how spelling errors can be corrected in context, with a pre-trained language model BERT. We present two experiments, based on BERT and the edit distance algorithm, for ranking and selecting candidate corrections. The results of our experiments demonstrated that when combined properly, contextual word embeddings of BERT and edit distance are capable of effectively correcting spelling errors.