Source author record

Ran Xu

Ran Xu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Machine Learning physics.comp-ph cond-mat.mtrl-sci Cryptography and Security Distributed, Parallel, and Cluster Computing Human-Computer Interaction Information Theory math.IT math.NA Networking and Internet Architecture Neurons and Cognition Numerical Analysis physics.atom-ph

Catalog footprint

What is connected

15works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

EgoIntrospect: An Egocentric Dataset and Benchmark for User-Centric Internal State Reasoning

Despite extensive efforts on egocentric video datasets and benchmarks, understanding users' internal states, which is crucial for enabling seamless AI assistant experiences, remains largely overlooked. In this work, we introduce EgoIntrospect, the first egocentric dataset captured in user-driven scenarios with self-annotations that explicitly reveal users' interactive intentions with AI assistants. EgoIntrospect was collected using a cross-device setup, providing synchronized video, audio, gaze, motion, and physiological signals. It consists of 180 hours of recordings from 60 subjects, with an average recording duration of 3 hours per subject. Leveraging EgoIntrospect, we formalize a suite of tasks centered on user internal states, including affective experience, interactive intent, and cognitive memory. We further process the annotations to construct benchmarks that evaluate the ability of modern multimodal large language models to reason about users' internal states from egocentric observations. Experiments on our benchmark suggest that existing multimodal large language models struggle to effectively leverage multimodal signals to infer users' subjective internal states. The dataset and annotations will be made publicly available to advance research in egocentric vision and wearable AI assistants. Project page: https://ego-introspect.github.io/

preprint2022arXiv

Field Extraction from Forms with Unlabeled Data

We propose a novel framework to conduct field extraction from forms with unlabeled data. To bootstrap the training process, we develop a rule-based method for mining noisy pseudo-labels from unlabeled forms. Using the supervisory signal from the pseudo-labels, we extract a discriminative token representation from a transformer-based model by modeling the interaction between text in the form. To prevent the model from overfitting to label noise, we introduce a refinement module based on a progressive pseudo-label ensemble. Experimental results demonstrate the effectiveness of our framework.

preprint2022arXiv

Open Vocabulary Object Detection with Pseudo Bounding-Box Labels

Despite great progress in object detection, most existing methods work only on a limited set of object categories, due to the tremendous human effort needed for bounding-box annotations of training data. To alleviate the problem, recent open vocabulary and zero-shot detection methods attempt to detect novel object categories beyond those seen during training. They achieve this goal by training on a pre-defined base categories to induce generalization to novel objects. However, their potential is still constrained by the small set of base categories available for training. To enlarge the set of base classes, we propose a method to automatically generate pseudo bounding-box annotations of diverse objects from large-scale image-caption pairs. Our method leverages the localization ability of pre-trained vision-language models to generate pseudo bounding-box labels and then directly uses them for training object detectors. Experimental results show that our method outperforms the state-of-the-art open vocabulary detector by 8% AP on COCO novel categories, by 6.3% AP on PASCAL VOC, by 2.3% AP on Objects365 and by 2.8% AP on LVIS. Code is available at https://github.com/salesforce/PB-OVD.

preprint2022arXiv

Use All The Labels: A Hierarchical Multi-Label Contrastive Learning Framework

Current contrastive learning frameworks focus on leveraging a single supervisory signal to learn representations, which limits the efficacy on unseen data and downstream tasks. In this paper, we present a hierarchical multi-label representation learning framework that can leverage all available labels and preserve the hierarchical relationship between classes. We introduce novel hierarchy preserving losses, which jointly apply a hierarchical penalty to the contrastive loss, and enforce the hierarchy constraint. The loss function is data driven and automatically adapts to arbitrary multi-label structures. Experiments on several datasets show that our relationship-preserving embedding performs well on a variety of tasks and outperform the baseline supervised and self-supervised approaches. Code is available at https://github.com/salesforce/hierarchicalContrastiveLearning.

preprint2022arXiv

Value Retrieval with Arbitrary Queries for Form-like Documents

We propose value retrieval with arbitrary queries for form-like documents to reduce human effort of processing forms. Unlike previous methods that only address a fixed set of field items, our method predicts target value for an arbitrary query based on the understanding of the layout and semantics of a form. To further boost model performance, we propose a simple document language modeling (SimpleDLM) strategy to improve document understanding on large-scale model pre-training. Experimental results show that our method outperforms previous designs significantly and the SimpleDLM further improves our performance on value retrieval by around 17% F1 score compared with the state-of-the-art pre-training method. Code is available at https://github.com/salesforce/QVR-SimpleDLM.

preprint2020arXiv

New Frontiers in IoT: Networking, Systems, Reliability, and Security Challenges

The field of IoT has blossomed and is positively influencing many application domains. In this paper, we bring out the unique challenges this field poses to research in computer systems and networking. The unique challenges arise from the unique characteristics of IoT systems such as the diversity of application domains where they are used and the increasingly demanding protocols they are being called upon to run (such as, video and LIDAR processing) on constrained resources (on-node and network). We show how these open challenges can benefit from foundations laid in other areas, such as, 5G cellular protocols, ML model reduction, and device-edge-cloud offloading. We then discuss the unique challenges for reliability, security, and privacy posed by IoT systems due to their salient characteristics which include heterogeneity of devices and protocols, dependence on the physical environment, and the close coupling with humans. We again show how the open research challenges benefit from reliability, security, and privacy advancements in other areas. We conclude by providing a vision for a desirable end state for IoT systems.

preprint2020arXiv

Proposal Learning for Semi-Supervised Object Detection

In this paper, we focus on semi-supervised object detection to boost performance of proposal-based object detectors (a.k.a. two-stage object detectors) by training on both labeled and unlabeled data. However, it is non-trivial to train object detectors on unlabeled data due to the unavailability of ground truth labels. To address this problem, we present a proposal learning approach to learn proposal features and predictions from both labeled and unlabeled data. The approach consists of a self-supervised proposal learning module and a consistency-based proposal learning module. In the self-supervised proposal learning module, we present a proposal location loss and a contrastive loss to learn context-aware and noise-robust proposal features respectively. In the consistency-based proposal learning module, we apply consistency losses to both bounding box classification and regression predictions of proposals to learn noise-robust proposal features and predictions. Our approach enjoys the following benefits: 1) encouraging more context information to delivered in the proposals learning procedure; 2) noisy proposal features and enforcing consistency to allow noise-robust object detection; 3) building a general and high-performance semi-supervised object detection framework, which can be easily adapted to proposal-based object detectors with different backbone architectures. Experiments are conducted on the COCO dataset with all available labeled and unlabeled data. Results demonstrate that our approach consistently improves the performance of fully-supervised baselines. In particular, after combining with data distillation, our approach improves AP by about 2.0% and 0.9% on average compared to fully-supervised baselines and data distillation baselines respectively.

preprint2019arXiv

TempoCave: Visualizing Dynamic Connectome Datasets to Support Cognitive Behavioral Therapy

We introduce TempoCave, a novel visualization application for analyzing dynamic brain networks, or connectomes. TempoCave provides a range of functionality to explore metrics related to the activity patterns and modular affiliations of different regions in the brain. These patterns are calculated by processing raw data retrieved functional magnetic resonance imaging (fMRI) scans, which creates a network of weighted edges between each brain region, where the weight indicates how likely these regions are to activate synchronously. In particular, we support the analysis needs of clinical psychologists, who examine these modular affiliations and weighted edges and their temporal dynamics, utilizing them to understand relationships between neurological disorders and brain activity, which could have a significant impact on the way in which patients are diagnosed and treated. We summarize the core functionality of TempoCave, which supports a range of comparative tasks, and runs both in a desktop mode and in an immersive mode. Furthermore, we present a real-world use case that analyzes pre- and post-treatment connectome datasets from 27 subjects in a clinical study investigating the use of cognitive behavior therapy to treat major depression disorder, indicating that TempoCave can provide new insight into the dynamic behavior of the human brain.

preprint2015arXiv

Collecting and Annotating the Large Continuous Action Dataset

We make available to the community a new dataset to support action-recognition research. This dataset is different from prior datasets in several key ways. It is significantly larger. It contains streaming video with long segments containing multiple action occurrences that often overlap in space and/or time. All actions were filmed in the same collection of backgrounds so that background gives little clue as to action class. We had five humans replicate the annotation of temporal extent of action occurrences labeled with their class and measured a surprisingly low level of intercoder agreement. A baseline experiment shows that recent state-of-the-art methods perform poorly on this dataset. This suggests that this will be a challenging dataset to foster advances in action-recognition research. This manuscript serves to describe the novel content and characteristics of the LCA dataset, present the design decisions made when filming the dataset, and document the novel methods employed to annotate the dataset.

preprint2015arXiv

Sequential Labeling with online Deep Learning

Deep learning has attracted great attention recently and yielded the state of the art performance in dimension reduction and classification problems. However, it cannot effectively handle the structured output prediction, e.g. sequential labeling. In this paper, we propose a deep learning structure, which can learn discriminative features for sequential labeling problems. More specifically, we add the inter-relationship between labels in our deep learning structure, in order to incorporate the context information from the sequential data. Thus, our model is more powerful than linear Conditional Random Fields (CRFs) because the objective function learns latent non-linear features so that target labeling can be better predicted. We pretrain the deep structure with stacked restricted Boltzmann machines (RBMs) for feature learning and optimize our objective function with online learning algorithm, a mixture of perceptron training and stochastic gradient descent. We test our model on different challenge tasks, and show that our model outperforms significantly over the completive baselines.

preprint2014arXiv

Compositional Structure Learning for Action Understanding

The focus of the action understanding literature has predominately been classification, how- ever, there are many applications demanding richer action understanding such as mobile robotics and video search, with solutions to classification, localization and detection. In this paper, we propose a compositional model that leverages a new mid-level representation called compositional trajectories and a locally articulated spatiotemporal deformable parts model (LALSDPM) for fully action understanding. Our methods is advantageous in capturing the variable structure of dynamic human activity over a long range. First, the compositional trajectories capture long-ranging, frequently co-occurring groups of trajectories in space time and represent them in discriminative hierarchies, where human motion is largely separated from camera motion; second, LASTDPM learns a structured model with multi-layer deformable parts to capture multiple levels of articulated motion. We implement our methods and demonstrate state of the art performance on all three problems: action detection, localization, and recognition.

preprint2013arXiv

A hybrid molecular dynamics/atomic-scale finite element method for quasi-static atomistic simulations at finite temperature

In this paper, a hybrid quasi-static atomistic simulation method at finite temperature is developed, which combines the advantages of MD for thermal equilibrium and atomic-scale finite element method (AFEM) for efficient equilibration. Some temperature effects are embedded in static AFEM simulation by applying the virtual and equivalent thermal disturbance forces extracted from MD. Alternatively performing MD and AFEM can quickly obtain a series of thermodynamic equilibrium configurations such that a quasi-static process is modeled. Moreover, a stirring-accelerated MD/AFEM fast relaxation approach is proposed, in which the atomic forces and velocities are randomly exchanged to artificially accelerate the "slow processes" such as mechanical wave propagation and thermal diffusion. The efficiency of the proposed methods is demonstrated by numerical examples on single wall carbon nanotubes.

preprint2012arXiv

Random Forests for Metric Learning with Implicit Pairwise Position Dependence

Metric learning makes it plausible to learn distances for complex distributions of data from labeled data. However, to date, most metric learning methods are based on a single Mahalanobis metric, which cannot handle heterogeneous data well. Those that learn multiple metrics throughout the space have demonstrated superior accuracy, but at the cost of computational efficiency. Here, we take a new angle to the metric learning problem and learn a single metric that is able to implicitly adapt its distance function throughout the feature space. This metric adaptation is accomplished by using a random forest-based classifier to underpin the distance function and incorporate both absolute pairwise position and standard relative position into the representation. We have implemented and tested our method against state of the art global and multi-metric methods on a variety of data sets. Overall, the proposed method outperforms both types of methods in terms of accuracy (consistently ranked first) and is an order of magnitude faster than state of the art multi-metric methods (16x faster in the worst case).

preprint2012arXiv

Scalable hierarchical parallel algorithm for the solution of super large-scale sparse linear equations

The parallel linear equations solver capable of effectively using 1000+ processors becomes the bottleneck of large-scale implicit engineering simulations. In this paper, we present a new hierarchical parallel master-slave-structural iterative algorithm for the solution of super large-scale sparse linear equations in distributed memory computer cluster. Through alternatively performing global equilibrium computation and local relaxation, our proposed algorithm will reach the specific accuracy requirement in a few of iterative steps. Moreover, each set/slave-processor majorly communicate with its nearest neighbors, and the transferring data between sets/slave-processors and master is always far below the set-neighbor communication. The corresponding algorithm for implicit finite element analysis has been implemented based on MPI library, and a super large 2-dimension square system of triangle-lattice truss structure under random static loads is simulated with over one billion degrees of freedom and up to 2001 processors on "Exploration 100" cluster in Tsinghua University. The numerical experiments demonstrate that this algorithm has excellent parallel efficiency and high scalability, and it may have broad application in other implicit simulations.

preprint2010arXiv

A Discrete Time Markov Chain Model for High Throughput Bidirectional Fano Decoders

The bidirectional Fano algorithm (BFA) can achieve at least two times decoding throughput compared to the conventional unidirectional Fano algorithm (UFA). In this paper, bidirectional Fano decoding is examined from the queuing theory perspective. A Discrete Time Markov Chain (DTMC) is employed to model the BFA decoder with a finite input buffer. The relationship between the input data rate, the input buffer size and the clock speed of the BFA decoder is established. The DTMC based modelling can be used in designing a high throughput parallel BFA decoding system. It is shown that there is a tradeoff between the number of BFA decoders and the input buffer size, and an optimal input buffer size can be chosen to minimize the hardware complexity for a target decoding throughput in designing a high throughput parallel BFA decoding system.

Ran Xu

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

EgoIntrospect: An Egocentric Dataset and Benchmark for User-Centric Internal State Reasoning

Field Extraction from Forms with Unlabeled Data

Open Vocabulary Object Detection with Pseudo Bounding-Box Labels

Use All The Labels: A Hierarchical Multi-Label Contrastive Learning Framework

Value Retrieval with Arbitrary Queries for Form-like Documents

New Frontiers in IoT: Networking, Systems, Reliability, and Security Challenges

Proposal Learning for Semi-Supervised Object Detection

TempoCave: Visualizing Dynamic Connectome Datasets to Support Cognitive Behavioral Therapy

Collecting and Annotating the Large Continuous Action Dataset

Sequential Labeling with online Deep Learning

Compositional Structure Learning for Action Understanding

A hybrid molecular dynamics/atomic-scale finite element method for quasi-static atomistic simulations at finite temperature

Random Forests for Metric Learning with Implicit Pairwise Position Dependence

Scalable hierarchical parallel algorithm for the solution of super large-scale sparse linear equations

A Discrete Time Markov Chain Model for High Throughput Bidirectional Fano Decoders