Researcher profile

Wei Han

Wei Han contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
33works
0followers
15topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

33 published item(s)

preprint2026arXiv

LithoBench: Benchmarking Large Multimodal Models for Remote-Sensing Lithology Interpretation

Remote sensing lithology interpretation is fundamental to geological surveys, mineral exploration, and regional geological mapping. Unlike general land-cover recognition, lithology interpretation is a knowledge-intensive task that requires experts to infer rock types from various features, e.g., subtle visual, spectral, textural, geomorphological, and contextual cues, making reliable automated interpretation highly challenging. Geological knowledge-guided large multimodal models offer new opportunities, yet their evaluation remains constrained by the lack of benchmarks that capture lithological annotations, multi-level geological semantics, and expert-informed assessment. Here, we propose LithoBench, a multi-level benchmark for evaluating geological semantic understanding in remote sensing lithology interpretation. LithoBench contains 10,000 expert-annotated interpretation instances across 12 representative lithological categories, including 4,000 multiple-choice and 6,000 open-ended tasks organized into five cognitive levels: Identification and Description, Comparative Analysis, Mechanism Explanation, Practical Application, and Comprehensive Reasoning. We further develop an expert-in-the-loop, knowledge-grounded semi-automated construction pipeline, coupling multi sub-processes, e.g., structured geological image descriptions, to enhance geological validity and evaluation reliability. Experiments with multiple large vision-language models eveal substantial limitations in geological semantic understanding, particularly on higher-order explanation, application, and reasoning tasks.

preprint2023arXiv

Electromagnetic-Compliant Channel Modeling and Performance Evaluation for Holographic MIMO

Recently, the concept of holographic multiple-input multiple-output (MIMO) is emerging as one of the promising technologies beyond massive MIMO. Many challenges need to be addressed to bring this novel idea into practice, including electromagnetic (EM)-compliant channel modeling and accurate performance evaluation. In this paper, an EM-compliant channel model is proposed for the holographic MIMO systems, which is able to model both the characteristics of the propagation channel and the non-ideal factors caused by mutual coupling at the transceivers, including the antenna pattern distortion and the decrease of antenna efficiency. Based on the proposed channel model, a more realistic performance evaluation is conducted to show the performance of the holographic MIMO system in both the single-user and the multi-user scenarios. Key challenges and future research directions are further provided based on the theoretical analyses and numerical results.

preprint2022arXiv

Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data

Building inclusive speech recognition systems is a crucial step towards developing technologies that speakers of all language varieties can use. Therefore, ASR systems must work for everybody independently of the way they speak. To accomplish this goal, there should be available data sets representing language varieties, and also an understanding of model configuration that is the most helpful in achieving robust understanding of all types of speech. However, there are not enough data sets for accented speech, and for the ones that are already available, more training approaches need to be explored to improve the quality of accented speech recognition. In this paper, we discuss recent progress towards developing more inclusive ASR systems, namely, the importance of building new data sets representing linguistic diversity, and exploring novel training approaches to improve performance for all users. We address recent directions within benchmarking ASR systems for accented speech, measure the effects of wav2vec 2.0 pre-training on accented speech recognition, and highlight corpora relevant for diverse ASR evaluations.

preprint2022arXiv

An Efficient Two-Stage SPARC Decoder for Massive MIMO Unsourced Random Access

In this paper, we study a concatenate coding scheme based on sparse regression code (SPARC) and tree code for unsourced random access in massive multiple-input and multiple-output systems. Our focus is concentrated on efficient decoding for the inner SPARC with practical concerns. A two-stage method is proposed to achieve near-optimal performance while maintaining low computational complexity. Specifically, a one-step thresholding-based algorithm is first used for reducing large dimensions of the SPARC decoding, after which a relaxed maximum-likelihood estimator is employed for refinement. Adequate simulation results are provided to validate the near-optimal performance and the low computational complexity. Besides, for covariance-based sparse recovery method, theoretical analyses are given to characterize the upper bound of the number of active users supported when convex relaxation is considered, and the probability of successful dimension reduction by the one-step thresholding-based algorithm.

preprint2022arXiv

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens of thousands of hours of labeled data. In particular, on an ASR task with 34k hours of labeled data, by fine-tuning an 8 billion parameter pre-trained Conformer model we can match state-of-the-art (SoTA) performance with only 3% of the training data and significantly improve SoTA with the full training set. We also report on the universal benefits gained from using big pre-trained and self-trained models for a large set of downstream tasks that cover a wide range of speech domains and span multiple orders of magnitudes of dataset sizes, including obtaining SoTA performance on many public benchmarks. In addition, we utilize the learned representation of pre-trained networks to achieve SoTA results on non-ASR tasks.

preprint2022arXiv

DoubleMix: Simple Interpolation-Based Data Augmentation for Text Classification

This paper proposes a simple yet effective interpolation-based data augmentation approach termed DoubleMix, to improve the robustness of models in text classification. DoubleMix first leverages a couple of simple augmentation operations to generate several perturbed samples for each training data, and then uses the perturbed data and original data to carry out a two-step interpolation in the hidden space of neural models. Concretely, it first mixes up the perturbed data to a synthetic sample and then mixes up the original data and the synthetic perturbed data. DoubleMix enhances models' robustness by learning the "shifted" features in hidden space. On six text classification benchmark datasets, our approach outperforms several popular text augmentation methods including token-level, sentence-level, and hidden-level data augmentation techniques. Also, experiments in low-resource settings show our approach consistently improves models' performance when the training data is scarce. Extensive ablation studies and case studies confirm that each component of our approach contributes to the final performance and show that our approach exhibits superior performance on challenging counterexamples. Additionally, visual analysis shows that text features generated by our approach are highly interpretable. Our code for this paper can be found at https://github.com/declare-lab/DoubleMix.git.

preprint2022arXiv

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset. More precisely, we carry out noisy student training with SpecAugment using giant Conformer models pre-trained using wav2vec 2.0 pre-training. By doing so, we are able to achieve word-error-rates (WERs) 1.4%/2.6% on the LibriSpeech test/test-other sets against the current state-of-the-art WERs 1.7%/3.3%.

preprint2022arXiv

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge. Parti treats text-to-image generation as a sequence-to-sequence modeling problem, akin to machine translation, with sequences of image tokens as the target outputs rather than text tokens in another language. This strategy can naturally tap into the rich body of prior work on large language models, which have seen continued advances in capabilities and performance through scaling data and model sizes. Our approach is simple: First, Parti uses a Transformer-based image tokenizer, ViT-VQGAN, to encode images as sequences of discrete tokens. Second, we achieve consistent quality improvements by scaling the encoder-decoder Transformer model up to 20B parameters, with a new state-of-the-art zero-shot FID score of 7.23 and finetuned FID score of 3.22 on MS-COCO. Our detailed analysis on Localized Narratives as well as PartiPrompts (P2), a new holistic benchmark of over 1600 English prompts, demonstrate the effectiveness of Parti across a wide variety of categories and difficulty aspects. We also explore and highlight limitations of our models in order to define and exemplify key areas of focus for further improvements. See https://parti.research.google/ for high-resolution images.

preprint2022arXiv

Semantic Compression with Side Information: A Rate-Distortion Perspective

We consider the semantic rate-distortion problem motivated by task-oriented video compression. The semantic information corresponding to the task, which is not observable to the encoder, shows impacts on the observations through a joint probability distribution. The similarities among intra-frame segments and inter-frames in video compression are formulated as side information available at both the encoder and the decoder. The decoder is interested in recovering the observation and making an inference of the semantic information under certain distortion constraints. We establish the information-theoretic limits for the tradeoff between compression rates and distortions by fully characterizing the rate-distortion function. We further evaluate the rate-distortion function under specific Markov conditions for three scenarios: i) both the task and the observation are binary sources; ii) the task is a binary classification of an integer observation as even and odd; iii) Gaussian correlated task and observation. We also illustrate through numerical results that recovering only the semantic information can reduce the coding rate comparing to recovering the source observation.

preprint2022arXiv

Simulation and detection of Weyl fermions in ultracold Fermi gases with Raman-assisted spin-orbit coupling

Weyl fermion, also referred to as pseudo-magnetic monopole in momentum space, is an undiscovered massless elementary particle with half-integer spin predicted according to relativistic quantum field theory. Motivated by the recent experimental observation of Weyl semimetal band in ultracold Bose gases with Raman-assisted 3D spin-orbit coupling, we investigate the properties and possible observation of Weyl fermions in the low-energy quasi-particle excitations of ultracold Fermi gases. Following a previous suggestion that the existing Raman lattice scheme can be readily generalized to fermionic systems, here we discuss the movement of the Weyl points in the Brillouin Zone, as well as the creation and annihilation of Weyl fermions by adjusting the effective Zeeman field. The relevant topological properties are also demonstrated by calculating the Chern number. Furthermore, we propose how to experimentally verify the existence of the Weyl fermions and the associated quantum phase transition via density profile measurements.

preprint2022arXiv

Spin Seebeck effect in quantum magnet Pb2V3O9

Spin Seebeck effect (SSE), the generation of spin current from heat, has been extensively studied in a large variety of magnetic materials, including ferromagnets, antiferromagnets, paramagnets, and quantum spin liquids. In this paper, we report the study of the SSE in the single crystalline Pb2V3O9, a spin-gapped quantum magnet candidate with quasi-one-dimensional spin-1/2 chain. Detailed temperature and magnetic field dependences of the SSE are investigated, and the temperature-dependent critical magnetic fields show a strong correlation to the Bose-Einstein condensation phase of the quantum magnet Pb2V3O9. This work shows the potential of using spin current as a probe to study the spin correlation and phase transition properties in quantum magnets.

preprint2022arXiv

Unsupervised Data Selection via Discrete Speech Representation for ASR

Self-supervised learning of speech representations has achieved impressive results in improving automatic speech recognition (ASR). In this paper, we show that data selection is important for self-supervised learning. We propose a simple and effective unsupervised data selection method which selects acoustically similar speech to a target domain. It takes the discrete speech representation available in common self-supervised learning frameworks as input, and applies a contrastive data selection method on the discrete tokens. Through extensive empirical studies we show that our proposed method reduces the amount of required pre-training data and improves the downstream ASR performance. Pre-training on a selected subset of 6% of the general data pool results in 11.8% relative improvements in LibriSpeech test-other compared to pre-training on the full set. On Multilingual LibriSpeech French, German, and Spanish test sets, selecting 6% data for pre-training reduces word error rate by more than 15% relatively compared to the full set, and achieves competitive results compared to current state-of-the-art performances.

preprint2021arXiv

A Better and Faster End-to-End Model for Streaming ASR

End-to-end (E2E) models have shown to outperform state-of-the-art conventional models for streaming speech recognition [1] across many dimensions, including quality (as measured by word error rate (WER)) and endpointer latency [2]. However, the model still tends to delay the predictions towards the end and thus has much higher partial latency compared to a conventional ASR model. To address this issue, we look at encouraging the E2E model to emit words early, through an algorithm called FastEmit [3]. Naturally, improving on latency results in a quality degradation. To address this, we explore replacing the LSTM layers in the encoder of our E2E model with Conformer layers [4], which has shown good improvements for ASR. Secondly, we also explore running a 2nd-pass beam search to improve quality. In order to ensure the 2nd-pass completes quickly, we explore non-causal Conformer layers that feed into the same 1st-pass RNN-T decoder, an algorithm called Cascaded Encoders [5]. Overall, we find that the Conformer RNN-T with Cascaded Encoders offers a better quality and latency tradeoff for streaming ASR.

preprint2021arXiv

Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling

Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as quickly and accurately as possible, while full-context ASR waits for the completion of a full speech utterance before emitting completed hypotheses. In this work, we propose a unified framework, Dual-mode ASR, to train a single end-to-end ASR model with shared weights for both streaming and full-context speech recognition. We show that the latency and accuracy of streaming ASR significantly benefit from weight sharing and joint training of full-context ASR, especially with inplace knowledge distillation during the training. The Dual-mode ASR framework can be applied to recent state-of-the-art convolution-based and transformer-based ASR networks. We present extensive experiments with two state-of-the-art ASR networks, ContextNet and Conformer, on two datasets, a widely used public dataset LibriSpeech and a large-scale dataset MultiDomain. Experiments and ablation studies demonstrate that Dual-mode ASR not only simplifies the workflow of training and deploying streaming and full-context ASR models, but also significantly improves both emission latency and recognition accuracy of streaming ASR. With Dual-mode ASR, we achieve new state-of-the-art streaming ASR results on both LibriSpeech and MultiDomain in terms of accuracy and latency.

preprint2021arXiv

FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization

Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as quickly and accurately as possible. However, emitting fast without degrading quality, as measured by word error rate (WER), is highly challenging. Existing approaches including Early and Late Penalties and Constrained Alignments penalize emission delay by manipulating per-token or per-frame probability prediction in sequence transducer models. While being successful in reducing delay, these approaches suffer from significant accuracy regression and also require additional word alignment information from an existing model. In this work, we propose a sequence-level emission regularization method, named FastEmit, that applies latency regularization directly on per-sequence probability in training transducer models, and does not require any alignment. We demonstrate that FastEmit is more suitable to the sequence-level optimization of transducer models for streaming ASR by applying it on various end-to-end streaming ASR networks including RNN-Transducer, Transformer-Transducer, ConvNet-Transducer and Conformer-Transducer. We achieve 150-300 ms latency reduction with significantly better accuracy over previous techniques on a Voice Search test set. FastEmit also improves streaming ASR accuracy from 4.4%/8.9% to 3.1%/7.5% WER, meanwhile reduces 90th percentile latency from 210 ms to only 30 ms on LibriSpeech.

preprint2021arXiv

Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data

Streaming end-to-end automatic speech recognition (ASR) models are widely used on smart speakers and on-device applications. Since these models are expected to transcribe speech with minimal latency, they are constrained to be causal with no future context, compared to their non-streaming counterparts. Consequently, streaming models usually perform worse than non-streaming models. We propose a novel and effective learning method by leveraging a non-streaming ASR model as a teacher to generate transcripts on an arbitrarily large data set, which is then used to distill knowledge into streaming ASR models. This way, we scale the training of streaming models to up to 3 million hours of YouTube audio. Experiments show that our approach can significantly reduce the word error rate (WER) of RNNT models not only on LibriSpeech but also on YouTube data in four languages. For example, in French, we are able to reduce the WER by 16.4% relatively to a baseline streaming model by leveraging a non-streaming teacher model trained on the same amount of labeled data as the baseline.

preprint2021arXiv

Structural Entropy of the Stochastic Block Models

With the rapid expansion of graphs and networks and the growing magnitude of data from all areas of science, effective treatment and compression schemes of context-dependent data is extremely desirable. A particularly interesting direction is to compress the data while keeping the "structural information" only and ignoring the concrete labelings. Under this direction, Choi and Szpankowski introduced the structures (unlabeled graphs) which allowed them to compute the structural entropy of the Erdős--Rényi random graph model. Moreover, they also provided an asymptotically optimal compression algorithm that (asymptotically) achieves this entropy limit and runs in expectation in linear time. In this paper, we consider the Stochastic Block Models with an arbitrary number of parts. Indeed, we define a partitioned structural entropy for Stochastic Block Models, which generalizes the structural entropy for unlabeled graphs and encodes the partition information as well. We then compute the partitioned structural entropy of the Stochastic Block Models, and provide a compression scheme that asymptotically achieves this entropy limit.

preprint2021arXiv

TAPL: Dynamic Part-based Visual Tracking via Attention-guided Part Localization

Holistic object representation-based trackers suffer from performance drop under large appearance change such as deformation and occlusion. In this work, we propose a dynamic part-based tracker and constantly update the target part representation to adapt to object appearance change. Moreover, we design an attention-guided part localization network to directly predict the target part locations, and determine the final bounding box with the distribution of target parts. Our proposed tracker achieves promising results on various benchmarks: VOT2018, OTB100 and GOT-10k

preprint2020arXiv

Conformer: Convolution-augmented Transformer for Speech Recognition

Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs). Transformer models are good at capturing content-based global interactions, while CNNs exploit local features effectively. In this work, we achieve the best of both worlds by studying how to combine convolution neural networks and transformers to model both local and global dependencies of an audio sequence in a parameter-efficient way. To this regard, we propose the convolution-augmented transformer for speech recognition, named Conformer. Conformer significantly outperforms the previous Transformer and CNN based models achieving state-of-the-art accuracies. On the widely used LibriSpeech benchmark, our model achieves WER of 2.1%/4.3% without using a language model and 1.9%/3.9% with an external language model on test/testother. We also observe competitive performance of 2.7%/6.3% with a small model of only 10M parameters.

preprint2020arXiv

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

Convolutional neural networks (CNN) have shown promising results for end-to-end speech recognition, albeit still behind other state-of-the-art methods in performance. In this paper, we study how to bridge this gap and go beyond with a novel CNN-RNN-transducer architecture, which we call ContextNet. ContextNet features a fully convolutional encoder that incorporates global context information into convolution layers by adding squeeze-and-excitation modules. In addition, we propose a simple scaling method that scales the widths of ContextNet that achieves good trade-off between computation and accuracy. We demonstrate that on the widely used LibriSpeech benchmark, ContextNet achieves a word error rate (WER) of 2.1%/4.6% without external language model (LM), 1.9%/4.1% with LM and 2.9%/7.0% with only 10M parameters on the clean/noisy LibriSpeech test sets. This compares to the previous best published system of 2.0%/4.6% with LM and 3.9%/11.3% with 20M parameters. The superiority of the proposed ContextNet model is also verified on a much larger internal dataset.

preprint2020arXiv

FFusionCGAN: An end-to-end fusion method for few-focus images using conditional GAN in cytopathological digital slides

Multi-focus image fusion technologies compress different focus depth images into an image in which most objects are in focus. However, although existing image fusion techniques, including traditional algorithms and deep learning-based algorithms, can generate high-quality fused images, they need multiple images with different focus depths in the same field of view. This criterion may not be met in some cases where time efficiency is required or the hardware is insufficient. The problem is especially prominent in large-size whole slide images. This paper focused on the multi-focus image fusion of cytopathological digital slide images, and proposed a novel method for generating fused images from single-focus or few-focus images based on conditional generative adversarial network (GAN). Through the adversarial learning of the generator and discriminator, the method is capable of generating fused images with clear textures and large depth of field. Combined with the characteristics of cytopathological images, this paper designs a new generator architecture combining U-Net and DenseBlock, which can effectively improve the network's receptive field and comprehensively encode image features. Meanwhile, this paper develops a semantic segmentation network that identifies the blurred regions in cytopathological images. By integrating the network into the generative model, the quality of the generated fused images is effectively improved. Our method can generate fused images from only single-focus or few-focus images, thereby avoiding the problem of collecting multiple images of different focus depths with increased time and hardware costs. Furthermore, our model is designed to learn the direct mapping of input source images to fused images without the need to manually design complex activity level measurements and fusion rules as in traditional methods.

preprint2020arXiv

Rate-Splitting Multiple Access for Downlink Multi-Antenna Communications: Physical Layer Design and Link-level Simulations

Rate-Splitting Multiple Access (RSMA) is an emerging flexible, robust and powerful multiple access scheme for downlink multi-antenna wireless networks. RSMA relies on multi-antenna Rate-Splitting (RS) strategies at the transmitter and Successive Interference Cancellation (SIC) at the receivers, and has the unique ability to partially decode interference and partially treat interference as noise so as to softly bridge the two extremes of fully decoding interference (as in Non-Orthogonal Multiple Access, NOMA) and treating interference as noise (as in Space Division Multiple Access, SDMA or Multi-User Multiple-Input Multiple-Output, MU-MIMO). RSMA has been shown to provide significant room for spectral efficiency, energy efficiency, Quality-of-Service enhancements, robustness to Channel State Information (CSI) imperfections, as well as feedback overhead and complexity reduction, in a wide range of network loads (underloaded and overloaded regimes) and user deployments (with a diversity of channel directions, channel strengths and qualities). RSMA is also deeply rooted and motivated by recent advances in understanding the fundamental limits of multi-antenna networks with imperfect CSI at the Transmitter (CSIT). In this work, we leverage recent results on the optimization of RSMA and design for the first time its physical layer, accounting for modulation, coding (using polar codes), message split, adaptive modulation and coding, and SIC receiver. Link-level evaluations confirm the significant throughput benefits of RSMA over various baselines as SDMA and NOMA.

preprint2020arXiv

Rate-Splitting Multiple Access: A New Frontier for the PHY Layer of 6G

In order to efficiently cope with the high throughput, reliability, heterogeneity of Quality-of-Service (QoS), and massive connectivity requirements of future 6G multi-antenna wireless networks, multiple access and multiuser communication system design need to depart from conventional interference management strategies, namely fully treat interference as noise (as commonly used in 4G/5G, MU-MIMO, CoMP, Massive MIMO, millimetre wave MIMO) and fully decode interference (as in Non-Orthogonal Multiple Access, NOMA). This paper is dedicated to the theory and applications of a more general and powerful transmission framework based on Rate-Splitting Multiple Access (RSMA) that splits messages into common and private parts and enables to partially decode interference and treat remaining part of the interference as noise. This enables RSMA to softly bridge and therefore reconcile the two extreme strategies of fully decode interference and treat interference as noise and provide room for spectral efficiency, energy efficiency and QoS enhancements, robustness to imperfect Channel State Information at the Transmitter (CSIT), and complexity reduction. We give an overview of RSMA and its potential to address the requirements of 6G. This paper provides an overview of RSMA and its potential to address the requirements of 6G.

preprint2020arXiv

Scalability in Perception for Autonomous Driving: Waymo Open Dataset

The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing self-driving datasets are limited in the scale and variation of the environments they capture, even though generalization within and between operating regions is crucial to the overall viability of the technology. In an effort to help align the research community's contributions with real-world self-driving problems, we introduce a new large scale, high quality, diverse dataset. Our new dataset consists of 1150 scenes that each span 20 seconds, consisting of well synchronized and calibrated high quality LiDAR and camera data captured across a range of urban and suburban geographies. It is 15x more diverse than the largest camera+LiDAR dataset available based on our proposed diversity metric. We exhaustively annotated this data with 2D (camera image) and 3D (LiDAR) bounding boxes, with consistent identifiers across frames. Finally, we provide strong baselines for 2D as well as 3D detection and tracking tasks. We further study the effects of dataset size and generalization across geographies on 3D detection methods. Find data, code and more up-to-date information at http://www.waymo.com/open.

preprint2020arXiv

Streaming Object Detection for 3-D Point Clouds

Autonomous vehicles operate in a dynamic environment, where the speed with which a vehicle can perceive and react impacts the safety and efficacy of the system. LiDAR provides a prominent sensory modality that informs many existing perceptual systems including object detection, segmentation, motion estimation, and action recognition. The latency for perceptual systems based on point cloud data can be dominated by the amount of time for a complete rotational scan (e.g. 100 ms). This built-in data capture latency is artificial, and based on treating the point cloud as a camera image in order to leverage camera-inspired architectures. However, unlike camera sensors, most LiDAR point cloud data is natively a streaming data source in which laser reflections are sequentially recorded based on the precession of the laser beam. In this work, we explore how to build an object detector that removes this artificial latency constraint, and instead operates on native streaming data in order to significantly reduce latency. This approach has the added benefit of reducing the peak computational burden on inference hardware by spreading the computation over the acquisition time for a scan. We demonstrate a family of streaming detection systems based on sequential modeling through a series of modifications to the traditional detection meta-architecture. We highlight how this model may achieve competitive if not superior predictive performance with state-of-the-art, traditional non-streaming detection systems while achieving significant latency gains (e.g. 1/15'th - 1/3'rd of peak latency). Our results show that operating on LiDAR data in its native streaming formulation offers several advantages for self driving object detection -- advantages that we hope will be useful for any LiDAR perception system where minimizing latency is critical for safe and efficient operation.

preprint2010arXiv

Blow up of Solutions to Semilinear Wave Equations with variable coefficients and boundary

This paper is devoted to studying the following two initial-boundary value problems for semilinear wave equations with variable coefficients on exterior domain with subcritical exponent in $n$ space dimensions: u_{tt}-partial_{i}(a_{ij}(x)\partial_{j}u)=|u|^{p}, (x,t)\in Ω^{c}\times(0,+\infty), n\geq 3 and u_{tt}-\partial_{i}(a_{ij}(x)\partial_{j}u)=|u_{t}|^{p}, (x,t)\in Ω^{c}\times (0,+\infty), n\geq 1, where $a_{ij}(x)=δ_{ij}, when |x|\geq R. The exponents $p$ satisfies $ 1<p<p_{1}(n)$ in (0.1), and $p \leq p_{2}(n)$ in (0.2), where $p_{1}(n)$ is the larger root of the quadratic equation (n-1)p^{2}-(n+1)p-2=0, and p_{2}(n)=\frac{2}{n-1}+1, respectively. It is well-known that the numbers p_{1}(n) and p_{2}(n) are the critical exponents. We will establish two blowup results for the above two initial-boundary value problems, it is proved that there can be no global solutions no matter how small the initial data are, and also we give the lifespan estimate of solutions for above problems.

preprint2010arXiv

Epitaxial EuO Thin Films on GaAs

We demonstrate the epitaxial growth of EuO on GaAs by reactive molecular beam epitaxy. Thin films are grown in an adsorption-controlled regime with the aid of an MgO diffusion barrier. Despite the large lattice mismatch, it is shown that EuO grows well on MgO(001) with excellent magnetic properties. Epitaxy on GaAs is cube-on-cube and longitudinal magneto-optic Kerr effect measurements demonstrate a large Kerr rotation of 0.57°, a significant remanent magnetization, and a Curie temperature of 69 K.

preprint2010arXiv

Manipulation of Spin Transport in Graphene by Surface Chemical Doping

The effects of surface chemical doping on spin transport in graphene are investigated by performing non-local measurements in ultrahigh vacuum while depositing gold adsorbates. We demonstrate manipulation of the gate-dependent non-local spin signal as a function of gold coverage. We discover that charged impurity scattering is not the dominant mechanism for spin relaxation in graphene, despite its importance for momentum scattering. Finally, unexpected enhancements of the spin lifetime illustrate the complex nature of spin relaxation in graphene.

preprint2010arXiv

The Effect of Cluster Formation on Graphene Mobility

We investigate the effect of gold (Au) atoms in the form of both point-like charged impurities and clusters on the transport properties of graphene. Cryogenic deposition (18 K) of Au decreases the mobility and shifts the Dirac point in a manner that is consistent with scattering from point-like charged impurities. Increasing the temperature to room temperature promotes the formation of clusters, which is verified with atomic force microscopy. We find that for a fixed amount of Au impurities, the formation of clusters enhances the mobility and causes the Dirac point to shift back towards zero.

preprint2009arXiv

Electrical Detection of Spin Precession in Single Layer Graphene Spin Valves with Transparent Contacts

Spin accumulation and spin precession in single-layer graphene are studied by non-local spin valve measurements at room temperature. The dependence of the non-local magnetoresistance on electrode spacing is investigated and the results indicate a spin diffusion length of ~1.6 microns and a spin injection/detection efficiency of 0.013. Electrical detection of the spin precession confirms that the non-local signal originates from spin injection and transport. Fitting of the Hanle spin precession data yields a spin relaxation time of ~84 ps and a spin diffusion length of ~1.5 microns, which is consistent with the value obtained through the spacing dependence.

preprint2009arXiv

Electron-Hole Asymmetry of Spin Injection and Transport in Single-Layer Graphene

Spin-dependent properties of single-layer graphene (SLG) have been studied by non-local spin valve measurements at room temperature. Gate voltage dependence shows that the non-local magnetoresistance (MR) is proportional to the conductivity of the SLG, which is the predicted behavior for transparent ferromagnetic/nonmagnetic contacts. While the electron and hole bands in SLG are symmetric, gate voltage and bias dependence of the non-local MR reveal an electron-hole asymmetry in which the non-local MR is roughly independent of bias for electrons, but varies significantly with bias for holes.

preprint2009arXiv

Electronic Doping and Scattering by Transition Metals on Graphene

We investigate the effects of transition metals (TM) on the electronic doping and scattering in graphene using molecular beam epitaxy combined with in situ transport measurements. The room temperature deposition of TM onto graphene produces clusters that dope n-type for all TM investigated (Ti, Fe, Pt). We also find that the scattering by TM clusters exhibits different behavior compared to 1/r Coulomb scattering. At high coverage, Pt films are able to produce doping that is either n-type or weakly p-type, which provides experimental evidence for a strong interfacial dipole favoring n-type doping as predicted theoretically.