Source author record

John Chen

John Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computation and Language Data Structures and Algorithms physics.flu-dyn

Catalog footprint

What is connected

5works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

Transformer-Based Models for Question Answering on COVID19

In response to the Kaggle's COVID-19 Open Research Dataset (CORD-19) challenge, we have proposed three transformer-based question-answering systems using BERT, ALBERT, and T5 models. Since the CORD-19 dataset is unlabeled, we have evaluated the question-answering models' performance on two labeled questions answers datasets \textemdash CovidQA and CovidGQA. The BERT-based QA system achieved the highest F1 score (26.32), while the ALBERT-based QA system achieved the highest Exact Match (13.04). However, numerous challenges are associated with developing high-performance question-answering systems for the ongoing COVID-19 pandemic and future pandemics. At the end of this paper, we discuss these challenges and suggest potential solutions to address them.

preprint2020arXiv

Negative sampling in semi-supervised learning

We introduce Negative Sampling in Semi-Supervised Learning (NS3L), a simple, fast, easy to tune algorithm for semi-supervised learning (SSL). NS3L is motivated by the success of negative sampling/contrastive estimation. We demonstrate that adding the NS3L loss to state-of-the-art SSL algorithms, such as the Virtual Adversarial Training (VAT), significantly improves upon vanilla VAT and its variant, VAT with Entropy Minimization. By adding the NS3L loss to MixMatch, the current state-of-the-art approach on semi-supervised tasks, we observe significant improvements over vanilla MixMatch. We conduct extensive experiments on the CIFAR10, CIFAR100, SVHN and STL10 benchmark datasets.

preprint2020arXiv

Revisiting Consistent Hashing with Bounded Loads

Dynamic load balancing lies at the heart of distributed caching. Here, the goal is to assign objects (load) to servers (computing nodes) in a way that provides load balancing while at the same time dynamically adjusts to the addition or removal of servers. One essential requirement is that the addition or removal of small servers should not require us to recompute the complete assignment. A popular and widely adopted solution is the two-decade-old Consistent Hashing (CH). Recently, an elegant extension was provided to account for server bounds. In this paper, we identify that existing methodologies for CH and its variants suffer from cascaded overflow, leading to poor load balancing. This cascading effect leads to decreasing performance of the hashing procedure with increasing load. To overcome the cascading effect, we propose a simple solution to CH based on recent advances in fast minwise hashing. We show, both theoretically and empirically, that our proposed solution is significantly superior for load balancing and is optimal in many senses. On the AOL search dataset and Indiana University Clicks dataset with real user activity, our proposed solution reduces cache misses by several magnitudes.

preprint2020arXiv

STORM: Foundations of End-to-End Empirical Risk Minimization on the Edge

Empirical risk minimization is perhaps the most influential idea in statistical learning, with applications to nearly all scientific and technical domains in the form of regression and classification models. To analyze massive streaming datasets in distributed computing environments, practitioners increasingly prefer to deploy regression models on edge rather than in the cloud. By keeping data on edge devices, we minimize the energy, communication, and data security risk associated with the model. Although it is equally advantageous to train models at the edge, a common assumption is that the model was originally trained in the cloud, since training typically requires substantial computation and memory. To this end, we propose STORM, an online sketch for empirical risk minimization. STORM compresses a data stream into a tiny array of integer counters. This sketch is sufficient to estimate a variety of surrogate losses over the original dataset. We provide rigorous theoretical analysis and show that STORM can estimate a carefully chosen surrogate loss for the least-squares objective. In an exhaustive experimental comparison for linear regression models on real-world datasets, we find that STORM allows accurate regression models to be trained.

preprint2010arXiv

High-Frame-Rate Oil Film Interferometry

The fluid dynamics video to which this abstract relates contains visualization of the response of a laminar boundary layer to a sudden puff from a small hole. The boundary layer develops on a flat plate in a wind tunnel; the hole is located at a streamwise Reynolds number of 100,000. The visualization of the boundary layer response is accomplished using interferometry of a transparent, thin film of oil placed on the surface immediately downstream of the hole and with its leading edge perpendicular to the direction of flow. Through lubrication theory, it is understood that the rate of change of the spacing of the interference fringes is proportional to the skin friction at any instant. For reference, a small disk-shaped protrusion of the type often used to trip the boundary layer in wind model tunnel testing is also shown. Three cases with different puff strengths are included. Using a high-speed commercial camera, frame rates in excess of 1000/sec have been recorded; the video shown here was taken at 24 frames/sec to remain within prescribed file size limits.

John Chen

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Transformer-Based Models for Question Answering on COVID19

Negative sampling in semi-supervised learning

Revisiting Consistent Hashing with Bounded Loads

STORM: Foundations of End-to-End Empirical Risk Minimization on the Edge

High-Frame-Rate Oil Film Interferometry