Source author record

Hang Qi

Hang Qi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Computer Vision Computer Science and Game Theory Computation and Language Cryptography and Security Distributed, Parallel, and Cluster Computing gr-qc hep-ph Information Retrieval nucl-ex nucl-th Social and Information Networks

Catalog footprint

What is connected

10works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Long Range Outlook for Short-Range Correlations

Short range correlated (SRC) N N pairs are pairs of nucleons with high relative momentum (prel > kF where kF ~ 250 MeV/c is the Fermi momentum in medium to heavy nuclei) and lower center of mass momentum. The motivation for studying SRC pairs ranges from a desire to achieve a more comprehensive understanding of the many-body nuclear wave-function at high-resolution to searching for explicit QCD-dynamics effects within the nuclear medium, not to mention connections to many other open problems in nuclear physics. Exploring short-range correlations was one of the physics motivations for building CEBAF (now Jefferson Lab). Scientists used the high luminosity and high energy of this cutting-edge machine to find kinematics that cleanly showed the signals of short-range correlations. This paved the way in the last two decades for tremendous progress understanding these correlations. This paper reviews recent progress and highlights outstanding questions and areas that need further study.

preprint2022arXiv

Efficient Image Representation Learning with Federated Sampled Softmax

Learning image representations on decentralized data can bring many benefits in cases where data cannot be aggregated across data silos. Softmax cross entropy loss is highly effective and commonly used for learning image representations. Using a large number of classes has proven to be particularly beneficial for the descriptive power of such representations in centralized learning. However, doing so on decentralized data with Federated Learning is not straightforward as the demand on FL clients' computation and communication increases proportionally to the number of classes. In this work we introduce federated sampled softmax (FedSS), a resource-efficient approach for learning image representation with Federated Learning. Specifically, the FL clients sample a set of classes and optimize only the corresponding model parameters with respect to a sampled softmax objective that approximates the global full softmax objective. We examine the loss formulation and empirically show that our method significantly reduces the number of parameters transferred to and optimized by the client devices, while performing on par with the standard full softmax method. This work creates a possibility for efficiently learning image representations on decentralized data with a large number of classes under the federated setting.

preprint2022arXiv

FedLite: A Scalable Approach for Federated Learning on Resource-constrained Clients

In classical federated learning, the clients contribute to the overall training by communicating local updates for the underlying model on their private data to a coordinating server. However, updating and communicating the entire model becomes prohibitively expensive when resource-constrained clients collectively aim to train a large machine learning model. Split learning provides a natural solution in such a setting, where only a small part of the model is stored and trained on clients while the remaining large part of the model only stays at the servers. However, the model partitioning employed in split learning introduces a significant amount of communication cost. This paper addresses this issue by compressing the additional communication using a novel clustering scheme accompanied by a gradient correction method. Extensive empirical evaluations on image and text benchmarks show that the proposed method can achieve up to $490\times$ communication cost reduction with minimal drop in accuracy, and enables a desirable performance vs. communication trade-off.

preprint2021arXiv

Advances and Open Problems in Federated Learning

Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.

preprint2020arXiv

Black hole production at lepton colliders

Production of black holes has been discussed in a variety of extensions of the Standard Model, and related bounds have been established from data taken at the Large Hadron Collider. We show that, if the Higgs particle has a fully gravitational content via the equivalence principle, enhanced cross-sections of black holes at colliders should be expected within the Standard Model itself. The case of black hole production by precision measurements at electron colliders is discussed. The Coulomb repulsion strongly suppresses the related cross-section with respect to the one based on the hoop conjecture, making the possible production of black holes still unfeasible with current beam technology. At the same time, this suggests the reanalysis of the bounds, based on the hoop conjecture, already determined in hadronic collisions for extra-dimensional models.

preprint2020arXiv

Federated Visual Classification with Real-World Data Distribution

Federated Learning enables visual models to be trained on-device, bringing advantages for user privacy (data need never leave the device), but challenges in terms of data diversity and quality. Whilst typical models in the datacenter are trained using data that are independent and identically distributed (IID), data at source are typically far from IID. Furthermore, differing quantities of data are typically available at each device (imbalance). In this work, we characterize the effect these real-world data distributions have on distributed learning, using as a benchmark the standard Federated Averaging (FedAvg) algorithm. To do so, we introduce two new large-scale datasets for species and landmark classification, with realistic per-user data splits that simulate real-world edge learning scenarios. We also develop two new algorithms (FedVC, FedIR) that intelligently resample and reweight over the client pool, bringing large improvements in accuracy and stability in training. The datasets are made available online.

preprint2016arXiv

Lift-Based Bidding in Ad Selection

Real-time bidding (RTB) has become one of the largest online advertising markets in the world. Today the bid price per ad impression is typically decided by the expected value of how it can lead to a desired action event (e.g., registering an account or placing a purchase order) to the advertiser. However, this industry standard approach to decide the bid price does not consider the actual effect of the ad shown to the user, which should be measured based on the performance lift among users who have been or have not been exposed to a certain treatment of ads. In this paper, we propose a new bidding strategy and prove that if the bid price is decided based on the performance lift rather than absolute performance value, advertisers can actually gain more action events. We describe the modeling methodology to predict the performance lift and demonstrate the actual performance gain through blind A/B test with real ad campaigns in an industry-leading Demand-Side Platform (DSP). We also discuss the relationship between attribution models and bidding strategies. We prove that, to move the DSPs to bid based on performance lift, they should be rewarded according to the relative performance lift they contribute.

preprint2015arXiv

A Restricted Visual Turing Test for Deep Scene and Event Understanding

This paper presents a restricted visual Turing test (VTT) for story-line based deep understanding in long-term and multi-camera captured videos. Given a set of videos of a scene (such as a multi-room office, a garden, and a parking lot.) and a sequence of story-line based queries, the task is to provide answers either simply in binary form "true/false" (to a polar query) or in an accurate natural language description (to a non-polar query). Queries, polar or non-polar, consist of view-based queries which can be answered from a particular camera view and scene-centered queries which involves joint inference across different cameras. The story lines are collected to cover spatial, temporal and causal understanding of input videos. The data and queries distinguish our VTT from recently proposed visual question answering in images and video captioning. A vision system is proposed to perform joint video and query parsing which integrates different vision modules, a knowledge base and a query engine. The system provides unified interfaces for different modules so that individual modules can be reconfigured to test a new method. We provide a benchmark dataset and a toolkit for ontology guided story-line query generation which consists of about 93.5 hours videos captured in four different locations and 3,426 queries split into 127 story lines. We also provide a baseline implementation and result analyses.

preprint2015arXiv

Joint Image-Text News Topic Detection and Tracking with And-Or Graph Representation

In this paper, we aim to develop a method for automatically detecting and tracking topics in broadcast news. We present a hierarchical And-Or graph (AOG) to jointly represent the latent structure of both texts and visuals. The AOG embeds a context sensitive grammar that can describe the hierarchical composition of news topics by semantic elements about people involved, related places and what happened, and model contextual relationships between elements in the hierarchy. We detect news topics through a cluster sampling process which groups stories about closely related events. Swendsen-Wang Cuts (SWC), an effective cluster sampling algorithm, is adopted for traversing the solution space and obtaining optimal clustering solutions by maximizing a Bayesian posterior probability. Topics are tracked to deal with the continuously updated news streams. We generate topic trajectories to show how topics emerge, evolve and disappear over time. The experimental results show that our method can explicitly describe the textual and visual data in news videos and produce meaningful topic trajectories. Our method achieves superior performance compared to state-of-the-art methods on both a public dataset Reuters-21578 and a self-collected dataset named UCLA Broadcast News Dataset.

preprint2015arXiv

Smart Pacing for Effective Online Ad Campaign Optimization

In targeted online advertising, advertisers look for maximizing campaign performance under delivery constraint within budget schedule. Most of the advertisers typically prefer to impose the delivery constraint to spend budget smoothly over the time in order to reach a wider range of audiences and have a sustainable impact. Since lots of impressions are traded through public auctions for online advertising today, the liquidity makes price elasticity and bid landscape between demand and supply change quite dynamically. Therefore, it is challenging to perform smooth pacing control and maximize campaign performance simultaneously. In this paper, we propose a smart pacing approach in which the delivery pace of each campaign is learned from both offline and online data to achieve smooth delivery and optimal performance goals. The implementation of the proposed approach in a real DSP system is also presented. Experimental evaluations on both real online ad campaigns and offline simulations show that our approach can effectively improve campaign performance and achieve delivery goals.

Hang Qi

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Long Range Outlook for Short-Range Correlations

Efficient Image Representation Learning with Federated Sampled Softmax

FedLite: A Scalable Approach for Federated Learning on Resource-constrained Clients

Advances and Open Problems in Federated Learning

Black hole production at lepton colliders

Federated Visual Classification with Real-World Data Distribution

Lift-Based Bidding in Ad Selection

A Restricted Visual Turing Test for Deep Scene and Event Understanding

Joint Image-Text News Topic Detection and Tracking with And-Or Graph Representation

Smart Pacing for Effective Online Ad Campaign Optimization