Source author record

Frank Li

Frank Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security Artificial Intelligence Computer Vision Databases Networking and Internet Architecture Neural and Evolutionary Computing

Catalog footprint

What is connected

7works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Improving Database Performance by Application-side Transaction Merging

This paper explores a new opportunity to improve the performance of transaction processing at the application side by merging structurely similar statements or transactions. Concretely, we re-write transactions to 1) merge similar statements using specific SQL semantics; 2) eliminate redundant reads; and 3) merge contending statements across transactions by pre-computing their aggregated effect. Following this idea, we present the design of TransactionMerger, a middleware to collect and merge transactions across different clients. We further present a static analysis tool to identify the merging opportunity without violating isolation as well as our experience of re-writing transactions in TPC-C and Spree, a popular real-world application. Our evaluation shows that such transaction merging can improve TPC-C throughput by up to 2.65X and Spree throughput by 3.52X.

preprint2026arXiv

MultiMedVision: Multi-Modal Medical Vision Framework

Multi-modal medical imaging enables comprehensive diagnostics, yet current foundation models process 2D (e.g. X-ray) and 3D (e.g. CT) data with separate, dimensionality-specific architectures. We present MultiMedVision, a unified framework for joint 2D/3D representation learning built on a Sparse Vision Transformer. Our model uses 3D Rotary Positional Embeddings and variable-length sequence packing to process mixed-modality batches natively within a shared latent space, without modality-specific adapters or treating 3D volumes as 2D slice sequences. Trained with a self-supervised objective on chest X-rays (MIMIC-CXR) and CT scans (CT-RATE), and using a single shared encoder with 5x less data, MultiMedVision achieves competitive performance on both 2D benchmarks (Macro AUROC 0.82 on MIMIC, 0.84 on CheXpert) and 3D tasks (0.85 on CT-RATE). Analysis of the learned representations reveals coexisting modality-specific and shared feature subspaces, demonstrating that unified cross-dimensional representation learning is feasible without sacrificing modality-specific performance.

preprint2024arXiv

Web Neural Network with Complete DiGraphs

This paper introduces a new neural network model that aims to mimic the biological brain more closely by structuring the network as a complete directed graph that processes continuous data for each timestep. Current neural networks have structures that vaguely mimic the brain structure, such as neurons, convolutions, and recurrence. The model proposed in this paper adds additional structural properties by introducing cycles into the neuron connections and removing the sequential nature commonly seen in other network layers. Furthermore, the model has continuous input and output, inspired by spiking neural networks, which allows the network to learn a process of classification, rather than simply returning the final result.

preprint2020arXiv

Cleaning the NVD: Comprehensive Quality Assessment, Improvements, and Analyses

Vulnerability databases are vital sources of information on emergent software security concerns. Security professionals, from system administrators to developers to researchers, heavily depend on these databases to track vulnerabilities and analyze security trends. How reliable and accurate are these databases though? In this paper, we explore this question with the National Vulnerability Database (NVD), the U.S. government's repository of vulnerability information that arguably serves as the industry standard. Through a systematic investigation, we uncover inconsistent or incomplete data in the NVD that can impact its practical uses, affecting information such as the vulnerability publication dates, names of vendors and products affected, vulnerability severity scores, and vulnerability type categorizations. We explore the extent of these discrepancies and identify methods for automated corrections. Finally, we demonstrate the impact that these data issues can pose by comparing analyses using the original and our rectified versions of the NVD. Ultimately, our investigation of the NVD not only produces an improved source of vulnerability information, but also provides important insights and guidance for the security community on the curation and use of such data sources.

preprint2019arXiv

IoT Inspector: Crowdsourcing Labeled Network Traffic from Smart Home Devices at Scale

The proliferation of smart home devices has created new opportunities for empirical research in ubiquitous computing, ranging from security and privacy to personal health. Yet, data from smart home deployments are hard to come by, and existing empirical studies of smart home devices typically involve only a small number of devices in lab settings. To contribute to data-driven smart home research, we crowdsource the largest known dataset of labeled network traffic from smart home devices from within real-world home networks. To do so, we developed and released IoT Inspector, an open-source tool that allows users to observe the traffic from smart home devices on their own home networks. Since April 2019, 4,322 users have installed IoT Inspector, allowing us to collect labeled network traffic from 44,956 smart home devices across 13 categories and 53 vendors. We demonstrate how this data enables new research into smart homes through two case studies focused on security and privacy. First, we find that many device vendors use outdated TLS versions and advertise weak ciphers. Second, we discover about 350 distinct third-party advertiser and tracking domains on smart TVs. We also highlight other research areas, such as network management and healthcare, that can take advantage of IoT Inspector's dataset. To facilitate future reproducible research in smart homes, we will release the IoT Inspector data to the public.

preprint2015arXiv

Exploring Privacy Preservation in Outsourced K-Nearest Neighbors with Multiple Data Owners

The k-nearest neighbors (k-NN) algorithm is a popular and effective classification algorithm. Due to its large storage and computational requirements, it is suitable for cloud outsourcing. However, k-NN is often run on sensitive data such as medical records, user images, or personal information. It is important to protect the privacy of data in an outsourced k-NN system. Prior works have all assumed the data owners (who submit data to the outsourced k-NN system) are a single trusted party. However, we observe that in many practical scenarios, there may be multiple mutually distrusting data owners. In this work, we present the first framing and exploration of privacy preservation in an outsourced k-NN system with multiple data owners. We consider the various threat models introduced by this modification. We discover that under a particularly practical threat model that covers numerous scenarios, there exists a set of adaptive attacks that breach the data privacy of any exact k-NN system. The vulnerability is a result of the mathematical properties of k-NN and its output. Thus, we propose a privacy-preserving alternative system supporting kernel density estimation using a Gaussian kernel, a classification algorithm from the same family as k-NN. In many applications, this similar algorithm serves as a good substitute for k-NN. We additionally investigate solutions for other threat models, often through extensions on prior single data owner systems.

preprint2012arXiv

SybilControl: Practical Sybil Defense with Computational Puzzles

Many distributed systems are subject to the Sybil attack, where an adversary subverts system operation by emulating behavior of multiple distinct nodes. Most recent work to address this problem leverages social networks to establish trust relationships between users. However, the use of social networks is not appropriate in all systems, as they can be subverted by social engineering techniques, require nodes in a P2P network to maintain and be aware of social network information, and may require overly optimistic assumptions about the fast-mixing nature of social links. This paper explores an alternate approach. We present SybilControl, a novel, decentralized scheme for controlling the extent of Sybil attacks. SybilControl is an admission control mechanism for nodes in a distributed system that requires them to periodically solve computational puzzles. SybilControl consists of a distributed protocol to allow nodes to collectively verify the computational work of other nodes, and mechanisms to prevent the malicious influence of misbehaving nodes that do not perform the computational work. We investigate the practical issues involved with deploying SybilControl into existing DHTs, particularly with resilient lookup protocols. We evaluate SybilControl through simulations and find that SybilControl retains low overhead and latency. Additionally, even when the adversary controls 20% of the system's computational resources, SybilControl-enabled DHTs can be configured to maintain lookup performance at over 99% success rate using low communication overhead.