Source author record

Lizy K. John

Lizy K. John appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Hardware Architecture Distributed, Parallel, and Cluster Computing Machine Learning

Catalog footprint

What is connected

4works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Pitfalls of Unlabeled Disagreement-Based Drift Detection in Streaming Tree Ensembles

Detecting concept drift in high-speed data streams remains challenging, particularly when models must operate on unlabeled data and avoid false alarms caused by benign shifts. While disagreement-based uncertainty has shown promise in neural networks, its adaptation to ensembles of incremental decision trees (IDTs) remains largely unexplored. We investigate this approach by constructing batch-specific disagreement measures via label flipping in ensemble members and evaluating their effectiveness for drift detection in tabular data streams. Our experiments show that, although this method performs well in ensembles of multi-layer perceptrons (MLPs), it consistently underperforms loss-based detectors when applied to IDTs. We attribute this behavior to the intrinsic rigidity of IDTs: learning primarily through structural expansion, with limited parameter adaptation, restricts model plasticity and prevents disagreement from reliably reflecting learning potential. Recent work on restructuring IDTs using their intrinsic decomposition into non-overlapping rules offers a promising direction for improving adaptability.

preprint2022arXiv

CoMeFa: Compute-in-Memory Blocks for FPGAs

Block RAMs (BRAMs) are the storage houses of FPGAs, providing extensive on-chip memory bandwidth to the compute units implemented using Logic Blocks (LBs) and Digital Signal Processing (DSP) slices. We propose modifying BRAMs to convert them to CoMeFa (Compute-In-Memory Blocks for FPGAs) RAMs. These RAMs provide highly-parallel compute-in-memory by combining computation and storage capabilities in one block. CoMeFa RAMs utilize the true dual port nature of FPGA BRAMs and contain multiple programmable single-bit bit-serial processing elements. CoMeFa RAMs can be used to compute in any precision, which is extremely important for evolving applications like Deep Learning. Adding CoMeFa RAMs to FPGAs significantly increases their compute density. We explore and propose two architectures of these RAMs: CoMeFa-D (optimized for delay) and CoMeFa-A (optimized for area). Compared to existing proposals, CoMeFa RAMs do not require changing the underlying SRAM technology like simultaneously activating multiple rows on the same port, and are practical to implement. CoMeFa RAMs are versatile blocks that find applications in numerous diverse parallel applications like Deep Learning, signal processing, databases, etc. By augmenting an Intel Arria-10-like FPGA with CoMeFa-D (CoMeFa-A) RAMs at the cost of 3.8% (1.2%) area, and with algorithmic improvements and efficient mapping, we observe a geomean speedup of 2.5x (1.8x), across several representative benchmarks. Replacing all or some BRAMs with CoMeFa RAMs in FPGAs can make them better accelerators of modern compute-intensive workloads.

preprint2021arXiv

Virtual-Link: A Scalable Multi-Producer, Multi-Consumer Message Queue Architecture for Cross-Core Communication

Cross-core communication is increasingly a bottleneck as the number of processing elements increase per system-on-chip. Typical hardware solutions to cross-core communication are often inflexible; while software solutions are flexible, they have performance scaling limitations. A key problem, as we will show, is that of shared state in software-based message queue mechanisms. This paper proposes Virtual-Link (VL), a novel light-weight communication mechanism with hardware support to facilitate M:N lock-free data movement. VL reduces the amount of coherent shared state, which is a bottleneck for many approaches, to zero. VL provides further latency benefit by keeping data on the fast path (i.e., within the on-chip interconnect). VL enables directed cache-injection (stashing) between PEs on the coherence bus, reducing the latency for core-to-core communication. VL is particularly effective for fine-grain tasks on streaming data. Evaluation on a full system simulator with 7 benchmarks shows that VL achieves a 2.09x speedup over state-of-the-art software-based communication mechanisms, while reducing memory traffic by 61%.

preprint2015arXiv

BigDataBench-MT: A Benchmark Tool for Generating Realistic Mixed Data Center Workloads

Long-running service workloads (e.g. web search engine) and short-term data analysis workloads (e.g. Hadoop MapReduce jobs) co-locate in today's data centers. Developing realistic benchmarks to reflect such practical scenario of mixed workload is a key problem to produce trustworthy results when evaluating and comparing data center systems. This requires using actual workloads as well as guaranteeing their submissions to follow patterns hidden in real-world traces. However, existing benchmarks either generate actual workloads based on probability models, or replay real-world workload traces using basic I/O operations. To fill this gap, we propose a benchmark tool that is a first step towards generating a mix of actual service and data analysis workloads on the basis of real workload traces. Our tool includes a combiner that enables the replaying of actual workloads according to the workload traces, and a multi-tenant generator that flexibly scales the workloads up and down according to users' requirements. Based on this, our demo illustrates the workload customization and generation process using a visual interface. The proposed tool, called BigDataBench-MT, is a multi-tenant version of our comprehensive benchmark suite BigDataBench and it is publicly available from http://prof.ict.ac.cn/BigDataBench/multi-tenancyversion/.