Source author record

Tsung-Wei Huang

Tsung-Wei Huang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing physics.data-an quant-ph

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Concurrent CPU-GPU Task Programming using Modern C++

In this paper, we introduce Heteroflow, a new C++ library to help developers quickly write parallel CPU-GPU programs using task dependency graphs. Heteroflow leverages the power of modern C++ and task-based approaches to enable efficient implementations of heterogeneous decomposition strategies. Our new CPU-GPU programming model allows users to express a problem in a way that adapts to effective separation of concerns and expertise encapsulation. Compared with existing libraries, Heteroflow is more cost-efficient in performance scaling, programming productivity, and solution generality. We have evaluated Heteroflow on two real applications in VLSI design automation and demonstrated the performance scalability across different CPU-GPU numbers and problem sizes. At a particular example of VLSI timing analysis with million-scale tasking, Heteroflow achieved 7.7x runtime speed-up (99 vs 13 minutes) over a baseline on a machine of 40 CPU cores and 4 GPUs.

preprint2022arXiv

Pipeflow: An Efficient Task-Parallel Pipeline Programming Framework using Modern C++

Pipeline is a fundamental parallel programming pattern. Mainstream pipeline programming frameworks count on data abstractions to perform pipeline scheduling. This design is convenient for data-centric pipeline applications but inefficient for algorithms that only exploit task parallelism in pipeline. As a result, we introduce a new task-parallel pipeline programming framework called Pipeflow. Pipeflow does not design yet another data abstraction but focuses on the pipeline scheduling itself, enabling more efficient implementation of task-parallel pipeline algorithms than existing frameworks. We have evaluated Pipeflow on both micro-benchmarks and real-world applications. As an example, Pipeflow outperforms oneTBB 24% and 10% faster in a VLSI placement and a timing analysis workloads that adopt pipeline parallelism to speed up runtimes, respectively.

preprint2020arXiv

Mermin's Inequalities of Multiple qubits with Orthogonal Measurements on IBM Q 53-qubit system

Entanglement properties of IBM Q 53 qubit quantum computer are carefully examined with the noisy intermediate-scale quantum (NISQ) technology. We study GHZ-like states with multiple qubits (N=2 to N=7) on IBM Rochester and compare their maximal violation values of Mermin polynomials with analytic results. A rule of N-qubits orthogonal measurements is taken to further justify the entanglement less than maximal values of local realism (LR). The orthogonality of measurements is another reliable criterion for entanglement except the maximal values of LR. Our results indicate that the entanglement of IBM 53-qubits is reasonably good when N <= 4 while for the longer entangle chains the entanglement is only valid for some special connectivity.