Source author record

Tianyu Fu

Tianyu Fu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Hardware Architecture physics.app-ph physics.flu-dyn physics.ins-det

Catalog footprint

What is connected

2works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs

Transformer-based Large Language Models (LLMs) have made a significant impact on various domains. However, LLMs' efficiency suffers from both heavy computation and memory overheads. Compression techniques like sparsification and quantization are commonly used to mitigate the gap between LLM's computation/memory overheads and hardware capacity. However, existing GPU and transformer-based accelerators cannot efficiently process compressed LLMs, due to the following unresolved challenges: low computational efficiency, underutilized memory bandwidth, and large compilation overheads. This paper proposes FlightLLM, enabling efficient LLMs inference with a complete mapping flow on FPGAs. In FlightLLM, we highlight an innovative solution that the computation and memory overhead of LLMs can be solved by utilizing FPGA-specific resources (e.g., DSP48 and heterogeneous memory hierarchy). We propose a configurable sparse DSP chain to support different sparsity patterns with high computation efficiency. Second, we propose an always-on-chip decode scheme to boost memory bandwidth with mixed-precision support. Finally, to make FlightLLM available for real-world LLMs, we propose a length adaptive compilation method to reduce the compilation overhead. Implemented on the Xilinx Alveo U280 FPGA, FlightLLM achieves 6.0$\times$ higher energy efficiency and 1.8$\times$ better cost efficiency against commercial GPUs (e.g., NVIDIA V100S) on modern LLMs (e.g., LLaMA2-7B) using vLLM and SmoothQuant under the batch size of one. FlightLLM beats NVIDIA A100 GPU with 1.2$\times$ higher throughput using the latest Versal VHK158 FPGA.

preprint2020arXiv

Measurement of Liquid Flow Rate among the Annular Flow in Vertical Tee Junction

Since the liquid flow rate of the annular flow is closely related to the heat exchange efficiency, it has great significance to measure the liquid flow rate of the annular flow in vertical tee junction. In order to acquire the liquid flow rate of the annular flow in vertical tee junction, a measurement method has been designed, which implements the digital subtraction method to measure the thickness of the liquid film under the visible light and to apply the image feature matching algorithm to obtain the liquid velocity field. Moreover, the accuracy of the liquid film velocity field as well as the spatial and temporal stability of the mass flow rate is tested by proposed algorithms in this study. Experimental results show that the measurement error of our method is approximately 5% in the lower section of the main pipe and the branch pipe, and lower than 15% in the upper section of the main pipe. Therefore, this method has a high accuracy in comparison with other measurement approaches. Our method can be applied to measure and analyse the shape and property of the annular flow in the vertical tee junction.