Researcher profile

Tianyu Fu

Tianyu Fu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2024arXiv

FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs

Transformer-based Large Language Models (LLMs) have made a significant impact on various domains. However, LLMs' efficiency suffers from both heavy computation and memory overheads. Compression techniques like sparsification and quantization are commonly used to mitigate the gap between LLM's computation/memory overheads and hardware capacity. However, existing GPU and transformer-based accelerators cannot efficiently process compressed LLMs, due to the following unresolved challenges: low computational efficiency, underutilized memory bandwidth, and large compilation overheads. This paper proposes FlightLLM, enabling efficient LLMs inference with a complete mapping flow on FPGAs. In FlightLLM, we highlight an innovative solution that the computation and memory overhead of LLMs can be solved by utilizing FPGA-specific resources (e.g., DSP48 and heterogeneous memory hierarchy). We propose a configurable sparse DSP chain to support different sparsity patterns with high computation efficiency. Second, we propose an always-on-chip decode scheme to boost memory bandwidth with mixed-precision support. Finally, to make FlightLLM available for real-world LLMs, we propose a length adaptive compilation method to reduce the compilation overhead. Implemented on the Xilinx Alveo U280 FPGA, FlightLLM achieves 6.0$\times$ higher energy efficiency and 1.8$\times$ better cost efficiency against commercial GPUs (e.g., NVIDIA V100S) on modern LLMs (e.g., LLaMA2-7B) using vLLM and SmoothQuant under the batch size of one. FlightLLM beats NVIDIA A100 GPU with 1.2$\times$ higher throughput using the latest Versal VHK158 FPGA.

preprint2020arXiv

Measurement of Liquid Flow Rate among the Annular Flow in Vertical Tee Junction

Since the liquid flow rate of the annular flow is closely related to the heat exchange efficiency, it has great significance to measure the liquid flow rate of the annular flow in vertical tee junction. In order to acquire the liquid flow rate of the annular flow in vertical tee junction, a measurement method has been designed, which implements the digital subtraction method to measure the thickness of the liquid film under the visible light and to apply the image feature matching algorithm to obtain the liquid velocity field. Moreover, the accuracy of the liquid film velocity field as well as the spatial and temporal stability of the mass flow rate is tested by proposed algorithms in this study. Experimental results show that the measurement error of our method is approximately 5% in the lower section of the main pipe and the branch pipe, and lower than 15% in the upper section of the main pipe. Therefore, this method has a high accuracy in comparison with other measurement approaches. Our method can be applied to measure and analyse the shape and property of the annular flow in the vertical tee junction.