Source author record

Minghao Li

Minghao Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Computer Vision cond-mat.mtrl-sci Cryptography and Security Machine Learning physics.flu-dyn cond-mat.mes-hall Human-Computer Interaction Software Engineering

Catalog footprint

What is connected

12works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Hint Tuning: Less Data Makes Better Reasoners

Large reasoning models achieve high accuracy through extended chain-of-thought but generate 5--8 more tokens than necessary, applying verbose reasoning uniformly regardless of problem difficulty. We propose Hint Tuning, a data-efficient approach that teaches models to calibrate reasoning depth. Our key insight: the corresponding instruct model serves as an ideal difficulty probe. By testing what the instruct model can solve with varying guidance, we automatically construct training data across three states: No-Hint (direct answer), Sparse-Hint (minimal prefix), and Full-Hint (complete reasoning). This converts the abstract challenge of difficulty labeling into a measurable consistency check between the instruct and reasoning models. With only 1K self-annotated samples, Hint Tuning achieves 24--66% token reduction (31.5% average) across mainstream reasoning models (Qwen3-Thinking, DeepSeek-R1-Distill) at multiple scales (4B--32B) while maintaining competitive accuracy on five benchmarks. Unlike methods requiring massive distillation datasets or expensive RL, we achieve superior efficiency through simple alignment with the instruct model's capabilities.

preprint2026arXiv

Metric-Normalized Posterior Leakage (mPL): Attacker-Aligned Privacy for Joint Consumption

Metric differential privacy (mDP) strengthens local differential privacy (LDP) by scaling noise to semantic distance, but many machine learning (ML) systems are consumed under joint observation, where model-agnostic, per-record guarantees can miss leakage from evidence aggregation. We introduce metric-normalized posterior leakage (mPL), an attacker-aligned, distance-calibrated measure of posterior-odds shift induced by releases, and show that for single or independent releases, uniformly bounding mPL is equivalent to mDP. Under joint observation, however, satisfying mDP may still leave mPL high because learned aggregators compound evidence across correlated items. To make control practical, we formalize probabilistically bounded mPL (PBmPL), which limits how often mPL may exceed a target budget, and we operationalize it via Adaptive mPL (AmPL), a trust-and-verify framework that perturbs, audits with a learned attacker, and adapts parameters (with optional Bayesian remapping) to balance privacy and utility. In a word-embedding case study, neural adversaries violate mPL under joint consumption despite per-record mDP perturbations, whereas AmPL substantially lowers the frequency of such violations with low utility loss, indicating PBmPL as a practical, certifiable protection for joint-consumption settings.

preprint2022arXiv

Automation Slicing and Testing for in-App Deep Learning Models

Intelligent Apps (iApps), equipped with in-App deep learning (DL) models, are emerging to offer stable DL inference services. However, App marketplaces have trouble auto testing iApps because the in-App model is black-box and couples with ordinary codes. In this work, we propose an automated tool, ASTM, which can enable large-scale testing of in-App models. ASTM takes as input an iApps, and the outputs can replace the in-App model as the test object. ASTM proposes two reconstruction techniques to translate the in-App model to a backpropagation-enabled version and reconstruct the IO processing code for DL inference. With the ASTM's help, we perform a large-scale study on the robustness of 100 unique commercial in-App models and find that 56\% of in-App models are vulnerable to robustness issues in our context. ASTM also detects physical attacks against three representative iApps that may cause economic losses and security issues.

preprint2022arXiv

TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

Text recognition is a long-standing research problem for document digitalization. Existing approaches are usually built based on CNN for image understanding and RNN for char-level text generation. In addition, another language model is usually needed to improve the overall accuracy as a post-processing step. In this paper, we propose an end-to-end text recognition approach with pre-trained image Transformer and text Transformer models, namely TrOCR, which leverages the Transformer architecture for both image understanding and wordpiece-level text generation. The TrOCR model is simple but effective, and can be pre-trained with large-scale synthetic data and fine-tuned with human-labeled datasets. Experiments show that the TrOCR model outperforms the current state-of-the-art models on the printed, handwritten and scene text recognition tasks. The TrOCR models and code are publicly available at \url{https://aka.ms/trocr}.

preprint2022arXiv

Understanding the Challenges of Team-Based Live Streaming for First-person Shooter Games

First-person shooter (FPS) game tournaments take place across the globe. A growing number of people choose to watch FPS games online instead of attending the game events in person. However, live streaming might miss critical highlight moments in the game, including kills and tactics. We identify how and why the live streaming team fails to capture highlight moments to reduce such live streaming mistakes. We named such mistakes jarring observations. We conducted a field study of live streaming competitions of Game For Peace, a popular FPS mobile game, to summarize five typical jarring observations and identify three primary reasons that caused the issues. We further studied how to improve the live streaming system to prevent jarring observations from happening by doing semi-structured interviews with two professional streaming teams for Game For Peace. The study showed that a better system should (1) add a new sub-team role to share the director's responsibility of managing observers; (2) provide interfaces customized for three roles of live streamers in the team; (3) abstract more geographical info; (4) predict the priority of observation targets; and (5) provide non-verbal interfaces for sync-up between sub-teams. Our work provides insights for esports streaming system researchers and developers to improve the system for a smoother audience experience.

preprint2020arXiv

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pre-training models for NLP applications, they almost exclusively focus on text-level manipulation, while neglecting layout and style information that is vital for document image understanding. In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents. Furthermore, we also leverage image features to incorporate words' visual information into LayoutLM. To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for document-level pre-training. It achieves new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification (from 93.07 to 94.42). The code and pre-trained LayoutLM models are publicly available at \url{https://aka.ms/layoutlm}.

preprint2020arXiv

SegAttnGAN: Text to Image Generation with Segmentation Attention

In this paper, we propose a novel generative network (SegAttnGAN) that utilizes additional segmentation information for the text-to-image synthesis task. As the segmentation data introduced to the model provides useful guidance on the generator training, the proposed model can generate images with better realism quality and higher quantitative measures compared with the previous state-of-art methods. We achieved Inception Score of 4.84 on the CUB dataset and 3.52 on the Oxford-102 dataset. Besides, we tested the self-attention SegAttnGAN which uses generated segmentation data instead of masks from datasets for attention and achieved similar high-quality results, suggesting that our model can be adapted for the text-to-image synthesis task.

preprint2020arXiv

TableBank: A Benchmark Dataset for Table Detection and Recognition

We present TableBank, a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet. Existing research for image-based table detection and recognition usually fine-tunes pre-trained models on out-of-domain data with a few thousand human-labeled examples, which is difficult to generalize on real-world applications. With TableBank that contains 417K high quality labeled tables, we build several strong baselines using state-of-the-art models with deep neural networks. We make TableBank publicly available and hope it will empower more deep learning approaches in the table detection and recognition task. The dataset and models are available at \url{https://github.com/doc-analysis/TableBank}.

preprint2019arXiv

Ship resistance when operating in floating ice floes: a combined CFD&DEM approach

Whilst climate change is transforming the Arctic into a navigable ocean where small ice floes are floating on the sea surface, the effect of such ice conditions on ship performance has yet to be understood. The present work combines a set of numerical methods to simulate the ship-wave-ice interaction in such ice conditions. Particularly, Computational Fluid Dynamics is applied to provide fluid solutions for the floes and it is incorporated with the Discrete Element Method to govern ice motions and account for ship-ice/ice-ice collisions, by which, the proposed approach innovatively includes wave effects in the interaction. In addition, this work introduces two algorithms that can implement computational models with natural ice-floe fields, which takes randomness into consideration thus achieving high-fidelity modelling of the problem. Following validation against experiments, the model is shown accurate in predicting the ice-floe resistance of a ship, and then a series of simulations are performed to investigate how the resistance is influenced by ship speed, ice concentration, ice thickness and floe diameter. This paper presents a useful approach that can provide power estimates for Arctic shipping and has the potential to facilitate other polar engineering purposes.

preprint2018arXiv

Fluid-structure interaction of a large ice sheet in waves

With global warming, the ice-covered areas in the Arctic are being transformed into open water. This provides increased impetus for extensive maritime activities and attracts research interests in sea ice modelling. In the polar region, ice sheets can be several kilometres long and subjected to the effects of ocean waves. As its thickness to length ratio is very small, the wave response of such a large ice sheet, known as its hydroelastic response, is dominated by an elastic deformation rather than rigid body motions. In the past 25 years, sea ice hydroelasticity has been widely studied by theoretical models; however, recent experiments indicate that the ideal assumptions used for these theoretical models can cause considerable inaccuracies. This work proposes a numerical approach based on OpenFOAM to simulate the hydroelastic wave-ice interaction, with the Navier-Stokes equations describing the fluid domain, the St. Venant Kirchhoff solid model governing the ice deformation and a coupling scheme to achieve the fluid-structure interaction. Following validation against experiments, the proposed model has been shown capable of capturing phenomena that have not been included in current theoretical models. In particular, the developed model shows the capability to predict overwash, which is a ubiquitous polar phenomenon reported to be a key gap. The present model has the potential to be used to study wave-ice behaviours and the coupled wave-ice effect on marine structures.

preprint2011arXiv

Lattice relaxation of dimer islands on Ge(001) during homoepitaxy by pulsed laser deposition

In low-temperature pulsed growth two-dimensional islands form and coarsen into ~10 nm features. The islands produce well-defined displaced x-ray diffraction peaks due to relaxation of anisotropic surface stress of the (2x1) reconstruction with expansion and contraction present in orthogonal directions. The relaxation carries over into multilevel islands, suggesting that domains in subsequent layers form metastable stress domains. We infer that the island distribution differs from continuous deposition, enhancing the population of monodisperded islands exhibiting anisotropic relaxation.

preprint2010arXiv

Pressure-dependent transition from atoms to nanoparticles in magnetron sputtering: Effect on WSi2 film roughness and stress

We report on the transition between two regimes from several-atom clusters to much larger nanoparticles in Ar magnetron sputter deposition of WSi2, and the effect of nanoparticles on the properties of amorphous thin films and multilayers. Sputter deposition of thin films is monitored by in situ x-ray scattering, including x-ray reflectivity and grazing incidence small angle x-ray scattering. The results show an abrupt transition at an Ar background pressure Pc; the transition is associated with the threshold for energetic particle thermalization, which is known to scale as the product of the Ar pressure and the working distance between the magnetron source and the substrate surface. Below Pc smooth films are produced, while above Pc roughness increases abruptly, consistent with a model in which particles aggregate in the deposition flux before reaching the growth surface. The results from WSi2 films are correlated with in situ measurement of stress in WSi2/Si multilayers, which exhibits a corresponding transition from compressive to tensile stress at Pc. The tensile stress is attributed to coalescence of nanoparticles and the elimination of nano-voids.

Minghao Li

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Hint Tuning: Less Data Makes Better Reasoners

Metric-Normalized Posterior Leakage (mPL): Attacker-Aligned Privacy for Joint Consumption

Automation Slicing and Testing for in-App Deep Learning Models

TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

Understanding the Challenges of Team-Based Live Streaming for First-person Shooter Games

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

SegAttnGAN: Text to Image Generation with Segmentation Attention

TableBank: A Benchmark Dataset for Table Detection and Recognition

Ship resistance when operating in floating ice floes: a combined CFD&DEM approach

Fluid-structure interaction of a large ice sheet in waves

Lattice relaxation of dimer islands on Ge(001) during homoepitaxy by pulsed laser deposition

Pressure-dependent transition from atoms to nanoparticles in magnetron sputtering: Effect on WSi2 film roughness and stress