Researcher profile

Jihao Liu

Jihao Liu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
16works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

16 published item(s)

preprint2026arXiv

The minimal volume of surfaces of log general type with non-empty non-klt locus

We show that the minimal volume of surfaces of log general type, with non-empty non-klt locus on the ample model, is $\frac{1}{825}$. Furthermore, the ample model $V$ achieving the minimal volume is determined uniquely up to isomorphism. The canonical embedding presents $V$ as a degree $86$ hypersurface of $\mathbb P(6,11,25,43)$. This motivates a one-parameter deformation of $V$ to klt stable surfaces within the weighted projective space. Consequently, we identify a $\textit{complete}$ rational curve in the corresponding moduli space $M_{\frac{1}{825}}$. As an important application, we deduce that the smallest accumulation point of the set of volumes for projective log canonical surfaces equals $\frac{1}{825}$.

preprint2023arXiv

On effective log Iitaka fibrations and existence of complements

We study the relationship between Iitaka fibrations and the conjecture on the existence of complements, assuming the good minimal model conjecture. In one direction, we show that the conjecture on the existence of complements implies the effective log Iitaka fibration conjecture. As a consequence, the effective log Iitaka fibration conjecture holds in dimension $3$. In the other direction, for any Calabi-Yau type variety $X$ such that $-K_X$ is nef, we show that $X$ has an $n$-complement for some universal constant $n$ depending only on the dimension of $X$ and two natural invariants of a general fiber of an Iitaka fibration of $-K_X$. We also formulate the decomposable Iitaka fibration conjecture, a variation of the effective log Iitaka fibration conjecture which is closely related to the structure of ample models of pairs with non-rational coefficients, and study its relationship with the forestated conjectures.

preprint2022arXiv

ACC for minimal log discrepancies of terminal threefolds

We prove that the ACC conjecture for minimal log discrepancies holds for threefolds in $[1-δ,+\infty)$, where $δ>0$ only depends on the coefficient set. We also study Reid's general elephant for pairs, and show Shokurov's conjecture on the existence of $(ε,n)$-complements for threefolds for any $ε\geq 1$. As a key important step, we prove the uniform boundedness of divisors computing minimal log discrepancies for terminal threefolds. We show the ACC for threefold canonical thresholds, and that the set of accumulation points of threefold canonical thresholds is equal to $\{0\}\cup\{\frac{1}{n}\}_{n\in\mathbb Z_{\ge 2}}$ as well.

preprint2022arXiv

INTERN: A New Learning Paradigm Towards General Vision

Enormous waves of technological innovations over the past several years, marked by the advances in AI technologies, are profoundly reshaping the industry and the society. However, down the road, a key challenge awaits us, that is, our capability of meeting rapidly-growing scenario-specific demands is severely limited by the cost of acquiring a commensurate amount of training data. This difficult situation is in essence due to limitations of the mainstream learning paradigm: we need to train a new model for each new scenario, based on a large quantity of well-annotated data and commonly from scratch. In tackling this fundamental problem, we move beyond and develop a new learning paradigm named INTERN. By learning with supervisory signals from multiple sources in multiple stages, the model being trained will develop strong generalizability. We evaluate our model on 26 well-known datasets that cover four categories of tasks in computer vision. In most cases, our models, adapted with only 10% of the training data in the target domain, outperform the counterparts trained with the full set of data, often by a significant margin. This is an important step towards a promising prospect where such a model with general vision capability can dramatically reduce our reliance on data, thus expediting the adoption of AI technologies. Furthermore, revolving around our new paradigm, we also introduce a new data system, a new architecture, and a new benchmark, which, together, form a general vision ecosystem to support its future development in an open and inclusive manner. See project website at https://opengvlab.shlab.org.cn .

preprint2022arXiv

Meta Knowledge Distillation

Recent studies pointed out that knowledge distillation (KD) suffers from two degradation problems, the teacher-student gap and the incompatibility with strong data augmentations, making it not applicable to training state-of-the-art models, which are trained with advanced augmentations. However, we observe that a key factor, i.e., the temperatures in the softmax functions for generating probabilities of both the teacher and student models, was mostly overlooked in previous methods. With properly tuned temperatures, such degradation problems of KD can be much mitigated. However, instead of relying on a naive grid search, which shows poor transferability, we propose Meta Knowledge Distillation (MKD) to meta-learn the distillation with learnable meta temperature parameters. The meta parameters are adaptively adjusted during training according to the gradients of the learning objective. We validate that MKD is robust to different dataset scales, different teacher/student architectures, and different types of data augmentation. With MKD, we achieve the best performance with popular ViT architectures among compared methods that use only ImageNet-1K as training data, ranging from tiny to large models. With ViT-L, we achieve 86.5% with 600 epochs of training, 0.6% better than MAE that trains for 1,650 epochs.

preprint2022arXiv

On generalized lc pairs with $\mathrm{\textbf b}$-log abundant nef part

We study the behavior of generalized lc pairs with $\mathrm{\textbf b}$-log abundant nef part, a meticulously designed structure on algebraic varieties. We show that this structure is preserved under the canonical bundle formula and sub-adjunction formulas, and is also compatible with the non-vanishing conjecture and the abundance conjecture in the classical minimal model program.

preprint2022arXiv

On the fixed part of pluricanonical systems for surfaces

We show that $|mK_X|$ defines a birational map and has no fixed part for some bounded positive integer $m$ for any $\frac{1}{2}$-lc surface $X$ such that $K_X$ is big and nef. For every positive integer $n\geq 3$, we construct a sequence of projective surfaces $X_{n,i}$, such that $K_{X_{n,i}}$ is ample, ${\rm{mld}}(X_{n,i})>\frac{1}{n}$ for every $i$, $\lim_{i\rightarrow+\infty}{\rm{mld}}(X_{n,i})=\frac{1}{n}$, and for any positive integer $m$, there exists $i$ such that $|mK_{X_{n,i}}|$ has non-zero fixed part. These results answer the surface case of a question of Xu.

preprint2022arXiv

Uniform rational polytopes for Iitaka dimensions

In this paper, we continue to develop the theories on functional pairs and uniform rational polytopes. We show that there is a uniform perturbation for Iitaka dimensions of pseudo-effective lc pairs of fixed dimension with DCC coefficients assuming the non-vanishing conjecture. We also show the existence of uniform rational polytopes for Iitaka dimensions of pseudo-effective lc pairs assuming the non-vanishing conjecture.

preprint2022arXiv

UniNet: Unified Architecture Search with Convolution, Transformer, and MLP

Recently, transformer and multi-layer perceptron (MLP) architectures have achieved impressive results on various vision tasks. However, how to effectively combine those operators to form high-performance hybrid visual architectures still remains a challenge. In this work, we study the learnable combination of convolution, transformer, and MLP by proposing a novel unified architecture search approach. Our approach contains two key designs to achieve the search for high-performance networks. First, we model the very different searchable operators in a unified form, and thus enable the operators to be characterized with the same set of configuration parameters. In this way, the overall search space size is significantly reduced, and the total search cost becomes affordable. Second, we propose context-aware downsampling modules (DSMs) to mitigate the gap between the different types of operators. Our proposed DSMs are able to better adapt features from different types of operators, which is important for identifying high-performance hybrid architectures. Finally, we integrate configurable operators and DSMs into a unified search space and search with a Reinforcement Learning-based search algorithm to fully explore the optimal combination of the operators. To this end, we search a baseline network and scale it up to obtain a family of models, named UniNets, which achieve much better accuracy and efficiency than previous ConvNets and Transformers. In particular, our UniNet-B5 achieves 84.9% top-1 accuracy on ImageNet, outperforming EfficientNet-B7 and BoTNet-T7 with 44% and 55% fewer FLOPs respectively. By pretraining on the ImageNet-21K, our UniNet-B6 achieves 87.4%, outperforming Swin-L with 51% fewer FLOPs and 41% fewer parameters. Code is available at https://github.com/Sense-X/UniNet.

preprint2020arXiv

Effective birationality for sub-pairs with real coefficients

For $ε$-lc Fano type varieties $X$ of dimension $d$ and a given finite set $Γ$, we show that there exists a positive integer $m_0$ which only depends on $ε,d$ and $Γ$, such that both $|-mK_X-\sum_i\lceil mb_i\rceil B_i|$ and $|-mK_X-\sum_i\lfloor mb_i\rfloor B_i|$ define birational maps for any $m\ge m_0$ provided that $B_i$ are pseudo-effective Weil divisors, $b_i\inΓ$, and $-(K_X+\sum_ib_iB_i)$ is big. When $Γ\subset[0,1]$ satisfies the DCC but is not finite, we construct an example to show that the effective birationality may fail even if $X$ is fixed, $B_i$ are fixed prime divisors, and $(X,B)$ is $ε'$-lc for some $ε'>0$.

preprint2020arXiv

Learning Where to Focus for Efficient Video Object Detection

Transferring existing image-based detectors to the video is non-trivial since the quality of frames is always deteriorated by part occlusion, rare pose, and motion blur. Previous approaches exploit to propagate and aggregate features across video frames by using optical flow-warping. However, directly applying image-level optical flow onto the high-level features might not establish accurate spatial correspondences. Therefore, a novel module called Learnable Spatio-Temporal Sampling (LSTS) has been proposed to learn semantic-level correspondences among adjacent frame features accurately. The sampled locations are first randomly initialized, then updated iteratively to find better spatial correspondences guided by detection supervision progressively. Besides, Sparsely Recursive Feature Updating (SRFU) module and Dense Feature Aggregation (DFA) module are also introduced to model temporal relations and enhance per-frame features, respectively. Without bells and whistles, the proposed method achieves state-of-the-art performance on the ImageNet VID dataset with less computational complexity and real-time speed. Code will be made available at https://github.com/jiangzhengkai/LSTS.

preprint2020arXiv

Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

Though face rotation has achieved rapid progress in recent years, the lack of high-quality paired training data remains a great hurdle for existing methods. The current generative models heavily rely on datasets with multi-view images of the same person. Thus, their generated results are restricted by the scale and domain of the data source. To overcome these challenges, we propose a novel unsupervised framework that can synthesize photo-realistic rotated faces using only single-view image collections in the wild. Our key insight is that rotating faces in the 3D space back and forth, and re-rendering them to the 2D plane can serve as a strong self-supervision. We leverage the recent advances in 3D face modeling and high-resolution GAN to constitute our building blocks. Since the 3D rotation-and-render on faces can be applied to arbitrary angles without losing details, our approach is extremely suitable for in-the-wild scenarios (i.e. no paired data are available), where existing methods fall short. Extensive experiments demonstrate that our approach has superior synthesis quality as well as identity preservation over the state-of-the-art methods, across a wide range of poses and domains. Furthermore, we validate that our rotate-and-render framework naturally can act as an effective data augmentation engine for boosting modern face recognition systems even on strong baseline models.