Researcher profile

Zhifei Zhang

Zhifei Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
19works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

19 published item(s)

preprint2025arXiv

OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions

Existing feedforward subject-driven video customization methods mainly study single-subject scenarios due to the difficulty of constructing multi-subject training data pairs. Another challenging problem that how to use the signals such as depth, mask, camera, and text prompts to control and edit the subject in the customized video is still less explored. In this paper, we first propose a data construction pipeline, VideoCus-Factory, to produce training data pairs for multi-subject customization from raw videos without labels and control signals such as depth-to-video and mask-to-video pairs. Based on our constructed data, we develop an Image-Video Transfer Mixed (IVTM) training with image editing data to enable instructive editing for the subject in the customized video. Then we propose a diffusion Transformer framework, OmniVCus, with two embedding mechanisms, Lottery Embedding (LE) and Temporally Aligned Embedding (TAE). LE enables inference with more subjects by using the training subjects to activate more frame embeddings. TAE encourages the generation process to extract guidance from temporally aligned control signals by assigning the same frame embeddings to the control and noise tokens. Experiments demonstrate that our method significantly surpasses state-of-the-art methods in both quantitative and qualitative evaluations. Video demos are at our project page: https://caiyuanhao1998.github.io/project/OmniVCus/. Our code, models, data are released at https://github.com/caiyuanhao1998/Open-OmniVCus

preprint2022arXiv

A Multi-Implicit Neural Representation for Fonts

Fonts are ubiquitous across documents and come in a variety of styles. They are either represented in a native vector format or rasterized to produce fixed resolution images. In the first case, the non-standard representation prevents benefiting from latest network architectures for neural representations; while, in the latter case, the rasterized representation, when encoded via networks, results in loss of data fidelity, as font-specific discontinuities like edges and corners are difficult to represent using neural networks. Based on the observation that complex fonts can be represented by a superposition of a set of simpler occupancy functions, we introduce \textit{multi-implicits} to represent fonts as a permutation-invariant set of learned implict functions, without losing features (e.g., edges and corners). However, while multi-implicits locally preserve font features, obtaining supervision in the form of ground truth multi-channel signals is a problem in itself. Instead, we propose how to train such a representation with only local supervision, while the proposed neural architecture directly finds globally consistent multi-implicits for font families. We extensively evaluate the proposed representation for various tasks including reconstruction, interpolation, and synthesis to demonstrate clear advantages with existing alternatives. Additionally, the representation naturally enables glyph completion, wherein a single characteristic font is used to synthesize a whole font family in the target style.

preprint2022arXiv

Dynamics near Couette flow for the $β$-plane equation

In this paper, we study stationary structures near the planar Couette flow in Sobolev spaces on a channel $\mathbb{T}\times[-1,1]$, and asymptotic behavior of Couette flow in Gevrey spaces on $\mathbb{T}\times\mathbb{R}$ for the $β$-plane equation. Let $T>0$ be the horizontal period of the channel and $α={2π\over T}$ be the wave number. We obtain a sharp region $O$ in the whole $(α,β)$ half-plane such that non-parallel steadily traveling waves do not exist for $(α,β)\in O$ and such traveling waves exist for $(α,β)$ in the remaining regions, near Couette flow for $H^{\geq5}$ velocity perturbation. The borderlines between the region $O$ and its remaining are determined by two curves of the principal eigenvalues of singular Rayleigh-Kuo operators. Our results reveal that there exists $β_*>0$ such that if $|β|\leq β_*$, then non-parallel traveling waves do not exist for any $T>0$, while if $|β|>β_*$, then there exists a critical period $T_β>0$ so that such traveling waves exist for $T\in \left[T_β,\infty\right)$ and do not exist for $T\in \left(0,T_β\right)$, near Couette flow for $H^{\geq5}$ velocity perturbation. This contrasting dynamics plays an important role in studying the long time dynamics near Couette flow with Coriolis effects. Moreover, for any $β\neq0$ and $T>0$, there exist no non-parallel traveling waves with speeds converging in $(-1,1)$ near Couette flow for $H^{\geq5}$ velocity perturbation, in contrast to this, we construct non-shear stationary solutions near Couette flow for $H^{<{5\over2}}$ velocity perturbation, which is a generalization of Theorem 1 in [22] but the construction is more difficult due to the $β$&#39;s term. Finally, we prove nonlinear inviscid damping for Couette flow in some Gevrey spaces by extending the method of [4] to the $β$-plane equation on $\mathbb{T}\times\mathbb{R}$.

preprint2022arXiv

Feature Importance-aware Transferable Adversarial Attacks

Transferability of adversarial examples is of central importance for attacking an unknown model, which facilitates adversarial attacks in more practical scenarios, e.g., black-box attacks. Existing transferable attacks tend to craft adversarial examples by indiscriminately distorting features to degrade prediction accuracy in a source model without aware of intrinsic features of objects in the images. We argue that such brute-force degradation would introduce model-specific local optimum into adversarial examples, thus limiting the transferability. By contrast, we propose the Feature Importance-aware Attack (FIA), which disrupts important object-aware features that dominate model decisions consistently. More specifically, we obtain feature importance by introducing the aggregate gradient, which averages the gradients with respect to feature maps of the source model, computed on a batch of random transforms of the original clean image. The gradients will be highly correlated to objects of interest, and such correlation presents invariance across different models. Besides, the random transforms will preserve intrinsic features of objects and suppress model-specific information. Finally, the feature importance guides to search for adversarial examples towards disrupting critical features, achieving stronger transferability. Extensive experimental evaluation demonstrates the effectiveness and superior performance of the proposed FIA, i.e., improving the success rate by 9.5% against normally trained models and 12.8% against defense models as compared to the state-of-the-art transferable attacks. Code is available at: https://github.com/hcguoO0/FIA

preprint2022arXiv

GALA: Toward Geometry-and-Lighting-Aware Object Search for Compositing

Compositing-aware object search aims to find the most compatible objects for compositing given a background image and a query bounding box. Previous works focus on learning compatibility between the foreground object and background, but fail to learn other important factors from large-scale data, i.e. geometry and lighting. To move a step further, this paper proposes GALA (Geometry-and-Lighting-Aware), a generic foreground object search method with discriminative modeling on geometry and lighting compatibility for open-world image compositing. Remarkably, it achieves state-of-the-art results on the CAIS dataset and generalizes well on large-scale open-world datasets, i.e. Pixabay and Open Images. In addition, our method can effectively handle non-box scenarios, where users only provide background images without any input bounding box. A web demo (see supplementary materials) is built to showcase applications of the proposed method for compositing-aware search and automatic location/scale prediction for the foreground object.

preprint2022arXiv

The number of traveling wave families in a running water with Coriolis force

In this paper, we study the number of traveling wave families near a shear flow under the influence of Coriolis force, where the traveling speeds lie outside the range of the flow $u$. Under the $β$-plane approximation, if the flow $u$ has a critical point at which $u$ attains its minimal (resp. maximal) value, then a unique transitional $β$ value exists in the positive (resp. negative) half-line such that the number of traveling wave families near the shear flow changes suddenly from finite to infinite when $β$ passes through it. On the other hand, if $u$ has no such critical points, then the number is always finite for positive (resp. negative) $β$ values. This is true for general shear flows under mildly technical assumptions, and for a large class of shear flows including a cosine jet $u(y) = {1+\cos(πy)\over 2}$ (i.e. the sinus profile) and analytic monotone flows unconditionally. The sudden change of the number of traveling wave families indicates that long time dynamics around the shear flow is much richer than the non-rotating case, where no such traveling wave families exist.

preprint2022arXiv

The unconditional uniqueness for the energy-supercritical NLS

We consider the cubic and quintic nonlinear Schrödinger equations (NLS) under the $\mathbb{R}^{d}$ and $\mathbb{T}^{d}$ energy-supercritical setting. Via a newly developed unified scheme, we prove the unconditional uniqueness for solutions to NLS at critical regularity for all dimensions. Thus, together with [18,19], the unconditional uniqueness problems for $H^{1}$-critical and $H^{1}$-supercritical cubic and quintic NLS are completely and uniformly resolved at critical regularity for these domains. One application of our theorem is to prove that defocusing blowup solutions of the type in [54] is the only possible $C([0,T);\dot{H}^{s_{c}})$ solution if exist in these domains.

preprint2022arXiv

Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition

Artistic text recognition is an extremely challenging task with a wide range of applications. However, current scene text recognition methods mainly focus on irregular text while have not explored artistic text specifically. The challenges of artistic text recognition include the various appearance with special-designed fonts and effects, the complex connections and overlaps between characters, and the severe interference from background patterns. To alleviate these problems, we propose to recognize the artistic text at three levels. Firstly, corner points are applied to guide the extraction of local features inside characters, considering the robustness of corner structures to appearance and shape. In this way, the discreteness of the corner points cuts off the connection between characters, and the sparsity of them improves the robustness for background interference. Secondly, we design a character contrastive loss to model the character-level feature, improving the feature representation for character classification. Thirdly, we utilize Transformer to learn the global feature on image-level and model the global relationship of the corner points, with the assistance of a corner-query cross-attention mechanism. Besides, we provide an artistic text dataset to benchmark the performance. Experimental results verify the significant superiority of our proposed method on artistic text recognition and also achieve state-of-the-art performance on several blurred and perspective datasets.

preprint2022arXiv

Vanilla Feature Distillation for Improving the Accuracy-Robustness Trade-Off in Adversarial Training

Adversarial training has been widely explored for mitigating attacks against deep models. However, most existing works are still trapped in the dilemma between higher accuracy and stronger robustness since they tend to fit a model towards robust features (not easily tampered with by adversaries) while ignoring those non-robust but highly predictive features. To achieve a better robustness-accuracy trade-off, we propose the Vanilla Feature Distillation Adversarial Training (VFD-Adv), which conducts knowledge distillation from a pre-trained model (optimized towards high accuracy) to guide adversarial training towards higher accuracy, i.e., preserving those non-robust but predictive features. More specifically, both adversarial examples and their clean counterparts are forced to be aligned in the feature space by distilling predictive representations from the pre-trained/clean model, while previous works barely utilize predictive features from clean models. Therefore, the adversarial training model is updated towards maximally preserving the accuracy as gaining robustness. A key advantage of our method is that it can be universally adapted to and boost existing works. Exhaustive experiments on various datasets, classification models, and adversarial training algorithms demonstrate the effectiveness of our proposed method.

preprint2021arXiv

On the $L^\infty$ stability of Prandtl expansions in Gevrey class

In this paper, we prove the $L^\infty\cap L^2$ stability of Prandtl expansions of shear flow type as $\big(U(y/\sqrtν),0\big)$ for the initial perturbation in the Gevrey class, where $U(y)$ is a monotone and concave function and $ν$ is the viscosity coefficient. To this end, we develop the direct resolvent estimate method for the linearized Orr-Sommerfeld operator instead of the Rayleigh-Airy iteration method introduced by Grenier, Guo and Nguyen.

preprint2020arXiv

Scaling invariant Serrin criterion via one velocity component for the Navier-Stokes equations

In this paper, we prove that the Leray weak solution $u : \mathbb{R}^3\times (0, T)\rightarrow\mathbb{R}^3 $ of the Navier-Stokes equations is regular in $\mathbb{R}^3\times (0,T)$ under the scaling invariant Serrin condition imposed on one component of the velocity $u_3\in L^{q,1}(0, T;L^p(\mathbb{R}^3))$ with \[ \frac{2}{q}+\frac{3}{p}\leq 1,\quad 3<p<+\infty. \] This result is an immediate consequence of a new local regularity criterion in terms of one velocity component for suitable weak solutions.

preprint2020arXiv

Texture Hallucination for Large-Factor Painting Super-Resolution

We aim to super-resolve digital paintings, synthesizing realistic details from high-resolution reference painting materials for very large scaling factors (e.g., 8X, 16X). However, previous single image super-resolution (SISR) methods would either lose textural details or introduce unpleasing artifacts. On the other hand, reference-based SR (Ref-SR) methods can transfer textures to some extent, but is still impractical to handle very large factors and keep fidelity with original input. To solve these problems, we propose an efficient high-resolution hallucination network for very large scaling factors with an efficient network structure and feature transferring. To transfer more detailed textures, we design a wavelet texture loss, which helps to enhance more high-frequency components. At the same time, to reduce the smoothing effect brought by the image reconstruction loss, we further relax the reconstruction constraint with a degradation loss which ensures the consistency between downscaled super-resolution results and low-resolution inputs. We also collected a high-resolution (e.g., 4K resolution) painting dataset PaintHD by considering both physical size and image resolution. We demonstrate the effectiveness of our method with extensive experiments on PaintHD by comparing with SISR and Ref-SR state-of-the-art methods.

preprint2020arXiv

Transition threshold for the 3D Couette flow in a finite channel

In this paper, we study nonlinear stability of the 3D plane Couette flow $(y,0,0)$ at high Reynolds number ${Re}$ in a finite channel $\mathbb{T}\times [-1,1]\times \mathbb{T}$. It is well known that the plane Couette flow is linearly stable for any Reynolds number. However, it could become nonlinearly unstable and transition to turbulence for small but finite perturbations at high Reynolds number. This is so-called Sommerfeld paradox. One resolution of this paradox is to study the transition threshold problem, which is concerned with how much disturbance will lead to the instability of the flow and the dependence of disturbance on the Reynolds number. This work shows that if the initial velocity $v_0$ satisfies $\|v_0-(y,0,0)\|_{H^2}\le c_0{Re}^{-1}$ for some $c_0>0$ independent of $Re$, then the solution of the 3D Navier-Stokes equations is global in time and does not transition away from the Couette flow in the $L^\infty$ sense, and rapidly converges to a streak solution for $t\gg Re^{\frac 13}$ due to the mixing-enhanced dissipation effect. This result confirms the transition threshold conjecture proposed by Trefethen et al.(Science, 261(1993), 578-584). To this end, we develop the resolvent estimate method to establish the space-time estimates for the full linearized Navier-Stokes system around the flow $(V(t,y,z), 0,0)$, where $V(t,y,z)$ is a small perturbation(but independent of $Re$) of the Couette flow $y$.