Source author record

Can Zhang

Can Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision math.OC Computation and Language cond-mat.supr-con physics.med-ph Artificial Intelligence cond-mat.mtrl-sci Information Retrieval math.AP physics.ins-det physics.optics Software Engineering

Catalog footprint

What is connected

19works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

PediaMind-R1: A Temperament-Aware Language Model for Personalized Early Childhood Care Reasoning via Cognitive Modeling and Preference Alignment

This paper presents PediaMind-R1, a domain-specialized large language model designed to achieve active personalization in intelligent parenting scenarios. Unlike conventional systems that provide generic suggestions, PediaMind-R1 draws on insights from developmental psychology. It introduces temperament theory from the Thomas-Chess framework and builds a temperament knowledge graph for infants and toddlers (0-3 years). Our two-stage training pipeline first uses supervised fine-tuning to teach structured chain-of-thought reasoning, and then applies a GRPO-based alignment stage to reinforce logical consistency, domain expertise, and empathetic caregiving strategies. We further design an evaluation framework comprising temperament-sensitive multiple-choice tests and human assessments. The results demonstrate that PediaMind-R1 can accurately interpret early childhood temperament profiles and proactively engage in individualized reasoning. This work highlights the value of integrating vertical-domain modeling with psychological theory. It offers a novel approach to developing user-centered LLMs that advance the practice of active personalization in sensitive caregiving contexts.

preprint2022arXiv

CA-UDA: Class-Aware Unsupervised Domain Adaptation with Optimal Assignment and Pseudo-Label Refinement

Recent works on unsupervised domain adaptation (UDA) focus on the selection of good pseudo-labels as surrogates for the missing labels in the target data. However, source domain bias that deteriorates the pseudo-labels can still exist since the shared network of the source and target domains are typically used for the pseudo-label selections. The suboptimal feature space source-to-target domain alignment can also result in unsatisfactory performance. In this paper, we propose CA-UDA to improve the quality of the pseudo-labels and UDA results with optimal assignment, a pseudo-label refinement strategy and class-aware domain alignment. We use an auxiliary network to mitigate the source domain bias for pseudo-label refinement. Our intuition is that the underlying semantics in the target domain can be fully exploited to help refine the pseudo-labels that are inferred from the source features under domain shift. Furthermore, our optimal assignment can optimally align features in the source-to-target domains and our class-aware domain alignment can simultaneously close the domain gap while preserving the classification decision boundaries. Extensive experiments on several benchmark datasets show that our method can achieve state-of-the-art performance in the image classification task.

preprint2022arXiv

Deep Motion Prior for Weakly-Supervised Temporal Action Localization

Weakly-Supervised Temporal Action Localization (WSTAL) aims to localize actions in untrimmed videos with only video-level labels. Currently, most state-of-the-art WSTAL methods follow a Multi-Instance Learning (MIL) pipeline: producing snippet-level predictions first and then aggregating to the video-level prediction. However, we argue that existing methods have overlooked two important drawbacks: 1) inadequate use of motion information and 2) the incompatibility of prevailing cross-entropy training loss. In this paper, we analyze that the motion cues behind the optical flow features are complementary informative. Inspired by this, we propose to build a context-dependent motion prior, termed as motionness. Specifically, a motion graph is introduced to model motionness based on the local motion carrier (e.g., optical flow). In addition, to highlight more informative video snippets, a motion-guided loss is proposed to modulate the network training conditioned on motionness scores. Extensive ablation studies confirm that motionness efficaciously models action-of-interest, and the motion-guided loss leads to more accurate results. Besides, our motion-guided loss is a plug-and-play loss function and is applicable with existing WSTAL methods. Without loss of generality, based on the standard MIL pipeline, our method achieves new state-of-the-art performance on three challenging benchmarks, including THUMOS'14, ActivityNet v1.2 and v1.3.

preprint2022arXiv

LocVTP: Video-Text Pre-training for Temporal Localization

Video-Text Pre-training (VTP) aims to learn transferable representations for various downstream tasks from large-scale web videos. To date, almost all existing VTP methods are limited to retrieval-based downstream tasks, e.g., video retrieval, whereas their transfer potentials on localization-based tasks, e.g., temporal grounding, are under-explored. In this paper, we experimentally analyze and demonstrate the incompatibility of current VTP methods with localization tasks, and propose a novel Localization-oriented Video-Text Pre-training framework, dubbed as LocVTP. Specifically, we perform the fine-grained contrastive alignment as a complement to the coarse-grained one by a clip-word correspondence discovery scheme. To further enhance the temporal reasoning ability of the learned feature, we propose a context projection head and a temporal aware contrastive loss to perceive the contextual relationships. Extensive experiments on four downstream tasks across six datasets demonstrate that our LocVTP achieves state-of-the-art performance on both retrieval-based and localization-based tasks. Furthermore, we conduct comprehensive ablation studies and thorough analyses to explore the optimum model designs and training strategies.

preprint2022arXiv

MISS: Multi-Interest Self-Supervised Learning Framework for Click-Through Rate Prediction

CTR prediction is essential for modern recommender systems. Ranging from early factorization machines to deep learning based models in recent years, existing CTR methods focus on capturing useful feature interactions or mining important behavior patterns. Despite the effectiveness, we argue that these methods suffer from the risk of label sparsity (i.e., the user-item interactions are highly sparse with respect to the feature space), label noise (i.e., the collected user-item interactions are usually noisy), and the underuse of domain knowledge (i.e., the pairwise correlations between samples). To address these challenging problems, we propose a novel Multi-Interest Self-Supervised learning (MISS) framework which enhances the feature embeddings with interest-level self-supervision signals. With the help of two novel CNN-based multi-interest extractors,self-supervision signals are discovered with full considerations of different interest representations (point-wise and union-wise), interest dependencies (short-range and long-range), and interest correlations (inter-item and intra-item). Based on that, contrastive learning losses are further applied to the augmented views of interest representations, which effectively improves the feature representation learning. Furthermore, our proposed MISS framework can be used as an plug-in component with existing CTR prediction models and further boost their performances. Extensive experiments on three large-scale datasets show that MISS significantly outperforms the state-of-the-art models, by up to 13.55% in AUC, and also enjoys good compatibility with representative deep CTR models.

preprint2022arXiv

On Pursuit of Designing Multi-modal Transformer for Video Grounding

Video grounding aims to localize the temporal segment corresponding to a sentence query from an untrimmed video. Almost all existing video grounding methods fall into two frameworks: 1) Top-down model: It predefines a set of segment candidates and then conducts segment classification and regression. 2) Bottom-up model: It directly predicts frame-wise probabilities of the referential segment boundaries. However, all these methods are not end-to-end, i.e., they always rely on some time-consuming post-processing steps to refine predictions. To this end, we reformulate video grounding as a set prediction task and propose a novel end-to-end multi-modal Transformer model, dubbed as GTR. Specifically, GTR has two encoders for video and language encoding, and a cross-modal decoder for grounding prediction. To facilitate the end-to-end training, we use a Cubic Embedding layer to transform the raw videos into a set of visual tokens. To better fuse these two modalities in the decoder, we design a new Multi-head Cross-Modal Attention. The whole GTR is optimized via a Many-to-One matching loss. Furthermore, we conduct comprehensive studies to investigate different model design choices. Extensive results on three benchmarks have validated the superiority of GTR. All three typical GTR variants achieve record-breaking performance on all datasets and metrics, with several times faster inference speed.

preprint2022arXiv

SpatioTemporal Focus for Skeleton-based Action Recognition

Graph convolutional networks (GCNs) are widely adopted in skeleton-based action recognition due to their powerful ability to model data topology. We argue that the performance of recent proposed skeleton-based action recognition methods is limited by the following factors. First, the predefined graph structures are shared throughout the network, lacking the flexibility and capacity to model the multi-grain semantic information. Second, the relations among the global joints are not fully exploited by the graph local convolution, which may lose the implicit joint relevance. For instance, actions such as running and waving are performed by the co-movement of body parts and joints, e.g., legs and arms, however, they are located far away in physical connection. Inspired by the recent attention mechanism, we propose a multi-grain contextual focus module, termed MCF, to capture the action associated relation information from the body joints and parts. As a result, more explainable representations for different skeleton action sequences can be obtained by MCF. In this study, we follow the common practice that the dense sample strategy of the input skeleton sequences is adopted and this brings much redundancy since number of instances has nothing to do with actions. To reduce the redundancy, a temporal discrimination focus module, termed TDF, is developed to capture the local sensitive points of the temporal dynamics. MCF and TDF are integrated into the standard GCN network to form a unified architecture, named STF-Net. It is noted that STF-Net provides the capability to capture robust movement patterns from these skeleton topology structures, based on multi-grain context aggregation and temporal dependency. Extensive experimental results show that our STF-Net significantly achieves state-of-the-art results on three challenging benchmarks NTU RGB+D 60, NTU RGB+D 120, and Kinetics-skeleton.

preprint2022arXiv

Unsupervised Pre-training for Temporal Action Localization Tasks

Unsupervised video representation learning has made remarkable achievements in recent years. However, most existing methods are designed and optimized for video classification. These pre-trained models can be sub-optimal for temporal localization tasks due to the inherent discrepancy between video-level classification and clip-level localization. To bridge this gap, we make the first attempt to propose a self-supervised pretext task, coined as Pseudo Action Localization (PAL) to Unsupervisedly Pre-train feature encoders for Temporal Action Localization tasks (UP-TAL). Specifically, we first randomly select temporal regions, each of which contains multiple clips, from one video as pseudo actions and then paste them onto different temporal positions of the other two videos. The pretext task is to align the features of pasted pseudo action regions from two synthetic videos and maximize the agreement between them. Compared to the existing unsupervised video representation learning approaches, our PAL adapts better to downstream TAL tasks by introducing a temporal equivariant contrastive learning paradigm in a temporally dense and scale-aware manner. Extensive experiments show that PAL can utilize large-scale unlabeled video data to significantly boost the performance of existing TAL methods. Our codes and models will be made publicly available at https://github.com/zhang-can/UP-TAL.

preprint2020arXiv

Dirac surface states in superconductors: a dual topological proximity effect

In this paper we present scanning tunneling microscopy of Bi$_2$Se$_3$ with superconducting Nb deposited on the surface. We find that the topologically protected surface states of the Bi$_2$Se$_3$ leak into the superconducting over-layer, suggesting a dual topological proximity effect. Coupling between theses states and the Nb states leads to an effective pairing mechanism for the surface states, leading to a modified model for a topological superconductor in these systems. This model is consistent with fits between the experimental data and the theory.

preprint2020arXiv

PAN: Towards Fast Action Recognition via Learning Persistence of Appearance

Efficiently modeling dynamic motion information in videos is crucial for action recognition task. Most state-of-the-art methods heavily rely on dense optical flow as motion representation. Although combining optical flow with RGB frames as input can achieve excellent recognition performance, the optical flow extraction is very time-consuming. This undoubtably will count against real-time action recognition. In this paper, we shed light on fast action recognition by lifting the reliance on optical flow. Our motivation lies in the observation that small displacements of motion boundaries are the most critical ingredients for distinguishing actions, so we design a novel motion cue called Persistence of Appearance (PA). In contrast to optical flow, our PA focuses more on distilling the motion information at boundaries. Also, it is more efficient by only accumulating pixel-wise differences in feature space, instead of using exhaustive patch-wise search of all the possible motion vectors. Our PA is over 1000x faster (8196fps vs. 8fps) than conventional optical flow in terms of motion modeling speed. To further aggregate the short-term dynamics in PA to long-term dynamics, we also devise a global temporal fusion strategy called Various-timescale Aggregation Pooling (VAP) that can adaptively model long-range temporal relationships across various timescales. We finally incorporate the proposed PA and VAP to form a unified framework called Persistent Appearance Network (PAN) with strong temporal modeling ability. Extensive experiments on six challenging action recognition benchmarks verify that our PAN outperforms recent state-of-the-art methods at low FLOPs. Codes and models are available at: https://github.com/zhang-can/PAN-PyTorch.

preprint2016arXiv

2-Approximation Algorithms for Perishable Inventory Control When FIFO Is an Optimal Issuing Policy

We consider a periodic-review, fixed-lifetime perishable inventory control problem where demand is a general stochastic process. The optimal solution for this problem is intractable due to "curse of dimensionality". In this paper, we first present a computationally efficient algorithm that we call the marginal-cost dual-balancing policy for perishable inventory control problem. We then prove that a myopic policy under the so-called marginal-cost accounting scheme provides a lower bound on the optimal ordering quantity. By combining the specific lower bound we derive and any upper bound on the optimal ordering quantity with the marginal-cost dual-balancing policy, we present a more general class of algorithms that we call the truncated-balancing policy. We prove that when first-in-first-out (FIFO) is an optimal issuing policy, both of our proposed algorithms admit a worst-case performance guarantee of two, i.e. the expected total cost of our policy is at most twice that of an optimal ordering policy. We further present sufficient conditions that ensure the optimality of FIFO issuing policy. Finally, we conduct numerical analyses based on real data and show that both of our algorithms perform much better than the worst-case performance guarantee, and the truncated-balancing policy has a significant performance improvement over the balancing policy.

preprint2016arXiv

Steady-state and periodic exponential turnpike property for optimal control problems in Hilbert spaces

In this work, we study the steady-state (or periodic) exponential turnpike property of optimal control problems in Hilbert spaces. The turnpike property, which is essentially due to the hyperbolic feature of the Hamiltonian system resulting from the Pontryagin maximum principle, reflects the fact that, in large time, the optimal state, control and adjoint vector remain most of the time close to an optimal steady-state. A similar statement holds true as well when replacing an optimal steady-state by an optimal periodic trajectory. To establish the result, we design an appropriate dichotomy transformation, based on solutions of the algebraic Riccati and Lyapunov equations. We illustrate our results with examples including linear heat and wave equations with periodic tracking terms.

preprint2015arXiv

Preliminary Research on Dual-Energy X-Ray Phase-Contrast Imaging

Dual-energy X-ray absorptiometry (DEXA) has been widely applied to measure bone mineral density (BMD) and soft-tissue composition of human body. However, the use of DEXA is greatly limited for low-Z materials such as soft tissues due to their weak absorption. While X-ray phase-contrast imaging (XPCI) shows significantly improved contrast in comparison with the conventional standard absorption-based X-ray imaging for soft tissues. In this paper, we propose a novel X-ray phase-contrast method to measure the area density of low-Z materials, including a single-energy method and a dual-energy method. The single-energy method is for the area density calculation of one low-Z material, while the dual-energy method is aiming to calculate the area densities of two low-Z materials simultaneously. Comparing the experimental and simulation results with the theoretic ones, the new method proves to have the potential to replace DEXA in area density measurement. The new method sets the prerequisites for future precise and low-dose area density calculation method of low-Z materials.

preprint2015arXiv

Research on the background correction method in x-ray phase contrast imaging with Talbot-Lau interferometer

X-ray Talbot-Lau interferometer has been used widely to conduct phase contrast imaging with a conventional low-brilliance x-ray source. Typically, in this technique, background correction has to be performed in order to obtain the pure signal of the sample under inspection. In this study, we reported on a research on the background correction strategies within this technique, especially we introduced a new phase unwrapping solution for one conventional background correction method, the key point of this new solution is changing the initial phase of each pixel by a cyclic shift operation on the raw images collected in phase stepping scan. Experimental result and numerical analysis showed that the new phase unwrapping algorithm could successfully subtract contribution of the system's background without error. Moreover, a potential advantage of this phase unwrapping strategy is that its effective phase measuring range could be tuned flexibly in some degree for example to be (-pi+3, pi+3], thus it would find usage in certain case because measuring range of the currently widely used background correction method is fixed to be (-pi, pi].

preprint2014arXiv

A LabVIEW based user-friendly X-ray phase-contrast imaging system software platform

X-ray phase-contrast imaging can provide greatly improved contrast over conventional absorption-based imaging for weakly absorbing samples, such as biological soft tissues and fibre composites. In this manuscript, we introduce an easy and fast way to develop a user-friendly software platform dedicated to the new grating-based X-ray phase-contrast imaging setup recently built at the National Synchrotron Radiation Laboratory of the University of Science and Technology of China. Unified management and control of 21 motorized positioning stages, of an ultra-precision piezoelectric translation stage and of the X-ray tube are achieved with this platform. The software package also covers the automatic image acquisition of the phase-stepping scanning with a flat panel detector. Moreover, a data post-processing module for signals retrieval and other custom features are in principle available. With a seamless integration of all necessary functions in a unique package, this software platform will greatly support the user activity during experimental runs.

preprint2014arXiv

Observability inequalities from measurable sets for some evolution equations

In this paper, we build up two observability inequalities from measurable sets in time for some evolution equations in Hilbert spaces from two different settings. The equation reads: $u'=Au,\; t>0$, and the observation operator is denoted by $B$. In the first setting, we assume that $A$ generates an analytic semigroup, $B$ is an admissible observation operator for this semigroup (cf. \cite{TG}), and the pair $(A,B)$ verifies some observability inequality from time intervals. With the help of the propagation estimate of analytic functions (cf. \cite{V}) and a telescoping series method provided in the current paper, we establish an observability inequality from measurable sets in time. In the second setting, we suppose that $A$ generates a $C_0$ semigroup, $B$ is a linear and bounded operator, and the pair $(A, B)$ verifies some spectral-like condition. With the aid of methods developed in \cite{AEWZ} and \cite{PW2} respectively, we first obtain an interpolation inequality at one time, and then derive an observability inequality from measurable sets in time. These two observability inequalities are applied to get the bang-bang property for some time optimal control problems.

preprint2013arXiv

Uniform Oxygen Doping Leads to Superconductivity in FeTe Films

FeTe is known to become a superconductor when doped with oxygen. Using layer by layer growth of single crystal films by molecular beam epitaxy (MBE), we have studied how oxygen incorporates. If oxygen is supplied during growth of a layer, it substitutes for tellurium inhomogeneously in oxygen domains that are not associated with superconductivity. When oxygen is supplied after growth, it diffuses homogeneously into the crystalline film and incorporates interstitially. Only the interstitial oxygen causes superconductivity to emerge. This suggests that the superconductivity observed in this material is spatially uniform and not filamentary.

preprint2012arXiv

The time optimal control problem with control constraints of the rectangular type for a class of ODEs

This paper studies a time optimal control problem with control constraints of the rectangular type for the linear multi-input time-varying ordinary differential equations. The aims of this study are to establish certain necessary and sufficient conditions for the optimal time and time optimal control, and to build up an algorithm for the optimal time and time optimal control.

preprint2012arXiv

Unique Continuation and Observability Estimates for 2-D Stokes Equations with the Navier Slip Boundary Condition

This paper presents a unique continuation estimate for 2-D Stokes equations with the Naiver slip boundary condition in a bounded and simply connected domain. Consequently, an observability estimate for this equation from a subset of positive measure in time follows from the aforementioned unique continuation estimate and the new strategy developed in [16]. Several applications of the above-mentioned observability estimate to control problems of the Stokes equations are given.

Can Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

PediaMind-R1: A Temperament-Aware Language Model for Personalized Early Childhood Care Reasoning via Cognitive Modeling and Preference Alignment

CA-UDA: Class-Aware Unsupervised Domain Adaptation with Optimal Assignment and Pseudo-Label Refinement

Deep Motion Prior for Weakly-Supervised Temporal Action Localization

LocVTP: Video-Text Pre-training for Temporal Localization

MISS: Multi-Interest Self-Supervised Learning Framework for Click-Through Rate Prediction

On Pursuit of Designing Multi-modal Transformer for Video Grounding

SpatioTemporal Focus for Skeleton-based Action Recognition

Unsupervised Pre-training for Temporal Action Localization Tasks

Dirac surface states in superconductors: a dual topological proximity effect

PAN: Towards Fast Action Recognition via Learning Persistence of Appearance

2-Approximation Algorithms for Perishable Inventory Control When FIFO Is an Optimal Issuing Policy

Steady-state and periodic exponential turnpike property for optimal control problems in Hilbert spaces

Preliminary Research on Dual-Energy X-Ray Phase-Contrast Imaging

Research on the background correction method in x-ray phase contrast imaging with Talbot-Lau interferometer

A LabVIEW based user-friendly X-ray phase-contrast imaging system software platform

Observability inequalities from measurable sets for some evolution equations

Uniform Oxygen Doping Leads to Superconductivity in FeTe Films

The time optimal control problem with control constraints of the rectangular type for a class of ODEs

Unique Continuation and Observability Estimates for 2-D Stokes Equations with the Navier Slip Boundary Condition