Source author record

Xing Wu

Xing Wu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Artificial Intelligence hep-th math.AP physics.atom-ph Applications cond-mat.mes-hall cond-mat.mtrl-sci eess.AS Genomics gr-qc Machine Learning physics.chem-ph physics.ins-det quant-ph Sound

Catalog footprint

What is connected

23works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

LongBench Pro: A More Realistic and Comprehensive Bilingual Long-Context Evaluation Benchmark

The rapid expansion of context length in large language models (LLMs) has outpaced existing evaluation benchmarks. Current long-context benchmarks often trade off scalability and realism: synthetic tasks underrepresent real-world complexity, while fully manual annotation is costly to scale to extreme lengths and diverse scenarios. We present LongBench Pro, a more realistic and comprehensive bilingual benchmark of 1,500 naturally occurring long-context samples in English and Chinese spanning 11 primary tasks and 25 secondary tasks, with input lengths from 8k to 256k tokens. LongBench Pro supports fine-grained analysis with task-specific metrics and a multi-dimensional taxonomy of context requirement (full vs. partial dependency), length (six levels), and difficulty (four levels calibrated by model performance). To balance quality with scalability, we propose a Human-Model Collaborative Construction pipeline: frontier LLMs draft challenging questions and reference answers, along with design rationales and solution processes, to reduce the cost of expert verification. Experts then rigorously validate correctness and refine problematic cases. Evaluating 46 widely used long-context LLMs on LongBench Pro yields three findings: (1) long-context optimization contributes more to long-context comprehension than parameter scaling; (2) effective context length is typically shorter than the claimed context length, with pronounced cross-lingual misalignment; and (3) the "thinking" paradigm helps primarily models trained with native reasoning, while mixed-thinking designs offer a promising Pareto trade-off. In summary, LongBench Pro provides a robust testbed for advancing long-context understanding.

preprint2026arXiv

MiLe Loss: a New Entropy-Weighed Loss for Mitigating the Bias of Learning Difficulties in Large Language Models

Generative language models are usually pretrained on large text corpus via predicting the next token (i.e., sub-word/word/phrase) given the previous ones. Recent works have demonstrated the impressive performance of large generative language models on downstream tasks. However, existing generative language models generally neglect an inherent challenge in text corpus during training, i.e., the imbalance between frequent tokens and infrequent ones. It can lead a language model to be dominated by common and easy-to-learn tokens, thereby overlooking the infrequent and difficult-to-learn ones. To alleviate that, we propose a MiLe Loss function for mitigating the bias of learning difficulties with tokens. During training, it can dynamically assess the learning difficulty of a to-be-learned token, according to the information entropy of the corresponding predicted probability distribution over the vocabulary. Then it scales the training loss adaptively, trying to lead the model to focus more on the difficult-to-learn tokens. On the Pile dataset, we train generative language models at different scales of 468M, 1.2B, and 6.7B parameters. Experiments reveal that models incorporating the proposed MiLe Loss can gain consistent performance improvement on downstream benchmarks.

preprint2026arXiv

Put the Space of LoRA Initialization to the Extreme to Preserve Pre-trained Knowledge

Low-Rank Adaptation (LoRA) is the leading parameter-efficient fine-tuning method for Large Language Models (LLMs), but it still suffers from catastrophic forgetting. Recent work has shown that specialized LoRA initialization can alleviate catastrophic forgetting. There are currently two approaches to LoRA initialization aimed at preventing knowledge forgetting during fine-tuning: (1) making residual weights close to pre-trained weights, and (2) ensuring the space of LoRA initialization is orthogonal to pre-trained knowledge. The former is what current methods strive to achieve, while the importance of the latter is not sufficiently recognized. We find that the space of LoRA initialization is the key to preserving pre-trained knowledge rather than the residual weights. Existing methods like MiLoRA propose making the LoRA initialization space orthogonal to pre-trained weights. However, MiLoRA utilizes the null space of pre-trained weights. Compared to pre-trained weights, the input activations of pre-trained knowledge take into account the parameters of all previous layers as well as the input data, while pre-trained weights only contain information from the current layer. Moreover, we find that the effective ranks of input activations are much smaller than those of pre-trained weights. Thus, the null space of activations is more accurate and contains less pre-trained knowledge information compared to that of weights. Based on these, we introduce LoRA-Null, our proposed method that initializes LoRA in the null space of activations. Experimental results show that LoRA-Null effectively preserves the pre-trained world knowledge of LLMs while achieving good fine-tuning performance, as evidenced by extensive experiments. Code is available at {https://github.com/HungerPWAY/LoRA-Null}.

preprint2026arXiv

Toward Scalable Terminal Task Synthesis via Skill Graphs

Terminal agents have demonstrated strong potential for autonomous command-line execution, yet their training remains constrained by the scarcity of high-quality and diverse execution trajectories. Existing approaches mitigate this bottleneck by synthesizing large-scale terminal task instances for trajectory sampling. However, they primarily focus on scaling the number of tasks while providing limited control over the diversity of execution trajectories that agents actually experience during training. In this paper, we present SkillSynth, an automated framework for terminal task synthesis built on a scenario-mediated skill graph. SkillSynth first constructs a large-scale skill graph, where scenarios serve as intermediate transition nodes that connect diverse command-line skills. It then samples paths from this graph as abstractions of real-world workflows, and uses a multi-agent harness to instantiate them into executable task instances. By grounding task synthesis in graph-sampled workflow paths, SkillSynth explicitly controls the diversity of minimal execution trajectories required to solve the synthesized tasks. Experiments on Terminal-Bench demonstrate the effectiveness of SkillSynth. Moreover, task instances synthesized by SkillSynth have been adopted to train Hy3 Preview, contributing to its enhanced agentic capabilities in terminal-based settings.

preprint2022arXiv

ESimCSE: Enhanced Sample Building Method for Contrastive Learning of Unsupervised Sentence Embedding

Contrastive learning has been attracting much attention for learning unsupervised sentence embeddings. The current state-of-the-art unsupervised method is the unsupervised SimCSE (unsup-SimCSE). Unsup-SimCSE takes dropout as a minimal data augmentation method, and passes the same input sentence to a pre-trained Transformer encoder (with dropout turned on) twice to obtain the two corresponding embeddings to build a positive pair. As the length information of a sentence will generally be encoded into the sentence embeddings due to the usage of position embedding in Transformer, each positive pair in unsup-SimCSE actually contains the same length information. And thus unsup-SimCSE trained with these positive pairs is probably biased, which would tend to consider that sentences of the same or similar length are more similar in semantics. Through statistical observations, we find that unsup-SimCSE does have such a problem. To alleviate it, we apply a simple repetition operation to modify the input sentence, and then pass the input sentence and its modified counterpart to the pre-trained Transformer encoder, respectively, to get the positive pair. Additionally, we draw inspiration from the community of computer vision and introduce a momentum contrast, enlarging the number of negative pairs without additional calculations. The proposed two modifications are applied on positive and negative pairs separately, and build a new sentence embedding method, termed Enhanced Unsup-SimCSE (ESimCSE). We evaluate the proposed ESimCSE on several benchmark datasets w.r.t the semantic text similarity (STS) task. Experimental results show that ESimCSE outperforms the state-of-the-art unsup-SimCSE by an average Spearman correlation of 2.02% on BERT-base.

preprint2022arXiv

High-sensitivity low-noise photodetector using large-area silicon photomultiplier

The application of silicon photomultiplier (SiPM) technology for weak-light detection at a single photon level has expanded thanks to its better photon detection efficiency in comparison to a conventional photomultiplier tube (PMT). SiPMs with large detection area have recently become commercially available, enabling applications where the photon flux is low both temporarily and spatially. On the other hand, several drawbacks exist in the usage of SiPMs such as a higher dark count rate, many readout channels, slow response time, and optical crosstalk; therefore, users need to carefully consider the trade-offs. This work presents a SiPM-embedded compact large-area photon detection module. Various techniques are adopted to overcome the disadvantages of SiPMs so that it can be generally utilized as an upgrade from a PMT. A simple cooling component and recently developed optical crosstalk suppression method are adopted to reduce the noise which is more serious for larger-area SiPMs. A dedicated readout circuit increases the response frequency and reduces the number of readout channels. We favorably compare this design with a conventional PMT and obtain both higher photon detection efficiency and larger-area acceptance.

preprint2022arXiv

Ill_posedness for a two_component Novikov system in Besov space

In this paper, we consider the Cauchy problem for a two-component Novikov system on the line. By specially constructed initial data $(ρ_0, u_0)$ in $B_{p, \infty}^{s-1}(\mathbb{R})\times B_{p, \infty}^s(\mathbb{R})$ with $s>\max\{2+\frac{1}{p}, \frac{5}{2}\}$ and $1\leq p \leq \infty$, we show that any energy bounded solution starting from $(ρ_0, u_0)$ does not converge back to $(ρ_0, u_0)$ in the metric of $B_{p, \infty}^{s-1}(\mathbb{R})\times B_{p, \infty}^s(\mathbb{R})$ as time goes to zero, thus results in discontinuity of the data-to-solution map and ill-posedness.

preprint2022arXiv

Measurement of the H$^3Δ_1$ Radiative Lifetime in ThO

The best limit on the electron electric dipole moment (eEDM) comes from the ACME II experiment [Nature \textbf{562} (2018), 355-360] which probes physics beyond the Standard Model at energy scales well above 1 TeV. ACME II measured the eEDM by monitoring electron spin precession in a cold beam of the metastable H$^3Δ_1$ state of thorium monoxide (ThO) molecules, with an observation time $τ\approx 1$ ms for each molecule. We report here a new measurement of the lifetime of the ThO (H$^3Δ_1$) state, $τ_H = 4.2\pm 0.5$ ms. Using an apparatus within which $τ\approx τ_H$ will enable a substantial reduction in uncertainty of an eEDM measurement.

preprint2022arXiv

Smoothed Contrastive Learning for Unsupervised Sentence Embedding

Contrastive learning has been gradually applied to learn high-quality unsupervised sentence embedding. Among the previous un-supervised methods, the latest state-of-the-art method, as far as we know, is unsupervised SimCSE (unsup-SimCSE). Unsup-SimCSE uses the InfoNCE1loss function in the training stage by pulling semantically similar sentences together and pushing apart dis-similar ones.Theoretically, we expect to use larger batches in unsup-SimCSE to get more adequate comparisons among samples and avoid overfitting. However, increasing the batch size does not always lead to improvements, but instead even lead to performance degradation when the batch size exceeds a threshold. Through statistical observation, we find that this is probably due to the introduction of low-confidence negative pairs after in-creasing the batch size. To alleviate this problem, we introduce a simple smoothing strategy upon the InfoNCE loss function, termedGaussian Smoothing InfoNCE (GS-InfoNCE).Specifically, we add random Gaussian noise vectors as negative samples, which act asa smoothing of the negative sample space.Though being simple, the proposed smooth-ing strategy brings substantial improvements to unsup-SimCSE. We evaluate GS-InfoNCEon the standard semantic text similarity (STS)task. GS-InfoNCE outperforms the state-of-the-art unsup-SimCSE by an average Spear-man correlation of 1.38%, 0.72%, 1.17% and0.28% on the base of BERT-base, BERT-large,RoBERTa-base and RoBERTa-large, respectively.

preprint2022arXiv

Stacked Autoencoder Based Multi-Omics Data Integration for Cancer Survival Prediction

Cancer survival prediction is important for developing personalized treatments and inducing disease-causing mechanisms. Multi-omics data integration is attracting widespread interest in cancer research for providing information for understanding cancer progression at multiple genetic levels. Many works, however, are limited because of the high dimensionality and heterogeneity of multi-omics data. In this paper, we propose a novel method to integrate multi-omics data for cancer survival prediction, called Stacked AutoEncoder-based Survival Prediction Neural Network (SAEsurv-net). In the cancer survival prediction for TCGA cases, SAEsurv-net addresses the curse of dimensionality with a two-stage dimensionality reduction strategy and handles multi-omics heterogeneity with a stacked autoencoder model. The two-stage dimensionality reduction strategy achieves a balance between computation complexity and information exploiting. The stacked autoencoder model removes most heterogeneities such as data's type and size in the first group of autoencoders, and integrates multiple omics data in the second autoencoder. The experiments show that SAEsurv-net outperforms models based on a single type of data as well as other state-of-the-art methods.

preprint2022arXiv

Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks

Before entering the neural network, a token is generally converted to the corresponding one-hot representation, which is a discrete distribution of the vocabulary. Smoothed representation is the probability of candidate tokens obtained from a pre-trained masked language model, which can be seen as a more informative substitution to the one-hot representation. We propose an efficient data augmentation method, termed text smoothing, by converting a sentence from its one-hot representation to a controllable smoothed representation. We evaluate text smoothing on different benchmarks in a low-resource regime. Experimental results show that text smoothing outperforms various mainstream data augmentation methods by a substantial margin. Moreover, text smoothing can be combined with those data augmentation methods to achieve better performance.

preprint2020arXiv

Buffer-gas cooling of molecules in the low-density regime: Comparison between simulation and experiment

Cryogenic buffer gas cells have been a workhorse for the cooling of molecules in the last decades. The straightforward sympathetic cooling principle makes them applicable to a huge variety of different species. Notwithstanding this success, detailed simulations of buffer gas cells are rare, and have never been compared to experimental data in the regime of low to intermediate buffer gas densities. Here, we present a numerical approach based on a trajectory analysis, with molecules performing a random walk in the cell due to collisions with a homogeneous buffer gas. This method can reproduce experimental flux and velocity distributions of molecules emerging from the buffer gas cell for varying buffer gas densities. This includes the strong decrease in molecule output from the cell for increasing buffer gas density and the so-called boosting effect, when molecules are accelerated by buffer-gas atoms after leaving the cell. The simulations provide various insights which could substantially improve buffer-gas cell design.

preprint2020arXiv

Data Augmentation for Copy-Mechanism in Dialogue State Tracking

While several state-of-the-art approaches to dialogue state tracking (DST) have shown promising performances on several benchmarks, there is still a significant performance gap between seen slot values (i.e., values that occur in both training set and test set) and unseen ones (values that occur in training set but not in test set). Recently, the copy-mechanism has been widely used in DST models to handle unseen slot values, which copies slot values from user utterance directly. In this paper, we aim to find out the factors that influence the generalization ability of a common copy-mechanism model for DST. Our key observations include: 1) the copy-mechanism tends to memorize values rather than infer them from contexts, which is the primary reason for unsatisfactory generalization performance; 2) greater diversity of slot values in the training set increase the performance on unseen values but slightly decrease the performance on seen values. Moreover, we propose a simple but effective algorithm of data augmentation to train copy-mechanism models, which augments the input dataset by copying user utterances and replacing the real slot values with randomly generated strings. Users could use two hyper-parameters to realize a trade-off between the performances on seen values and unseen ones, as well as a trade-off between overall performance and computational cost. Experimental results on three widely used datasets (WoZ 2.0, DSTC2, and Multi-WoZ 2.0) show the effectiveness of our approach.

preprint2020arXiv

Distilling Knowledge from Pre-trained Language Models via Text Smoothing

This paper studies compressing pre-trained language models, like BERT (Devlin et al.,2019), via teacher-student knowledge distillation. Previous works usually force the student model to strictly mimic the smoothed labels predicted by the teacher BERT. As an alternative, we propose a new method for BERT distillation, i.e., asking the teacher to generate smoothed word ids, rather than labels, for teaching the student model in knowledge distillation. We call this kind of methodTextSmoothing. Practically, we use the softmax prediction of the Masked Language Model(MLM) in BERT to generate word distributions for given texts and smooth those input texts using that predicted soft word ids. We assume that both the smoothed labels and the smoothed texts can implicitly augment the input corpus, while text smoothing is intuitively more efficient since it can generate more instances in one neural network forward step.Experimental results on GLUE and SQuAD demonstrate that our solution can achieve competitive results compared with existing BERT distillation methods.

preprint2020arXiv

Non-uniform continuity of the generalized Camassa-Holm equation in Besov spaces

In this paper, we consider the Cauchy problem for the generalized Camassa-Holm equation proposed by Hakkaev and Kirchev (2005) \cite{Hakkaev 2005}. We prove that the solution map of the generalized Camassa-Holm equation is not uniformly continuous on the initial data in Besov spaces. Our result include the present work (2020) \cite{Li 2020,Li 2020-1} on Camassa-Holm equation with $Q=1$ and extends the previous non-uniform continuity in Sobolev spaces (2015) \cite{Mi 2015} to Besov spaces. In addition, the non-uniform continuity in critical space $B_{2, 1}^{\frac{3}{2}}(\mathbb{R})$ is the first to be considered in our paper.

preprint2020arXiv

Non-uniform dependence on initial data for the Camassa-Holm equation in the critical Besov space

Whether or not the data-to-solution map of the Cauchy problem for the Camassa-Holm equation and Novikov equation in the critical Besov space $B_{2,1}^{3/2}(\R)$ is not uniformly continuous remains open. In the paper, we aim at solving the open question left the previous works in \cite{Li3,Li4} and give a positive answer to this problem.

preprint2020arXiv

TransSent: Towards Generation of Structured Sentences with Discourse Marker

Structured sentences are important expressions in human writings and dialogues. Previous works on neural text generation fused semantic and structural information by encoding the entire sentence into a mixed hidden representation. However, when a generated sentence becomes complicated, the structure is difficult to be properly maintained. To alleviate this problem, we explicitly separate the modeling process of semantic and structural information. Intuitively, humans generate structured sentences by directly connecting discourses with discourse markers (such as and, but, etc.). Therefore, we propose a task that mimics this process, called discourse transfer. This task represents a structured sentence as (head discourse, discourse marker, tail discourse), and aims at tail discourse generation based on head discourse and discourse marker. We also propose a corresponding model called TransSent, which interprets the relationship between two discourses as a translation1 from the head discourse to the tail discourse in the embedding space. We experiment TransSent not only in discourse transfer task but also in free text generation and dialogue generation tasks. Automatic and human evaluation results show that TransSent can generate structured sentences with high quality, and has certain scalability in different tasks.

preprint2019arXiv

Higher curvature corrections to pole-skipping

Recent developments have revealed a new phenomenon, i.e. the residues of the poles of the holographic retarded two point functions of generic operators vanish at certain complex values of the frequency and momentum. This so-called pole-skipping phenomenon can be determined holographically by the near horizon dynamics of the bulk equations of the corresponding fields. In particular, the pole-skipping point in the upper half plane of complex frequency has been shown to be closed related to many-body chaos, while those in the lower half plane also places universal and nontrivial constraints on the two point functions. In this paper, we study the effect of higher curvature corrections, i.e. the stringy correction and Gauss-Bonnet correction, to the (lower half plane) pole-skipping phenomenon for generic scalar, vector, and metric perturbations. We find that at the pole-skipping points, the frequencies $ω_n=-i2πnT$ are not explicitly influenced by both $R^2$ and $R^4$ corrections, while the momenta $k_n$ receive corresponding corrections.

preprint2019arXiv

Room temperature 2D ferromagnetism in few-layered 1$T$-CrTe$_{2}$

Spin-related electronics using two dimensional (2D) van der Waals (vdW) materials as a platform are believed to hold great promise for revolutionizing the next generation spintronics. Although many emerging new phenomena have been unravelled in 2D electronic systems with spin long-range orderings, the scarcely reported room temperature magnetic vdW material has thus far hindered the related applications. Here, we show that intrinsic ferromagnetically aligned spin polarization can hold up to 316 K in a metallic phase of 1$T$-CrTe$_{2}$ in the few-layer limit. This room temperature 2D long range spin interaction may be beneficial from an itinerant enhancement. Spin transport measurements indicate an in-plane room temperature negative anisotropic magnetoresistance (AMR) in few-layered CrTe$_{2}$, but a sign change in the AMR at lower temperature, with -0.6$\%$ at 300 K and +5$\%$ at 10 K, respectively. This behavior may originate from the specific spin polarized band structure of CrTe$_{2}$. Our findings provide insights into magnetism in few-layered CrTe$_{2}$, suggesting potential for future room temperature spintronic applications of such 2D vdW magnets.

preprint2015arXiv

Notes on holographic Schwinger effect

We use the method of evaluating the decay rate in terms of the imaginary part of a probe brane action to study the holographic Schwinger effect. In the confining D3-branes case, we find that the Schwinger effect occurs at energy scales higher than the Kaluza-Klein mass, indicating the absence of such effect when the dual gauge field theory can be regarded as an 2+1 dimensional theory. This property is independent of the configuration of the probe brane. In the case of D3-branes with a B field dual to a noncommutative super Yang-Mills theory, we study how the decay rate is affected by the noncommutative effect.

preprint2014arXiv

Holographic entanglement entropy and thermodynamic instability of planar R-charged black holes

The holographic entanglement entropy of an infinite strip subsystem on the asymptotic AdS boundary is used as a probe to study the thermodynamic instabilities of planar R-charged black holes (or their dual field theories). We focus on the single-charge AdS black holes in $D=5$, which correspond to spinning D3-branes with one non-vanishing angular momentum. Our results show that the holographic entanglement entropy indeed exhibits the thermodynamic instability associated with the divergence of the specific heat. When the width of the strip is large enough, the finite part of the holographic entanglement entropy as a function of the temperature resembles the thermal entropy, as is expected. As the width becomes smaller, however, the two entropies behave differently. In particular, there exists a critical value for the width of the strip, below which the finite part of the holographic entanglement entropy as a function of the temperature develops a self-intersection. We also find similar behavior in the single-charge black holes in $D=4$ and $7$.

preprint2013arXiv

Synthesis and optical properties of large-scale single-crystalline two-dimensional semiconductor WS2 monolayer from chemical vapor deposition

Two-dimensional (2D) transition metal dichalcogenides (TMDs), especially MoS2 and WS2 recently attract extensive attentions due to their rich physics and great potential applications. Superior to graphene, MS2 (M = Mo/W) monolayers have a native direct energy gap in visible frequency range. This promises great future of MS2 for optoelectronics. To exploit properties and further develop more applications, producing large-scale single crystals of MS2 by a facile method is highly demanded. Here, we report the synthesis of large-scale triangular single crystals of WS2 monolayer from a chemical vapor deposition process and systematic optical studies of such WS2 monolayers. The observations of high yield of light emission and valley-selective circular dichroism experimentally evidence the high optical quality of the WS2 monolayers. This work paves the road to fabricate large-scale single crystalline 2D semiconductors and study their fundamentals. It must be very meaningful for exploiting great potentials of WS2 for future optoelectronics.

preprint2012arXiv

Massless Scalar Field Vacuum in de Sitter Spacetime

As a spacetime with compact spatial sections, de Sitter spacetime does not have a de Sitter-invariant ground state for a minimally-coupled massless scalar field that gives definite expectation values for any observables not invariant under constant shifts of the field. However, if one restricts to observables that are shift invariant, as the action is, then there is a unique vacuum state. Here we calculate the shift-invariant four-point function that is the vacuum expectation value of the product of the difference of the field values at one pair of points and of the difference of the field values at a second pair of points. We show that this vacuum expectation value obeys a cluster-decomposition property of vanishing in the limit that the one pair of points is moved arbitrarily far from the other pair. We also calculate the shift-invariant correlation of the gradient of the scalar field at two different points and show that it also obeys a cluster-decomposition property. Possible relevance to a putative de Sitter-invariant quantum state for gravity is discussed.

Xing Wu

What is connected

Connect this record

See the researcher in context

Building this map preview

23 published item(s)

LongBench Pro: A More Realistic and Comprehensive Bilingual Long-Context Evaluation Benchmark

MiLe Loss: a New Entropy-Weighed Loss for Mitigating the Bias of Learning Difficulties in Large Language Models

Put the Space of LoRA Initialization to the Extreme to Preserve Pre-trained Knowledge

Toward Scalable Terminal Task Synthesis via Skill Graphs

ESimCSE: Enhanced Sample Building Method for Contrastive Learning of Unsupervised Sentence Embedding

High-sensitivity low-noise photodetector using large-area silicon photomultiplier

Ill_posedness for a two_component Novikov system in Besov space

Measurement of the H$^3Δ_1$ Radiative Lifetime in ThO

Smoothed Contrastive Learning for Unsupervised Sentence Embedding

Stacked Autoencoder Based Multi-Omics Data Integration for Cancer Survival Prediction

Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks

Buffer-gas cooling of molecules in the low-density regime: Comparison between simulation and experiment

Data Augmentation for Copy-Mechanism in Dialogue State Tracking

Distilling Knowledge from Pre-trained Language Models via Text Smoothing

Non-uniform continuity of the generalized Camassa-Holm equation in Besov spaces

Non-uniform dependence on initial data for the Camassa-Holm equation in the critical Besov space

TransSent: Towards Generation of Structured Sentences with Discourse Marker

Higher curvature corrections to pole-skipping

Room temperature 2D ferromagnetism in few-layered 1$T$-CrTe$_{2}$

Notes on holographic Schwinger effect

Holographic entanglement entropy and thermodynamic instability of planar R-charged black holes

Synthesis and optical properties of large-scale single-crystalline two-dimensional semiconductor WS2 monolayer from chemical vapor deposition

Massless Scalar Field Vacuum in de Sitter Spacetime