Source author record

Alexander Panchenko

Alexander Panchenko appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language physics.comp-ph cond-mat.soft cond-mat.dis-nn math-ph math.CA math.MP math.NA Artificial Intelligence Biological Physics physics.flu-dyn cond-mat.stat-mech Information Retrieval Machine Learning math.AP

Catalog footprint

What is connected

21works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Active Learning for Abstractive Text Summarization

Construction of human-curated annotated datasets for abstractive text summarization (ATS) is very time-consuming and expensive because creating each instance requires a human annotator to read a long document and compose a shorter summary that would preserve the key information relayed by the original document. Active Learning (AL) is a technique developed to reduce the amount of annotation required to achieve a certain level of machine learning model performance. In information extraction and text classification, AL can reduce the amount of labor up to multiple times. Despite its potential for aiding expensive annotation, as far as we know, there were no effective AL query strategies for ATS. This stems from the fact that many AL strategies rely on uncertainty estimation, while as we show in our work, uncertain instances are usually noisy, and selecting them can degrade the model performance compared to passive annotation. We address this problem by proposing the first effective query strategy for AL in ATS based on diversity principles. We show that given a certain annotation budget, using our strategy in AL annotation helps to improve the model performance in terms of ROUGE and consistency scores. Additionally, we analyze the effect of self-learning and show that it can further increase the performance of the model.

preprint2022arXiv

Beyond Plain Toxic: Detection of Inappropriate Statements on Flammable Topics for the Russian Language

Toxicity on the Internet, such as hate speech, offenses towards particular users or groups of people, or the use of obscene words, is an acknowledged problem. However, there also exist other types of inappropriate messages which are usually not viewed as toxic, e.g. as they do not contain explicit offences. Such messages can contain covered toxicity or generalizations, incite harmful actions (crime, suicide, drug use), provoke "heated" discussions. Such messages are often related to particular sensitive topics, e.g. on politics, sexual minorities, social injustice which more often than other topics, e.g. cars or computing, yield toxic emotional reactions. At the same time, clearly not all messages within such flammable topics are inappropriate. Towards this end, in this work, we present two text collections labelled according to binary notion of inapropriateness and a multinomial notion of sensitive topic. Assuming that the notion of inappropriateness is common among people of the same culture, we base our approach on human intuitive understanding of what is not acceptable and harmful. To objectivise the notion of inappropriateness, we define it in a data-driven way though crowdsourcing. Namely we run a large-scale annotation study asking workers if a given chatbot textual statement could harm reputation of a company created it. Acceptably high values of inter-annotator agreement suggest that the notion of inappropriateness exists and can be uniformly understood by different people. To define the notion of sensitive topics in an objective way we use on guidelines suggested commonly by specialists of legal and PR department of a large public company as potentially harmful.

preprint2022arXiv

Error syntax aware augmentation of feedback comment generation dataset

This paper presents a solution to the GenChal 2022 shared task dedicated to feedback comment generation for writing learning. In terms of this task given a text with an error and a span of the error, a system generates an explanatory note that helps the writer (language learner) to improve their writing skills. Our solution is based on fine-tuning the T5 model on the initial dataset augmented according to syntactical dependencies of the words located within indicated error span. The solution of our team "nigula" obtained second place according to manual evaluation by the organizers.

preprint2022arXiv

Exploring Cross-lingual Textual Style Transfer with Large Multilingual Language Models

Detoxification is a task of generating text in polite style while preserving meaning and fluency of the original toxic text. Existing detoxification methods are designed to work in one exact language. This work investigates multilingual and cross-lingual detoxification and the behavior of large multilingual models like in this setting. Unlike previous works we aim to make large language models able to perform detoxification without direct fine-tuning in given language. Experiments show that multilingual models are capable of performing multilingual style transfer. However, models are not able to perform cross-lingual detoxification and direct fine-tuning on exact language is inevitable.

preprint2022arXiv

MEKER: Memory Efficient Knowledge Embedding Representation for Link Prediction and Question Answering

Knowledge Graphs (KGs) are symbolically structured storages of facts. The KG embedding contains concise data used in NLP tasks requiring implicit information about the real world. Furthermore, the size of KGs that may be useful in actual NLP assignments is enormous, and creating embedding over it has memory cost issues. We represent KG as a 3rd-order binary tensor and move beyond the standard CP decomposition by using a data-specific generalized version of it. The generalization of the standard CP-ALS algorithm allows obtaining optimization gradients without a backpropagation mechanism. It reduces the memory needed in training while providing computational benefits. We propose a MEKER, a memory-efficient KG embedding model, which yields SOTA-comparable performance on link prediction tasks and KG-based Question Answering.

preprint2022arXiv

Neural Entity Linking: A Survey of Models Based on Deep Learning

This survey presents a comprehensive description of recent neural entity linking (EL) systems developed since 2015 as a result of the "deep learning revolution" in natural language processing. Its goal is to systemize design features of neural entity linking systems and compare their performance to the remarkable classic methods on common benchmarks. This work distills a generic architecture of a neural EL system and discusses its components, such as candidate generation, mention-context encoding, and entity ranking, summarizing prominent methods for each of them. The vast variety of modifications of this general architecture are grouped by several common themes: joint entity mention detection and disambiguation, models for global linking, domain-independent techniques including zero-shot and distant supervision methods, and cross-lingual approaches. Since many neural models take advantage of entity and mention/context embeddings to represent their meaning, this work also overviews prominent entity embedding techniques. Finally, the survey touches on applications of entity linking, focusing on the recently emerged use-case of enhancing deep pre-trained masked language models based on the Transformer architecture.

preprint2022arXiv

RuArg-2022: Argument Mining Evaluation

Argumentation analysis is a field of computational linguistics that studies methods for extracting arguments from texts and the relationships between them, as well as building argumentation structure of texts. This paper is a report of the organizers on the first competition of argumentation analysis systems dealing with Russian language texts within the framework of the Dialogue conference. During the competition, the participants were offered two tasks: stance detection and argument classification. A corpus containing 9,550 sentences (comments on social media posts) on three topics related to the COVID-19 pandemic (vaccination, quarantine, and wearing masks) was prepared, annotated, and used for training and testing. The system that won the first place in both tasks used the NLI (Natural Language Inference) variant of the BERT architecture, automatic translation into English to apply a specialized BERT model, retrained on Twitter posts discussing COVID-19, as well as additional masking of target entities. This system showed the following results: for the stance detection task an F1-score of 0.6968, for the argument classification task an F1-score of 0.7404. We hope that the prepared dataset and baselines will help to foster further research on argument mining for the Russian language.

preprint2022arXiv

Studying the role of named entities for content preservation in text style transfer

Text style transfer techniques are gaining popularity in Natural Language Processing, finding various applications such as text detoxification, sentiment, or formality transfer. However, the majority of the existing approaches were tested on such domains as online communications on public platforms, music, or entertainment yet none of them were applied to the domains which are typical for task-oriented production systems, such as personal plans arrangements (e.g. booking of flights or reserving a table in a restaurant). We fill this gap by studying formality transfer in this domain. We noted that the texts in this domain are full of named entities, which are very important for keeping the original sense of the text. Indeed, if for example, someone communicates the destination city of a flight it must not be altered. Thus, we concentrate on the role of named entities in content preservation for formality text style transfer. We collect a new dataset for the evaluation of content similarity measures in text style transfer. It is taken from a corpus of task-oriented dialogues and contains many important entities related to realistic requests that make this dataset particularly useful for testing style transfer models before using them in production. Besides, we perform an error analysis of a pre-trained formality transfer model and introduce a simple technique to use information about named entities to enhance the performance of baseline content similarity measures used in text style transfer.

preprint2022arXiv

Taxonomy Enrichment with Text and Graph Vector Representations

Knowledge graphs such as DBpedia, Freebase or Wikidata always contain a taxonomic backbone that allows the arrangement and structuring of various concepts in accordance with the hypo-hypernym ("class-subclass") relationship. With the rapid growth of lexical resources for specific domains, the problem of automatic extension of the existing knowledge bases with new words is becoming more and more widespread. In this paper, we address the problem of taxonomy enrichment which aims at adding new words to the existing taxonomy. We present a new method that allows achieving high results on this task with little effort. It uses the resources which exist for the majority of languages, making the method universal. We extend our method by incorporating deep representations of graph structures like node2vec, Poincaré embeddings, GCN etc. that have recently demonstrated promising results on various NLP tasks. Furthermore, combining these representations with word embeddings allows us to beat the state of the art. We conduct a comprehensive study of the existing approaches to taxonomy enrichment based on word and graph vector representations and their fusion approaches. We also explore the ways of using deep learning architectures to extend the taxonomic backbones of knowledge graphs. We create a number of datasets for taxonomy extension for English and Russian. We achieve state-of-the-art results across different datasets and provide an in-depth error analysis of mistakes.

preprint2021arXiv

Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty Estimates

Annotating training data for sequence tagging of texts is usually very time-consuming. Recent advances in transfer learning for natural language processing in conjunction with active learning open the possibility to significantly reduce the necessary annotation budget. We are the first to thoroughly investigate this powerful combination for the sequence tagging task. We conduct an extensive empirical study of various Bayesian uncertainty estimation methods and Monte Carlo dropout options for deep pre-trained models in the active learning framework and find the best combinations for different types of models. Besides, we also demonstrate that to acquire instances during active learning, a full-size Transformer can be substituted with a distilled version, which yields better computational performance and reduces obstacles for applying deep active learning in practice.

preprint2021arXiv

Detecting Inappropriate Messages on Sensitive Topics that Could Harm a Company's Reputation

Not all topics are equally "flammable" in terms of toxicity: a calm discussion of turtles or fishing less often fuels inappropriate toxic dialogues than a discussion of politics or sexual minorities. We define a set of sensitive topics that can yield inappropriate and toxic messages and describe the methodology of collecting and labeling a dataset for appropriateness. While toxicity in user-generated data is well-studied, we aim at defining a more fine-grained notion of inappropriateness. The core of inappropriateness is that it can harm the reputation of a speaker. This is different from toxicity in two respects: (i) inappropriateness is topic-related, and (ii) inappropriate message is not toxic but still unacceptable. We collect and release two datasets for Russian: a topic-labeled dataset and an appropriateness-labeled dataset. We also release pre-trained classification models trained on this data.

preprint2020arXiv

A Comparative Study of Lexical Substitution Approaches based on Neural Language Models

Lexical substitution in context is an extremely powerful technology that can be used as a backbone of various NLP applications, such as word sense induction, lexical relation extraction, data augmentation, etc. In this paper, we present a large-scale comparative study of popular neural language and masked language models (LMs and MLMs), such as context2vec, ELMo, BERT, XLNet, applied to the task of lexical substitution. We show that already competitive results achieved by SOTA LMs/MLMs can be further improved if information about the target word is injected properly, and compare several target injection methods. In addition, we provide analysis of the types of semantic relations between the target and substitutes generated by different models providing insights into what kind of words are really generated or given by annotators as substitutes.

preprint2020arXiv

RUSSE'2020: Findings of the First Taxonomy Enrichment Task for the Russian language

This paper describes the results of the first shared task on taxonomy enrichment for the Russian language. The participants were asked to extend an existing taxonomy with previously unseen words: for each new word their systems should provide a ranked list of possible (candidate) hypernyms. In comparison to the previous tasks for other languages, our competition has a more realistic task setting: new words were provided without definitions. Instead, we provided a textual corpus where these new terms occurred. For this evaluation campaign, we developed a new evaluation dataset based on unpublished RuWordNet data. The shared task features two tracks: "nouns" and "verbs". 16 teams participated in the task demonstrating high results with more than half of them outperforming the provided baseline.

preprint2020arXiv

Word Sense Disambiguation for 158 Languages using Word Embeddings Only

Disambiguation of word senses in context is easy for humans, but is a major challenge for automatic approaches. Sophisticated supervised and knowledge-based models were developed to solve this task. However, (i) the inherent Zipfian distribution of supervised training instances for a given word and/or (ii) the quality of linguistic knowledge representations motivate the development of completely unsupervised and knowledge-free approaches to word sense disambiguation (WSD). They are particularly useful for under-resourced languages which do not have any resources for building either supervised and/or knowledge-based models. In this paper, we present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory, which can be used for disambiguation in context. We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings by Grave et al. (2018), enabling WSD in these languages. Models and system are available online.

preprint2014arXiv

Kinetic equation for spatially averaged molecular dynamics

We obtain a kinetic description of spatially averaged dynamics of particle systems. Spatial averaging is one of the three types of averaging relevant within the Irwing-Kirkwood procedure (IKP), a general method for deriving macroscopic equations from molecular models. The other two types, ensemble averaging and time averaging, have been extensively studied, while spatial averaging is relatively less understood. We show that the average density, linear momentum, and kinetic energy used in IKP can be obtained from a single average quantity, called the generating function. A kinetic equation for the generating function is obtained and tested numerically on Lennard-Jones oscillator chains.

preprint2014arXiv

Motility versus fluctuations in mixtures of self-propelled and passive agents

Many biological systems consist of self-motile and passive agents both of which contribute to overall functionality. However, there are very few studies of the properties of such mixtures. Here we formulate a model for mixtures of self-motile and passive agents and show that the model gives rise to three different dynamical phases: a disordered mesoturbulent phase, a polar flocking phase, and a vortical phase characterized by large-scale counterrotating vortices. We use numerical simulations to construct a phase diagram and compare the statistical properties of the different phases of the model with self-motile bacterial suspensions. Our findings afford specific insights regarding the interaction of microorganisms and passive particles and provide novel strategic guidance for efficient technological realizations of artificial active matter.

preprint2013arXiv

Effective models for nematic liquid crystals composites with ferromagnetic inclusions

Molecules of a nematic liquid crystal respond to an applied magnetic field by reorienting themselves in the direction of the field. Since the dielectric anisotropy of a nematic is small, it takes relatively large fields to elicit a significant liquid crystal response. The interaction may be enhanced in colloidal suspensions of ferromagnetic particles in a liquid crystalline matrix---ferronematics--- as proposed by Brochard and de Gennes in 1970. The ability of these particles to align with the field and, simultaneously, cause reorientation of the nematic molecules, greatly increases the magnetic response of the mixture. Essentially the particles provide an easy axis of magnetization that interacts with the liquid crystal via surface anchoring. We derive an expression for the effective energy of ferronematic in the dilute limit, that is, when the number of particles tends to infinity while their total volume fraction tends to zero. The total energy of the mixture is assumed to be the sum of the bulk elastic liquid crystal contribution, the anchoring energy of the liquid crystal on the surfaces of the particles, and the magnetic energy of interaction between the particles and the applied magnetic field. The homogenized limiting ferronematic energy is obtained rigorously using a variational approach. It generalizes formal expressions previously reported in a physical literature.

preprint2013arXiv

Optimizing performance of the deconvolution model reduction for large ODE systems

We investigate the numerical performance of the regularized deconvolution closure introduced recently by the authors. The purpose of the closure is to furnish constitutive equations for Irwing-Kirkwood-Noll procedure, a well known method for deriving continuum balance equations from the Newton's equations of particle dynamics. A version of this procedure used in the paper relies on spatial averaging developed by Hardy, and independently by Murdoch and Bedeaux. The constitutive equations for the stress are given as a sum of several operator terms acting on the mesoscale average density and velocity. Each term is a "convolution sandwich" containing the deconvolution operator, a composition or a product operator, and the convolution (averaging) operator. Deconvolution is constructed using filtered regularization methods from the theory of ill-posed problems. The purpose of regularization is to ensure numerical stability. The particular technique used for numerical experiments is truncated singular value decomposition (SVD). The accuracy of the constitutive equations depends on several parameters: the choice of the averaging window function, the value of the mesoscale resolution parameter, scale separation, the level of truncation of singular values, and the level of spectral filtering of the averages. We conduct numerical experiments to determine the effect of each parameter on the accuracy and efficiency of the method. Partial error estimates are also obtained.

preprint2013arXiv

Particle-based simulations of self-motile suspensions

A simple model for simulating flows of active suspensions is investigated. The approach is based on dissipative particle dynamics. While the model is potentially applicable to a wide range of self-propelled particle systems, the specific class of self-motile bacterial suspensions is considered as a modeling scenario. To mimic the rod-like geometry of a bacterium, two dissipative particle dynamics particles are connected by a stiff harmonic spring to form an aggregate dissipative particle dynamics molecule. Bacterial motility is modeled through a constant self-propulsion force applied along the axis of each such aggregate molecule. The model accounts for hydrodynamic interactions between self-propelled agents through the pairwise dissipative interactions conventional to dissipative particle dynamics. Numerical simulations are performed using a customized version of the open-source LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) software package. Detailed studies of the influence of agent concentration, pairwise dissipative interactions, and Stokes friction on the statistics of the system are provided. The simulations are used to explore the influence of hydrodynamic interactions in active suspensions. For high agent concentrations in combination with dominating pairwise dissipative forces, strongly correlated motion patterns and a fluid-like spectral distributions of kinetic energy are found. In contrast, systems dominated by Stokes friction exhibit weaker spatial correlations of the velocity field. These results indicate that hydrodynamic interactions may play an important role in the formation of spatially extended structures in active suspensions.

preprint2011arXiv

Deconvolution closure for mesoscopic continuum models of particle systems

The paper introduces a general framework for derivation of continuum equations governing meso-scale dynamics of large particle systems. The balance equations for spatial averages such as density, linear momentum, and energy were previously derived by a number of authors. These equations are not in closed form because the stress and the heat flux cannot be evaluated without the knowledge of particle positions and velocities. We propose a closure method for approximating fluxes in terms of other meso-scale averages. The main idea is to rewrite the non-linear averages as linear convolutions that relate micro- and meso-scale dynamical functions. The convolutions can be approximately inverted using regularization methods developed for solving ill-posed problems. This yields closed form constitutive equations that can be evaluated without solving the underlying ODEs. We test the method numerically on Fermi-Pasta-Ulam chains with two different potentials: the classical Lennard-Jones, and the purely repulsive potential used in granular materials modeling. The initial conditions incorporate velocity fluctuations on scales that are smaller than the size of the averaging window. The results show very good agreement between the exact stress and its closed form approximation.

preprint2010arXiv

Closure method for spatially averaged dynamics of particle chains

We study the closure problem for continuum balance equations that model mesoscale dynamics of large ODE systems. The underlying microscale model consists of classical Newton equations of particle dynamics. As a mesoscale model we use the balance equations for spatial averages obtained earlier by a number of authors: Murdoch and Bedeaux, Hardy, Noll and others. The momentum balance equation contains a flux (stress), which is given by an exact function of particle positions and velocities. We propose a method for approximating this function by a sequence of operators applied to average density and momentum. The resulting approximate mesoscopic models are systems in closed form. The closed from property allows one to work directly with the mesoscale equaitons without the need to calculate underlying particle trajectories, which is useful for modeling and simulation of large particle systems. The proposed closure method utilizes the theory of ill-posed problems, in particular iterative regularization methods for solving first order linear integral equations. The closed from approximations are obtained in two steps. First, we use Landweber regularization to (approximately) reconstruct the interpolants of relevant microscale quantitites from the average density and momentum. Second, these reconstructions are substituted into the exact formulas for stress. The developed general theory is then applied to non-linear oscillator chains. We conduct a detailed study of the simplest zero-order approximation, and show numerically that it works well as long as fluctuations of velocity are nearly constant.

Alexander Panchenko

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

Active Learning for Abstractive Text Summarization

Beyond Plain Toxic: Detection of Inappropriate Statements on Flammable Topics for the Russian Language

Error syntax aware augmentation of feedback comment generation dataset

Exploring Cross-lingual Textual Style Transfer with Large Multilingual Language Models

MEKER: Memory Efficient Knowledge Embedding Representation for Link Prediction and Question Answering

Neural Entity Linking: A Survey of Models Based on Deep Learning

RuArg-2022: Argument Mining Evaluation

Studying the role of named entities for content preservation in text style transfer

Taxonomy Enrichment with Text and Graph Vector Representations

Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty Estimates

Detecting Inappropriate Messages on Sensitive Topics that Could Harm a Company's Reputation

A Comparative Study of Lexical Substitution Approaches based on Neural Language Models

RUSSE'2020: Findings of the First Taxonomy Enrichment Task for the Russian language

Word Sense Disambiguation for 158 Languages using Word Embeddings Only

Kinetic equation for spatially averaged molecular dynamics

Motility versus fluctuations in mixtures of self-propelled and passive agents

Effective models for nematic liquid crystals composites with ferromagnetic inclusions

Optimizing performance of the deconvolution model reduction for large ODE systems

Particle-based simulations of self-motile suspensions

Deconvolution closure for mesoscopic continuum models of particle systems

Closure method for spatially averaged dynamics of particle chains