Source author record

Saikat Chakraborty

Saikat Chakraborty appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Software Engineering cond-mat.stat-mech Artificial Intelligence Machine Learning cond-mat.soft gr-qc Programming Languages math-ph math.MP

Catalog footprint

What is connected

12works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A note on the dynamical system formulations in $f(R)$ gravity

A number of dynamical system formulations have been proposed over the last few years to analyse cosmological solutions in $f(R)$ gravity. The aim of this article is to provide a brief introduction to the different approaches, presenting them in chronological order as they appeared in the history of the relevant scientific literature. In this way, we illuminate how the shortcoming(s) of an existing formulation encouraged the development of an alternative formulation. Whenever possible, a 2-dimensional phase portrait is given for a better visual representation of the dynamics of phase space. We also touch upon how cosmological perturbations can be analyzed using the phase space language.

preprint2022arXiv

Can $f(R)$ gravity isotropize a pre-bounce contracting universe?

We address the important issue of isotropisation of a pre-bounce contracting phase in $f(R)$ gravity, which would be relevant to construct any viable nonsingular bouncing scenario in $f(R)$ gravity. The main motivation behind this work is to investigate whether the $f(R)$ gravity, by itself, can isotropise a contracting universe starting initially with small anisotropy without incorporating a super-stiff or non-ideal fluid, that is impossible in general relativity. Considering Bianchi I cosmology and employing a dynamical system analysis, we see that this is not possible for $R^n$ ($n>1$) and $R+αR^2$ ($α>0$) theory, but possible for $\frac{1}αe^{αR}$ ($α>0$) theory. On the other hand, if one does not specify an $f(R)$ theory a priori but demands a cosmology smoothly connecting an ekpyrotic contraction phase to a nonsingular bounce, the ekpyrotic phase \emph{may} not fulfil the condition for isotropisation and physically viability simultaneously.

preprint2022arXiv

NatGen: Generative pre-training by "Naturalizing" source code

Pre-trained Generative Language models (e.g. PLBART, CodeT5, SPT-Code) for source code yielded strong results on several tasks in the past few years, including code generation and translation. These models have adopted varying pre-training objectives to learn statistics of code construction from very large-scale corpora in a self-supervised fashion; the success of pre-trained models largely hinges on these pre-training objectives. This paper proposes a new pre-training objective, "Naturalizing" of source code, exploiting code's bimodal, dual-channel (formal & natural channels) nature. Unlike natural language, code's bimodal, dual-channel nature allows us to generate semantically equivalent code at scale. We introduce six classes of semantic preserving transformations to introduce un-natural forms of code, and then force our model to produce more natural original programs written by developers. Learning to generate equivalent, but more natural code, at scale, over large corpora of open-source code, without explicit manual supervision, helps the model learn to both ingest & generate code. We fine-tune our model in three generative Software Engineering tasks: code generation, code translation, and code refinement with limited human-curated labeled data and achieve state-of-the-art performance rivaling CodeT5. We show that our pre-trained model is especially competitive at zero-shot and few-shot learning, and better at learning code properties (e.g., syntax, data flow).

preprint2022arXiv

Towards Learning (Dis)-Similarity of Source Code from Program Contrasts

Understanding the functional (dis)-similarity of source code is significant for code modeling tasks such as software vulnerability and code clone detection. We present DISCO(DIS-similarity of COde), a novel self-supervised model focusing on identifying (dis)similar functionalities of source code. Different from existing works, our approach does not require a huge amount of randomly collected datasets. Rather, we design structure-guided code transformation algorithms to generate synthetic code clones and inject real-world security bugs, augmenting the collected datasets in a targeted way. We propose to pre-train the Transformer model with such automatically generated program contrasts to better identify similar code in the wild and differentiate vulnerable programs from benign ones. To better capture the structural features of source code, we propose a new cloze objective to encode the local tree-based context (e.g., parents or sibling nodes). We pre-train our model with a much smaller dataset, the size of which is only 5% of the state-of-the-art models' training datasets, to illustrate the effectiveness of our data augmentation and the pre-training approach. The evaluation shows that, even with much less data, DISCO can still outperform the state-of-the-art models in vulnerability and code clone detection tasks.

preprint2020arXiv

A Transformer-based Approach for Source Code Summarization

Generating a readable summary that describes the functionality of a program is known as source code summarization. In this task, learning code representation by modeling the pairwise relationship between code tokens to capture their long-range dependencies is crucial. To learn code representation for summarization, we explore the Transformer model that uses a self-attention mechanism and has shown to be effective in capturing long-range dependencies. In this work, we show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin. We perform extensive analysis and ablation studies that reveal several important findings, e.g., the absolute encoding of source code tokens' position hinders, while relative encoding significantly improves the summarization performance. We have made our code publicly available to facilitate future research.

preprint2020arXiv

CODIT: Code Editing with Tree-Based Neural Models

The way developers edit day-to-day code tends to be repetitive, often using existing code elements. Many researchers have tried to automate repetitive code changes by learning from specific change templates which are applied to limited scope. The advancement of deep neural networks and the availability of vast open-source evolutionary data opens up the possibility of automatically learning those templates from the wild. However, deep neural network based modeling for code changes and code in general introduces some specific problems that needs specific attention from research community. For instance, compared to natural language, source code vocabulary can be significantly larger. Further, good changes in code do not break its syntactic structure. Thus, deploying state-of-the-art neural network models without adapting the methods to the source code domain yields sub-optimal results. To this end, we propose a novel tree-based neural network system to model source code changes and learn code change patterns from the wild. Specifically, we propose a tree-based neural machine translation model to learn the probability distribution of changes in code. We realize our model with a change suggestion engine, CODIT, and train the model with more than 24k real-world changes and evaluate it on 5k patches. Our evaluation shows the effectiveness of CODITin learning and suggesting patches. CODIT can also learn specific bug fix pattern from bug fixing patches and can fix 25 bugs out of 80 bugs in Defects4J.

preprint2020arXiv

Deep Learning based Vulnerability Detection: Are We There Yet?

Automated detection of software vulnerabilities is a fundamental problem in software security. Existing program analysis techniques either suffer from high false positives or false negatives. Recent progress in Deep Learning (DL) has resulted in a surge of interest in applying DL for automated vulnerability detection. Several recent studies have demonstrated promising results achieving an accuracy of up to 95% at detecting vulnerabilities. In this paper, we ask, "how well do the state-of-the-art DL-based techniques perform in a real-world vulnerability prediction scenario?". To our surprise, we find that their performance drops by more than 50%. A systematic investigation of what causes such precipitous performance drop reveals that existing DL-based vulnerability prediction approaches suffer from challenges with the training data (e.g., data duplication, unrealistic distribution of vulnerable classes, etc.) and with the model choices (e.g., simple token-based models). As a result, these approaches often do not learn features related to the actual cause of the vulnerabilities. Instead, they learn unrelated artifacts from the dataset (e.g., specific variable/function names, etc.). Leveraging these empirical findings, we demonstrate how a more principled approach to data collection and model design, based on realistic settings of vulnerability prediction, can lead to better solutions. The resulting tools perform significantly better than the studied baseline: up to 33.57% boost in precision and 128.38% boost in recall compared to the best performing model in the literature. Overall, this paper elucidates existing DL-based vulnerability prediction systems' potential issues and draws a roadmap for future DL-based vulnerability prediction research. In that spirit, we make available all the artifacts supporting our results: https://git.io/Jf6IA.

preprint2020arXiv

Relaxation in a Phase-separating Two-dimensional Active Matter System with Alignment Interaction

Via computer simulations we study kinetics of pattern formation in a two-dimensional active matter system. Self-propulsion in our model is incorporated via the Vicsek-like activity, i.e., particles have the tendency of aligning their velocities with the average directions of motion of their neighbors. In addition to this dynamic or active interaction, there exists passive inter-particle interaction in the model for which we have chosen the standard Lennard-Jones form. Following quenches of homogeneous configurations to a point deep inside the region of coexistence between high and low density phases, as the systems exhibit formation and evolution of particle-rich clusters, we investigate properties related to the morphology, growth and aging. A focus of our study is on the understanding of the effects of structure on growth and aging. To quantify the latter we use the two-time order-parameter autocorrelation function. This correlation, as well as the growth, is observed to follow power-law time dependence, qualitatively similar to the scaling behavior reported for passive systems. The values of the exponents have been estimated and discussed by comparing with the previously obtained numbers for other dimensions as well as with the new results for the passive limit of the considered model. We have also presented results on the effects of temperature on the activity mediated phase separation.

preprint2019arXiv

Initial Correlation Dependence of Aging in Phase Separating Solid Binary Mixtures and Ordering Ferromagnets

Following quenches of initial configurations having long range spatial correlations, prepared at the demixing critical point, to points inside the miscibility gap, we study aging phenomena in solid binary mixtures. Results on the decay of the two-time order-parameter autocorrelation functions, obtained from Monte Carlo simulations of the two-dimensional Ising model, with Kawasaki exchange kinetics, are analyzed via state-of-the art methods. The outcome is compared with that obtained for the ordering in uniaxial ferromagnets. For the latter, we have performed Monte Carlo simulations of the same model using the Glauber mechanism. For both types of systems we provide comparative discussion of our results with reference to those concerning quenches with configurations having no spatial correlation. We also discuss the role of structure on the decay of these correlations.

preprint2016arXiv

Fractality in Persistence Decay and Domain Growth during Ferromagnetic Ordering: Dependence upon initial correlation

Dynamics of ordering in Ising model, following quench to zero temperature, have been studied via Glauber spin-flip Monte Carlo simulations in space dimensions $d=2$ and $3$. One of the primary objectives has been to understand phenomena associated with the persistent spins, viz., time decay in the number of unaffected spins, growth of the corresponding pattern and its fractal dimensionality, for varying correlation length in the initial configurations, prepared at different temperatures, at and above the critical value. It is observed that the fractal dimensionality and the exponent describing the power-law decay of persistence probability are strongly dependent upon the relative values of nonequilibrium domain size and the initial equilibrium correlation length. Via appropriate scaling analyses, these quantities have been estimated for quenches from infinite and critical temperatures. The above mentioned dependence is observed to be less pronounced in higher dimension. In addition to these findings for the local persistence, we present results for the global persistence as well. Further, important observations on the standard domain growth problem are reported. For the latter, a controversy in $d=3$, related to the value of the exponent for the power-law growth of the average domain size with time, has been resolved.

preprint2015arXiv

Persistence in Ferromagnetic Ordering: Dependence upon initial configuration

We study the dynamics of ordering in ferromagnets via Monte Carlo simulations of the Ising model, employing the Glauber spin-flip mechanism, in space dimensions $d=2$ and $3$. Results for the persistence probability and the domain growth are discussed for quenches to various temperatures ($T_f$) below the critical one ($T_{c}$), from different initial temperatures $T_{i} \geq T_{c}$. In long time limit, for $T_{i} > T_{c}$, the persistence probability exhibits power-law decay with exponents $θ\simeq 0.22$ and $\simeq 0.18$ in $d=2$ and $3$, respectively. For finite $T_i$, the early time behavior is a different power-law whose life-time diverges and exponent decreases as $T_{i} \rightarrow T_{c}$. The crossover length between the two steps diverges as the equilibrium correlation length. $T_i=T_c$ is expected to provide a {\it{new universality class}} for which we obtain $θ\simeq 0.035$ in $d=2$ and $\simeq 0.10$ in $d=3$. The time dependence of the average domain size $\ell$, however, is observed to be rather insensitive to the choice of $T_i$.

preprint1999arXiv

Analysis of radial segregation of granular mixtures in a rotating drum

This paper considers the segregation of a granular mixture in a rotating drum. Extending a recent kinematic model for grain transport on sandpile surfaces to the case of rotating drums, an analysis is presented for radial segregation in the rolling regime, where a thin layer is avalanching down while the rest of the material follows rigid body rotation. We argue that segregation is driven not just by differences in the angle of repose of the species, as has been assumed in earlier investigations, but also by differences in the size and surface properties of the grains. The cases of grains differing only in size (slightly or widely) and only in surface properties are considered, and the predictions are in qualitative agreement with observations. The model yields results inconsistent with the assumptions for more general cases, and we speculate on how this may be corrected.

Saikat Chakraborty

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

A note on the dynamical system formulations in $f(R)$ gravity

Can $f(R)$ gravity isotropize a pre-bounce contracting universe?

NatGen: Generative pre-training by "Naturalizing" source code

Towards Learning (Dis)-Similarity of Source Code from Program Contrasts

A Transformer-based Approach for Source Code Summarization

CODIT: Code Editing with Tree-Based Neural Models

Deep Learning based Vulnerability Detection: Are We There Yet?

Relaxation in a Phase-separating Two-dimensional Active Matter System with Alignment Interaction

Initial Correlation Dependence of Aging in Phase Separating Solid Binary Mixtures and Ordering Ferromagnets

Fractality in Persistence Decay and Domain Growth during Ferromagnetic Ordering: Dependence upon initial correlation

Persistence in Ferromagnetic Ordering: Dependence upon initial configuration

Analysis of radial segregation of granular mixtures in a rotating drum