Researcher profile

Jesús Sánchez Cuadrado

Jesús Sánchez Cuadrado contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
6topics
3close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2022arXiv

AST-Probe: Recovering abstract syntax trees from hidden representations of pre-trained language models

The objective of pre-trained language models is to learn contextual representations of textual data. Pre-trained language models have become mainstream in natural language processing and code modeling. Using probes, a technique to study the linguistic properties of hidden vector spaces, previous works have shown that these pre-trained language models encode simple linguistic properties in their hidden representations. However, none of the previous work assessed whether these models encode the whole grammatical structure of a programming language. In this paper, we prove the existence of a syntactic subspace, lying in the hidden representations of pre-trained language models, which contain the syntactic information of the programming language. We show that this subspace can be extracted from the models' representations and define a novel probing method, the AST-Probe, that enables recovering the whole abstract syntax tree (AST) of an input code snippet. In our experimentations, we show that this syntactic subspace exists in five state-of-the-art pre-trained language models. In addition, we highlight that the middle layers of the models are the ones that encode most of the AST information. Finally, we estimate the optimal size of this syntactic subspace and show that its dimension is substantially lower than those of the models' representation spaces. This suggests that pre-trained language models use a small part of their representation spaces to encode syntactic information of the programming languages.

preprint2020arXiv

MAR: A structure-based search engine for models

The availability of shared software models provides opportunities for reusing, adapting and learning from them. Public models are typically stored in a variety of locations, including model repositories, regular source code repositories, web pages, etc. To profit from them developers need effective search mechanisms to locate the models relevant for their tasks. However, to date, there has been little success in creating a generic and efficient search engine specially tailored to the modelling domain. In this paper we present MAR, a search engine for models. MAR is generic in the sense that it can index any type of model if its meta-model is known. MAR uses a query-by-example approach, that is, it uses example models as queries. The search takes the model structure into account using the notion of bag of paths, which encodes the structure of a model using paths between model elements and is a representation amenable for indexing. MAR is built over HBase using a specific design to deal with large repositories. Our benchmarks show that the engine is efficient and has fast response times in most cases. We have also evaluated the precision of the search engine by creating model mutants which simulate user queries. A REST API is available to perform queries and an Eclipse plug-in allows end users to connect to the search engine from model editors. We have currently indexed more than 50.000 models of different kinds, including Ecore meta-models, BPMN diagrams and UML models. MAR is available at http://mar-search.org.