Researcher profile

Vince Grolmusz

Vince Grolmusz contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - Emerging
24works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

24 published item(s)

preprint2022arXiv

Succinct Amyloid and Non-Amyloid Patterns in Hexapeptides

Hexapeptides are widely applied as a model system for studying amyloid-forming properties of polypeptides, including proteins. Recently, large experimental databases have become publicly available with amyloidogenic labels. Using these datasets for training and testing purposes, one may build artificial intelligence (AI)-based classifiers for predicting the amyloid state of peptides. In our previous work (Biomolecules, 11(4) 500, (2021)) we described the Support Vector Machine (SVM)-based Budapest Amyloid Predictor (\url{https://pitgroup.org/bap}). Here we apply the Budapest Amyloid Predictor for discovering numerous amyloidogenic and non-amyloidogenic hexapeptide patterns with accuracy between 80\% and 84\%, as surprising and succinct novel rules for further understanding the amyloid state of peptides. For example, we have shown that for any independently mutated residue (position marked by ``x''), the patterns CxFLWx, FxFLFx, or xxIVIV are predicted to be amyloidogenic, while those of PxDxxx, xxKxEx, and xxPQxx non-amyloidogenic at all. We note that each amyloidogenic pattern with two x's (e.g.,CxFLWx) describes succinctly $20^2=400$ hexapeptides, while the non-amyloidogenic patterns comprising four point mutations (e.g.,PxDxxx) gives $20^4=160,000$ hexapeptides in total. To our knowledge, no similar applications of artificial intelligence tools or succinct amyloid patterns were described before the present work.

preprint2020arXiv

On the Border of the Amyloidogenic Sequences: Prefix Analysis of the Parallel Beta Sheets in the PDB\_Amyloid Collection

The Protein Data Bank (PDB) today contains more than 153,000 entries with the 3-dimensional structures of biological macromolecules. Using the rich resources of this repository, it is possible identifying subsets with specific, interesting properties for different applications. Our research group prepared an automatically updated list of amyloid- and probably amyloidogenic molecules, the PDB\_Amyloid collection, which is freely available at the address \url{http://pitgroup.org/amyloid}. This resource applies exclusively the geometric properties of the steric structures for identifying amyloids. In the present contribution, we analyze the starting (i.e., prefix) subsequences of the characteristic, parallel beta-sheets of the structures in the PDB\_Amyloid collection, and identify further appearances of these length-5 prefix subsequences in the whole PDB data set. We have identified this way numerous proteins, whose normal or irregular functions involve amyloid formation, structural misfolding, or anti-coagulant properties, simply by containing these prefixes: including the T-cell receptor (TCR), bound with the major histocompatibility complexes MHC-1 and MHC-2; the p53 tumor suppressor protein; a mycobacterial RNA polymerase transcription initialization complex; the human bridging integrator protein BIN-1; and the tick anti-coagulant peptide TAP.

preprint2020arXiv

The braingraph.org Database with more than 1000 Robust Human Structural Connectomes in Five Resolutions

The human brain is the most complex object of study we encounter today. Mapping the neuronal-level connections between the more than 80 billion neurons in the brain is a hopeless task for science. By the recent advancement of magnetic resonance imaging (MRI), we are able to map the macroscopic connections between about 1000 brain areas. The MRI data acquisition and the subsequent algorithmic workflow contain several complex steps, where errors can occur. In the present contribution, we describe and publish 1064 human connectomes, computed from the public release of the Human Connectome Project. Each connectome is available in 5 resolutions, with 83, 129, 234, 463, and 1015 anatomically labeled nodes. For error correction, we follow an averaging and extreme value deleting strategy for each edge and for each connectome. The resulting 5320 braingraphs can be downloaded from the \url{https://braingraph.org} site. This dataset makes possible the access to these graphs for scientists unfamiliar with neuroimaging- and connectome-related tools: mathematicians, physicists, and engineers can use their expertize and ideas in the analysis of the connections of the human brain. Brain scientists also have a robust and large, multi-resolution set for connectomical studies.

preprint2020arXiv

The Graph of Our Mind

Graph theory in the last two decades penetrated sociology, molecular biology, genetics, chemistry, computer engineering, and numerous other fields of science. One of the more recent areas of its applications is the study of the connections of the human brain. By the development of diffusion magnetic resonance imaging (diffusion MRI), it is possible today to map the connections between the 1-1.5 cm$^2$ regions of the gray matter of the human brain. These connections can be viewed as a graph: the vertices are the anatomically identified regions of the gray matter, and two vertices are connected by an edge if the diffusion MRI-based workflow finds neuronal fiber tracts between these areas. This way we can compute 1015-vertex graphs with tens of thousands of edges. In a previous work, we have analyzed the male and female braingraphs graph-theoretically, and we have found statistically significant differences in numerous parameters between the sexes: the female braingraphs are better expanders, have more edges, larger bipartition widths, and larger vertex cover than the braingraphs of the male subjects. Our previous study has applied the data of 96 subjects; here we present a much larger study of 426 subjects. Our data source is an NIH-founded project, the "Human Connectome Project (HCP)" public data release. As a service to the community, we have also made all of the braingraphs computed by us from the HCP data publicly available at the \url{http://braingraph.org} for independent validation and further investigations.

preprint2019arXiv

The Frequent Complete Subgraphs in the Human Connectome

While it is still not possible to describe the neural-level connections of the human brain, we can map the human connectome with several hundred vertices, by the application of diffusion-MRI based techniques. In these graphs, the nodes correspond to anatomically identified gray matter areas of the brain, while the edges correspond to the axonal fibers, connecting these areas. In our previous contributions, we have described numerous graph-theoretical phenomena of the human connectomes. Here we map the frequent complete subgraphs of the human brain networks: in these subgraphs, every pair of vertices is connected by an edge. We also examine sex differences in the results. The mapping of the frequent subgraphs gives robust substructures in the graph: if a subgraph is present in the 80% of the graphs, then, most probably, it could not be an artifact of the measurement or the data processing workflow. We list here the frequent complete subgraphs of the human braingraphs of 414 subjects, each with 463 nodes, with a frequency threshold of 80%, and identify 812 complete subgraphs, which are more frequent in male and 224 complete subgraphs, which are more frequent in female connectomes.

preprint2016arXiv

High-Resolution Directed Human Connectomes and the Consensus Connectome Dynamics

Here we show a method of directing the edges of the connectomes, prepared from diffusion tensor imaging (DTI) datasets from the human brain. Before the present work, no high-definition directed braingraphs (or connectomes) were published, because the tractography methods in use are not capable of assigning directions to the neural tracts discovered. Previous work on the functional connectomes applied low-resolution functional MRI-detected statistical causality for the assignment of directions of connectomes of typically several dozens of vertices. Our method is based on the phenomenon of the "Consensus Connectome Dynamics" (CCD), described earlier by our research group. In this contribution, we apply the method to the 423 braingraphs, each with 1015 vertices, computed from the public release of the Human Connectome Project, and we also made the directed connectomes publicly available at the site \url{http://braingraph.org}. We also show the robustness of our edge directing method in four independently chosen connectome datasets: we have found that 86\% of the edges, which were present in all four datasets, get the very same directions in all datasets; therefore the direction method is robust, it does not depend on the particular choice of the dataset. We think that our present contribution opens up new possibilities in the analysis of the high-definition human connectome: from now on we can work with a robust assignment of directions of the connections of the human brain.

preprint2016arXiv

How to Direct the Edges of the Connectomes: Dynamics of the Consensus Connectomes and the Development of the Connections in the Human Brain

The human connectome is the object of an intensive research today. In these graphs, the vertices correspond to the small areas of the gray matter, and two vertices are connected by an edge, if a diffusion-MRI based workflow finds connections between those areas. One main question of the field is discovering the directions of the edges. In a previous work we have reported the construction of the Budapest Reference Connectome Server http://connectome.pitgroup.org from the data recorded in the Human Connectome Project of the NIH. After the server had been published, we recognized a surprising and unforeseen property of it: The server can generate the braingraph of connections that are present in at least $k$ graphs out of the 418, for any value of $k=1,2,...,418$. When the value of $k$ is changed from $k=418$ through 1 by moving a slider at the webserver from right to left, more and more edges appear in the consensus graph. The astonishing observation is that the appearance of the new edges is not random: it is similar to a growing tree. We hypothesize that this movement of the slider in the webserver may copy the development of the connections in the human brain in the following sense: the connections that are present in all subjects are the oldest ones, and those that are present in a decreasing fraction of subjects are gradually the newer connections in the individual brain development. An animation on the phenomenon is available at https://youtu.be/EnWwIf_HNjw. Based on this hypothesis, we can assign directions to the edges of the connectome as follows: Let $G_i$ denote the consensus connectome where each edge is present in at least $i$ graphs. Suppose that vertex $v$ is isolated in $G_{k+1}$, and becomes connected to a vertex $u$ in $G_k$, where $u$ was connected to other vertices already in $G_{k+1}$. Then we direct this $(v,u)$ edge from $v$ to $u$.

preprint2016arXiv

Human Sexual Dimorphism of the Relative Cerebral Area Volumes in the Data of the Human Connectome Project

The average human brain volume of the males is larger than that of the females. Several MRI voxel-based morphometry studies show that the gray matter/white matter ratio is larger in females. Here we have analyzed the recent public release of the Human Connectome Project, and by using the diffusion MRI data of 511 subjects (209 men and 302 women), we have found that the relative volumes of numerous subcortical areas and the gray matter of most cortical areas are significantly larger in women than in men. Additionally, we have discovered differences of the strengths of the sexual correlations between the same structures in different hemispheres.

preprint2016arXiv

Mapping Correlations of Psychological and Connectomical Properties of the Dataset of the Human Connectome Project with the Maximum Spanning Tree Method

We analyzed correlations between more than 700 psychological-, anatomical- and connectome--properties, originated from the Human Connectome Project's (HCP) 500-subject dataset. Apart from numerous natural correlations, which describe parameters computable or approximable from one another, we have discovered numerous significant correlations in the dataset, never described before. We also have found correlations described very recently independently from the HCP-dataset: e.g., between gambling behavior and the number of the connections leaving the insula.

preprint2016arXiv

Parameterizable Consensus Connectomes from the Human Connectome Project: The Budapest Reference Connectome Server v3.0

Connections of the living human brain, on a macroscopic scale, can be mapped by a diffusion MR imaging based workflow. Since the same anatomic regions can be corresponded between distinct brains, one can compare the presence or the absence of the edges, connecting the very same two anatomic regions, among multiple cortices. Previously, we have constructed the consensus braingraphs on 1015 vertices first in five, then in 96 subjects in the Budapest Reference Connectome Server v1.0 and v2.0, respectively. Here we report the construction of the version 3.0 of the server, generating the common edges of the connectomes of variously parameterizable subsets of the 1015-vertex connectomes of 477 subjects of the Human Connectome Project's 500-subject release. The consensus connectomes are downloadable in csv and GraphML formats, and they are also visualized on the server's page. The consensus connectomes of the server can be considered as the "average, healthy" human connectome since all of their connections are present in at least $k$ subjects, where the default value of $k=209$, but it can also be modified freely at the web server. The webserver is available at \url{http://connectome.pitgroup.org}.

preprint2016arXiv

The braingraph.org Database of High Resolution Structural Connectomes and the Brain Graph Tools

Based on the data of the NIH-funded Human Connectome Project, we have computed structural connectomes of 426 human subjects in five different resolutions of 83, 129, 234, 463 and 1015 nodes and several edge weights. The graphs are given in anatomically annotated GraphML format that facilitates better further processing and visualization. For 96 subjects, the anatomically classified sub-graphs can also be accessed, formed from the vertices corresponding to distinct lobes or even smaller regions of interests of the brain. For example, one can easily download and study the connectomes, restricted to the frontal lobes or just to the left precuneus of 96 subjects using the data. Partially directed connectomes of 423 subjects are also available for download. We also present a GitHub-deposited set of tools, called the Brain Graph Tools, for several processing tasks of the connectomes on the site \url{http://braingraph.org}.

preprint2016arXiv

The Dorsal Striatum and the Dynamics of the Consensus Connectomes in the Frontal Lobe of the Human Brain

In the applications of the graph theory it is unusual that one considers numerous, pairwise different graphs on the very same set of vertices. In the case of human braingraphs or connectomes, however, this is the standard situation: the nodes correspond to anatomically identified cerebral regions, and two vertices are connected by an edge if a diffusion MRI-based workflow identifies a fiber of axons, running between the two regions, corresponding to the two vertices. Therefore, if we examine the braingraphs of $n$ subjects, then we have $n$ graphs on the very same, anatomically identified vertex set. It is a natural idea to describe the $k$-frequently appearing edges in these graphs: the edges that are present between the same two vertices in at least $k$ out of the $n$ graphs. Based on the NIH-funded large Human Connectome Project's public data release, we have reported the construction of the Budapest Reference Connectome Server \url{http://connectome.pitgroup.org} that generates and visualizes these $k$-frequently appearing edges. We call the graphs of the $k$-frequently appearing edges "$k$-consensus connectomes" since an edge could be included only if it is present in at least $k$ graphs out of $n$. Considering the whole human brain, we have reported a surprising property of these consensus connectomes earlier. In the present work we are focusing on the frontal lobe of the brain, and we report here a similarly surprising dynamical property of the consensus connectomes when $k$ is gradually changed from $k=n$ to $k=1$: the connections between the nodes of the frontal lobe are seemingly emanating from those nodes that were connected to sub-cortical structures of the dorsal striatum: the caudate nucleus, and the putamen. We hypothesize that this dynamic behavior copies the axonal fiber development of the frontal lobe.

preprint2016arXiv

The Robustness and the Doubly-Preferential Attachment Simulation of the Consensus Connectome Dynamics of the Human Brain

The increasing quantity and quality of the publicly available human cerebral diffusion MRI data make possible the study of the brain as it was unimaginable before. The Consensus Connectome Dynamics (CCD) is a remarkable phenomenon that was discovered by continuously decreasing the minimum confidence-parameter at the graphical interface of the Budapest Reference Connectome Server (\url{http://connectome.pitgroup.org}). The Budapest Reference Connectome Server depicts the cerebral connections of $n=418$ subjects with a frequency-parameter $k$: For any $k=1,2,...,n$ one can view the graph of the edges that are present in at least $k$ connectomes. If parameter $k$ is decreased one-by-one from $k=n$ through $k=1$ then more and more edges appear in the graph, since the inclusion condition is relaxed. The surprising observation is that the appearance of the edges is far from random: it resembles a growing, complex structure, like a tree or a shrub (visualized on \url{https://www.youtube.com/watch?v=yxlyudPaVUE}). Here we examine the robustness of the CCD phenomenon, and we show that it is almost independent of the particular choice of the set of underlying individual connectomes, yielding the CCD phenomenon. This result shows that the CCD phenomenon is very likely a biological property of the human brain and not just a property of the data sets examined. We also present a simulation that well-describes the growth of the CCD structure: in our random graph model a doubly-preferential attachment distribution is found to mimic the CCD: a new edge appear with a probability proportional to the sum of the degrees of the endpoints of the new edge.

preprint2015arXiv

Comparative Connectomics: Mapping the Inter-Individual Variability of Connections within the Regions of the Human Brain

The human braingraph, or connectome is a description of the connections of the brain: the nodes of the graph correspond to small areas of the gray matter, and two nodes are connected by an edge if a diffusion MRI-based workflow finds fibers between those brain areas. We have constructed 1015-vertex graphs from the diffusion MRI brain images of 395 human subjects and compared the individual graphs with respect to several different areas of the brain. The inter-individual variability of the graphs within different brain regions was discovered and described. We have found that the frontal and the limbic lobes are more conservative, while the edges in the temporal and occipital lobes are more diverse. Interestingly, a "hybrid" conservative and diverse distribution was found in the paracentral lobule and the fusiform gyrus. Smaller cortical areas were also evaluated: precentral gyri were found to be more conservative, and the postcentral and the superior temporal gyri to be very diverse.

preprint2015arXiv

Graph Theoretical Analysis Reveals: Women's Brains are Better Connected than Men's

Deep graph-theoretic ideas in the context with the graph of the World Wide Web led to the definition of Google's PageRank and the subsequent rise of the most-popular search engine to date. Brain graphs, or connectomes, are being widely explored today. We believe that non-trivial graph theoretic concepts, similarly as it happened in the case of the World Wide web, will lead to discoveries enlightening the structural and also the functional details of the animal and human brains. When scientists examine large networks of tens or hundreds of millions of vertices, only fast algorithms can be applied because of the size constraints. In the case of diffusion MRI-based structural human brain imaging, the effective vertex number of the connectomes, or brain graphs derived from the data is on the scale of several hundred today. That size facilitates applying strict mathematical graph algorithms even for some hard-to-compute (or NP-hard) quantities like vertex cover or balanced minimum cut. In the present work we have examined brain graphs, computed from the data of the Human Connectome Project, recorded from male and female subjects between ages 22 and 35. Significant differences were found between the male and female structural brain graphs: we show that the average female connectome has more edges, is a better expander graph, has larger minimal bisection width, and has more spanning trees than the average male connectome. Since the average female brain weights less than the brain of males, these properties show that the female brain is more "well-connected" or perhaps, more "efficient" in a sense than the brain of males.

preprint2015arXiv

Life without dUTPase

Fine-tuned regulation of the cellular nucleotide pools is indispensable for faithful replication of DNA. The genetic information is also safeguarded by DNA damage recognition and repair processes. Uracil is one of the most frequently occurring erroneous base in DNA; it can arise from cytosine deamination or thymine-replacing incorporation. Two enzyme families are primarily involved in keeping DNA uracil-free: dUTPases that prevent thymine-replacing incorporation and uracil-DNA glycosylases that excise uracil from DNA and initiate uracil-excision repair. Both dUTPase and the most efficient uracil-DNA glycosylase UNG is thought to be ubiquitous in free-living organisms. In the present work, we have systematically investigated the genotype of deposited fully sequenced bacterial and Archaeal genomes. Surprisingly, we have found that in contrast to the generally held opinion, a wide number of bacterial and Archaeal species lack the dUTPase gene(s). The dut- genotype is present in diverse bacterial phyla indicating that loss of this (or these) gene(s) has occurred multiple times during evolution. We have identified several survival strategies in lack of dUTPases: i) simultaneous lack or inhibition of UNG, ii) acquisition of a less dUTP-specific sanitizing nucleotide pyrophosphatase, and iii) supply of dUTPase from bacteriophages. Our data indicate that several unicellular microorganisms may efficiently cope with a dut- genotype potentially leading to an unusual uracil-enrichment in their genomic DNA.

preprint2015arXiv

Nucleotide 9-mers Characterize the Type II Diabetic Gut Metagenome

Discoveries of new biomarkers for frequently occurring diseases are of special importance in today's medicine. While fully developed type II diabetes (T2D) can be detected easily, the early identification of high risk individuals is an area of interest in T2D, too. Metagenomic analysis of the human bacterial flora has shown subtle changes in diabetic patients, but no specific microbes are known to cause or promote the disease. Moderate changes were also detected in the microbial gene composition of the metagenomes of diabetic patients, but again, no specific gene was found that is present in disease-related and missing in healthy metagenome. However, these fine differences in microbial taxon- and gene composition are difficult to apply as quantitative biomarkers for diagnosing or predicting type II diabetes. In the present work we report some nucleotide 9-mers with significantly differing frequencies in diabetic and healthy intestinal flora. To our knowledge, it is the first time such short DNA fragments have been associated with T2D. The automated, quantitative analysis of the frequencies of short nucleotide sequences seems to be more feasible than accurate phylogenetic and functional analysis, and thus it might be a promising direction of diagnostic research.

preprint2015arXiv

The "Giant Virus Finder" Discovers an Abundance of Giant Viruses in the Antarctic Dry Valleys

The first giant virus was identified in 2003 from a biofilm of an industrial water-cooling tower in England. Later, numerous new giant viruses were found in oceans and freshwater habitats, some of them having even 2,500 genes. We have demonstrated their very likely presence in four soil samples taken from the Kutch Desert (Gujarat, India). Here we describe a bioinformatics work-flow, called the "Giant Virus Finder" that is capable to discover the very likely presence of the genomes of giant viruses in metagenomic shotgun-sequenced datasets. The new tool is applied to numerous hot and cold desert soil samples as well as some tundra- and forest soils. We show that most of these samples contain giant viruses, and especially many were found in the Antarctic dry valleys. The results imply that giant viruses could be frequent not only in aqueous habitats, but in a wide spectrum of soils on our planet.

preprint2015arXiv

The Advantage is at the Ladies: Brain Size Bias-Compensated Graph-Theoretical Parameters are Also Better in Women's Connectomes

In our previous study we have shown that the female connectomes have significantly better, deep graph-theoretical parameters, related to superior "connectivity", than the connectome of the males. Since the average female brain is smaller than the average male brain, one cannot rule out that the significant advantages are due to the size- and not to the sex-differences in the data. To filter out the possible brain-volume related artifacts, we have chosen 36 small male and 36 large female brains such that all the brains in the female set are larger than all the brains in the male set. For the sets, we have computed the corresponding braingraphs and computed numerous graph-theoretical parameters. We have found that (i) the small male brains lack the better connectivity advantages shown in our previous study for female brains in general; (ii) in numerous parameters, the connectomes computed from the large-brain females, still have the significant, deep connectivity advantages, demonstrated in our previous study.

preprint2015arXiv

The Budapest Reference Connectome Server v2.0

The connectomes of different human brains are pairwise distinct: we cannot talk about an abstract "graph of the brain". Two typical connectomes, however, have quite a few common graph edges that may describe the same connections between the same cortical areas. The Budapest Reference Connectome Server Ver. 2.0 (http://connectome.pitgroup.org) generates the common edges of the connectomes of 96 distinct cortexes, each with 1015 vertices, computed from 96 MRI data sets of the Human Connectome Project. The user may set numerous parameters for the identification and filtering of common edges, and the graphs are downloadable in both csv and GraphML formats; both formats carry the anatomical annotations of the vertices, generated by the Freesurfer program. The resulting consensus graph is also automatically visualized in a 3D rotating brain model on the website. The consensus graphs, generated with various parameter settings, can be used as reference connectomes based on different, independent MRI images, therefore they may serve as reduced-error, low-noise, robust graph representations of the human brain.

preprint2014arXiv

Giant Viruses of the Kutch Desert

The Kutch desert (Great Rann of Kutch, Gujarat, India) is a unique ecosystem: in the larger part of the year it is a hot, salty desert that is flooded regularly in the Indian monsoon season. In the dry season, the crystallized salt deposits form the "white desert" in large regions. The first metagenomic analysis of the soil samples of Kutch was published in 2013, and the data was deposited in the NCBI Sequence Read Archive. The sequences were analyzed at the same time phylogenetically for prokaryotes, especially for bacterial taxa. In the present work, we are searching for the DNA sequences of the recently discovered giant viruses in the soil samples of the Kutch desert. Since most giant viruses were discovered in biofilms in industrial cooling towers, ocean water and freshwater ponds, we were surprised to find their DNA sequences in the soil samples of a seasonally very hot and arid, salty environment.

preprint2013arXiv

An Intuitive Graphical Webserver for Multiple-Choice Protein Sequence Search

Every day tens of thousands of sequence searches and sequence alignment queries are submitted to webservers. The capitalized word "BLAST" become a verb, describing the act of performing sequence search and alignment. However, if one needs to search for sequences that contain, for example, two hydrophobic and three polar residues at five given positions, the query formation on the most frequently used webservers will be difficult. Some servers support the formation of queries with regular expressions, but most of the users are unfamiliar with their syntax. Here we present an intuitive, easily applicable webserver, the Protein Sequence Analysis server, that allows the formation of multiple choice queries by simply drawing the residues to their positions; if more than one residue are drawn to the same position, then they will be nicely stacked on the user interface, indicating the multiple choice at he given position. This computer-game like interface is natural and intuitive, and the coloring of the residues makes possible to form queries requiring not just certain amino acids in the given positions, but also small nonpolar, negatively charged, hydrophobic, positively charged, or polar ones. The webserver is available at http://psa.pitgroup.org.

preprint2013arXiv

Dimension reduction of clustering results in bioinformatics

OPTICS is a density-based clustering algorithm that performs well in a wide variety of applications. For a set of input objects, the algorithm creates a so-called reachability plot that can be either used to produce cluster membership assignments, or interpreted itself as an expressive two-dimensional representation of the density-based clustering structure of the input set, even if the input set is embedded in higher dimensions. The main focus of this work is a visualization method that can be used to assign colours to all entries of the input database, based on hierarchically represented a-priori knowledge available for each of these objects. Based on two different, bioinformatics-related applications we illustrate how the proposed method can be efficiently used to identify clusters with proven real-life relevance.

preprint2013arXiv

Fast and Exact Sequence Alignment with the Smith-Waterman Algorithm: The SwissAlign Webserver

It is demonstrated earlier that the exact Smith-Waterman algorithm yields more accurate results than the members of the heuristic BLAST family of algorithms. Unfortunately, the Smith-Waterman algorithm is much slower than the BLAST and its clones. Here we present a technique and a webserver that uses the exact Smith-Waterman algorithm, and it is approximately as fast as the BLAST algorithm. The technique unites earlier methods of extensive preprocessing of the target sequence database, and CPU-specific coding of the Smith-Waterman algorithm. The SwissAlign webserver is available at the http://swissalign.pitgroup.org address.