Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
34works
0followers
15topics
3close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

34 published item(s)

preprint2012arXiv

A Comparative Study on the Performance of Permutation Algorithms

Permutation is the different arrangements that can be made with a given number of things taking some or all of them at a time. The notation P(n,r) is used to denote the number of permutations of n things taken r at a time. Permutation is used in various fields such as mathematics, group theory, statistics, and computing, to solve several combinatorial problems such as the job assignment problem and the traveling salesman problem. In effect, permutation algorithms have been studied and experimented for many years now. Bottom-Up, Lexicography, and Johnson-Trotter are three of the most popular permutation algorithms that emerged during the past decades. In this paper, we are implementing three of the most eminent permutation algorithms, they are respectively: Bottom-Up, Lexicography, and Johnson-Trotter algorithms. The implementation of each algorithm will be carried out using two different approaches: brute-force and divide and conquer. The algorithms codes will be tested using a computer simulation tool to measure and evaluate the execution time between the different implementations.

preprint2012arXiv

A Comparative Study on the Performance of the Top DBMS Systems

Database management systems are today's most reliable mean to organize data into collections that can be searched and updated. However, many DBMS systems are available on the market each having their pros and cons in terms of reliability, usability, security, and performance. This paper presents a comparative study on the performance of the top DBMS systems. They are mainly MS SQL Server 2008, Oracle 11g, IBM DB2, MySQL 5.5, and MS Access 2010. The testing is aimed at executing different SQL queries with different level of complexities over the different five DBMSs under test. This would pave the way to build a head-to-head comparative evaluation that shows the average execution time, memory usage, and CPU utilization of each DBMS after completion of the test.

preprint2012arXiv

A Data Warehouse Design for a Typical University Information System

Presently, large enterprises rely on database systems to manage their data and information. These databases are useful for conducting daily business transactions. However, the tight competition in the marketplace has led to the concept of data mining in which data are analyzed to derive effective business strategies and discover better ways in carrying out business. In order to perform data mining, regular databases must be converted into what so called informational databases also known as data warehouse. This paper presents a design model for building data warehouse for a typical university information system. It is based on transforming an operational database into an informational warehouse useful for decision makers to conduct data analysis, predication, and forecasting. The proposed model is based on four stages of data migration: Data extraction, data cleansing, data transforming, and data indexing and loading. The complete system is implemented under MS Access 2010 and is meant to serve as a repository of data for data mining operations.

preprint2012arXiv

A Generation-based Text Steganography Method using SQL Queries

Cryptography and Steganography are two techniques commonly used to secure and safely transmit digital data. Nevertheless, they do differ in important ways. In fact, cryptography scrambles data so that they become unreadable by eavesdroppers; while, steganography hides the very existence of data so that they can be transferred unnoticed. Basically, steganography is a technique for hiding data such as messages into another form of data such as images. Currently, many types of steganography are in use; however, there is yet no known steganography application for query languages such as SQL. This paper proposes a new steganography method for textual data. It encodes input text messages into SQL carriers made up of SELECT queries. In effect, the output SQL carrier is dynamically generated out of the input message using a dictionary of words implemented as a hash table and organized into 65 categories, each of which represents a particular character in the language. Generally speaking, every character in the message to hide is mapped to a random word from a corresponding category in the dictionary. Eventually, all input characters are transformed into output words which are then put together to form an SQL query. Experiments conducted, showed how the proposed method can operate on real examples proving the theory behind it. As future work, other types of SQL queries are to be researched including INSERT, DELETE, and UPDATE queries, making the SQL carrier quite puzzling for malicious third parties to recuperate the secret message that it encodes.

preprint2012arXiv

A Simulation Model for the Waterfall Software Development Life Cycle

Software development life cycle or SDLC for short is a methodology for designing, building, and maintaining information and industrial systems. So far, there exist many SDLC models, one of which is the Waterfall model which comprises five phases to be completed sequentially in order to develop a software solution. However, SDLC of software systems has always encountered problems and limitations that resulted in significant budget overruns, late or suspended deliveries, and dissatisfied clients. The major reason for these deficiencies is that project directors are not wisely assigning the required number of workers and resources on the various activities of the SDLC. Consequently, some SDLC phases with insufficient resources may be delayed; while, others with excess resources may be idled, leading to a bottleneck between the arrival and delivery of projects and to a failure in delivering an operational product on time and within budget. This paper proposes a simulation model for the Waterfall development process using the Simphony.NET simulation tool whose role is to assist project managers in determining how to achieve the maximum productivity with the minimum number of expenses, workers, and hours. It helps maximizing the utilization of development processes by keeping all employees and resources busy all the time to keep pace with the arrival of projects and to decrease waste and idle time. As future work, other SDLC models such as spiral and incremental are to be simulated, giving project executives the choice to use a diversity of software development methodologies.

preprint2012arXiv

A Survey on Information Retrieval, Text Categorization, and Web Crawling

This paper is a survey discussing Information Retrieval concepts, methods, and applications. It goes deep into the document and query modelling involved in IR systems, in addition to pre-processing operations such as removing stop words and searching by synonym techniques. The paper also tackles text categorization along with its application in neural networks and machine learning. Finally, the architecture of web crawlers is to be discussed shedding the light on how internet spiders index web documents and how they allow users to search for items on the web.

preprint2012arXiv

A Text Steganography Method Using Pangram and Image Mediums

Steganography is the art and science of writing hidden messages in such a way that no one apart from the sender and the receiver would realize that a secret communicating is taking place. Unlike cryptography which only scrambles secret data keeping them overt, steganography covers secret data into medium files such as image files and transmits them in total secrecy avoiding drawing eavesdroppers suspicions. However, considering that the public channel is monitored by eavesdroppers, it is vulnerable to stego-attacks which refer to randomly trying to break the medium file and recover the secret data out of it. That is often true because steganalysts assume that the secret data are encoded into a single medium file and not into multiple ones that complement each other. This paper proposes a text steganography method for hiding secret textual data using two mediums; a Pangram sentence containing all the characters of the alphabet, and an uncompressed image file. The algorithm tries to search for every character of the secret message into the Pangram text. The search starts from a random index called seed and ends up on the index of the first occurrence of the character being searched for. As a result, two indexes are obtained, the seed and the offset indexes. Together they are embedded into the three LSBs of the color channels of the image medium. Ultimately, both mediums mainly the Pangram and the image are sent to the receiver. The advantage of the proposed method is that it makes the covert data hard to be recovered by unauthorized parties as it uses two mediums, instead of one, to deliver the secret data. Experiments conducted, illustrated an example that explained how to encode and decode a secret text message using the Pangram and the image mediums.

preprint2012arXiv

A Two Intermediates Audio Steganography Technique

On the rise of the Internet, digital data became openly public which has driven IT industries to pay special consideration to data confidentiality. At present, two main techniques are being used: Cryptography and Steganography. In effect, cryptography garbles a secret message turning it into a meaningless form; while, steganography hides the very existence of the message by embedding it into an intermediate such as a computer file. In fact, in audio steganography, this computer file is a digital audio file in which secret data are concealed, predominantly, into the bits that make up its audio samples. This paper proposes a novel steganography technique for hiding digital data into uncompressed audio files using a randomized algorithm and a context-free grammar coupled with a lexicon of words. Furthermore, the proposed technique uses two intermediates to transmit the secret data between communicating parties: The first intermediate is an audio file whose audio samples, which are selected randomly, are used to conceal the secret data; whereas, the second intermediate is a grammatically correct English text that is generated at runtime using a context-free grammar and it encodes the location of the random audio samples in the audio file. The proposed technique is stealthy and irrecoverable in a sense that it is difficult for unauthorized third parties to detect the presence of and recover the secret data. Experiments conducted showed how the covering and the uncovering processes of the proposed technique work. As future work, a semantic analyzer is to be developed so as to make the intermediate text not only grammatically correct but also semantically plausible.

preprint2012arXiv

An Image Steganography Scheme using Randomized Algorithm and Context-Free Grammar

Currently, cryptography is in wide use as it is being exploited in various domains from data confidentiality to data integrity and message authentication. Basically, cryptography shuffles data so that they become unreadable by unauthorized parties. However, clearly visible encrypted messages, no matter how unbreakable, will arouse suspicions. A better approach would be to hide the very existence of the message using steganography. Fundamentally, steganography conceals secret data into innocent-looking mediums called carriers which can then travel from the sender to the receiver safe and unnoticed. This paper proposes a novel steganography scheme for hiding digital data into uncompressed image files using a randomized algorithm and a context-free grammar. Besides, the proposed scheme uses two mediums to deliver the secret data: a carrier image into which the secret data are hidden into random pixels, and a well-structured English text that encodes the location of the random carrier pixels. The English text is generated at runtime using a context-free grammar coupled with a lexicon of English words. The proposed scheme is stealthy, and hard to be noticed, detected, and recovered. Experiments conducted showed how the covering and the uncovering processes of the proposed scheme work. As future work, a semantic analyzer is to be developed so as to make the English text medium semantically correct, and consequently safer to be transmitted without drawing any attention.

preprint2012arXiv

ASR Context-Sensitive Error Correction Based on Microsoft N-Gram Dataset

At the present time, computers are employed to solve complex tasks and problems ranging from simple calculations to intensive digital image processing and intricate algorithmic optimization problems to computationally-demanding weather forecasting problems. ASR short for Automatic Speech Recognition is yet another type of computational problem whose purpose is to recognize human spoken speech and convert it into text that can be processed by a computer. Despite that ASR has many versatile and pervasive real-world applications,it is still relatively erroneous and not perfectly solved as it is prone to produce spelling errors in the recognized text, especially if the ASR system is operating in a noisy environment, its vocabulary size is limited, and its input speech is of bad or low quality. This paper proposes a post-editing ASR error correction method based on MicrosoftN-Gram dataset for detecting and correcting spelling errors generated by ASR systems. The proposed method comprises an error detection algorithm for detecting word errors; a candidate corrections generation algorithm for generating correction suggestions for the detected word errors; and a context-sensitive error correction algorithm for selecting the best candidate for correction. The virtue of using the Microsoft N-Gram dataset is that it contains real-world data and word sequences extracted from the web which canmimica comprehensive dictionary of words having a large and all-inclusive vocabulary. Experiments conducted on numerous speeches, performed by different speakers, showed a remarkable reduction in ASR errors. Future research can improve upon the proposed algorithm so much so that it can be parallelized to take advantage of multiprocessor and distributed systems.

preprint2012arXiv

Autonomic html interface generator for web applications

Recent advances in computing systems have led to a new digital era in which every area of life is nearly interrelated with information technology. However, with the trend towards large-scale IT systems, a new challenge has emerged. The complexity of IT systems is becoming an obstacle that hampers the manageability, operability, and maintainability of modern computing infrastructures. Autonomic computing popped up to provide an answer to these ever-growing pitfalls. Fundamentally, autonomic systems are self-configuring, self-healing, self-optimizing, and self-protecting; hence, they can automate all complex IT processes without human intervention. This paper proposes an autonomic HTML web-interface generator based on XML Schema and Style Sheet specifications for self-configuring graphical user interfaces of web applications. The goal of this autonomic generator is to automate the process of customizing GUI web-interfaces according to the ever-changing business rules, policies, and operating environment with the least IT labor involvement. The conducted experiments showed a successful automation of web interfaces customization that dynamically self-adapts to keep with the always-changing business requirements. Future research can improve upon the proposed solution so that it supports the selfconfiguring of not only web applications but also desktop applications.

preprint2012arXiv

Autonomic Model for Self-Configuring C#.NET Applications

With the advances in computational technologies over the last decade, large organizations have been investing in Information Technology to automate their internal processes to cut costs and efficiently support their business projects. However, this comes to a price. Business requirements always change. Likewise, IT systems constantly evolves as developers make new versions of them, which require endless administrative manual work to customize and configure them, especially if they are being used in different contexts, by different types of users, and for different requirements. Autonomic computing was conceived to provide an answer to these ever-changing requirements. Essentially, autonomic systems are self-configuring, self-healing, self-optimizing, and self-protecting; hence, they can automate all complex IT processes without human intervention. This paper proposes an autonomic model based on Venn diagram and set theory for self-configuring C#.NET applications, namely the self-customization of their GUI, event-handlers, and security permissions. The proposed model does not require altering the source-code of the original application; rather, it uses an XML-based customization file to turn on and off the internal attributes of the application. Experiments conducted on the proposed model, showed a successful automatic customization for C# applications and an effective self-adaption based on dynamic business requirements. As future work, other programming languages such as Java and C++ are to be supported, in addition to other operating systems such as Linux and Mac so as to provide a standard platform-independent autonomic self-configuring model.

preprint2012arXiv

Building sustainable ecosystem-oriented architectures

Currently, organizations are transforming their business processes into e-services and service-oriented architectures to improve coordination across sales, marketing, and partner channels, to build flexible and scalable systems, and to reduce integration-related maintenance and development costs. However, this new paradigm is still fragile and lacks many features crucial for building sustainable and progressive computing infrastructures able to rapidly respond and adapt to the always-changing market and environmental business. This paper proposes a novel framework for building sustainable Ecosystem- Oriented Architectures (EOA) using e-service models. The backbone of this framework is an ecosystem layer comprising several computing units whose aim is to deliver universal interoperability, transparent communication, automated management, self-integration, self-adaptation, and security to all the interconnected services, components, and devices in the ecosystem. Overall, the proposed model seeks to deliver a comprehensive and a generic sustainable business IT model for developing agile e-enterprises that are constantly up to new business constraints, trends, and requirements. Future research can improve upon the proposed model so much so that it supports computational intelligence to help in decision making and problem solving.

preprint2012arXiv

Communication Language Specifications For Digital Ecosystems

Service-based IT infrastructures are today's trend and the future for every enterprise willing to support dynamic and agile business to contend with the ever changing e-demands and requirements. A digital ecosystem is an emerging business IT model for developing agile e-enterprises made out of self-adaptable, self-manageable, self-organizing, and sustainable service components. This paper defines the specifications of a communication language for exchanging data between connecting entities in digital ecosystems. It is called ECL short for Ecosystem Communication Language and is based on XML to format its request and response messages. An ECU short for Ecosystem Communication Unit is also presented which interprets, validates, parses ECL messages and routes them to their destination entities. ECL is open and provides transparent, portable, and interoperable communication between the different heterogeneous distributed components to send requests, and receive responses from each other, regardless of their incompatible protocols, standards, and technologies. As future research, digital signature for ECL is to be investigated so as to deliver data integrity as well as message authenticity for the digital ecosystem.

preprint2012arXiv

Context-sensitive Spelling Correction Using Google Web 1T 5-Gram Information

In computing, spell checking is the process of detecting and sometimes providing spelling suggestions for incorrectly spelled words in a text. Basically, a spell checker is a computer program that uses a dictionary of words to perform spell checking. The bigger the dictionary is, the higher is the error detection rate. The fact that spell checkers are based on regular dictionaries, they suffer from data sparseness problem as they cannot capture large vocabulary of words including proper names, domain-specific terms, technical jargons, special acronyms, and terminologies. As a result, they exhibit low error detection rate and often fail to catch major errors in the text. This paper proposes a new context-sensitive spelling correction method for detecting and correcting non-word and real-word errors in digital text documents. The approach hinges around data statistics from Google Web 1T 5-gram data set which consists of a big volume of n-gram word sequences, extracted from the World Wide Web. Fundamentally, the proposed method comprises an error detector that detects misspellings, a candidate spellings generator based on a character 2-gram model that generates correction suggestions, and an error corrector that performs contextual error correction. Experiments conducted on a set of text documents from different domains and containing misspellings, showed an outstanding spelling error correction rate and a drastic reduction of both non-word and real-word errors. In a further study, the proposed algorithm is to be parallelized so as to lower the computational cost of the error detection and correction processes.

preprint2012arXiv

Distributed, Cross-Platform, and Regression Testing Architecture for Service-Oriented Architecture

As per leading IT experts, today's large enterprises are going through business transformations. They are adopting service-based IT models such as SOA to develop their enterprise information systems and applications. In fact, SOA is an integration of loosely-coupled interoperable components, possibly built using heterogeneous software technologies and hardware platforms. As a result, traditional testing architectures are no more adequate for verifying and validating the quality of SOA systems and whether they are operating to specifications. This paper first discusses the various state-of-the-art methods for testing SOA applications, and then it proposes a novel automated, distributed, cross-platform, and regression testing architecture for SOA systems. The proposed testing architecture consists of several testing units which include test engine, test code generator, test case generator, test executer, and test monitor units. Experiments conducted showed that the proposed testing architecture managed to use parallel agents to test heterogeneous web services whose technologies were incompatible with the testing framework. As future work, testing non-functional aspects of SOA applications are to be investigated so as to allow the testing of such properties as performance, security, availability, and scalability.

preprint2012arXiv

Expert PC Troubleshooter With Fuzzy-Logic And Self-Learning Support

Expert systems use human knowledge often stored as rules within the computer to solve problems that generally would entail human intelligence. Today, with information systems turning out to be more pervasive and with the myriad advances in information technologies, automating computer fault diagnosis is becoming so fundamental that soon every enterprise has to endorse it. This paper proposes an expert system called Expert PC Troubleshooter for diagnosing computer problems. The system is composed of a user interface, a rule-base, an inference engine, and an expert interface. Additionally, the system features a fuzzy-logic module to troubleshoot POST beep errors, and an intelligent agent that assists in the knowledge acquisition process. The proposed system is meant to automate the maintenance, repair, and operations (MRO) process, and free-up human technicians from manually performing routine, laborious, and timeconsuming maintenance tasks. As future work, the proposed system is to be parallelized so as to boost its performance and speed-up its various operations.

preprint2012arXiv

Hybrid Information Retrieval Model For Web Images

The Bing Bang of the Internet in the early 90's increased dramatically the number of images being distributed and shared over the web. As a result, image information retrieval systems were developed to index and retrieve image files spread over the Internet. Most of these systems are keyword-based which search for images based on their textual metadata; and thus, they are imprecise as it is vague to describe an image with a human language. Besides, there exist the content-based image retrieval systems which search for images based on their visual information. However, content-based type systems are still immature and not that effective as they suffer from low retrieval recall/precision rate. This paper proposes a new hybrid image information retrieval model for indexing and retrieving web images published in HTML documents. The distinguishing mark of the proposed model is that it is based on both graphical content and textual metadata. The graphical content is denoted by color features and color histogram of the image; while textual metadata are denoted by the terms that surround the image in the HTML document, more particularly, the terms that appear in the tags p, h1, and h2, in addition to the terms that appear in the image's alt attribute, filename, and class-label. Moreover, this paper presents a new term weighting scheme called VTF-IDF short for Variable Term Frequency-Inverse Document Frequency which unlike traditional schemes, it exploits the HTML tag structure and assigns an extra bonus weight for terms that appear within certain particular HTML tags that are correlated to the semantics of the image. Experiments conducted to evaluate the proposed IR model showed a high retrieval precision rate that outpaced other current models.

preprint2012arXiv

Image Steganography based on a Parameterized Canny Edge Detection Algorithm

Steganography is the science of hiding digital information in such a way that no one can suspect its existence. Unlike cryptography which may arouse suspicions, steganography is a stealthy method that enables data communication in total secrecy. Steganography has many requirements, the foremost one is irrecoverability which refers to how hard it is for someone apart from the original communicating parties to detect and recover the hidden data out of the secret communication. A good strategy to guaranteeirrecoverability is to cover the secret data not usinga trivial method based on a predictable algorithm, but using a specific random pattern based on a mathematical algorithm. This paper proposes an image steganography technique based on theCanny edge detection algorithm.It is designed to hide secret data into a digital image within the pixels that make up the boundaries of objects detected in the image. More specifically, bits of the secret data replace the three LSBs of every color channel of the pixels detected by the Canny edge detection algorithm as part of the edges in the carrier image. Besides, the algorithm is parameterized by three parameters: The size of the Gaussian filter, a low threshold value, and a high threshold value. These parameters can yield to different outputs for the same input image and secret data. As a result, discovering the inner-workings of the algorithm would be considerably ambiguous, misguiding steganalysts from the exact location of the covert data. Experiments showed a simulation tool codenamed GhostBit, meant to cover and uncover secret data using the proposed algorithm. As future work, examining how other image processing techniques such as brightness and contrast adjustment can be taken advantage of in steganography with the purpose ofgiving the communicating parties more preferences tomanipulate their secret communication.

preprint2012arXiv

Image Steganography Method Based on Brightness Adjustment

Steganography is an information hiding technique in which secret data are secured by covering them into a computer carrier file without damaging the file or changing its size. The difference between steganography and cryptography is that steganography is a stealthy method of communication that only the communicating parties are aware of; while, cryptography is an overt method of communication that anyone is aware of, despite its payload is scribbled. Typically, an irrecoverable steganography algorithm is the algorithm that makes it hard for malicious third parties to discover how it works and how to recover the secret data out of the carrier file. One popular way to achieve irrecoverability is to digitally process the carrier file after hiding the secret data into it. However, such process is irreversible as it would destroy the concealed data. This paper proposes a new image steganography method for textual data, as well as for any form of digital data, based on adjusting the brightness of the carrier image after covering the secret data into it. The algorithm used is parameterized as it can be configured using three different parameters defined by the communicating parties. They include the amount of brightness to apply on the carrier image after the completion of the covering process, the color channels whose brightness should be adjusted, and the bytes that should carry in the secret data. The novelty of the proposed method is that it embeds bits of the secret data into the three LSBs of the bytes that compose the carrier image in such a way that does not destroy the secret data when restoring back the original brightness of the carrier image. The simulation conducted proved that the proposed algorithm is valid and correct.

preprint2012arXiv

Management Language Specifications For Digital Ecosystems

This paper defines the specifications of a management language intended to automate the control and administration of various service components connected to a digital ecosystem. It is called EML short for Ecosystem Management Language and it is based on proprietary syntax and notation and contains a set of managerial commands issued by the system's administrator via a command console. Additionally, EML is shipped with a collection of self-adaptation procedures called SAP. Their purpose is to provide self-adaptation properties to the ecosystem allowing it to self-optimize itself based on the state of its execution environment. On top of that, there exists the EMU short for Ecosystem Management Unit which interprets, validates, parses, and executes EML commands and SAP procedures. Future research can improve upon EML so much so that it can be extended to support a larger set of commands in addition to a larger set of SAP procedures.

preprint2012arXiv

MyProLang - My Programming Language: A Template-Driven Automatic Natural Programming Language

Modern computer programming languages are governed by complex syntactic rules. They are unlike natural languages; they require extensive manual work and a significant amount of learning and practicing for an individual to become skilled at and to write correct programs. Computer programming is a difficult, complicated, unfamiliar, non-automated, and a challenging discipline for everyone; especially, for students, new programmers and end-users. This paper proposes a new programming language and an environment for writing computer applications based on source-code generation. It is mainly a template-driven automatic natural imperative programming language called MyProLang. It harnesses GUI templates to generate proprietary natural language source-code, instead of having computer programmers write the code manually. MyProLang is a blend of five elements. A proprietary natural programming language with unsophisticated grammatical rules and expressive syntax; automation templates that automate the generation of instructions and thereby minimizing the learning and training time; an NLG engine to generate natural instructions; a source-to-source compiler that analyzes, parses, and build executables; and an ergonomic IDE that houses diverse functions whose role is to simplify the software development process. MyProLang is expected to make programming open to everyone including students, programmers and end-users. In that sense, anyone can start programming systematically, in an automated manner and in natural language; without wasting time in learning how to formulate instructions and arrange expressions, without putting up with unfamiliar structures and symbols, and without being annoyed by syntax errors. In the long run, this increases the productivity, quality and time-to-market in software development.

preprint2012arXiv

Neural Network Model for Path-Planning of Robotic Rover Systems

Today, robotics is an auspicious and fast-growing branch of technology that involves the manufacturing, design, and maintenance of robot machines that can operate in an autonomous fashion and can be used in a wide variety of applications including space exploration, weaponry, household, and transportation. More particularly, in space applications, a common type of robots has been of widespread use in the recent years. It is called planetary rover which is a robot vehicle that moves across the surface of a planet and conducts detailed geological studies pertaining to the properties of the landing cosmic environment. However, rovers are always impeded by obstacles along the traveling path which can destabilize the rover's body and prevent it from reaching its goal destination. This paper proposes an ANN model that allows rover systems to carry out autonomous path-planning to successfully navigate through challenging planetary terrains and follow their goal location while avoiding dangerous obstacles. The proposed ANN is a multilayer network made out of three layers: an input, a hidden, and an output layer. The network is trained in offline mode using back-propagation supervised learning algorithm. A software-simulated rover was experimented and it revealed that it was able to follow the safest trajectory despite existing obstacles. As future work, the proposed ANN is to be parallelized so as to speed-up the execution time of the training process.

preprint2012arXiv

OCR Context-Sensitive Error Correction Based on Google Web 1T 5-Gram Data Set

Since the dawn of the computing era, information has been represented digitally so that it can be processed by electronic computers. Paper books and documents were abundant and widely being published at that time; and hence, there was a need to convert them into digital format. OCR, short for Optical Character Recognition was conceived to translate paper-based books into digital e-books. Regrettably, OCR systems are still erroneous and inaccurate as they produce misspellings in the recognized text, especially when the source document is of low printing quality. This paper proposes a post-processing OCR context-sensitive error correction method for detecting and correcting non-word and real-word OCR errors. The cornerstone of this proposed approach is the use of Google Web 1T 5-gram data set as a dictionary of words to spell-check OCR text. The Google data set incorporates a very large vocabulary and word statistics entirely reaped from the Internet, making it a reliable source to perform dictionary-based error correction. The core of the proposed solution is a combination of three algorithms: The error detection, candidate spellings generator, and error correction algorithms, which all exploit information extracted from Google Web 1T 5-gram data set. Experiments conducted on scanned images written in different languages showed a substantial improvement in the OCR error correction rate. As future developments, the proposed algorithm is to be parallelised so as to support parallel and distributed computing architectures.

preprint2012arXiv

OCR Post-Processing Error Correction Algorithm using Google Online Spelling Suggestion

With the advent of digital optical scanners, a lot of paper-based books, textbooks, magazines, articles, and documents are being transformed into an electronic version that can be manipulated by a computer. For this purpose, OCR, short for Optical Character Recognition was developed to translate scanned graphical text into editable computer text. Unfortunately, OCR is still imperfect as it occasionally mis-recognizes letters and falsely identifies scanned text, leading to misspellings and linguistics errors in the OCR output text. This paper proposes a post-processing context-based error correction algorithm for detecting and correcting OCR non-word and real-word errors. The proposed algorithm is based on Google's online spelling suggestion which harnesses an internal database containing a huge collection of terms and word sequences gathered from all over the web, convenient to suggest possible replacements for words that have been misspelled during the OCR process. Experiments carried out revealed a significant improvement in OCR error correction rate. Future research can improve upon the proposed algorithm so much so that it can be parallelized and executed over multiprocessing platforms.

preprint2012arXiv

Parallel Spell-Checking Algorithm Based on Yahoo! N-Grams Dataset

Spell-checking is the process of detecting and sometimes providing suggestions for incorrectly spelled words in a text. Basically, the larger the dictionary of a spell-checker is, the higher is the error detection rate; otherwise, misspellings would pass undetected. Unfortunately, traditional dictionaries suffer from out-of-vocabulary and data sparseness problems as they do not encompass large vocabulary of words indispensable to cover proper names, domain-specific terms, technical jargons, special acronyms, and terminologies. As a result, spell-checkers will incur low error detection and correction rate and will fail to flag all errors in the text. This paper proposes a new parallel shared-memory spell-checking algorithm that uses rich real-world word statistics from Yahoo! N-Grams Dataset to correct non-word and real-word errors in computer text. Essentially, the proposed algorithm can be divided into three sub-algorithms that run in a parallel fashion: The error detection algorithm that detects misspellings, the candidates generation algorithm that generates correction suggestions, and the error correction algorithm that performs contextual error correction. Experiments conducted on a set of text articles containing misspellings, showed a remarkable spelling error correction rate that resulted in a radical reduction of both non-word and real-word errors in electronic text. In a further study, the proposed algorithm is to be optimized for message-passing systems so as to become more flexible and less costly to scale over distributed machines.

preprint2012arXiv

Post-Editing Error Correction Algorithm for Speech Recognition using Bing Spelling Suggestion

ASR short for Automatic Speech Recognition is the process of converting a spoken speech into text that can be manipulated by a computer. Although ASR has several applications, it is still erroneous and imprecise especially if used in a harsh surrounding wherein the input speech is of low quality. This paper proposes a post-editing ASR error correction method and algorithm based on Bing's online spelling suggestion. In this approach, the ASR recognized output text is spell-checked using Bing's spelling suggestion technology to detect and correct misrecognized words. More specifically, the proposed algorithm breaks down the ASR output text into several word-tokens that are submitted as search queries to Bing search engine. A returned spelling suggestion implies that a query is misspelled; and thus it is replaced by the suggested correction; otherwise, no correction is performed and the algorithm continues with the next token until all tokens get validated. Experiments carried out on various speeches in different languages indicated a successful decrease in the number of ASR errors and an improvement in the overall error correction rate. Future research can improve upon the proposed algorithm so much so that it can be parallelized to take advantage of multiprocessor computers.

preprint2012arXiv

Semantic-Sensitive Web Information Retrieval Model for HTML Documents

With the advent of the Internet, a new era of digital information exchange has begun. Currently, the Internet encompasses more than five billion online sites and this number is exponentially increasing every day. Fundamentally, Information Retrieval (IR) is the science and practice of storing documents and retrieving information from within these documents. Mathematically, IR systems are at the core based on a feature vector model coupled with a term weighting scheme that weights terms in a document according to their significance with respect to the context in which they appear. Practically, Vector Space Model (VSM), Term Frequency (TF), and Inverse Term Frequency (IDF) are among other long-established techniques employed in mainstream IR systems. However, present IR models only target generic-type text documents, in that, they do not consider specific formats of files such as HTML web documents. This paper proposes a new semantic-sensitive web information retrieval model for HTML documents. It consists of a vector model called SWVM and a weighting scheme called BTF-IDF, particularly designed to support the indexing and retrieval of HTML web documents. The chief advantage of the proposed model is that it assigns extra weights for terms that appear in certain pre-specified HTML tags that are correlated to the semantics of the document. Additionally, the model is semantic-sensitive as it generates synonyms for every term being indexed and later weights them appropriately to increase the likelihood of retrieving documents with similar context but different vocabulary terms. Experiments conducted, revealed a momentous enhancement in the precision of web IR systems and a radical increase in the number of relevant documents being retrieved. As further research, the proposed model is to be upgraded so as to support the indexing and retrieval of web images in multimedia-rich web documents.

preprint2012arXiv

Sequential & Parallel Algorithms for Big-Integer Numbers Subtraction

Many emerging computer applications require the processing of large numbers, larger than what a CPU can handle. In fact, the top of the line PCs can only manipulate numbers not longer than 32 bits or 64 bits. This is due to the size of the registers and the data-path inside the CPU. As a result, performing arithmetic operations such as subtraction on big-integer numbers is to some extend limited. Different algorithms were designed in an attempt to solve this problem; they all operate on big-integer numbers by first converting them into a binary representation then performing bitwise operations on single bits. Such algorithms are of complexity O(n) where n is the total number of bits in each operand. This paper proposes two new algorithms for performing arithmetic subtraction on big-integer numbers. The two algorithms are different in that one is sequential while the other is parallel. The similarity between them is that both follow the same concept of dividing the big-integer inputs into several blocks or tokens of 60 bits (18 digits) each; thus reducing the input size n in O(n) by a factor of 60. Subtraction of corresponding tokens, one from each operand, is performed as humans perform subtraction, using a pencil and a paper in the decimal system. Both algorithms are to be implemented using MS C#.NET 2005 and tested over a multiple processor system. Further studies can be done on other arithmetic operations such as addition and multiplication.

preprint2012arXiv

Sequential and Parallel Algorithms for the Addition of Big-Integer Numbers

Today's PCs can directly manipulate numbers not longer than 64 bits because the size of the CPU registers and the data-path are limited. Consequently, arithmetic operations such as addition, can only be performed on numbers of that length. To solve the problem of computation on big-integer numbers, different algorithms were developed. However, these algorithms are considerably slow because they operate on individual bits; and are only designed to run over single-processor computers. In this paper, two algorithms for handling arithmetic addition on big-integer numbers are presented. The first algorithm is sequential while the second is parallel. Both algorithms, unlike existing ones, perform addition on blocks or tokens of 60 bits (18 digits), and thus boosting the execution time by a factor of 60.

preprint2012arXiv

Service-Oriented Architecture for Space Exploration Robotic Rover Systems

Currently, industrial sectors are transforming their business processes into e-services and component-based architectures to build flexible, robust, and scalable systems, and reduce integration-related maintenance and development costs. Robotics is yet another promising and fast-growing industry that deals with the creation of machines that operate in an autonomous fashion and serve for various applications including space exploration, weaponry, laboratory research, and manufacturing. It is in space exploration that the most common type of robots is the planetary rover which moves across the surface of a planet and conducts a thorough geological study of the celestial surface. This type of rover system is still ad-hoc in that it incorporates its software into its core hardware making the whole system cohesive, tightly-coupled, more susceptible to shortcomings, less flexible, hard to be scaled and maintained, and impossible to be adapted to other purposes. This paper proposes a service-oriented architecture for space exploration robotic rover systems made out of loosely-coupled and distributed web services. The proposed architecture consists of three elementary tiers: the client tier that corresponds to the actual rover; the server tier that corresponds to the web services; and the middleware tier that corresponds to an Enterprise Service Bus which promotes interoperability between the interconnected entities. The niche of this architecture is that rover's software components are decoupled and isolated from the rover's body and possibly deployed at a distant location. A service-oriented architecture promotes integrate-ability, scalability, reusability, maintainability, and interoperability for client-to-server communication.

preprint2012arXiv

Service-Oriented Architecture for Weaponry and Battle Command and Control Systems in Warfighting

Military is one of many industries that is more computer-dependent than ever before, from soldiers with computerized weapons, and tactical wireless devices, to commanders with advanced battle management, command and control systems. Fundamentally, command and control is the process of planning, monitoring, and commanding military personnel, weaponry equipment, and combating vehicles to execute military missions. In fact, command and control systems are revolutionizing as war fighting is changing into cyber, technology, information, and unmanned warfare. As a result, a new design model that supports scalability, reusability, maintainability, survivability, and interoperability is needed to allow commanders, hundreds of miles away from the battlefield, to plan, monitor, evaluate, and control the war events in a dynamic, robust, agile, and reliable manner. This paper proposes a service-oriented architecture for weaponry and battle command and control systems, made out of loosely-coupled and distributed web services. The proposed architecture consists of three elementary tiers: the client tier that corresponds to any computing military equipment; the server tier that corresponds to the web services that deliver the basic functionalities for the client tier; and the middleware tier that corresponds to an enterprise service bus that promotes interoperability between all the interconnected entities. A command and control system was simulated and experimented and it successfully exhibited the desired features of SOA. Future research can improve upon the proposed architecture so much so that it supports encryption for securing the exchange of data between the various communicating entities of the system.

preprint2012arXiv

TCP Congestion Control Scheme for Wireless Networks based on TCP Reserved Field and SNR Ratio

Currently, TCP is the most popular and widely used network transmission protocol. In actual fact, about 90% of connections on the internet use TCP to communicate. Through several upgrades and improvements, TCP became well optimized for the very reliable wired networks. As a result, TCP considers all packet timeouts in wired networks as due to network congestion and not to bit errors. However, with networking becoming more heterogeneous, providing wired as well as wireless topologies, TCP suffers from performance degradation over error-prone wireless links as it has no mechanism to differentiate error losses from congestion losses. It therefore considers all packet losses as due to congestion and consequently reduces the burst of packet, diminishing at the same time the network throughput. This paper proposes a new TCP congestion control scheme appropriate for wireless as well as wired networks and is capable of distinguishing congestion losses from error losses. The proposed scheme is based on using the reserved field of the TCP header to indicate whether the established connection is over a wired or a wireless link. Additionally, the proposed scheme leverages the SNR ratio to detect the reliability of the link and decide whether to reduce packet burst or retransmit a timed-out packet. Experiments conducted, revealed that the proposed scheme proved to behave correctly in situations where timeouts were due to error and not to congestion. Future work can improve upon the proposed scheme so much so that it can leverage CRC and HEC errors so as to better determine the cause of transmission timeouts in wireless networks.

preprint2012arXiv

Windows And Linux Operating Systems From A Security Perspective

Operating systems are vital system software that, without them, humans would not be able to manage and use computer systems. In essence, an operating system is a collection of software programs whose role is to manage computer resources and provide an interface for client applications to interact with the different computer hardware. Most of the commercial operating systems available today on the market have buggy code and they exhibit security flaws and vulnerabilities. In effect, building a trusted operating system that can mostly resist attacks and provide a secure computing environment to protect the important assets of a computer is the goal of every operating system manufacturer. This paper deeply investigates the various security features of the two most widespread and successful operating systems, Microsoft Windows and Linux. The different security features, designs, and components of the two systems are to be covered elaborately, pin-pointing the key similarities and differences between them. In due course, a head-to-head comparison is to be drawn for each security aspect, exposing the advantage of one system over the other.