Source author record

Mehwish Aziz

Mehwish Aziz appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Information Retrieval Computation and Language Databases

Catalog footprint

What is connected

3works

4topics

3close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2014arXiv

Efficiency Analysis of Materialized views in DataWarehouse Using selfmaintenance

A data warehouse is a large data repository for the purpose of analysis and decision making in organizations. To improve the query performance and to get fast access to the data, data is stored as materialized views (MV) in the data warehouse. When data at source gets updated, the materialized views also need to be updated. In this paper, we focus on the problem of maintenance of these materialized views and address the issue of finding such auxiliary views (AV) that together with the materialized views make the data self-maintainable and take minimal space. We propose an algorithm that uses key and referential constraints which reduces the total number of tuples in auxiliary views and uses idea of information sharing between these auxiliary views to further reduce number of auxiliary views.

preprint2012arXiv

Pbm: A new dataset for blog mining

Text mining is becoming vital as Web 2.0 offers collaborative content creation and sharing. Now Researchers have growing interest in text mining methods for discovering knowledge. Text mining researchers come from variety of areas like: Natural Language Processing, Computational Linguistic, Machine Learning, and Statistics. A typical text mining application involves preprocessing of text, stemming and lemmatization, tagging and annotation, deriving knowledge patterns, evaluating and interpreting the results. There are numerous approaches for performing text mining tasks, like: clustering, categorization, sentimental analysis, and summarization. There is a growing need to standardize the evaluation of these tasks. One major component of establishing standardization is to provide standard datasets for these tasks. Although there are various standard datasets available for traditional text mining tasks, but there are very few and expensive datasets for blog-mining task. Blogs, a new genre in web 2.0 is a digital diary of web user, which has chronological entries and contains a lot of useful knowledge, thus offers a lot of challenges and opportunities for text mining. In this paper, we report a new indigenous dataset for Pakistani Political Blogosphere. The paper describes the process of data collection, organization, and standardization. We have used this dataset for carrying out various text mining tasks for blogosphere, like: blog-search, political sentiments analysis and tracking, identification of influential blogger, and clustering of the blog-posts. We wish to offer this dataset free for others who aspire to pursue further in this domain.

preprint2012arXiv

Sentence based semantic similarity measure for blog-posts

Blogs-Online digital diary like application on web 2.0 has opened new and easy way to voice opinion, thoughts, and like-dislike of every Internet user to the World. Blogosphere has no doubt the largest user-generated content repository full of knowledge. The potential of this knowledge is still to be explored. Knowledge discovery from this new genre is quite difficult and challenging as it is totally different from other popular genre of web-applications like World Wide Web (WWW). Blog-posts unlike web documents are small in size, thus lack in context and contain relaxed grammatical structures. Hence, standard text similarity measure fails to provide good results. In this paper, specialized requirements for comparing a pair of blog-posts is thoroughly investigated. Based on this we proposed a novel algorithm for sentence oriented semantic similarity measure of a pair of blog-posts. We applied this algorithm on a subset of political blogosphere of Pakistan, to cluster the blogs on different issues of political realm and to identify the influential bloggers.

Mehwish Aziz

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

Efficiency Analysis of Materialized views in DataWarehouse Using selfmaintenance

Pbm: A new dataset for blog mining

Sentence based semantic similarity measure for blog-posts