Modeling and Data Mining in Blogosphere

Synthesis Lectures on Data Mining and Knowledge Discovery

2009, 109 pages, (doi: 10.2200/S00213ED1V01Y200907DMK001)

Nitin Agarwal

University of Arkansas at Little Rock

Huan Liu

Arizona State University




About the Book:

This book offers a comprehensive overview of the various concepts and research issues about blogs or weblogs. It introduces techniques and approaches, tools and applications, and evaluation methodologies with examples and case studies. Blogs allow people to express their thoughts, voice their opinions, and share their experiences and ideas. Blogs also facilitate interactions among individuals creating a network with unique characteristics. Through the interactions individuals experience a sense of community. We elaborate on approaches that extract communities and cluster blogs based on information of the bloggers. Open standards and low barrier to publication in Blogosphere have transformed information consumers to producers, generating an overwhelming amount of ever-increasing knowledge about the members, their environment and symbiosis. We elaborate on approaches that sift through humongous blog data sources to identify influential and trustworthy bloggers leveraging content and network information. Spam blogs or “splogs” are an increasing concern in Blogosphere and are discussed in detail with the approaches leveraging supervised machine learning algorithms and interaction patterns. We elaborate on data collection procedures, provide resources for blog data repositories, mention various visualization and analysis tools in Blogosphere, and explain conventional and novel evaluation methodologies, to help perform research in the Blogosphere.


This lecture is currently the most read among other lectures on Data Mining and Knowledge Discovery, according to the Morgan & Claypool publishers.



Table of Contents:

1.    Modeling Blogosphere (Slides) (Figures)

2.    Blog Clustering and Community Discovery (Slides) (Figures)

3.    Influence and Trust (Slides) (Figures)

4.    Spam Filtering in Blogosphere (Slides) (Figures)

5.    Data Collection and Evaluation (Slides) (Figures)



Bibliographic entry (Please use the following to cite this book):

Nitin Agarwal and Huan Liu. "Modeling and Data Mining in Blogosphere", Synthesis Lectures on Data Mining and Knowledge Discovery #1, Morgan & Claypool Publishers, Robert Grossman (Editor), pp. 1-109. August 2009. ISBN: 9781598299083 (paperback) ISBN: 9781598299090 (ebook)



Download a digital copy of the book from Morgan & Claypool Publishers website



Order the printed copy of the book from:

1.    Amazon.com

2.    Morgan & Claypool Publishers




This effort was supported in part by the U.S. Office of Naval Research (Grant number: N000141010091). We gratefully acknowledge this support.



See other and upcoming lectures by the authors.

See other lectures in the Data Mining and Knowledge Discovery series.



Last updated: July 22, 2011