Social Media Mining: An Introduction

P1: Sqe Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-05 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 19:23

5.8 Exercises 135

but not all. The model uses the labeled information and the feature distri- bution of the unlabeled data to learn a model.Expectation maximization (EM) is a well-established technique from this area. In short, EM learns a model from the data that is partially labeled (expectation step). Then, it uses this model to predict labels for the unlabeled instances (maximization step). The predicted labels for instances are used once again to refine the learned model and revise predictions for unlabeled instances in an iterative fashion until convergence in reached. In addition to supervised methods covered, neural networks deserve mention [Haykin and Network, 2004]. More on regression techniques in available in [Neter et al., 1996;Bishop, 2006 ]. Clustering is one of the most popular areas in the field of machine learning research. A taxonomy of clustering algorithms can be found in [Berkhin, 2006;Jain et al., 1999;Xu and Wunsch, 2005;Mirkin, 2005]. Among clustering algorithms, some of which use data density of cluster data, DBSCAN [Ester et al., 1996], GDBSCAN [Sander et al., 1998], CLARANS [Ng and Han, 1994], and OPTICS [Ankerst et al., 1999]are some of the most well known and practiced algorithms. Most of the previous contributions in the area of clustering consider the number of clusters as an input parameter. Early literature in clustering had attempted to solve this by running algorithms for severalKs (number of clusters) and selecting the bestKthat optimizes some coefficients [Milligan and Cooper, 1985; Berkhin, 2006]. For example, the distance between two cluster centroids normalized by a cluster’s standard deviation could be used as a coefficient. After the coefficient is selected, the coefficient values are plotted as a function ofK(number of clusters) and the bestKis selected. An interesting application of data mining issentiment analysisin which the level of subjective content in information is quantified; for example, identifying the polarity (i.e., being positive/negative) of a digital camera review. General references for sentiment analysis can be found in [Pang and Lee, 2008;Liu, 2007], and examples of recent developments in social media are available in [Hu et al., 2013a,b].

5.8 Exercises

Describe how methods from this chapter can be applied in social
media.

Outline a framework for using the supervised learning algorithm for
unsupervised learning.

Social Media Mining: An Introduction

Get our desktop app

Company

Features

Documentation

Resources