Social Media Mining: An Introduction

(Axel Boer) #1

P1: Sqe Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-05 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 19:23


5.8 Exercises 135

but not all. The model uses the labeled information and the feature distri-
bution of the unlabeled data to learn a model.Expectation maximization
(EM) is a well-established technique from this area. In short, EM learns
a model from the data that is partially labeled (expectation step). Then, it
uses this model to predict labels for the unlabeled instances (maximization
step). The predicted labels for instances are used once again to refine the
learned model and revise predictions for unlabeled instances in an iterative
fashion until convergence in reached. In addition to supervised methods
covered, neural networks deserve mention [Haykin and Network, 2004].
More on regression techniques in available in [Neter et al., 1996;Bishop,
2006 ].
Clustering is one of the most popular areas in the field of machine
learning research. A taxonomy of clustering algorithms can be found in
[Berkhin, 2006;Jain et al., 1999;Xu and Wunsch, 2005;Mirkin, 2005].
Among clustering algorithms, some of which use data density of cluster
data, DBSCAN [Ester et al., 1996], GDBSCAN [Sander et al., 1998],
CLARANS [Ng and Han, 1994], and OPTICS [Ankerst et al., 1999]are
some of the most well known and practiced algorithms. Most of the previous
contributions in the area of clustering consider the number of clusters as an
input parameter. Early literature in clustering had attempted to solve this
by running algorithms for severalKs (number of clusters) and selecting
the bestKthat optimizes some coefficients [Milligan and Cooper, 1985;
Berkhin, 2006]. For example, the distance between two cluster centroids
normalized by a cluster’s standard deviation could be used as a coefficient.
After the coefficient is selected, the coefficient values are plotted as a
function ofK(number of clusters) and the bestKis selected.
An interesting application of data mining issentiment analysisin which
the level of subjective content in information is quantified; for exam-
ple, identifying the polarity (i.e., being positive/negative) of a digital
camera review. General references for sentiment analysis can be found
in [Pang and Lee, 2008;Liu, 2007], and examples of recent developments
in social media are available in [Hu et al., 2013a,b].

5.8 Exercises


  1. Describe how methods from this chapter can be applied in social
    media.

  2. Outline a framework for using the supervised learning algorithm for
    unsupervised learning.

Free download pdf