Social Media Mining: An Introduction

(Axel Boer) #1

P1: Sqe Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-05 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 19:23


5.4 Supervised Learning 113

5 females and 5 males can be selected using stratified sampling
from this set.
In social media, a large amount of information is represented in
network form. These networks can be sampled by selecting a subset
of their nodes and edges. These nodes and edges can be selected
using the aforementioned sampling methods. We can also sample
these networks by starting with a small set of nodes (seed nodes) and
sample
(a) the connected components they belong to;
(b) the set of nodes (and edges) connected to them directly; or
(c) the set of nodes and edges that are withinn-hop distance from
them.

After preprocessing is performed, the data is ready to be mined. Next,
we discuss two general categories of data mining algorithms and how each
can be evaluated.

5.3 Data Mining Algorithms

Data mining algorithms can be divided into several categories. Here, we
discuss two well-established categories:supervised learningandunsuper-
vised learning. In supervised learning, the class attribute exists, and the
task is to predict the class attribute value. Our previous example of pre-
dicting the class attribute “will buy” is an example of supervised learning.
In unsupervised learning, the dataset has no class attribute, and our task is
to find similar instances in the dataset and group them. By grouping these
similar instances, one can find significant patterns in a dataset. For example,
unsupervised learning can be used to identify events on Twitter, because the
frequency of tweeting is different for various events. By using unsupervised
learning, tweets can be grouped based on the times at which they appear
and hence, identify the tweets’ corresponding real-world events. Other cat-
egories of data mining algorithms exist; interested readers can refer to the
bibliographic notes for pointers to these categories.

5.4 Supervised Learning
The first category of algorithms, supervised learning algorithms, are those
for which the class attribute values for the dataset are known before running
the algorithm. This data is calledlabeleddata ortrainingdata. Instances in
Free download pdf