Social Media Mining: An Introduction

(Axel Boer) #1

P1: Sqe Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-05 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 19:23


5.4 Supervised Learning 115

Table 5.1. A Sample Dataset. In this dataset, features are
characteristics of individuals on Twitter, and the class attribute
denotes whether they are influential or not

ID Celebrity Verified Account # Followers Influential?
1 Yes No 1.25 M No
2 No Yes 1 M No
3 No Yes 600 K No
4 Yes Unknown 2.2 M No
5 No No 850 K Yes
6 No Yes 750 K No
7 No No 900 K Yes
8 No No 700 K No
9 Yes Yes 1.2 M No
10 No Unknown 950 K Yes

5.4.1 Decision Tree Learning

Consider the dataset shown in Table5.1. The last attribute represents the
class attribute, and the other attributes represent the features. In decision tree
classification, a decision tree is learned from the training dataset, and that
tree is later used to predict the class attribute value for instances in the test
dataset. As an example, two learned decision trees from the dataset shown
in Table5.1are provided in Figure5.3. As shown in this figure, multiple
decision trees can be learned from the same dataset, and these decision trees
can both correctly predict the class attribute values for all instances in the
dataset. Construction of decision trees is based on heuristics, as different
heuristics generate different decision trees from the same dataset.

Splitting Attributes

Ye s Celebrity N o

Celebrity

No AccountVerified

Verified
Account

No, Unknown

No, Unknown

Number of
Followers

Number of
<800 K >800 K <800 K Followers >800 K
No

No

No

No

No

Ye s No

Ye s

Ye s

Ye s

Ye s

(a) Learned Decision Tree 1 (b) Learned Decision Tree 2
Figure 5.3. Decision Trees Learned from Data Provided in Table5.1.
Free download pdf