Social Media Mining: An Introduction

P1: Sqe Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-05 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 19:23

5.4 Supervised Learning 115

Table 5.1. A Sample Dataset. In this dataset, features are characteristics of individuals on Twitter, and the class attribute denotes whether they are influential or not

ID Celebrity Verified Account # Followers Influential? 1 Yes No 1.25 M No 2 No Yes 1 M No 3 No Yes 600 K No 4 Yes Unknown 2.2 M No 5 No No 850 K Yes 6 No Yes 750 K No 7 No No 900 K Yes 8 No No 700 K No 9 Yes Yes 1.2 M No 10 No Unknown 950 K Yes

5.4.1 Decision Tree Learning

Consider the dataset shown in Table5.1. The last attribute represents the class attribute, and the other attributes represent the features. In decision tree classification, a decision tree is learned from the training dataset, and that tree is later used to predict the class attribute value for instances in the test dataset. As an example, two learned decision trees from the dataset shown in Table5.1are provided in Figure5.3. As shown in this figure, multiple decision trees can be learned from the same dataset, and these decision trees can both correctly predict the class attribute values for all instances in the dataset. Construction of decision trees is based on heuristics, as different heuristics generate different decision trees from the same dataset.

Splitting Attributes

Ye s Celebrity N o

Celebrity

No AccountVerified

Verified Account

No, Unknown

Number of Followers

Number of <800 K >800 K <800 K Followers >800 K No

No

Ye s No

Ye s

(a) Learned Decision Tree 1 (b) Learned Decision Tree 2 Figure 5.3. Decision Trees Learned from Data Provided in Table5.1.

Social Media Mining: An Introduction

Get our desktop app

Company

Features

Documentation

Resources