Social Media Mining: An Introduction

P1: Sqe Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-05 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 19:23

136 Data Mining Essentials

Data

Describe methods that can be used to deal with missing data.

Given a continuous attribute, how can we convert it to a discrete
attribute? How can we convert discrete attributes to continuous ones?

If you had the chance of choosing either instance selection or feature
selection, which one would you choose? Please justify.

Given two text documents that are vectorized, how can we measure
document similarity?

In the example provided for TF-IDF (Example 5.1), the word “orange”
received zero score. Is this desirable? What does a high TF-IDF value
show?

Supervised Learning

Provide a pseudocode for decision tree induction.

How many decision trees containingnattributes and a binary class can
be generated?

What does zero entropy mean?

What is the time complexity for learning a naive Bayes classifer?
What is the time complexity for classifying using the naive Bayes
classifier?
Linear separability: Two sets of two-dimensional instances are
linearly separable if they can be completely separated using one
line. In n-dimensional space, two set of instances are linearly
separable if one can separate them by a hyper-plane. A classical
example of nonlinearity is the XOR function. In this function, the
two instance sets are the black-and-white instances (see Figure 5.9),
which cannot be separated using a single line. This is an example
of a nonlinear binary function. Can a naive Bayes classifier learn
nonlinear binary functions? Provide details.
What about linear separability andK-NN? AreK-NNs capable of
solving such problems?

(1,1) (0,1)

(0,0) (1,0) Figure 5.9. Nonlinearity of XOR Function.

Social Media Mining: An Introduction

Get our desktop app

Company

Features

Documentation

Resources