Social Media Mining: An Introduction

P1: Sqe Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-05 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 19:23

5.1 Data 107

Attributes Class Name Money Spent Bought Similar Visits Will Buy John High Yes Frequently? Mary High Yes Rarely Yes

A dataset is represented using a set offeatures, and an instance is rep-

INSTANCE, POINT, DATA POINT, OR OBSERVATION

resented using values assigned to these features. Features are also known asmeasurementsorattributes. In this example, the features areName, FEATURES, MEASUREMENTS, OR ATTRIBUTES

Money Spent,Bought Similar, andVisits; feature values for the first instance areJohn,High,Yes, andFrequently. Given the feature values for one instance, one tries to predict itsclass(orclass attribute) value. In our example, the class attribute isWill Buy, and our class value prediction for first instance isYes. An instance such as John in which the class attribute value is unknown is called anunlabeledinstance. Similarly, a labeledinstance is an instance in which the class attribute value in known. LABELED AND UNLABELED

Mary in this dataset represents a labeled instance. The class attribute is optional in a dataset and is only necessary for prediction purposes. One can have a dataset in which no class attribute is present, such as a list of customers and their characteristics. There are different types of features based on the characteristics of the feature and the values they can take. For instance,Money Spentcan be represented using numeric values, such as$25. In that case, we have a continuous feature, whereas in our example it is adiscretefeature, which can take a number of ordered values:{High, Normal, Low}. Different types of features were first introduced by psychologist Stanley SmithStevens [1996] as “levels of measurement” in the theory of scales. LEVELS OF He claimed that there are four types of features. For each feature type, there MEASUREMENT exists a set of permissible operations (statistics) using the feature values and transformations that are allowed. Nominal (categorical). These features take values that are often represented as strings. For instance, a customer’s name is a nominal feature. In general, a few statistics can be computed on nominal features. Examples are the chi-square statistic (χ^2 ) and themode(most common feature value). For example, one can find the most common first name among customers. The only possible transformation on the data is comparison. For example, we can check whether our customer’s name is John or not. Nominal feature values are often presented in a set format.

Social Media Mining: An Introduction

Get our desktop app

Company

Features

Documentation

Resources