data-architecture-a

Chapter 4.2

What Is Big Data?

Abstract

There are different definitions of big data. The definition used here is that big data
encompasses a lot of data, is based on inexpensive storage, manages data by the “Roman
census” method, and stores data in an unstructured format. There are two major types of
big data—repetitive big data and nonrepetitive big data. Only a small fraction of
repetitive big data has business value, whereas almost all of nonrepetitive big data has
business value. In order to achieve business value, the context of data in big data must be
determined. Contextualization of repetitive big data is easily achieved. But
contextualization of nonrepetitive data is done by means of textual disambiguation.

Keywords

Big data; Roman census method; Unstructured data; Repetitive data; Nonrepetitive data;
Contextualization; Textual disambiguation

The definition of big data as defined by Gartner Group is

volume, velocity, variety.

While this definition is often quoted and used on a widespread basis, it is not a definition
at all. The load handled by a semitruck going down the highway fits this definition and
the cargo of an ocean liner. In fact, there are many things that fit this definition other
than big data.

Another Definition

The problem with the Gartner definition is that it describes some of the characteristics of
big data, but it does not disclose the identifying characteristics.

The definition of big data that we will use for this book is as follows:

Chapter 4.2: What Is Big Data?

data-architecture-a

What Is Big Data?

Abstract

Keywords

Another Definition

Get our desktop app

Company

Features

Documentation

Resources