Open Source For You — December 2017

(Steven Felgate) #1
http://www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 101

Insight For U & Me

billion at the start of 2015 and around 1.65 billion at the
start of 2016. On an average, there are approximately 1.32
billion daily active users as of June 2017. Every day, 4.3
billion Facebook messages get posted. There are around
5.75 billion Facebook likes every day.


  1. Mobile text messages: There are almost 22 billion text
    messages sent every day (for personal and commercial
    purposes).

  2. Google: On an average, in 2017, more than 5.2 billion
    daily Google searches get initiated.

  3. IoT devices: Devices are a huge source of the 2.5
    quintillion bytes of data that we create every day –
    this not only includes mobile devices, but smart TVs,
    airplanes, cars, etc. Hence, the Internet of Things is
    producing an increasing amount of data.


Characteristics of Big Data
There are several characteristics of Big Data as listed below.
Volume: This refers to the quantity of generated and stored
data sets. The size of the data helps in determining the value and
potential insights into it; hence, it helps us to know if a specific
set of data can actually be considered as Big Data or not.
Variety: This property deals with the different types and
nature of the data. This actually helps people who analyse the
large data sets to effectively use the resulting insights obtained
after analysis. If a specific set of data contains different
varieties of data, then we can consider it as Big Data.
Velocity: The speed of data generation also plays a big
role when we classify something as Big Data. The speed
data is generated and further processed at to arrive at results
that can be analysed for further use is one of the major
properties of Big Data.
Variability: When we talk about Big Data, there is always
some inconsistency associated with it. We consider the data
set as inconsistent if it does not have a specific pattern or
structure. This can hamper the different processes required to
handle and manage the data.
Veracity: The quality of the captured data can also vary
a lot, which affects the accurate analysis of the large data
sets. If the captured data’s quality is not good enough to be
analysed then it needs to be processed before analysis.


  1. The data is of very high volume.

  2. It is generated, stored and processed very quickly.

  3. The data cannot be categorised into regular
    relational databases.
    Big Data has a lot of potential in business applications.
    It plays a role in the manufacture of healthcare machines,
    social media, banking transactions and satellite imaging.
    Traditionally, the data is stored in a structured format in order
    to be easily retrieved and analysed. However, present data
    volumes comprise both unstructured as well as semi-structured
    data. Hence, end-to-end processing can be impeded during the
    translation between the structured data in a relational database
    management system and the unstructured data for analytics.
    Among the problems linked to the staggering volumes of data
    being generated is the transfer speed of data, the diversity of
    data, and security issues. There have been several advances
    in data storage and mining technologies, which enable the
    preservation of such increased amounts of data. Also, during
    this preservation process, the nature of the original data
    generated by organisations is modified.


Some big sources of Big Data
Let’s have a quick look at some of the main sources of data
along with some statistics (Data source: http://microfocus.com).


  1. Social media: There are around 1,209,600 (1.2 million)
    new data producing social media users every day.

  2. Twitter: There are approximately 656 million tweets per day!

  3. YouTube: There are more than 4 million hours of content
    uploaded to YouTube every day, with all its users watching
    around 5.97 billion hours of YouTube videos each day.

  4. Instagram: There are approximately 67,305,600 (67.30
    million) Instagram posts uploaded each day.

  5. Facebook: There have been more than 2 billion monthly
    active Facebook users in 2017 so far, compared to 1.44


Figure 1: Challenges of Big Data (Image source: googleimages.com)


Figure 2: Major sources of Big Data (Image source: googleimages.com)

Business systems

Facebook
Blogs
Transactions Twitter

Unstructured
data

Sensor
data

Social
Media

BIG DATA SOURCES

Free download pdf