Open Source For You — December 2017

102 | DECEMBER 2017 | OPEN SOURCE FOR YOU | http://www.OpenSourceForU.com

For U & Me Insight

the test. All the users in the group exposed to Variation B are referred to as the treatment group. This technique is used to optimise a conversion rate by measuring the performance of the treatment against that of the control using some mathematical calculations. This testing methodology removes the possible uesswork from the website optimisation process, and hence enables various data-informed decisions which shift the business conversations from what ‘we think’ to what ‘we know’. We can make sure that each change produces positive results just by measuring the impact that various changes have on our metrics.

Natural language processing: This area of computational
linguistics is linked to the interactions between different
computers and human languages. In particular, it is
concerned with programming several computers to process
large natural language corpora. The different challenges
in natural language processing are natural language
generation, natural language understanding, connecting the
machine and language perception or some combinations
thereof. Natural language processing research has mostly
relied on machine learning. Initially, there were many
language-processing tasks which involved direct hand
coding of rules. Nowadays, different machine learning
pattern calls are being used instead of the statistical
inference to automatically learn various rules by analysing
large sets of data from real-life examples. Many different
classes of machine learning algorithms have been used for
NLP tasks. These algorithms utilise large sets of ‘features’
as inputs. These features are developed from the input data
set. Recent research has focused more on statistical models,
which take probabilistic decisions based on attaching the
real-valued weights to each input feature. Such models
really have the edge because they can easily express the
relative certainty for more than one different possible
answer rather than only one, therefore producing more
reliable results, compared to when such a model is included
as only one of the many components of a larger system.

How can Big Data benefit your business? Big Data may seem to be out of reach for different non-profit and government agencies that do not have the funds to buy into this new trend. We all have an impression that ‘big’ usually means expensive, but Big Data is not really about using more resources; rather, it’s about the effective usage of the resources at hand. Hence, organisations with limited financial resources can also stay competitive and grow. For

How is Big Data analysed? We all know that we cannot analyse Big Data manually, as it’s a highly challenging and tedious task. In order to make this task easier, there are several techniques that help us to analyse the large sets of data very easily. Let us look at some of the famous techniques being used for data analysis.

Association rule learning: This is a rule-based Big
Data analysis technique which is used to discover the
interesting relations between different variables present
in large databases. It is intended to identify the strong
rules that are discovered in the databases using different
measures of what is considered ‘interesting’. It makes use
of a set of techniques for discovering several interesting
relationships, also called ‘association rules’, among all the
different variables present in the large databases.
All such techniques use a variety of algorithms in order to
generate and then test different possible rules. One of its
most common applications is the market basket analysis.
This helps a retailer to determine the several products
frequently bought together and use that information for
more focused marketing (like the discovery that most of the
supermarket shoppers who buy diapers also go to buy beer,
etc). Association rules are widely being used nowadays in
continuous production, Web usage mining, bioinformatics
and intrusion detection. These rules do not take into
consideration the order of different items either within the
same transaction or across different transactions.

A/B testing: This is a technique that compares the two
different versions of an application to determine which
one performs better. It is also called split testing or
bucket testing. It actually refers to a specific type of
the randomised experiment under which a set of users
are presented with two variations of the same product
(advertisements, emails, Web pages, etc) – let’s call
them Variation A and Variation B. All the users exposed
to Variation A are often referred to as the control group,
since its performance is considered as the baseline against
which any improvement in performance observed from
presenting the Variation B is measured. Also, at times,
Variation A itself acts as the original version of the
product which is being tested against what existed before

Figure 3: Different types of Big Data (Image source: googleimages.com)

Figure 4: Different processes involved in a Big Data system (Image source: googleimages.com)

Open Source For You — December 2017

Get our desktop app

Company

Features

Documentation

Resources