102 | DECEMBER 2017 | OPEN SOURCE FOR YOU | http://www.OpenSourceForU.com
For U & Me Insight
the test. All the users in the group exposed to Variation
B are referred to as the treatment group. This technique
is used to optimise a conversion rate by measuring the
performance of the treatment against that of the control
using some mathematical calculations.
This testing methodology removes the possible uesswork from
the website optimisation process, and hence enables various
data-informed decisions which shift the business conversations
from what ‘we think’ to what ‘we know’. We can make sure
that each change produces positive results just by measuring
the impact that various changes have on our metrics.
- Natural language processing: This area of computational
linguistics is linked to the interactions between different
computers and human languages. In particular, it is
concerned with programming several computers to process
large natural language corpora. The different challenges
in natural language processing are natural language
generation, natural language understanding, connecting the
machine and language perception or some combinations
thereof. Natural language processing research has mostly
relied on machine learning. Initially, there were many
language-processing tasks which involved direct hand
coding of rules. Nowadays, different machine learning
pattern calls are being used instead of the statistical
inference to automatically learn various rules by analysing
large sets of data from real-life examples. Many different
classes of machine learning algorithms have been used for
NLP tasks. These algorithms utilise large sets of ‘features’
as inputs. These features are developed from the input data
set. Recent research has focused more on statistical models,
which take probabilistic decisions based on attaching the
real-valued weights to each input feature. Such models
really have the edge because they can easily express the
relative certainty for more than one different possible
answer rather than only one, therefore producing more
reliable results, compared to when such a model is included
as only one of the many components of a larger system.
How can Big Data benefit your business?
Big Data may seem to be out of reach for different non-profit
and government agencies that do not have the funds to buy
into this new trend. We all have an impression that ‘big’
usually means expensive, but Big Data is not really about
using more resources; rather, it’s about the effective usage
of the resources at hand. Hence, organisations with limited
financial resources can also stay competitive and grow. For
How is Big Data analysed?
We all know that we cannot analyse Big Data manually, as it’s
a highly challenging and tedious task. In order to make this
task easier, there are several techniques that help us to analyse
the large sets of data very easily. Let us look at some of the
famous techniques being used for data analysis.
- Association rule learning: This is a rule-based Big
Data analysis technique which is used to discover the
interesting relations between different variables present
in large databases. It is intended to identify the strong
rules that are discovered in the databases using different
measures of what is considered ‘interesting’. It makes use
of a set of techniques for discovering several interesting
relationships, also called ‘association rules’, among all the
different variables present in the large databases.
All such techniques use a variety of algorithms in order to
generate and then test different possible rules. One of its
most common applications is the market basket analysis.
This helps a retailer to determine the several products
frequently bought together and use that information for
more focused marketing (like the discovery that most of the
supermarket shoppers who buy diapers also go to buy beer,
etc). Association rules are widely being used nowadays in
continuous production, Web usage mining, bioinformatics
and intrusion detection. These rules do not take into
consideration the order of different items either within the
same transaction or across different transactions. - A/B testing: This is a technique that compares the two
different versions of an application to determine which
one performs better. It is also called split testing or
bucket testing. It actually refers to a specific type of
the randomised experiment under which a set of users
are presented with two variations of the same product
(advertisements, emails, Web pages, etc) – let’s call
them Variation A and Variation B. All the users exposed
to Variation A are often referred to as the control group,
since its performance is considered as the baseline against
which any improvement in performance observed from
presenting the Variation B is measured. Also, at times,
Variation A itself acts as the original version of the
product which is being tested against what existed before
Figure 3: Different types of Big Data (Image source: googleimages.com)
Figure 4: Different processes involved in a Big Data system
(Image source: googleimages.com)