Open Source For You — December 2017

(Steven Felgate) #1

Admin How To


44 | DECEMBER 2017 | OPEN SOURCE FOR YOU | http://www.OpenSourceForU.com

The importance of Hive in Hadoop
Apache Hive lets you work with Hadoop in a very efficient
manner. It is a complete data warehouse infrastructure that
is built on top of the Hadoop framework. Hive is uniquely
placed to query data, and perform powerful analysis and
data summarisation while working with large volumes of
data. An integral part of Hive is the HiveQL query, which is
an SQL-like interface that is used extensively to query what
is stored in databases.
Hive has the distinct advantage of deploying high-speed data
reads and writes within the data warehouses while managing
large data sets that are distributed across multiple locations, all
thanks to its SQL-like features. It provides a structure to the
data that is already stored in the database. The users are able to
connect with Hive using a command line tool and a JDBC driver.

How to implement Hive
First, download Hive from http://apache.claz.org/hive/
stable/. Next, download apache-hive-1.2.1-bin.tar.
gz 26-Jun-2015 13:34 89M. Extract it manually and
rename the folder as hive.

Figure 1: Hive architecture

HDFS or HBASE Data Storage

Meta Store
Map Reduce

USER INTERFACES WEB UI HIVE COMMAND LINE HD Insight

Hive QL Process Engine
Execution Engine

Figure 2: Hive configuration

Table 1
Unit name Operation

User interface Hive is data warehouse infrastructure
software that can create interactions
between the user and HDFS. The user
interfaces that Hive supports are Hive
Web UI, the Hive command line, and
Hive HD Insight (in Windows Server).
Meta store Hive chooses respective database
servers to store the schema or
metadata of tables, databases and
columns in a table, along with their
data types, and HDFS mapping.

HiveQL pro-
cess engine

HiveQL is similar to SQL for querying
on schema information on the meta
store. It replaces the traditional ap-
proach of the MapReduce program.
Instead of writing the MapReduce
program in Java, we can write a
query for a MapReduce job and
process it.

Execution
engine

The conjunction part of the HiveQL
Process engine and MapReduce is
the Hive Execution engine, which
processes the query and generates
results that are the same as MapRe-
duce results.
HDFS or
HBASE

Hadoop distributed file system or
HBASE comprises the data storage
techniques for storing data into the file
system.

Figure 3: Getting started with Hive
Free download pdf