Open Source For You — December 2017

Admin How To

44 | DECEMBER 2017 | OPEN SOURCE FOR YOU | http://www.OpenSourceForU.com

The importance of Hive in Hadoop Apache Hive lets you work with Hadoop in a very efficient manner. It is a complete data warehouse infrastructure that is built on top of the Hadoop framework. Hive is uniquely placed to query data, and perform powerful analysis and data summarisation while working with large volumes of data. An integral part of Hive is the HiveQL query, which is an SQL-like interface that is used extensively to query what is stored in databases. Hive has the distinct advantage of deploying high-speed data reads and writes within the data warehouses while managing large data sets that are distributed across multiple locations, all thanks to its SQL-like features. It provides a structure to the data that is already stored in the database. The users are able to connect with Hive using a command line tool and a JDBC driver.

How to implement Hive First, download Hive from http://apache.claz.org/hive/ stable/. Next, download apache-hive-1.2.1-bin.tar. gz 26-Jun-2015 13:34 89M. Extract it manually and rename the folder as hive.

Figure 1: Hive architecture

HDFS or HBASE Data Storage

Meta Store Map Reduce

USER INTERFACES WEB UI HIVE COMMAND LINE HD Insight

Hive QL Process Engine Execution Engine

Figure 2: Hive configuration

Table 1 Unit name Operation

User interface Hive is data warehouse infrastructure software that can create interactions between the user and HDFS. The user interfaces that Hive supports are Hive Web UI, the Hive command line, and Hive HD Insight (in Windows Server). Meta store Hive chooses respective database servers to store the schema or metadata of tables, databases and columns in a table, along with their data types, and HDFS mapping.

HiveQL process engine

HiveQL is similar to SQL for querying on schema information on the meta store. It replaces the traditional ap- proach of the MapReduce program. Instead of writing the MapReduce program in Java, we can write a query for a MapReduce job and process it.

Execution engine

The conjunction part of the HiveQL Process engine and MapReduce is the Hive Execution engine, which processes the query and generates results that are the same as MapRe- duce results. HDFS or HBASE

Hadoop distributed file system or HBASE comprises the data storage techniques for storing data into the file system.

Figure 3: Getting started with Hive

Open Source For You — December 2017

Get our desktop app

Company

Features

Documentation

Resources