Friday, December 5, 2014

Hive : How to install it on top of Hadoop in Ubuntu

What is Apache Hive?

Apache Hive is a data warehouse infrastructure that facilitates querying and managing large data sets which resides in distributed storage system. It is built on top of Hadoop and developed by Facebook. Hive provides a way to query the data using a SQL-like query language called HiveQL(Hive query Language).

Internally, a compiler translates HiveQL statements into MapReduce jobs, which are then submitted to Hadoop framework for execution.

Difference between Hive and SQL?

Hive looks very much similar like traditional database with SQL access. However, because Hive is based on Hadoop and MapReduce operations, there are several key differences:

As Hadoop is intended for long sequential scans and Hive is based on Hadoop, you would expect queries to have a very high latency. It means that Hive would not be appropriate for those applications that need very fast response times, as you can expect with a traditional RDBMS database.

Finally, Hive is read-based and therefore not appropriate for transaction processing that typically involves a high percentage of write operations.

Hive Installation on Ubuntu:

Follow the below steps to install Apache Hive on Ubuntu:

Step 1:  Download Hive tar.

Download the latest Hive version from here

Step 2: untar the file.

Step 3: Edit the “.bashrc” file to update the environment variables for user.

   $sudo gedit .bashrc

Add the following at the end of the file:

export HADOOP_HOME=/home/user/hadoop-2.4.0
export HIVE_HOME=/home/user/hive-0.14.0-bin
export PATH=$PATH:$HIVE_HOME/bin

Step 4:  Create Hive directories within HDFS.

NOTE: Run the commands from bin folder of hadoop[installed]

$hadoop fs -mkdir /user/hive/storage

The directory ‘storage’ is the location to store the table or data related to hive.

$hadoop fs -mkdir /tmp

The temporary directory ‘tmp’is the temporary location to store the intermediate result of processing.

Step 5: Set read/write permissions for table.

In this command we are giving written permission to the group:

$hadoop fs -chmod 774  /user/hive/warehouse

$hadoop fs -chmod 774  /tmp

Step 6:  Set Hadoop path in Hive

cd hadoop // my current directory where hadoop is stored.
cd hive*-bin
cd bin
sudo gedit

In the configuration file , add the following

export HADOOP_HOME=/home/user/hadoop-2.4.0

Step 7: Launch Hive.

***[run from bin of Hive] 

Command: $hive

Step 8: Test your setup

$show tables; 

Don't forget to put semicolon after this command :P
Press Ctrl+C to exit Hive 

Happy Hadooping! ;) 

No comments: