Apache Hive is a data warehouse infrastructure that facilitates querying and managing large data sets which resides in distributed storage system. It is built on top of Hadoop and developed by Facebook. Hive provides a way to query the data using a SQL-like query language called HiveQL(Hive query Language).
Internally, a compiler translates HiveQL statements into MapReduce jobs, which are then submitted to Hadoop framework for execution.
Difference between Hive and SQL?
Hive looks very much similar like traditional database with SQL access. However, because Hive is based on Hadoop and MapReduce operations, there are several key differences:
As Hadoop is intended for long sequential scans and Hive is based on Hadoop, you would expect queries to have a very high latency. It means that Hive would not be appropriate for those applications that need very fast response times, as you can expect with a traditional RDBMS database.
Finally, Hive is read-based and therefore not appropriate for transaction processing that typically involves a high percentage of write operations.
Hive Installation on Ubuntu:
Follow the below steps to install Apache Hive on Ubuntu:
Step 1: Download Hive tar.
Download the latest Hive version from here
Step 2: untar the file.
Step 3: Edit the “.bashrc” file to update the environment variables for user.
$sudo gedit .bashrc
Add the following at the end of the file:
export HADOOP_HOME=/home/user/hadoop-2.4.0
export HIVE_HOME=/home/user/hive-0.14.0-bin
export PATH=$PATH:$HIVE_HOME/bin
export PATH=$PATH:$HADOOP_HOME/bin
Step 4: Create Hive directories within HDFS.
NOTE: Run the commands from bin folder of hadoop[installed]
NOTE: Run the commands from bin folder of hadoop[installed]
$hadoop fs -mkdir /user/hive/storage
The directory ‘storage’ is the location to store the table or data related to hive.
$hadoop fs -mkdir /tmp
The temporary directory ‘tmp’is the temporary location to store the intermediate result of processing.
Step 5: Set read/write permissions for table.
In this command we are giving written permission to the group:
$hadoop fs -chmod 774 /user/hive/warehouse
$hadoop fs -chmod 774 /tmp
Step 6: Set Hadoop path in Hive config.sh.
cd hadoop // my current directory where hadoop is stored.
cd hive*-bin
cd bin
sudo gedit hive-config.sh
In the configuration file , add the following
export HIVE_CONF_DIR=$HIVE_CONF_DIR
export HIVE_AUX_JAR_PATH=$HIVE_AUX_JAR_PATH
export HADOOP_HOME=/home/user/hadoop-2.4.0
Step 7: Launch Hive.
***[run from bin of Hive]
***[run from bin of Hive]
Command: $hive
Step 8: Test your setup
$show tables;
Don't forget to put semicolon after this command :P
Press Ctrl+C to exit Hive
Press Ctrl+C to exit Hive
Happy Hadooping! ;)
No comments:
Post a Comment