Following are the steps for installing Hadoop. I have just listed the steps with very brief explanation at some places. This is more or less like some reference notes for installation. I made a note of this when I was installing Hadoop on my system for the very first time.
Please let me know if you need any specific details.
Installing HDFS (Hadoop Distributed File System)
OS : Ubuntu
Installing Sun Java on Ubuntu
$sudo apt-get update
$sudo apt-get install oracle-java7-installer
$sudo update-java-alternatives -s java-7-oracle
Create hadoop user
$sudo addgroup hadoop
$sudo adduser —ingroup hadoop hduser
Install SSH Server if not already present. This is needed as hadoop does an ssh into localhost for execution.
$ sudo apt-get install openssh-server
$ su - hduser
$ ssh-keygen -t rsa -P ""
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
Installing Hadoop
Download hadoop from Apache Downloads.
Download link for latest hadoop 2.6.0 can be found here
Download hadoop-2.6.0.tar.gz from the link.
mv hadoop-2.6.0 hadoop
Edit .bashrc
# Set Hadoop-related environment variables
export HADOOP_HOME=/home/hduser/hadoop
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java-6-sun
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
Update hadoop-env.sh
We need only to update the JAVA_HOME variable in this file. Simply you will open this file using a text editor using the following command:
$gedit /home/hduser/hadoop/conf/hadoop-env.sh
Add the following
export JAVA_HOME=/usr/lib/jvm/java-6-sun
Temp directory for hadoop
$mkdir /home/hduser/tmp
Configurations for hadoop
cd home/hduser/hadoop/conf/
Then add the following configurations between <configuration> .. </configuration> xml elements:
core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri’s scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri’s authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
hdfs-site.xml
Open hadoop/conf/hdfs-site.xml using a text editor and add the following configurations:
<property>
<name>dfs.replication</name>
<value>3</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
Formatting NameNode
You should format the NameNode in your HDFS. You should not do this step when the system is running. It is usually done once at first time of your installation.
Run the following command
$/home/hduser/hadoop/bin/hadoop namenode -format
Starting Hadoop Cluster
From hadoop/bin
./start-dfs.sh
./start-yarn.sh
To check for processes running use:
$jps
If jps is not installed, do the following
sudo update-alternatives --install /usr/bin/jps jps /usr/lib/jvm/jdk1.6/bin/jps 1
Tasks running should be as follows:
NameNode
DataNode
SecondaryNameNode
JobTracker
TaskTracker
NOTE : This is for single node setup.If you configure it for cluster node setup, the demons will be shown in the specific serves.
Stopping Hadoop Cluster
From hadoop/bin
./stop-dfs.sh
./stop-yarn.sh
Example Application to test success of hadoop:
Follow my this post to test whether it is successfully configured or not :)
http://dipayandev.blogspot.in/2014/09/benchmark-testing-of-hadoop-cluster.html
For any other query, feel free to comment in the below thread. :)
Happy Hadooping.
Please let me know if you need any specific details.
Installing HDFS (Hadoop Distributed File System)
OS : Ubuntu
Installing Sun Java on Ubuntu
$sudo apt-get update
$sudo apt-get install oracle-java7-installer
$sudo update-java-alternatives -s java-7-oracle
Create hadoop user
$sudo addgroup hadoop
$sudo adduser —ingroup hadoop hduser
Install SSH Server if not already present. This is needed as hadoop does an ssh into localhost for execution.
$ sudo apt-get install openssh-server
$ su - hduser
$ ssh-keygen -t rsa -P ""
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
Installing Hadoop
Download hadoop from Apache Downloads.
Download link for latest hadoop 2.6.0 can be found here
Download hadoop-2.6.0.tar.gz from the link.
mv hadoop-2.6.0 hadoop
Edit .bashrc
# Set Hadoop-related environment variables
export HADOOP_HOME=/home/hduser/hadoop
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java-6-sun
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
Update hadoop-env.sh
We need only to update the JAVA_HOME variable in this file. Simply you will open this file using a text editor using the following command:
$gedit /home/hduser/hadoop/conf/hadoop-env.sh
Add the following
export JAVA_HOME=/usr/lib/jvm/java-6-sun
Temp directory for hadoop
$mkdir /home/hduser/tmp
Configurations for hadoop
cd home/hduser/hadoop/conf/
Then add the following configurations between <configuration> .. </configuration> xml elements:
core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri’s scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri’s authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
hdfs-site.xml
Open hadoop/conf/hdfs-site.xml using a text editor and add the following configurations:
<property>
<name>dfs.replication</name>
<value>3</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
Formatting NameNode
You should format the NameNode in your HDFS. You should not do this step when the system is running. It is usually done once at first time of your installation.
Run the following command
$/home/hduser/hadoop/bin/hadoop namenode -format
Starting Hadoop Cluster
From hadoop/bin
./start-dfs.sh
./start-yarn.sh
To check for processes running use:
$jps
If jps is not installed, do the following
sudo update-alternatives --install /usr/bin/jps jps /usr/lib/jvm/jdk1.6/bin/jps 1
Tasks running should be as follows:
NameNode
DataNode
SecondaryNameNode
JobTracker
TaskTracker
NOTE : This is for single node setup.If you configure it for cluster node setup, the demons will be shown in the specific serves.
Stopping Hadoop Cluster
From hadoop/bin
./stop-dfs.sh
./stop-yarn.sh
Example Application to test success of hadoop:
Follow my this post to test whether it is successfully configured or not :)
http://dipayandev.blogspot.in/2014/09/benchmark-testing-of-hadoop-cluster.html
For any other query, feel free to comment in the below thread. :)
Happy Hadooping.
9 comments:
Thanks Dipayan . This is really helpful.
Welcome :) Do comment with your own name.
tip - use syntax highlighter ;) :) (y)
Sure ! Thanks for the suggestion. :)
the directory : /home/hduser/hadoop/conf/
is missing :)
/home/hduser/hadoop/conf/ is not true :)
you can use %HADOOP_HOME/etc/hadoop/
Apart from learning more about Hadoop at hadoop online training, this blog adds to my learning platforms. Great work done by the webmasters. Thanks for your research and experience sharing on a platform like this.
Thanks for the blog.please keep updating.Hadoop is a platform for storing and processing of Data in an environment with clusters of computers using simple programming language.It is designed in such a way that it connects from single servers to group of servers with proper computation and storage.
Hadoop training in chennai
Post a Comment