Install Hadoop on AWS Ubuntu Instance
Step 1: Create an Ubuntu 14.04 LTS instance on AWS.
Step 2: Connect to the instance.
chmod 400 yourKey.pem
ssh -i yourKey.pem ubuntu@your_instance_ip
Step 3: Install Java.
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java6-installer
sudo update-java-alternatives -s java-6-oracle
sudo apt-get install oracle-java6-set-default
Step 4: Add a Hadoop user.
sudo addgroup hadoop
sudo adduser --ingroup hadoop hduser
Step 5: Create an SSH key for password-free login.
su - hduser
ssh-keygen -t rsa -P ""
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
Step 6: Test the connection.
ssh localhost
exit
Step 7: Download and Install Hadoop.
cd /usr/local
sudo wget [http://apache.01link.hk/hadoop/core/hadoop-1.2.1/hadoop-1.2.1.tar.gz](http://apache.01link.hk/hadoop/core/hadoop-1.2.1/hadoop-1.2.1.tar.gz)
sudo tar -xzvf hadoop-1.2.1.tar.gz
sudo mv hadoop-1.2.1 hadoop
chown -R hduser:hadoop hadoop
sudo rm hadoop-1.2.1.tar.gz
Step 8: Update .bashrc
.
su - hduser
vim $HOME/.bashrc
# Add the following content to the end of the file:
export HADOOP_PREFIX=/usr/local/hadoop
export JAVA_HOME=/usr/lib/jvm/java-6-sun
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"
export PATH=$PATH:$HADOOP_PREFIX/bin
Then save it with :wq
and execute .bashrc
.
source ~/.bashrc
Step 9: Configure Hadoop, logged in as hduser
.
cd /usr/local/hadoop/conf
vim hadoop-env.sh
# Add the following lines to the file:
export JAVA_HOME=/usr/lib/jvm/java-6-oracle
export HADOOP_CLASSPATH=/usr/local/hadoop
Save and exit with :wq
.
Step 10: Create a temporary directory for Hadoop.
exit
sudo mkdir -p /app/hadoop/tmp
sudo chown hduser:hadoop /app/hadoop/tmp
sudo chmod 750 /app/hadoop/tmp
Step 11: Add configuration snippets.
su - hduser
cd /usr/local/hadoop/conf
vim core-site.xml
# Put the following content between <configuration> ... </configuration> tags:
Include your Hadoop configuration here.
# Save and exit with :wq
Continue with configuring your additional files as needed.
Step 12: Format the HDFS.
/usr/local/hadoop/bin/hadoop namenode -format
Step 13: Start Hadoop.
/usr/local/hadoop/bin/start-all.sh
Step 14: To check if all processes are up and running.
jps
Step 15: To stop Hadoop, type the following command:
/usr/local/hadoop/bin/stop-all.sh
Step 16: To start Hadoop again.
/usr/local/hadoop/bin/start-all.sh
You are now ready to rock! Have fun :)