Changes Done in UBUNTU 12.04 64-bit : Post Installation
1. Access To Root:
eagroup@BI-Lab:~$ sudo su
[sudo] password for eagroup:
root@BI-Lab:/home/eagroup# sudo passwd
Enter new UNIX password: password
Retype new UNIX password: password
passwd: password updated successfully
root@BI-Lab:/home/eagroup#
2. Add and manage users and groups:
a. Add user from GUI - hduser,
Username Password Privilage
root password root
eagroup password admin
hduser hduser hadoop user
3. Making hduser sudoer:
a. login as root
$ su
b. nano /etc/sudoers
c. add the following:
hduser ALL=(ALL:ALL) ALL
4. Create Share Folder - Install & Configure Samba : (http://rbgeek.wordpress.com/2012/04/25/how-to-install-samba-server-on-ubuntu-12-04/)
a. Install the samba package - $ sudo apt-get install samba samba-common
b. Check the version - $ smbd --version
c. Suggested packages for samba - $ sudo apt-get install python-glade2 system-config-samba
->Go to your Windows machine and use this command in cmd
d. net config workstation
Note the workstation domain
->Go to Ubuntu system:
e. Backup the smb.conf file, then delete it and create the new one:
sudo cp /etc/samba/smb.conf /etc/samba/smb.conf.bak
sudo rm /etc/samba/smb.conf
sudo touch /etc/samba/smb.conf
sudo nano /etc/samba/smb.conf
f. Add this, in your smb.conf file (or change it according to your requirement):
#======================= Global Settings =====================================
[global]
workgroup = INFICS
server string = Samba Server %v
netbios name = ubuntu
security = user
map to guest = bad user
dns proxy = no
#============================ Share Definitions ==============================
[MyShare]
path = /Share
browsable =yes
writable = yes
guest ok = yes
read only = no
g. Save the smb.conf file and restart the service:
sudo service smbd restart
h. Check the current permission on the samba share:
cd /samba/
ls -l
i. Change it, in such a way that everyone can read and write it(Check it, that it is allowed in your environment or not):
sudo chmod -R 0777 share
ls -l
j. Verify the newly created file on samba server:
cd share/
ls -l
5. Install xrdp to access remotely from Windows:
a. sudo apt-get install xrdp
b. sudo apt-get install gnome-session-fallback
c. sudo /etc/init.d/xrdp restart
d. login as particular user(the user you use to access from Windows)
e. echo gnome-session --session=gnome-fallback > ~/.xsession
6. Install SSH:
a. sudo apt-get install ssh
b. sudo service ssh restart
7. Install RPM:
a. sudo apt-get install rpm
8. Install Telnet
a. sudo apt-get install telnetd
9. Install ORACLE JAVA:(http://www.liberiangeek.net/2012/11/install-oracle-java-jrejdk-6-in-ubuntu-12-10-quantal-quetzal/)
a. sudo add-apt-repository ppa:webupd8team/java
b. sudo apt-get update && sudo apt-get install oracle-java6-installer
10. Setting JAVA_HOME, JRE_HOME and PATH:
a. Login as root.
b. create .bash_profile file
c. Add Following in .bash_profile:
JAVA_HOME=/usr/lib/jvm/java-6-oracle
PATH=$PATH:$HOME/bin:$JAVA_HOME/bin
JRE_HOME=/usr/lib/jvm/java-6-oracle
PATH=$PATH:$HOME/bin:$JRE_HOME/bin
export JAVA_HOME
export JRE_HOME
export PATH
d. Source .bash_profile
11. Test Java:
a. Creat a file name HelloWorld.java and enter the following:
public class HelloWorld {
public static void main(String[] args) {
System.out.println("Hello, World");
}
}
b. Compile HelloWorld.java
$javac HelloWorld.java
c. Run HelloWorld
$java HelloWorld
12. Install Eclipse(Indigo):
a. Download and install Eclipse from 'Ubuntu Software Center'
b. Go to terminal and run following:(http://askubuntu.com/questions/138019/unable-to-open-eclipse-ide-due-to-missing-symlink)
ln -s /usr/lib/jni/libswt-* ~/.swt/lib/linux/x86_64/
12. Removed Eclipse-Indigo and Installed Eclipse Juno
a. Remove Eclipse Indigo from Ubuntu Software Center
b. Download Eclipse Juno from http://www.eclipse.org/downloads/
c. extract the gz file
$ tar xvf eclipse-jee-juno-SR2-linux-gtk-x86_64.tar.gz
this is extracted in eclipse folder
d. login as root
$ su
e. copy the extracted eclipse folder in /opt
$ cp /home/tejeswar/Downloads/eclipse /opt
f. create a desktop icon
$ ln -s /opt/eclipse/eclipse /home/tejeswar/Desktop
13. Test Eclipse:
a. Create a java project 'Sample'
b. Create a package 'sample'
c. Create a class 'HelloWorld' and enter following:
public class HelloWorld {
public static void main(String[] args) {
System.out.println("Hello, World");
}
}
d. Right click on 'HelloWorld.java' and run as 'Java Application'
---->>>>> Hadoop and its eco system downloaded from the following link <<<<<<<-----------
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDHTarballs/3.25.2013/CDH3-Downloadable-Tarballs/CDH3-Downloadable-Tarballs.html
14. Install Hadoop:
********Hadoop Prerequestie***********
a. Go to hduser - Generate an SSH key for the hduser user.
$ su - hduser
$ ssh-keygen -t rsa -P ""
b. Enable SSH access to your local machine with this newly created key
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
c. Test the SSH setup by connecting to your local machine with the hduser user
$ ssh localhost
d. Disabling IPv6
-> Open /etc/sysctl.conf
-> Enter following:
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
e. Reboot your machine in order to make the changes take effect
f. Check whether IPv6 is enabled on your machine with the following command:
$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6
A return value of 0 means IPv6 is enabled, a value of 1 means disabled
************Hadoop Installation****************
a. Download Apache Hadoop From Apache Mirror
b. Goto /usr/local
$ cd /usr/local
c. Copy Hadoop file in local
$ cp /home/eagroup/Hadoop_EcoSystem/hadoop-0.20.2-cdh3u6.tar.gz /usr/local/
d. Extract Hadoop tar file
$ sudo tar xzf hadoop-0.20.2-cdh3u6.tar.gz
e. Rename Folder hadoop-0.20.2-cdh3u6 to hadoop
$ sudo mv hadoop-0.20.2-cdh3u6 hadoop
f. Delete the tar file
$ rm hadoop-0.20.2-cdh3u6.tar.gz
g. Give ownership to hduser
$ sudo chown -R hduser:hadoop hadoop
h. Update $HOME/.bashrc
-> Copy .bashrc from /root to /home/hduser (if not exist)
$ cp .bashrc /home/hduser/
-> Give permissions and change ownership of .bashrc file of hduser
$ chown hduser:hadoop /home/hduser/.bashrc
$ chmod 755 /home/hduser/.bashrc
-> switch to hduser
$ su - hduser
-> Edit and the following lines to $HOME/.bashrc
# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java-6-oracle
# Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"
# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
# Requires installed 'lzop' command.
#
lzohead () {
hadoop fs -cat $1 | lzop -dc | head -1000 | less
}
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
i.Configuring Hadoop
-> login as root
$ su
-> Open and edit /usr/local/hadoop/conf/hadoop-env.sh
-> set the JAVA_HOME environment variable
export JAVA_HOME=/usr/lib/jvm/java-6-oracle
-> Create Temp directory and set the required ownerships and permissions
$ sudo mkdir -p /app/hadoop/tmp
$ sudo chown hduser:hadoop /app/hadoop/tmp
-> And if you want to tighten up security, chmod from 755 to 750...
$ sudo chmod 750 /app/hadoop/tmp
-> Edit /usr/local/hadoop/conf/core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
-> Edit /usr/local/hadoop/conf/mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
-> Edit /usr/local/hadoop/conf/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
j. Formatting the HDFS filesystem via the NameNode
-> Login as hduser
$ su - hduser
-> To format the filesystem run the command
$ /usr/local/hadoop/bin/hadoop namenode -format
k. Starting your single-node cluster
-> Run the command:
$ /usr/local/hadoop/bin/start-all.sh
-> After running start-all.sh O/P is as follows:
hduser@BI-Lab:~$ /usr/local/hadoop/bin/start-all.sh
starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-BI-Lab.out
localhost: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-BI-Lab.out
localhost: starting secondarynamenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-secondarynamenode-BI-Lab.out
starting jobtracker, logging to /usr/local/hadoop/logs/hadoop-hduser-jobtracker-BI-Lab.out
localhost: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-tasktracker-BI-Lab.out
l. Checking whether the expected Hadoop processes are running is 'jps'
hduser@BI-Lab:~$ jps
3855 JobTracker
3780 SecondaryNameNode
4139 Jps
3551 DataNode
3295 NameNode
4079 TaskTracker
m. Check with 'netstat' if Hadoop is listening on the configured ports.
$ sudo netstat -plten | grep java
n. To browse hdfs file system
$ hadoop fs -ls /
o. To go to logs of hadoop for error checking:
$ ls /usr/local/hadoop/logs/
p. Stopping your single-node cluster
-> Run the command
$ /usr/local/hadoop/bin/stop-all.sh
q. Hadoop Web Interfaces
-> http://localhost:50070/ – web UI of the NameNode daemon
-> http://localhost:50030/ – web UI of the JobTracker daemon
-> http://localhost:50060/ – web UI of the TaskTracker daemon
15. Installing Flume:
a. Login as root
$ su
b. Copy tar from Hadoop_EcoSystem folder to /usr/local/lib
$ cp flume-0.9.4-cdh3u6.tar.gz /usr/local/lib/
c. Go to /usr/local/lib/
$ cd /usr/local/lib/
d. Extract flume-0.9.4-cdh3u6.tar.gz
$ tar -xcvf flume-0.9.4-cdh3u6.tar.gz
e. Rename flume-0.9.4-cdh3u6.tar.gz as flume
$ mv flume-0.9.4-cdh3u6 flume
f. Delete tar file
$ rm flume-0.9.4-cdh3u6.tar.gz
g. Give ownership to hduser
$ sudo chown -R hduser:hadoop flume
************Configuring Flume**************
a. Set the environment variable $FLUME_CONF_DIR in .bashrc
-> login as hduser
$ su - hduser
-> Open and Add the following in .bashrc
export FLUME_CONF_DIR=/usr/local/lib/flume/conf
export FLUME_HOME=/usr/local/lib/flume
export PATH=$PATH:$HADOOP_HOME/bin:$FLUME_HOME/bin:.
-> Run .bashrc
$ source .bashrc
************Flume Service*******************
a. To start the Flume node:
$ flume node
b. To start the Flume master:
$ flume master
16. Installing HBase:
a. Login as root
$ su
b. Copy tar from Hadoop_EcoSystem folder to /usr/local
$ cp /home/eagroup/Hadoop_EcoSystem/hbase-0.90.6-cdh3u6.tar /usr/local
c. Go to /usr/local/
$ cd /usr/local
d. Extract hbase-0.90.6-cdh3u6.tar
$ tar -xvf hbase-0.90.6-cdh3u6.tar
e. Rename hbase-0.90.6-cdh3u6.tar as hbase
$ mv hbase-0.90.6-cdh3u6 hbase
f. Delete tar file
$ rm hbase-0.90.6-cdh3u6.tar
g. Give ownership to hduser
$ sudo chown -R hduser:hadoop hbase
***********HBase Configuration************
a. login as hduser
$ su - hduser
b. Navigate to hbase conf directory and set the hbase-site.xml
-> Go to /usr/local/hbase/conf
$ nano /usr/local/hbase/conf/hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>file:///usr/local/hbase/temp</value>
</property>
<property>
<name>hbase.master</name>
<value>localhost:60000</value>
<description>The host and port that the HBase master runs at.</description>
</property>
<property>
<name>hbase.regionserver.port</name>
<value>60020</value>
<description>The host and port that the HBase master runs at.</description>
</property>
<!--<property>
<name>hbase.master.port</name>
<value>60000</value>
<description>The host and port that the HBase master runs at.</description>
</property>-->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/usr/local/hbase/temp</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
<description>Property from ZooKeeper's config zoo.cfg.
The port at which the clients will connect.
</description>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/zookeeper</value>
<description>Property from ZooKeeper's config zoo.cfg.
The directory where the snapshot is stored.
</description>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>1800000</value>
<description>Session Time out.</description>
</property>
<property>
<name>hbase.client.scanner.caching</name>
<value>500</value>
</property>
<property>
<name>hbase.regionserver.lease.period</name>
<value>240000</value>
</property>
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>
c. Add log directory:
$ mkdir /usr/local/hbase/logs
d. Add temp directory:
$ mkdir /usr/local/hbase/temp
e. Add pid directory:
$ mkdir /usr/local/hbase/pid
c. open the hbase-env.sh file present in the hbase conf directory and add following lines:
export JAVA_HOME=/usr/lib/jvm/java-6-oracle
export HBASE_REGIONSERVERS=/usr/local/hbase/conf/regionservers
export HBASE_LOG_DIR=/usr/local/hbase/logs
export HBASE_PID_DIR=/usr/local/hbase/pid
export HBASE_MANAGES_ZK=false
export HBASE_CONF_DIR=/usr/local/hbase/conf
d. edit hduser .bashrc file and add following lines
export HBASE_HOME=/usr/local/hbase
export PATH=$PATH:$HADOOP_HOME/bin:$FLUME_HOME/bin:$ZOOBINDIR:$HBASE_HOME/bin:.
**************HBase Service*****************
a. $ start-hbase.sh
b. $ stop-hbase.sh
c. $ hbase master start (if HMaster dont start from start-hbase.sh)
d. Browse Hbase files: file:///usr/local/hbase/temp
17. Installing Zookeeper:
a. Login as root
$ su
b. Copy tar from Hadoop_EcoSystem folder to /usr/local
$ cp /home/eagroup/Hadoop_EcoSystem/zookeeper-3.3.5-cdh3u6.tar /usr/local
c. Go to /usr/local/
$ cd /usr/local
d. Extract zookeeper-3.3.5-cdh3u6.tar
$ tar -xvf zookeeper-3.3.5-cdh3u6.tar
e. Rename zookeeper-3.3.5-cdh3u6.tar as zookeeper
$ mv zookeeper-3.3.5-cdh3u6 zookeeper
f. Delete tar file
$ rm zookeeper-3.3.5-cdh3u6.tar
g. Give ownership to hduser
$ sudo chown -R hduser:hadoop zookeeper
*************Configuring Zookeeper****************
a. login as hduser and Go to /usr/local/zookeeper/conf
b. Edit and add the following lines in zoo.cfg
$ nano zoo.cfg
#server.0=localhost:2888:3888
server.1=zoo1:2888:3888
c. Create dir /var/zookeeper and change ownership and permission
$ mkdir /var/zookeeper
$ chmod 755 /var/zookeeper
$ chown hduser:hadoop /var/zookeeper
c. Add following lines in the /etc/hosts file
$ nano /etc/hosts
localhost zoo1
localhost zoo2
localhost zoo3
d. Change '127.0.1.1 BI-Lab' to '127.0.0.1 BI-Lab' in the above file
e. Create a file 'myid' in /var/zookeeper and add entry '1' in it.
$ touch /var/zookeeper/myid
$ nano /var/zookeeper/myid
f. Edit hduser .bashrc file and add following lines
export ZOOCFGDIR=/usr/local/zookeeper/conf
export ZOOBINDIR=/usr/local/zookeeper/bin
export PATH=$PATH:$HADOOP_HOME/bin:$FLUME_HOME/bin:$ZOOBINDIR:$HBASE_HOME/bin:.
g. $ source $HOME/.bashrc
************Zookeeper Service***************
a. $ zkServer.sh start
b. $ zkServer.sh stop
18. Installing Sqoop:
a. Login as root
$ su
b. Copy tar from Hadoop_EcoSystem folder to /usr/local
$ cp /home/eagroup/Hadoop_EcoSystem/sqoop-1.3.0-cdh3u6.tar.gz /usr/local
c. Go to /usr/local/
$ cd /usr/local
d. Extract sqoop-1.3.0-cdh3u6.tar.gz
$ tar xzf sqoop-1.3.0-cdh3u6.tar.gz
e. Rename sqoop-1.3.0-cdh3u6.tar.gz as sqoop
$ mv sqoop-1.3.0-cdh3u6 sqoop
f. Delete tar file
$ rm sqoop-1.3.0-cdh3u6.tar.gz
g. Give ownership to hduser
$ sudo chown -R hduser:hadoop sqoop
**********Configure Sqoop******************
a. Add the following line in $HOME/.bashrc
export SQOOP_HOME=/usr/local/sqoop
export PATH=$PATH:$HADOOP_HOME/bin:$SQOOP_HOME/bin:$FLUME_HOME/bin:$ZOOBINDIR:$HBASE_HOME/bin:.
*******Sqoop Commands*********
a. $sqoop help
19. Installing MySql:
a. From 'Ubuntu Software Center' install mysql server
b. From 'Ubuntu Software Center' install mysql client
c. Installing mysql workbench:
-> $ sudo add-apt-repository ppa:olivier-berten/misc
-> $ sudo apt-get update
-> $ sudo apt-get install mysql-workbench
20. Installing Hive:
a. Login as root
$ su
b. Copy tar from Hadoop_EcoSystem folder to /usr/local
$ cp /home/eagroup/Hadoop_EcoSystem/hive-0.7.1-cdh3u6.tar.gz /usr/local
c. Go to /usr/local/
$ cd /usr/local
d. Extract hive-0.7.1-cdh3u6.tar.gz
$ tar xzf hive-0.7.1-cdh3u6.tar.gz
e. Rename hive-0.7.1-cdh3u6.tar.gz as hive
$ mv hive-0.7.1-cdh3u6 hive
f. Delete tar file
$ rm hive-0.7.1-cdh3u6.tar.gz
g. Give ownership to hduser
$ sudo chown -R hduser:hadoop hive
**********Configure Hive******************
a. Add the following line in $HOME/.bashrc
export HIVE_PORT=10000
export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HADOOP_HOME/bin:$SQOOP_HOME/bin:$HIVE_HOME/bin:$FLUME_HOME/bin:$ZOOBINDIR:$HBASE_HOME/bin:.
*******Hive Commands*********
a. $hive
b. hive> show databases;
21. Installing Pig:
a. Login as root
$ su
b. Copy tar from Hadoop_EcoSystem folder to /usr/local
$ cp /home/eagroup/Hadoop_EcoSystem/pig-0.8.1-cdh3u6.tar.gz /usr/local
c. Go to /usr/local/
$ cd /usr/local
d. Extract pig-0.8.1-cdh3u6.tar.gz
$ tar xzf pig-0.8.1-cdh3u6.tar.gz
e. Rename pig-0.8.1-cdh3u6.tar.gz as pig
$ mv pig-0.8.1-cdh3u6 pig
f. Delete tar file
$ rm pig-0.8.1-cdh3u6.tar.gz
g. Give ownership to hduser
$ sudo chown -R hduser:hadoop pig
**********Configure Pig******************
a. Add the following line in $HOME/.bashrc
export PIG_HOME=/usr/local/pig
export PATH=$PATH:$HADOOP_HOME/bin:$PIG_HOME/bin:$SQOOP_HOME/bin:$HIVE_HOME/bin:$FLUME_HOME/bin:$ZOOBINDIR:$HBASE_HOME/bin:.
*******Pig Commands*********
a. $pig
b. grunt> pig help
22. Installing Oozie:
a. Login as root
$ su
b. Copy tar from Hadoop_EcoSystem folder to /usr/local
$ cp /home/eagroup/Hadoop_EcoSystem/oozie-2.3.2-cdh3u6.tar.gz /usr/local
c. Go to /usr/local/
$ cd /usr/local
d. Extract oozie-2.3.2-cdh3u6.tar.gz
$ tar xzf oozie-2.3.2-cdh3u6.tar.gz
e. Rename oozie-2.3.2-cdh3u6.tar.gz as oozie
$ mv oozie-2.3.2-cdh3u6 oozie
f. Delete tar file
$ rm oozie-2.3.2-cdh3u6.tar.gz
g. Give ownership to hduser
$ sudo chown -R hduser:hadoop oozie
**********Configure oozie******************
a. Add the following line in $HOME/.bashrc
export OOZIE_HOME=/usr/local/oozie
export PATH=$PATH:$HADOOP_HOME/bin:$PIG_HOME/bin:$OOZIE_HOME/bin:$SQOOP_HOME/bin:$HIVE_HOME/bin:$FLUME_HOME/bin:$ZOOBINDIR:$HBASE_HOME/bin:.
b. Create the Oozie database and Oozie MySQL user.
$ mysql -u root -p
Enter password: ******
mysql> create database oozie;
Query OK, 1 row affected (0.03 sec)
mysql> grant all privileges on oozie.* to 'oozie'@'localhost' identified by 'oozie';
Query OK, 0 rows affected (0.03 sec)
mysql> grant all privileges on oozie.* to 'oozie'@'%' identified by 'oozie';
Query OK, 0 rows affected (0.03 sec)
mysql> exit
Bye
c. Configure Oozie to use MySQL
-> go to /usr/local/oozie/conf and edit oozie-site.xml file as follows:
<property>
<name>oozie.service.StoreService.jdbc.driver</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>oozie.service.StoreService.jdbc.url</name>
<value>jdbc:mysql://localhost:3306/oozie</value>
</property>
<property>
<name>oozie.service.StoreService.jdbc.username</name>
<value>oozie</value>
</property>
<property>
<name>oozie.service.StoreService.jdbc.password</name>
<value>oozie</value>
</property>
d. Download mysql-connector-java-5.1.6.jar from http://dev.mysql.com/downloads/connector/j/
e. Download the ExtJS version 2.2 library from http://extjs.com/deploy/ext-2.2.zip
f. Extract mysql-connector-java-5.1.25-bin.jar and copy to /usr/local/oozie/lib folder
g. Add the MySQL JDBC driver JAR to Oozie.(go to /user/local/oozie/lib)
$ sudo -u hduser /usr/local/oozie/bin/oozie-setup.sh -jars mysql-connector-java-5.1.25-bin.jar -extjs /home/hduser/Downloads/ext-2.2.zip
*******oozie Services*********
a. $ oozie-start.sh
b. $ oozie-stop.sh
23. Installing Hue: (http://archive.cloudera.com/cdh/3/hue/manual.html#_further_hadoop_configuration_and_caveats)
a. Login as root
$ su
b. Copy tar from Hadoop_EcoSystem folder to /usr/local
$ cp /home/eagroup/Hadoop_EcoSystem/hue-1.2.0.0-cdh3u6.tar.gz /usr/local
c. Go to /usr/local/
$ cd /usr/local
d. Extract hue-1.2.0.0-cdh3u6.tar.gz
$ tar xzf hue-1.2.0.0-cdh3u6.tar.gz
e. Rename hue-1.2.0.0-cdh3u6.tar.gz as hue
$ mv hue-1.2.0.0-cdh3u6 hue
f. Delete tar file
$ rm hue-1.2.0.0-cdh3u6.tar.gz
g. Give ownership to hduser
$ sudo chown -R hduser:hadoop hue
**********hue Prerequesties******************
-> login as hduser
a. Install Python2.7
$ dpkg -l gcc libxml2 libxslt libsasl2 libmysqlclient python python-setuptools python-simplejson libsqlite3 ant
$ sudo add-apt-repository ppa:fkrull/deadsnakes
$ sudo apt-get update
$ sudo apt-get install python-dev
$ sudo apt-get install libxml2-dev
$ sudo apt-get install libxslt-dev
$ sudo apt-get install libmysqlclient-dev
$ sudo apt-get install libsqlite3-dev
$ sudo apt-get build-dep python-ldap
a. Add the following line in $HOME/.bashrc
export HUE_HOME=/usr/local/hue-1.2.0.0-cdh3u6/hue/build/env
export PATH=$PATH:$HADOOP_HOME/bin:$HUE_HOME/bin:$OOZIE_HOME/bin:$PIG_HOME/bin:$SQOOP_HOME/bin:$HIVE_HOME/bin:$FLUME_HOME/bin:$ZOOBINDIR:$HBASE_HOME/bin:/usr/local/Talend:.
b. Configure $HADOOP_HOME and $PREFIX with the path of your Hadoop installation and the path where you want to install Hue by running:
-> Go to /usr/local/hue-1.2.0.0-cdh3u6
-> HADOOP_HOME=/usr/local/hadoop PREFIX=/usr/local/hue-1.2.0.0-cdh3u6 make install
c.## Install plug-ins
$ cd /usr/local/hadoop/lib
$ ln -s /usr/local/hue-1.2.0.0-cdh3u6/hue/desktop/libs/hadoop/java-lib/hue-plugins-1.2.0-cdh3u6.jar .
d. Restart Hadoop
$ stop-all.sh
$ start-all.sh
e. hadoop-metrics.properties[To enable full monitoring in the Health application,
the metrics contexts must not be NullContext] (go to /usr/local/hadoop/conf) and add the following:
# Exposes /metrics URL endpoint for metrics information.
dfs.class=org.apache.hadoop.metrics.spi.NoEmitMetricsContext
mapred.class=org.apache.hadoop.metrics.spi.NoEmitMetricsContext
jvm.class=org.apache.hadoop.metrics.spi.NoEmitMetricsContext
rpc.class=org.apache.hadoop.metrics.spi.NoEmitMetricsContext
f. sudo apt-get install libsasl2-modules-gssapi-mit
g. edit hadoop-env.sh and add
export HADOOP_CLASSPATH="/usr/local/hadoop/lib/:$HADOOP_CLASSPATH"
h.ln -s usr/local/hive/conf/hive-default.xml /usr/local/hive/src/data/conf
i. Configure Hadoop
Edit hdfs-site.xml:
<property>
<name>dfs.namenode.plugins</name>
<value>org.apache.hadoop.thriftfs.NamenodePlugin</value>
<description>Comma-separated list of namenode plug-ins to be activated.
</description>
</property>
<property>
<name>dfs.datanode.plugins</name>
<value>org.apache.hadoop.thriftfs.DatanodePlugin</value>
<description>Comma-separated list of datanode plug-ins to be activated.
</description>
</property>
j.Edit mapred-site.xml:
<property>
<name>mapred.jobtracker.plugins</name>
<value>org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin</value>
<description>Comma-separated list of jobtracker plug-ins to be activated.
</description>
</property>
k. edit hue.ini and add
path: /usr/local/hue-1.2.0.0-cdh3u6/hue/desktop/conf
hadoop_home=/usr/local/hadoop
http_host=127.0.0.1
l. edit hue-beeswax.ini
hive_home_dir=/usr/local/hive
hive_conf_dir=/usr/local/hive/conf
m. Start Hue, use build/env/bin/supervisor of hue_install
-> http://localhost:8088
-> http://localhost:8088/dump_config
n.Run!
$ /usr/local/hue/build/env/bin/supervisor
24. Installing Talend Open Studio: (http://www.talendforge.org/wiki/doku.php?id=doc:installation_on_ubuntu)
1.Prerequisites
Install Java on Linux
Database client
2. Install Talend Open Studio
Download Talend for DigData from http://www.talend.com/download
Get the archive file from the download section of the Talend website.
Note that the TOS_BD-r95165-V5.2.1.zip file contains binaries for ALL platforms (Linux/Unix, Windows and MacOS).
Once the download is complete, extract the archive files on your hard drive.
tar-xvf TOS_BD-r95165-V5.2.1 /usr/local
mv TOS_BD-r95165-V5.2.1 to /usr/lcoal/Talend
Edit TOS_BD-linux-gtk-x86_64.ini file and add following lines
-vmargs
-Xms64m
-Xmx1536m
-XX:MaxPermSize=512m
-Dfile.encoding=UTF-8
Give execute permission for file
chmod +x /usr/local/Talend/TOS_BD-linux-gtk-x86_64
create script start_talend.sh and add following lines to start Talend.
vi start_talend.sh
#!/bin/sh
export GDK_NATIVE_WINDOWS=1
/usr/local/Talend/TOS_BD-linux-gtk-x86_64
execute start_talend.sh to start
./start_talend.sh
***************************************************************************************************************************
--------------------------Hadoop & Its Eco System Services Guidelines------------------------------------------------------
1. Login as hduser: (password - hduser)
$ su - hduser
2. To check running services:
$ jps
------------Starting Services-----------
1. To run Hadoop and its daemons: (name node, secondary name node, data node, job tracker and task tracker)
$ start-all.sh
2. To start Zookeeper: (should start zookeeper before HBase)
$ zkServer.sh start
3. To start HBase: (HMaster and HRegionServer services)
$ start-hbase.sh
$ hbase master start (if HMaster dont start from start-hbase.sh)
4. To start the Flume node:
$ flume node
5. To start the Flume master:
$ flume master
6. To start Oozie service:
$ oozie-start.sh
7. To start hive:
$ hive
$ hive --service metastore
8. To start Hive Server
$HIVE_HOME/bin/hive --service hiveserver
8. To start Pig:
$ pig
9. To start sqoop:
$ sqoop command
------------Stop Services-----------
1. To stop Hadoop and its daemons:
$ stop-all.sh
2. To stop Zookeeper: (should start zookeeper before HBase)
$ zkServer.sh stop
3. To stop HBase: (HMaster and HRegionServer services)
$ stop-hbase.sh
$ hbase master stop (if HMaster dont start from start-hbase.sh)
4. To stop the Flume node:
$ cntl + c
5. To stop the Flume master:
$ cntl + c
6. To stop Oozie service:
$ oozie-stop.sh
7. To stop hive:
hive>quit;
8. To stop Pig:
grunt>quit;
1. Access To Root:
eagroup@BI-Lab:~$ sudo su
[sudo] password for eagroup:
root@BI-Lab:/home/eagroup# sudo passwd
Enter new UNIX password: password
Retype new UNIX password: password
passwd: password updated successfully
root@BI-Lab:/home/eagroup#
2. Add and manage users and groups:
a. Add user from GUI - hduser,
Username Password Privilage
root password root
eagroup password admin
hduser hduser hadoop user
3. Making hduser sudoer:
a. login as root
$ su
b. nano /etc/sudoers
c. add the following:
hduser ALL=(ALL:ALL) ALL
4. Create Share Folder - Install & Configure Samba : (http://rbgeek.wordpress.com/2012/04/25/how-to-install-samba-server-on-ubuntu-12-04/)
a. Install the samba package - $ sudo apt-get install samba samba-common
b. Check the version - $ smbd --version
c. Suggested packages for samba - $ sudo apt-get install python-glade2 system-config-samba
->Go to your Windows machine and use this command in cmd
d. net config workstation
Note the workstation domain
->Go to Ubuntu system:
e. Backup the smb.conf file, then delete it and create the new one:
sudo cp /etc/samba/smb.conf /etc/samba/smb.conf.bak
sudo rm /etc/samba/smb.conf
sudo touch /etc/samba/smb.conf
sudo nano /etc/samba/smb.conf
f. Add this, in your smb.conf file (or change it according to your requirement):
#======================= Global Settings =====================================
[global]
workgroup = INFICS
server string = Samba Server %v
netbios name = ubuntu
security = user
map to guest = bad user
dns proxy = no
#============================ Share Definitions ==============================
[MyShare]
path = /Share
browsable =yes
writable = yes
guest ok = yes
read only = no
g. Save the smb.conf file and restart the service:
sudo service smbd restart
h. Check the current permission on the samba share:
cd /samba/
ls -l
i. Change it, in such a way that everyone can read and write it(Check it, that it is allowed in your environment or not):
sudo chmod -R 0777 share
ls -l
j. Verify the newly created file on samba server:
cd share/
ls -l
5. Install xrdp to access remotely from Windows:
a. sudo apt-get install xrdp
b. sudo apt-get install gnome-session-fallback
c. sudo /etc/init.d/xrdp restart
d. login as particular user(the user you use to access from Windows)
e. echo gnome-session --session=gnome-fallback > ~/.xsession
6. Install SSH:
a. sudo apt-get install ssh
b. sudo service ssh restart
7. Install RPM:
a. sudo apt-get install rpm
8. Install Telnet
a. sudo apt-get install telnetd
9. Install ORACLE JAVA:(http://www.liberiangeek.net/2012/11/install-oracle-java-jrejdk-6-in-ubuntu-12-10-quantal-quetzal/)
a. sudo add-apt-repository ppa:webupd8team/java
b. sudo apt-get update && sudo apt-get install oracle-java6-installer
10. Setting JAVA_HOME, JRE_HOME and PATH:
a. Login as root.
b. create .bash_profile file
c. Add Following in .bash_profile:
JAVA_HOME=/usr/lib/jvm/java-6-oracle
PATH=$PATH:$HOME/bin:$JAVA_HOME/bin
JRE_HOME=/usr/lib/jvm/java-6-oracle
PATH=$PATH:$HOME/bin:$JRE_HOME/bin
export JAVA_HOME
export JRE_HOME
export PATH
d. Source .bash_profile
11. Test Java:
a. Creat a file name HelloWorld.java and enter the following:
public class HelloWorld {
public static void main(String[] args) {
System.out.println("Hello, World");
}
}
b. Compile HelloWorld.java
$javac HelloWorld.java
c. Run HelloWorld
$java HelloWorld
12. Install Eclipse(Indigo):
a. Download and install Eclipse from 'Ubuntu Software Center'
b. Go to terminal and run following:(http://askubuntu.com/questions/138019/unable-to-open-eclipse-ide-due-to-missing-symlink)
ln -s /usr/lib/jni/libswt-* ~/.swt/lib/linux/x86_64/
12. Removed Eclipse-Indigo and Installed Eclipse Juno
a. Remove Eclipse Indigo from Ubuntu Software Center
b. Download Eclipse Juno from http://www.eclipse.org/downloads/
c. extract the gz file
$ tar xvf eclipse-jee-juno-SR2-linux-gtk-x86_64.tar.gz
this is extracted in eclipse folder
d. login as root
$ su
e. copy the extracted eclipse folder in /opt
$ cp /home/tejeswar/Downloads/eclipse /opt
f. create a desktop icon
$ ln -s /opt/eclipse/eclipse /home/tejeswar/Desktop
13. Test Eclipse:
a. Create a java project 'Sample'
b. Create a package 'sample'
c. Create a class 'HelloWorld' and enter following:
public class HelloWorld {
public static void main(String[] args) {
System.out.println("Hello, World");
}
}
d. Right click on 'HelloWorld.java' and run as 'Java Application'
---->>>>> Hadoop and its eco system downloaded from the following link <<<<<<<-----------
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDHTarballs/3.25.2013/CDH3-Downloadable-Tarballs/CDH3-Downloadable-Tarballs.html
14. Install Hadoop:
********Hadoop Prerequestie***********
a. Go to hduser - Generate an SSH key for the hduser user.
$ su - hduser
$ ssh-keygen -t rsa -P ""
b. Enable SSH access to your local machine with this newly created key
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
c. Test the SSH setup by connecting to your local machine with the hduser user
$ ssh localhost
d. Disabling IPv6
-> Open /etc/sysctl.conf
-> Enter following:
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
e. Reboot your machine in order to make the changes take effect
f. Check whether IPv6 is enabled on your machine with the following command:
$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6
A return value of 0 means IPv6 is enabled, a value of 1 means disabled
************Hadoop Installation****************
a. Download Apache Hadoop From Apache Mirror
b. Goto /usr/local
$ cd /usr/local
c. Copy Hadoop file in local
$ cp /home/eagroup/Hadoop_EcoSystem/hadoop-0.20.2-cdh3u6.tar.gz /usr/local/
d. Extract Hadoop tar file
$ sudo tar xzf hadoop-0.20.2-cdh3u6.tar.gz
e. Rename Folder hadoop-0.20.2-cdh3u6 to hadoop
$ sudo mv hadoop-0.20.2-cdh3u6 hadoop
f. Delete the tar file
$ rm hadoop-0.20.2-cdh3u6.tar.gz
g. Give ownership to hduser
$ sudo chown -R hduser:hadoop hadoop
h. Update $HOME/.bashrc
-> Copy .bashrc from /root to /home/hduser (if not exist)
$ cp .bashrc /home/hduser/
-> Give permissions and change ownership of .bashrc file of hduser
$ chown hduser:hadoop /home/hduser/.bashrc
$ chmod 755 /home/hduser/.bashrc
-> switch to hduser
$ su - hduser
-> Edit and the following lines to $HOME/.bashrc
# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java-6-oracle
# Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"
# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
# Requires installed 'lzop' command.
#
lzohead () {
hadoop fs -cat $1 | lzop -dc | head -1000 | less
}
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
i.Configuring Hadoop
-> login as root
$ su
-> Open and edit /usr/local/hadoop/conf/hadoop-env.sh
-> set the JAVA_HOME environment variable
export JAVA_HOME=/usr/lib/jvm/java-6-oracle
-> Create Temp directory and set the required ownerships and permissions
$ sudo mkdir -p /app/hadoop/tmp
$ sudo chown hduser:hadoop /app/hadoop/tmp
-> And if you want to tighten up security, chmod from 755 to 750...
$ sudo chmod 750 /app/hadoop/tmp
-> Edit /usr/local/hadoop/conf/core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
-> Edit /usr/local/hadoop/conf/mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
-> Edit /usr/local/hadoop/conf/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
j. Formatting the HDFS filesystem via the NameNode
-> Login as hduser
$ su - hduser
-> To format the filesystem run the command
$ /usr/local/hadoop/bin/hadoop namenode -format
k. Starting your single-node cluster
-> Run the command:
$ /usr/local/hadoop/bin/start-all.sh
-> After running start-all.sh O/P is as follows:
hduser@BI-Lab:~$ /usr/local/hadoop/bin/start-all.sh
starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-BI-Lab.out
localhost: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-BI-Lab.out
localhost: starting secondarynamenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-secondarynamenode-BI-Lab.out
starting jobtracker, logging to /usr/local/hadoop/logs/hadoop-hduser-jobtracker-BI-Lab.out
localhost: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-tasktracker-BI-Lab.out
l. Checking whether the expected Hadoop processes are running is 'jps'
hduser@BI-Lab:~$ jps
3855 JobTracker
3780 SecondaryNameNode
4139 Jps
3551 DataNode
3295 NameNode
4079 TaskTracker
m. Check with 'netstat' if Hadoop is listening on the configured ports.
$ sudo netstat -plten | grep java
n. To browse hdfs file system
$ hadoop fs -ls /
o. To go to logs of hadoop for error checking:
$ ls /usr/local/hadoop/logs/
p. Stopping your single-node cluster
-> Run the command
$ /usr/local/hadoop/bin/stop-all.sh
q. Hadoop Web Interfaces
-> http://localhost:50070/ – web UI of the NameNode daemon
-> http://localhost:50030/ – web UI of the JobTracker daemon
-> http://localhost:50060/ – web UI of the TaskTracker daemon
15. Installing Flume:
a. Login as root
$ su
b. Copy tar from Hadoop_EcoSystem folder to /usr/local/lib
$ cp flume-0.9.4-cdh3u6.tar.gz /usr/local/lib/
c. Go to /usr/local/lib/
$ cd /usr/local/lib/
d. Extract flume-0.9.4-cdh3u6.tar.gz
$ tar -xcvf flume-0.9.4-cdh3u6.tar.gz
e. Rename flume-0.9.4-cdh3u6.tar.gz as flume
$ mv flume-0.9.4-cdh3u6 flume
f. Delete tar file
$ rm flume-0.9.4-cdh3u6.tar.gz
g. Give ownership to hduser
$ sudo chown -R hduser:hadoop flume
************Configuring Flume**************
a. Set the environment variable $FLUME_CONF_DIR in .bashrc
-> login as hduser
$ su - hduser
-> Open and Add the following in .bashrc
export FLUME_CONF_DIR=/usr/local/lib/flume/conf
export FLUME_HOME=/usr/local/lib/flume
export PATH=$PATH:$HADOOP_HOME/bin:$FLUME_HOME/bin:.
-> Run .bashrc
$ source .bashrc
************Flume Service*******************
a. To start the Flume node:
$ flume node
b. To start the Flume master:
$ flume master
16. Installing HBase:
a. Login as root
$ su
b. Copy tar from Hadoop_EcoSystem folder to /usr/local
$ cp /home/eagroup/Hadoop_EcoSystem/hbase-0.90.6-cdh3u6.tar /usr/local
c. Go to /usr/local/
$ cd /usr/local
d. Extract hbase-0.90.6-cdh3u6.tar
$ tar -xvf hbase-0.90.6-cdh3u6.tar
e. Rename hbase-0.90.6-cdh3u6.tar as hbase
$ mv hbase-0.90.6-cdh3u6 hbase
f. Delete tar file
$ rm hbase-0.90.6-cdh3u6.tar
g. Give ownership to hduser
$ sudo chown -R hduser:hadoop hbase
***********HBase Configuration************
a. login as hduser
$ su - hduser
b. Navigate to hbase conf directory and set the hbase-site.xml
-> Go to /usr/local/hbase/conf
$ nano /usr/local/hbase/conf/hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>file:///usr/local/hbase/temp</value>
</property>
<property>
<name>hbase.master</name>
<value>localhost:60000</value>
<description>The host and port that the HBase master runs at.</description>
</property>
<property>
<name>hbase.regionserver.port</name>
<value>60020</value>
<description>The host and port that the HBase master runs at.</description>
</property>
<!--<property>
<name>hbase.master.port</name>
<value>60000</value>
<description>The host and port that the HBase master runs at.</description>
</property>-->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/usr/local/hbase/temp</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
<description>Property from ZooKeeper's config zoo.cfg.
The port at which the clients will connect.
</description>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/zookeeper</value>
<description>Property from ZooKeeper's config zoo.cfg.
The directory where the snapshot is stored.
</description>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>1800000</value>
<description>Session Time out.</description>
</property>
<property>
<name>hbase.client.scanner.caching</name>
<value>500</value>
</property>
<property>
<name>hbase.regionserver.lease.period</name>
<value>240000</value>
</property>
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>
c. Add log directory:
$ mkdir /usr/local/hbase/logs
d. Add temp directory:
$ mkdir /usr/local/hbase/temp
e. Add pid directory:
$ mkdir /usr/local/hbase/pid
c. open the hbase-env.sh file present in the hbase conf directory and add following lines:
export JAVA_HOME=/usr/lib/jvm/java-6-oracle
export HBASE_REGIONSERVERS=/usr/local/hbase/conf/regionservers
export HBASE_LOG_DIR=/usr/local/hbase/logs
export HBASE_PID_DIR=/usr/local/hbase/pid
export HBASE_MANAGES_ZK=false
export HBASE_CONF_DIR=/usr/local/hbase/conf
d. edit hduser .bashrc file and add following lines
export HBASE_HOME=/usr/local/hbase
export PATH=$PATH:$HADOOP_HOME/bin:$FLUME_HOME/bin:$ZOOBINDIR:$HBASE_HOME/bin:.
**************HBase Service*****************
a. $ start-hbase.sh
b. $ stop-hbase.sh
c. $ hbase master start (if HMaster dont start from start-hbase.sh)
d. Browse Hbase files: file:///usr/local/hbase/temp
17. Installing Zookeeper:
a. Login as root
$ su
b. Copy tar from Hadoop_EcoSystem folder to /usr/local
$ cp /home/eagroup/Hadoop_EcoSystem/zookeeper-3.3.5-cdh3u6.tar /usr/local
c. Go to /usr/local/
$ cd /usr/local
d. Extract zookeeper-3.3.5-cdh3u6.tar
$ tar -xvf zookeeper-3.3.5-cdh3u6.tar
e. Rename zookeeper-3.3.5-cdh3u6.tar as zookeeper
$ mv zookeeper-3.3.5-cdh3u6 zookeeper
f. Delete tar file
$ rm zookeeper-3.3.5-cdh3u6.tar
g. Give ownership to hduser
$ sudo chown -R hduser:hadoop zookeeper
*************Configuring Zookeeper****************
a. login as hduser and Go to /usr/local/zookeeper/conf
b. Edit and add the following lines in zoo.cfg
$ nano zoo.cfg
#server.0=localhost:2888:3888
server.1=zoo1:2888:3888
c. Create dir /var/zookeeper and change ownership and permission
$ mkdir /var/zookeeper
$ chmod 755 /var/zookeeper
$ chown hduser:hadoop /var/zookeeper
c. Add following lines in the /etc/hosts file
$ nano /etc/hosts
localhost zoo1
localhost zoo2
localhost zoo3
d. Change '127.0.1.1 BI-Lab' to '127.0.0.1 BI-Lab' in the above file
e. Create a file 'myid' in /var/zookeeper and add entry '1' in it.
$ touch /var/zookeeper/myid
$ nano /var/zookeeper/myid
f. Edit hduser .bashrc file and add following lines
export ZOOCFGDIR=/usr/local/zookeeper/conf
export ZOOBINDIR=/usr/local/zookeeper/bin
export PATH=$PATH:$HADOOP_HOME/bin:$FLUME_HOME/bin:$ZOOBINDIR:$HBASE_HOME/bin:.
g. $ source $HOME/.bashrc
************Zookeeper Service***************
a. $ zkServer.sh start
b. $ zkServer.sh stop
18. Installing Sqoop:
a. Login as root
$ su
b. Copy tar from Hadoop_EcoSystem folder to /usr/local
$ cp /home/eagroup/Hadoop_EcoSystem/sqoop-1.3.0-cdh3u6.tar.gz /usr/local
c. Go to /usr/local/
$ cd /usr/local
d. Extract sqoop-1.3.0-cdh3u6.tar.gz
$ tar xzf sqoop-1.3.0-cdh3u6.tar.gz
e. Rename sqoop-1.3.0-cdh3u6.tar.gz as sqoop
$ mv sqoop-1.3.0-cdh3u6 sqoop
f. Delete tar file
$ rm sqoop-1.3.0-cdh3u6.tar.gz
g. Give ownership to hduser
$ sudo chown -R hduser:hadoop sqoop
**********Configure Sqoop******************
a. Add the following line in $HOME/.bashrc
export SQOOP_HOME=/usr/local/sqoop
export PATH=$PATH:$HADOOP_HOME/bin:$SQOOP_HOME/bin:$FLUME_HOME/bin:$ZOOBINDIR:$HBASE_HOME/bin:.
*******Sqoop Commands*********
a. $sqoop help
19. Installing MySql:
a. From 'Ubuntu Software Center' install mysql server
b. From 'Ubuntu Software Center' install mysql client
c. Installing mysql workbench:
-> $ sudo add-apt-repository ppa:olivier-berten/misc
-> $ sudo apt-get update
-> $ sudo apt-get install mysql-workbench
20. Installing Hive:
a. Login as root
$ su
b. Copy tar from Hadoop_EcoSystem folder to /usr/local
$ cp /home/eagroup/Hadoop_EcoSystem/hive-0.7.1-cdh3u6.tar.gz /usr/local
c. Go to /usr/local/
$ cd /usr/local
d. Extract hive-0.7.1-cdh3u6.tar.gz
$ tar xzf hive-0.7.1-cdh3u6.tar.gz
e. Rename hive-0.7.1-cdh3u6.tar.gz as hive
$ mv hive-0.7.1-cdh3u6 hive
f. Delete tar file
$ rm hive-0.7.1-cdh3u6.tar.gz
g. Give ownership to hduser
$ sudo chown -R hduser:hadoop hive
**********Configure Hive******************
a. Add the following line in $HOME/.bashrc
export HIVE_PORT=10000
export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HADOOP_HOME/bin:$SQOOP_HOME/bin:$HIVE_HOME/bin:$FLUME_HOME/bin:$ZOOBINDIR:$HBASE_HOME/bin:.
*******Hive Commands*********
a. $hive
b. hive> show databases;
21. Installing Pig:
a. Login as root
$ su
b. Copy tar from Hadoop_EcoSystem folder to /usr/local
$ cp /home/eagroup/Hadoop_EcoSystem/pig-0.8.1-cdh3u6.tar.gz /usr/local
c. Go to /usr/local/
$ cd /usr/local
d. Extract pig-0.8.1-cdh3u6.tar.gz
$ tar xzf pig-0.8.1-cdh3u6.tar.gz
e. Rename pig-0.8.1-cdh3u6.tar.gz as pig
$ mv pig-0.8.1-cdh3u6 pig
f. Delete tar file
$ rm pig-0.8.1-cdh3u6.tar.gz
g. Give ownership to hduser
$ sudo chown -R hduser:hadoop pig
**********Configure Pig******************
a. Add the following line in $HOME/.bashrc
export PIG_HOME=/usr/local/pig
export PATH=$PATH:$HADOOP_HOME/bin:$PIG_HOME/bin:$SQOOP_HOME/bin:$HIVE_HOME/bin:$FLUME_HOME/bin:$ZOOBINDIR:$HBASE_HOME/bin:.
*******Pig Commands*********
a. $pig
b. grunt> pig help
22. Installing Oozie:
a. Login as root
$ su
b. Copy tar from Hadoop_EcoSystem folder to /usr/local
$ cp /home/eagroup/Hadoop_EcoSystem/oozie-2.3.2-cdh3u6.tar.gz /usr/local
c. Go to /usr/local/
$ cd /usr/local
d. Extract oozie-2.3.2-cdh3u6.tar.gz
$ tar xzf oozie-2.3.2-cdh3u6.tar.gz
e. Rename oozie-2.3.2-cdh3u6.tar.gz as oozie
$ mv oozie-2.3.2-cdh3u6 oozie
f. Delete tar file
$ rm oozie-2.3.2-cdh3u6.tar.gz
g. Give ownership to hduser
$ sudo chown -R hduser:hadoop oozie
**********Configure oozie******************
a. Add the following line in $HOME/.bashrc
export OOZIE_HOME=/usr/local/oozie
export PATH=$PATH:$HADOOP_HOME/bin:$PIG_HOME/bin:$OOZIE_HOME/bin:$SQOOP_HOME/bin:$HIVE_HOME/bin:$FLUME_HOME/bin:$ZOOBINDIR:$HBASE_HOME/bin:.
b. Create the Oozie database and Oozie MySQL user.
$ mysql -u root -p
Enter password: ******
mysql> create database oozie;
Query OK, 1 row affected (0.03 sec)
mysql> grant all privileges on oozie.* to 'oozie'@'localhost' identified by 'oozie';
Query OK, 0 rows affected (0.03 sec)
mysql> grant all privileges on oozie.* to 'oozie'@'%' identified by 'oozie';
Query OK, 0 rows affected (0.03 sec)
mysql> exit
Bye
c. Configure Oozie to use MySQL
-> go to /usr/local/oozie/conf and edit oozie-site.xml file as follows:
<property>
<name>oozie.service.StoreService.jdbc.driver</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>oozie.service.StoreService.jdbc.url</name>
<value>jdbc:mysql://localhost:3306/oozie</value>
</property>
<property>
<name>oozie.service.StoreService.jdbc.username</name>
<value>oozie</value>
</property>
<property>
<name>oozie.service.StoreService.jdbc.password</name>
<value>oozie</value>
</property>
d. Download mysql-connector-java-5.1.6.jar from http://dev.mysql.com/downloads/connector/j/
e. Download the ExtJS version 2.2 library from http://extjs.com/deploy/ext-2.2.zip
f. Extract mysql-connector-java-5.1.25-bin.jar and copy to /usr/local/oozie/lib folder
g. Add the MySQL JDBC driver JAR to Oozie.(go to /user/local/oozie/lib)
$ sudo -u hduser /usr/local/oozie/bin/oozie-setup.sh -jars mysql-connector-java-5.1.25-bin.jar -extjs /home/hduser/Downloads/ext-2.2.zip
*******oozie Services*********
a. $ oozie-start.sh
b. $ oozie-stop.sh
23. Installing Hue: (http://archive.cloudera.com/cdh/3/hue/manual.html#_further_hadoop_configuration_and_caveats)
a. Login as root
$ su
b. Copy tar from Hadoop_EcoSystem folder to /usr/local
$ cp /home/eagroup/Hadoop_EcoSystem/hue-1.2.0.0-cdh3u6.tar.gz /usr/local
c. Go to /usr/local/
$ cd /usr/local
d. Extract hue-1.2.0.0-cdh3u6.tar.gz
$ tar xzf hue-1.2.0.0-cdh3u6.tar.gz
e. Rename hue-1.2.0.0-cdh3u6.tar.gz as hue
$ mv hue-1.2.0.0-cdh3u6 hue
f. Delete tar file
$ rm hue-1.2.0.0-cdh3u6.tar.gz
g. Give ownership to hduser
$ sudo chown -R hduser:hadoop hue
**********hue Prerequesties******************
-> login as hduser
a. Install Python2.7
$ dpkg -l gcc libxml2 libxslt libsasl2 libmysqlclient python python-setuptools python-simplejson libsqlite3 ant
$ sudo add-apt-repository ppa:fkrull/deadsnakes
$ sudo apt-get update
$ sudo apt-get install python-dev
$ sudo apt-get install libxml2-dev
$ sudo apt-get install libxslt-dev
$ sudo apt-get install libmysqlclient-dev
$ sudo apt-get install libsqlite3-dev
$ sudo apt-get build-dep python-ldap
a. Add the following line in $HOME/.bashrc
export HUE_HOME=/usr/local/hue-1.2.0.0-cdh3u6/hue/build/env
export PATH=$PATH:$HADOOP_HOME/bin:$HUE_HOME/bin:$OOZIE_HOME/bin:$PIG_HOME/bin:$SQOOP_HOME/bin:$HIVE_HOME/bin:$FLUME_HOME/bin:$ZOOBINDIR:$HBASE_HOME/bin:/usr/local/Talend:.
b. Configure $HADOOP_HOME and $PREFIX with the path of your Hadoop installation and the path where you want to install Hue by running:
-> Go to /usr/local/hue-1.2.0.0-cdh3u6
-> HADOOP_HOME=/usr/local/hadoop PREFIX=/usr/local/hue-1.2.0.0-cdh3u6 make install
c.## Install plug-ins
$ cd /usr/local/hadoop/lib
$ ln -s /usr/local/hue-1.2.0.0-cdh3u6/hue/desktop/libs/hadoop/java-lib/hue-plugins-1.2.0-cdh3u6.jar .
d. Restart Hadoop
$ stop-all.sh
$ start-all.sh
e. hadoop-metrics.properties[To enable full monitoring in the Health application,
the metrics contexts must not be NullContext] (go to /usr/local/hadoop/conf) and add the following:
# Exposes /metrics URL endpoint for metrics information.
dfs.class=org.apache.hadoop.metrics.spi.NoEmitMetricsContext
mapred.class=org.apache.hadoop.metrics.spi.NoEmitMetricsContext
jvm.class=org.apache.hadoop.metrics.spi.NoEmitMetricsContext
rpc.class=org.apache.hadoop.metrics.spi.NoEmitMetricsContext
f. sudo apt-get install libsasl2-modules-gssapi-mit
g. edit hadoop-env.sh and add
export HADOOP_CLASSPATH="/usr/local/hadoop/lib/:$HADOOP_CLASSPATH"
h.ln -s usr/local/hive/conf/hive-default.xml /usr/local/hive/src/data/conf
i. Configure Hadoop
Edit hdfs-site.xml:
<property>
<name>dfs.namenode.plugins</name>
<value>org.apache.hadoop.thriftfs.NamenodePlugin</value>
<description>Comma-separated list of namenode plug-ins to be activated.
</description>
</property>
<property>
<name>dfs.datanode.plugins</name>
<value>org.apache.hadoop.thriftfs.DatanodePlugin</value>
<description>Comma-separated list of datanode plug-ins to be activated.
</description>
</property>
j.Edit mapred-site.xml:
<property>
<name>mapred.jobtracker.plugins</name>
<value>org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin</value>
<description>Comma-separated list of jobtracker plug-ins to be activated.
</description>
</property>
k. edit hue.ini and add
path: /usr/local/hue-1.2.0.0-cdh3u6/hue/desktop/conf
hadoop_home=/usr/local/hadoop
http_host=127.0.0.1
l. edit hue-beeswax.ini
hive_home_dir=/usr/local/hive
hive_conf_dir=/usr/local/hive/conf
m. Start Hue, use build/env/bin/supervisor of hue_install
-> http://localhost:8088
-> http://localhost:8088/dump_config
n.Run!
$ /usr/local/hue/build/env/bin/supervisor
24. Installing Talend Open Studio: (http://www.talendforge.org/wiki/doku.php?id=doc:installation_on_ubuntu)
1.Prerequisites
Install Java on Linux
Database client
2. Install Talend Open Studio
Download Talend for DigData from http://www.talend.com/download
Get the archive file from the download section of the Talend website.
Note that the TOS_BD-r95165-V5.2.1.zip file contains binaries for ALL platforms (Linux/Unix, Windows and MacOS).
Once the download is complete, extract the archive files on your hard drive.
tar-xvf TOS_BD-r95165-V5.2.1 /usr/local
mv TOS_BD-r95165-V5.2.1 to /usr/lcoal/Talend
Edit TOS_BD-linux-gtk-x86_64.ini file and add following lines
-vmargs
-Xms64m
-Xmx1536m
-XX:MaxPermSize=512m
-Dfile.encoding=UTF-8
Give execute permission for file
chmod +x /usr/local/Talend/TOS_BD-linux-gtk-x86_64
create script start_talend.sh and add following lines to start Talend.
vi start_talend.sh
#!/bin/sh
export GDK_NATIVE_WINDOWS=1
/usr/local/Talend/TOS_BD-linux-gtk-x86_64
execute start_talend.sh to start
./start_talend.sh
***************************************************************************************************************************
--------------------------Hadoop & Its Eco System Services Guidelines------------------------------------------------------
1. Login as hduser: (password - hduser)
$ su - hduser
2. To check running services:
$ jps
------------Starting Services-----------
1. To run Hadoop and its daemons: (name node, secondary name node, data node, job tracker and task tracker)
$ start-all.sh
2. To start Zookeeper: (should start zookeeper before HBase)
$ zkServer.sh start
3. To start HBase: (HMaster and HRegionServer services)
$ start-hbase.sh
$ hbase master start (if HMaster dont start from start-hbase.sh)
4. To start the Flume node:
$ flume node
5. To start the Flume master:
$ flume master
6. To start Oozie service:
$ oozie-start.sh
7. To start hive:
$ hive
$ hive --service metastore
8. To start Hive Server
$HIVE_HOME/bin/hive --service hiveserver
8. To start Pig:
$ pig
9. To start sqoop:
$ sqoop command
------------Stop Services-----------
1. To stop Hadoop and its daemons:
$ stop-all.sh
2. To stop Zookeeper: (should start zookeeper before HBase)
$ zkServer.sh stop
3. To stop HBase: (HMaster and HRegionServer services)
$ stop-hbase.sh
$ hbase master stop (if HMaster dont start from start-hbase.sh)
4. To stop the Flume node:
$ cntl + c
5. To stop the Flume master:
$ cntl + c
6. To stop Oozie service:
$ oozie-stop.sh
7. To stop hive:
hive>quit;
8. To stop Pig:
grunt>quit;
Comments
Post a Comment