Skip to main content

Installing Hadoop ecosystim in pseudo mode in UBUNTU 12.04 LTS

Changes Done in UBUNTU 12.04 64-bit : Post Installation

1. Access To Root:
    eagroup@BI-Lab:~$ sudo su
    [sudo] password for eagroup:
    root@BI-Lab:/home/eagroup# sudo passwd
    Enter new UNIX password: password
    Retype new UNIX password: password
    passwd: password updated successfully
    root@BI-Lab:/home/eagroup#

2.  Add and manage users and groups:
    a. Add user from GUI - hduser,
    Username     Password    Privilage
    root        password    root
    eagroup        password    admin
    hduser        hduser        hadoop user


3. Making hduser sudoer:
    a. login as root
        $ su
    b. nano /etc/sudoers
    c. add the following:
        hduser    ALL=(ALL:ALL) ALL

4. Create Share Folder - Install & Configure Samba : (http://rbgeek.wordpress.com/2012/04/25/how-to-install-samba-server-on-ubuntu-12-04/)
       a. Install the samba package - $ sudo apt-get install samba samba-common
    b. Check the version - $ smbd --version
    c. Suggested packages for samba - $ sudo apt-get install python-glade2 system-config-samba

    ->Go to your Windows machine and use this command in cmd
    d. net config workstation
    Note the workstation domain

    ->Go to Ubuntu system:
    e. Backup the smb.conf file, then delete it and create the new one:
        sudo cp /etc/samba/smb.conf /etc/samba/smb.conf.bak
        sudo rm /etc/samba/smb.conf
        sudo touch /etc/samba/smb.conf
        sudo nano /etc/samba/smb.conf   
    f. Add this, in your smb.conf file (or change it according to your requirement):
        #======================= Global Settings =====================================
            [global]
            workgroup = INFICS
            server string = Samba Server %v
            netbios name = ubuntu
            security = user
            map to guest = bad user
            dns proxy = no
        #============================ Share Definitions ==============================
            [MyShare]
            path = /Share
            browsable =yes
            writable = yes
            guest ok = yes
            read only = no

    g. Save the smb.conf file and restart the service:
        sudo service smbd restart
    h. Check the current permission on the samba share:
        cd /samba/
        ls -l
    i. Change it, in such a way that everyone can read and write it(Check it, that it is allowed in your environment or not):
        sudo chmod -R 0777 share
        ls -l
    j. Verify the newly created file on samba server:
        cd share/
        ls -l

5. Install xrdp to access remotely from Windows:
    a. sudo apt-get install xrdp
    b. sudo apt-get install gnome-session-fallback
    c. sudo /etc/init.d/xrdp restart
    d. login as particular user(the user you use to access from Windows)
    e. echo gnome-session --session=gnome-fallback > ~/.xsession

6. Install SSH:
    a. sudo apt-get install ssh
    b. sudo service ssh restart

7. Install RPM:
    a. sudo apt-get install rpm

8. Install Telnet
    a. sudo apt-get install telnetd

9. Install ORACLE JAVA:(http://www.liberiangeek.net/2012/11/install-oracle-java-jrejdk-6-in-ubuntu-12-10-quantal-quetzal/)
    a. sudo add-apt-repository ppa:webupd8team/java
    b. sudo apt-get update && sudo apt-get install oracle-java6-installer

10. Setting JAVA_HOME, JRE_HOME and PATH:
    a. Login as root.
    b. create .bash_profile file
    c. Add Following in .bash_profile:
        JAVA_HOME=/usr/lib/jvm/java-6-oracle
        PATH=$PATH:$HOME/bin:$JAVA_HOME/bin
        JRE_HOME=/usr/lib/jvm/java-6-oracle
        PATH=$PATH:$HOME/bin:$JRE_HOME/bin
        export JAVA_HOME
        export JRE_HOME
        export PATH
    d. Source .bash_profile
   
11. Test Java:
    a. Creat a file name HelloWorld.java and enter the following:
        public class HelloWorld {
           public static void main(String[] args) {
                  System.out.println("Hello, World");
               }
        }
    b. Compile HelloWorld.java
        $javac HelloWorld.java
    c. Run HelloWorld
        $java HelloWorld

12. Install Eclipse(Indigo):
    a. Download and install Eclipse from 'Ubuntu Software Center'
    b. Go to terminal and run following:(http://askubuntu.com/questions/138019/unable-to-open-eclipse-ide-due-to-missing-symlink)
        ln -s /usr/lib/jni/libswt-* ~/.swt/lib/linux/x86_64/

12. Removed Eclipse-Indigo and Installed Eclipse Juno
    a. Remove Eclipse Indigo from Ubuntu Software Center
    b. Download Eclipse Juno from http://www.eclipse.org/downloads/
    c. extract the gz file
        $ tar xvf eclipse-jee-juno-SR2-linux-gtk-x86_64.tar.gz
        this is extracted in eclipse folder
    d. login as root
        $ su
    e. copy the extracted eclipse folder in /opt
        $ cp /home/tejeswar/Downloads/eclipse /opt
    f. create a desktop icon
        $ ln -s /opt/eclipse/eclipse /home/tejeswar/Desktop

13. Test Eclipse:
    a. Create a java project 'Sample'
    b. Create a package 'sample'
    c. Create a class 'HelloWorld' and enter following:
        public class HelloWorld {
           public static void main(String[] args) {
                  System.out.println("Hello, World");
               }
        }
    d. Right click on 'HelloWorld.java' and run as 'Java Application'

---->>>>> Hadoop and its eco system downloaded from the following link <<<<<<<-----------
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDHTarballs/3.25.2013/CDH3-Downloadable-Tarballs/CDH3-Downloadable-Tarballs.html

14. Install Hadoop:
    ********Hadoop Prerequestie***********
    a. Go to hduser - Generate an SSH key for the hduser user.
        $ su - hduser
        $ ssh-keygen -t rsa -P ""
    b. Enable SSH access to your local machine with this newly created key
        $ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
    c. Test the SSH setup by connecting to your local machine with the hduser user
        $ ssh localhost
    d. Disabling IPv6
        -> Open /etc/sysctl.conf
        -> Enter following:
            # disable ipv6
            net.ipv6.conf.all.disable_ipv6 = 1
            net.ipv6.conf.default.disable_ipv6 = 1
            net.ipv6.conf.lo.disable_ipv6 = 1
    e. Reboot your machine in order to make the changes take effect
    f. Check whether IPv6 is enabled on your machine with the following command:
        $ cat /proc/sys/net/ipv6/conf/all/disable_ipv6
        A return value of 0 means IPv6 is enabled, a value of 1 means disabled
   
    ************Hadoop Installation****************
    a. Download Apache Hadoop From Apache Mirror
    b. Goto /usr/local
        $ cd /usr/local
    c. Copy Hadoop file in local
        $ cp /home/eagroup/Hadoop_EcoSystem/hadoop-0.20.2-cdh3u6.tar.gz /usr/local/
    d. Extract Hadoop tar file
        $ sudo tar xzf hadoop-0.20.2-cdh3u6.tar.gz
    e. Rename Folder hadoop-0.20.2-cdh3u6 to hadoop
        $ sudo mv hadoop-0.20.2-cdh3u6 hadoop
    f. Delete the tar file
        $ rm hadoop-0.20.2-cdh3u6.tar.gz
    g. Give ownership to hduser
        $ sudo chown -R hduser:hadoop hadoop
    h. Update $HOME/.bashrc
        -> Copy .bashrc from /root to /home/hduser (if not exist)
            $ cp .bashrc /home/hduser/
        -> Give permissions and change ownership of .bashrc file of hduser
            $ chown hduser:hadoop /home/hduser/.bashrc
            $ chmod 755 /home/hduser/.bashrc
        -> switch to hduser
            $ su - hduser
        -> Edit and the following lines to $HOME/.bashrc
            # Set Hadoop-related environment variables
            export HADOOP_HOME=/usr/local/hadoop

            # Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
            export JAVA_HOME=/usr/lib/jvm/java-6-oracle

            # Some convenient aliases and functions for running Hadoop-related commands
            unalias fs &> /dev/null
            alias fs="hadoop fs"
            unalias hls &> /dev/null
            alias hls="fs -ls"

            # If you have LZO compression enabled in your Hadoop cluster and
            # compress job outputs with LZOP (not covered in this tutorial):
            # Conveniently inspect an LZOP compressed file from the command
            # line; run via:
            #
            # $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
            #
            # Requires installed 'lzop' command.
            #
            lzohead () {
                hadoop fs -cat $1 | lzop -dc | head -1000 | less
            }

            # Add Hadoop bin/ directory to PATH
            export PATH=$PATH:$HADOOP_HOME/bin
    i.Configuring Hadoop
        -> login as root
            $ su
        -> Open and edit /usr/local/hadoop/conf/hadoop-env.sh
        -> set the JAVA_HOME environment variable
            export JAVA_HOME=/usr/lib/jvm/java-6-oracle
        -> Create Temp directory and set the required ownerships and permissions
            $ sudo mkdir -p /app/hadoop/tmp
            $ sudo chown hduser:hadoop /app/hadoop/tmp
        -> And if you want to tighten up security, chmod from 755 to 750...
            $ sudo chmod 750 /app/hadoop/tmp

        -> Edit /usr/local/hadoop/conf/core-site.xml
            <property>
              <name>hadoop.tmp.dir</name>
              <value>/app/hadoop/tmp</value>
              <description>A base for other temporary directories.</description>
            </property>

            <property>
              <name>fs.default.name</name>
              <value>hdfs://localhost:54310</value>
              <description>The name of the default file system.  A URI whose
              scheme and authority determine the FileSystem implementation.  The
              uri's scheme determines the config property (fs.SCHEME.impl) naming
              the FileSystem implementation class.  The uri's authority is used to
              determine the host, port, etc. for a filesystem.</description>
            </property>

        -> Edit /usr/local/hadoop/conf/mapred-site.xml
            <property>
              <name>mapred.job.tracker</name>
              <value>localhost:54311</value>
              <description>The host and port that the MapReduce job tracker runs
              at.  If "local", then jobs are run in-process as a single map
              and reduce task.
              </description>
            </property>

        -> Edit /usr/local/hadoop/conf/hdfs-site.xml
            <property>
              <name>dfs.replication</name>
              <value>1</value>
              <description>Default block replication.
              The actual number of replications can be specified when the file is created.
              The default is used if replication is not specified in create time.
              </description>
            </property>

    j. Formatting the HDFS filesystem via the NameNode
        -> Login as hduser
            $ su - hduser
        -> To format the filesystem run the command
            $ /usr/local/hadoop/bin/hadoop namenode -format

    k. Starting your single-node cluster
        -> Run the command:
            $ /usr/local/hadoop/bin/start-all.sh

        -> After running start-all.sh O/P is as follows:
            hduser@BI-Lab:~$ /usr/local/hadoop/bin/start-all.sh
            starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-BI-Lab.out
            localhost: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-BI-Lab.out
            localhost: starting secondarynamenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-secondarynamenode-BI-Lab.out
            starting jobtracker, logging to /usr/local/hadoop/logs/hadoop-hduser-jobtracker-BI-Lab.out
            localhost: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-tasktracker-BI-Lab.out

    l. Checking whether the expected Hadoop processes are running is 'jps'
        hduser@BI-Lab:~$ jps
        3855 JobTracker
        3780 SecondaryNameNode
        4139 Jps
        3551 DataNode
        3295 NameNode
        4079 TaskTracker

    m. Check with 'netstat' if Hadoop is listening on the configured ports.
        $ sudo netstat -plten | grep java

    n. To browse hdfs file system
        $ hadoop fs -ls /

    o. To go to logs of hadoop for error checking:
        $ ls /usr/local/hadoop/logs/
       
    p. Stopping your single-node cluster
        -> Run the command
            $ /usr/local/hadoop/bin/stop-all.sh

    q. Hadoop Web Interfaces       
         -> http://localhost:50070/ – web UI of the NameNode daemon
         -> http://localhost:50030/ – web UI of the JobTracker daemon
         -> http://localhost:50060/ – web UI of the TaskTracker daemon

15. Installing Flume:
    a. Login as root
        $ su
    b. Copy tar from Hadoop_EcoSystem folder to /usr/local/lib
        $ cp flume-0.9.4-cdh3u6.tar.gz /usr/local/lib/
    c. Go to /usr/local/lib/
        $ cd /usr/local/lib/
    d. Extract flume-0.9.4-cdh3u6.tar.gz
        $ tar -xcvf flume-0.9.4-cdh3u6.tar.gz
    e. Rename flume-0.9.4-cdh3u6.tar.gz as flume
        $ mv flume-0.9.4-cdh3u6 flume
    f. Delete tar file
        $ rm flume-0.9.4-cdh3u6.tar.gz
    g. Give ownership to hduser
        $ sudo chown -R hduser:hadoop flume

    ************Configuring Flume**************
    a. Set the environment variable $FLUME_CONF_DIR in .bashrc
        -> login as hduser
            $ su - hduser
        -> Open and Add the following in .bashrc
            export FLUME_CONF_DIR=/usr/local/lib/flume/conf
            export FLUME_HOME=/usr/local/lib/flume
            export PATH=$PATH:$HADOOP_HOME/bin:$FLUME_HOME/bin:.
        -> Run .bashrc
            $ source .bashrc

    ************Flume Service*******************
     a. To start the Flume node:
        $ flume node
    b. To start the Flume master:
        $ flume master

16. Installing HBase:
    a. Login as root
        $ su
    b. Copy tar from Hadoop_EcoSystem folder to /usr/local
        $ cp /home/eagroup/Hadoop_EcoSystem/hbase-0.90.6-cdh3u6.tar /usr/local
    c. Go to /usr/local/
        $ cd /usr/local
    d. Extract hbase-0.90.6-cdh3u6.tar
        $ tar -xvf hbase-0.90.6-cdh3u6.tar
    e. Rename hbase-0.90.6-cdh3u6.tar as hbase
        $ mv hbase-0.90.6-cdh3u6 hbase
    f. Delete tar file
        $ rm hbase-0.90.6-cdh3u6.tar
    g. Give ownership to hduser
        $ sudo chown -R hduser:hadoop hbase

    ***********HBase Configuration************
    a. login as hduser
        $ su - hduser
    b. Navigate to hbase conf directory and set the hbase-site.xml
        -> Go to /usr/local/hbase/conf
        $ nano /usr/local/hbase/conf/hbase-site.xml
       
        <property>
        <name>hbase.rootdir</name>
        <value>file:///usr/local/hbase/temp</value>
        </property>
        <property>
        <name>hbase.master</name>
        <value>localhost:60000</value>
        <description>The host and port that the HBase master runs at.</description>
        </property>
        <property>
        <name>hbase.regionserver.port</name>
        <value>60020</value>
        <description>The host and port that the HBase master runs at.</description>
        </property>
        <!--<property>
        <name>hbase.master.port</name>
        <value>60000</value>
        <description>The host and port that the HBase master runs at.</description>
        </property>-->
        <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
        </property>
        <property>
        <name>hbase.tmp.dir</name>
        <value>/usr/local/hbase/temp</value>
        </property>
        <property>
        <name>hbase.zookeeper.quorum</name>
        <value>localhost</value>
        </property>
        <property>
        <name>dfs.replication</name>
        <value>2</value>
        </property>
        <property>
        <name>hbase.zookeeper.property.clientPort</name>
        <value>2181</value>
        <description>Property from ZooKeeper's config zoo.cfg.
        The port at which the clients will connect.
        </description>
        </property>
        <property>
        <name>hbase.zookeeper.property.dataDir</name>
        <value>/usr/local/zookeeper</value>
        <description>Property from ZooKeeper's config zoo.cfg.
        The directory where the snapshot is stored.
        </description>
        </property>
        <property>
        <name>zookeeper.session.timeout</name>
        <value>1800000</value>
        <description>Session Time out.</description>
        </property>
        <property>
        <name>hbase.client.scanner.caching</name>
        <value>500</value>
        </property>
        <property>
        <name>hbase.regionserver.lease.period</name>
        <value>240000</value>
        </property>
        <property>
        <name>dfs.support.append</name>
        <value>true</value>
        </property>

    c. Add log directory:
        $ mkdir /usr/local/hbase/logs

    d. Add temp directory:
        $ mkdir /usr/local/hbase/temp

    e. Add pid directory:
        $ mkdir /usr/local/hbase/pid   
   
    c. open the hbase-env.sh file present in the hbase conf directory and add following lines:
        export JAVA_HOME=/usr/lib/jvm/java-6-oracle
        export HBASE_REGIONSERVERS=/usr/local/hbase/conf/regionservers
        export HBASE_LOG_DIR=/usr/local/hbase/logs
        export HBASE_PID_DIR=/usr/local/hbase/pid
        export HBASE_MANAGES_ZK=false
        export HBASE_CONF_DIR=/usr/local/hbase/conf

    d. edit hduser .bashrc file and add following lines
        export HBASE_HOME=/usr/local/hbase
        export PATH=$PATH:$HADOOP_HOME/bin:$FLUME_HOME/bin:$ZOOBINDIR:$HBASE_HOME/bin:.
   
    **************HBase Service*****************
    a. $ start-hbase.sh
    b. $ stop-hbase.sh
    c. $ hbase master start (if HMaster dont start from start-hbase.sh)
    d. Browse Hbase files: file:///usr/local/hbase/temp

17. Installing Zookeeper:
    a. Login as root
        $ su
    b. Copy tar from Hadoop_EcoSystem folder to /usr/local
        $ cp /home/eagroup/Hadoop_EcoSystem/zookeeper-3.3.5-cdh3u6.tar /usr/local
    c. Go to /usr/local/
        $ cd /usr/local
    d. Extract zookeeper-3.3.5-cdh3u6.tar
        $ tar -xvf zookeeper-3.3.5-cdh3u6.tar
    e. Rename zookeeper-3.3.5-cdh3u6.tar as zookeeper
        $ mv zookeeper-3.3.5-cdh3u6 zookeeper
    f. Delete tar file
        $ rm zookeeper-3.3.5-cdh3u6.tar
    g. Give ownership to hduser
        $ sudo chown -R hduser:hadoop zookeeper
   
    *************Configuring Zookeeper****************
    a. login as hduser and Go to /usr/local/zookeeper/conf
    b. Edit and add the following lines in zoo.cfg
        $ nano zoo.cfg                                              
        #server.0=localhost:2888:3888   
        server.1=zoo1:2888:3888                            
          
    c. Create dir /var/zookeeper and change ownership and permission
        $ mkdir /var/zookeeper
        $ chmod 755 /var/zookeeper
        $ chown hduser:hadoop /var/zookeeper
    c. Add following lines in the /etc/hosts file
        $ nano /etc/hosts
        localhost   zoo1
        localhost   zoo2                                                                    
        localhost   zoo3
    d. Change '127.0.1.1       BI-Lab' to '127.0.0.1       BI-Lab' in the above file
    e. Create a file 'myid' in /var/zookeeper and add entry '1' in it.
        $ touch /var/zookeeper/myid
        $ nano /var/zookeeper/myid
    f. Edit hduser .bashrc file and add following lines
        export ZOOCFGDIR=/usr/local/zookeeper/conf
        export ZOOBINDIR=/usr/local/zookeeper/bin
        export PATH=$PATH:$HADOOP_HOME/bin:$FLUME_HOME/bin:$ZOOBINDIR:$HBASE_HOME/bin:.
    g. $ source $HOME/.bashrc
   
    ************Zookeeper Service***************
    a. $ zkServer.sh start
    b. $ zkServer.sh stop

18. Installing Sqoop:
    a. Login as root
        $ su
    b. Copy tar from Hadoop_EcoSystem folder to /usr/local
        $ cp /home/eagroup/Hadoop_EcoSystem/sqoop-1.3.0-cdh3u6.tar.gz /usr/local
    c. Go to /usr/local/
        $ cd /usr/local
    d. Extract sqoop-1.3.0-cdh3u6.tar.gz
        $ tar xzf sqoop-1.3.0-cdh3u6.tar.gz
    e. Rename sqoop-1.3.0-cdh3u6.tar.gz as sqoop
        $ mv sqoop-1.3.0-cdh3u6 sqoop
    f. Delete tar file
        $ rm sqoop-1.3.0-cdh3u6.tar.gz
    g. Give ownership to hduser
        $ sudo chown -R hduser:hadoop sqoop

    **********Configure Sqoop******************
    a. Add the following line in $HOME/.bashrc
        export SQOOP_HOME=/usr/local/sqoop
        export PATH=$PATH:$HADOOP_HOME/bin:$SQOOP_HOME/bin:$FLUME_HOME/bin:$ZOOBINDIR:$HBASE_HOME/bin:.

    *******Sqoop Commands*********
    a. $sqoop help

19. Installing MySql:
    a. From 'Ubuntu Software Center' install mysql server
    b. From 'Ubuntu Software Center' install mysql client
    c. Installing mysql workbench:
        -> $ sudo add-apt-repository ppa:olivier-berten/misc
        -> $ sudo apt-get update
        -> $ sudo apt-get install mysql-workbench   

20. Installing Hive:
    a. Login as root
        $ su
    b. Copy tar from Hadoop_EcoSystem folder to /usr/local
        $ cp /home/eagroup/Hadoop_EcoSystem/hive-0.7.1-cdh3u6.tar.gz /usr/local
    c. Go to /usr/local/
        $ cd /usr/local
    d. Extract hive-0.7.1-cdh3u6.tar.gz
        $ tar xzf hive-0.7.1-cdh3u6.tar.gz
    e. Rename hive-0.7.1-cdh3u6.tar.gz as hive
        $ mv hive-0.7.1-cdh3u6 hive
    f. Delete tar file
        $ rm hive-0.7.1-cdh3u6.tar.gz
    g. Give ownership to hduser
        $ sudo chown -R hduser:hadoop hive

    **********Configure Hive******************
    a. Add the following line in $HOME/.bashrc
        export HIVE_PORT=10000

        export HIVE_HOME=/usr/local/hive
        export PATH=$PATH:$HADOOP_HOME/bin:$SQOOP_HOME/bin:$HIVE_HOME/bin:$FLUME_HOME/bin:$ZOOBINDIR:$HBASE_HOME/bin:.

    *******Hive Commands*********
    a. $hive
    b. hive> show databases;

21. Installing Pig:
    a. Login as root
        $ su
    b. Copy tar from Hadoop_EcoSystem folder to /usr/local
        $ cp /home/eagroup/Hadoop_EcoSystem/pig-0.8.1-cdh3u6.tar.gz /usr/local
    c. Go to /usr/local/
        $ cd /usr/local
    d. Extract pig-0.8.1-cdh3u6.tar.gz
        $ tar xzf pig-0.8.1-cdh3u6.tar.gz
    e. Rename pig-0.8.1-cdh3u6.tar.gz as pig
        $ mv pig-0.8.1-cdh3u6 pig
    f. Delete tar file
        $ rm pig-0.8.1-cdh3u6.tar.gz
    g. Give ownership to hduser
        $ sudo chown -R hduser:hadoop pig

    **********Configure Pig******************
    a. Add the following line in $HOME/.bashrc
        export PIG_HOME=/usr/local/pig
        export PATH=$PATH:$HADOOP_HOME/bin:$PIG_HOME/bin:$SQOOP_HOME/bin:$HIVE_HOME/bin:$FLUME_HOME/bin:$ZOOBINDIR:$HBASE_HOME/bin:.

    *******Pig Commands*********
    a. $pig
    b. grunt> pig help
   

22. Installing Oozie:
    a. Login as root
        $ su
    b. Copy tar from Hadoop_EcoSystem folder to /usr/local
        $ cp /home/eagroup/Hadoop_EcoSystem/oozie-2.3.2-cdh3u6.tar.gz /usr/local
    c. Go to /usr/local/
        $ cd /usr/local
    d. Extract oozie-2.3.2-cdh3u6.tar.gz
        $ tar xzf oozie-2.3.2-cdh3u6.tar.gz
    e. Rename oozie-2.3.2-cdh3u6.tar.gz as oozie
        $ mv oozie-2.3.2-cdh3u6 oozie
    f. Delete tar file
        $ rm oozie-2.3.2-cdh3u6.tar.gz
    g. Give ownership to hduser
        $ sudo chown -R hduser:hadoop oozie

    **********Configure oozie******************
    a. Add the following line in $HOME/.bashrc
      export OOZIE_HOME=/usr/local/oozie
      export PATH=$PATH:$HADOOP_HOME/bin:$PIG_HOME/bin:$OOZIE_HOME/bin:$SQOOP_HOME/bin:$HIVE_HOME/bin:$FLUME_HOME/bin:$ZOOBINDIR:$HBASE_HOME/bin:.
    b. Create the Oozie database and Oozie MySQL user.
        $ mysql -u root -p
        Enter password: ******

        mysql> create database oozie;
        Query OK, 1 row affected (0.03 sec)

        mysql>  grant all privileges on oozie.* to 'oozie'@'localhost' identified by 'oozie';
        Query OK, 0 rows affected (0.03 sec)

        mysql>  grant all privileges on oozie.* to 'oozie'@'%' identified by 'oozie';
        Query OK, 0 rows affected (0.03 sec)

        mysql> exit
        Bye
    c. Configure Oozie to use MySQL
        -> go to /usr/local/oozie/conf and edit oozie-site.xml file as follows:
            <property>
            <name>oozie.service.StoreService.jdbc.driver</name>
            <value>com.mysql.jdbc.Driver</value>
            </property>
            <property>
            <name>oozie.service.StoreService.jdbc.url</name>
            <value>jdbc:mysql://localhost:3306/oozie</value>
            </property>
            <property>
            <name>oozie.service.StoreService.jdbc.username</name>
            <value>oozie</value>
            </property>
            <property>
            <name>oozie.service.StoreService.jdbc.password</name>
            <value>oozie</value>
            </property>
    d. Download mysql-connector-java-5.1.6.jar from http://dev.mysql.com/downloads/connector/j/
    e. Download the ExtJS version 2.2 library from http://extjs.com/deploy/ext-2.2.zip
    f. Extract mysql-connector-java-5.1.25-bin.jar and copy to /usr/local/oozie/lib folder   
    g. Add the MySQL JDBC driver JAR to Oozie.(go to /user/local/oozie/lib)
       $ sudo -u hduser /usr/local/oozie/bin/oozie-setup.sh -jars mysql-connector-java-5.1.25-bin.jar -extjs /home/hduser/Downloads/ext-2.2.zip

    *******oozie Services*********
    a. $ oozie-start.sh
    b. $ oozie-stop.sh

23. Installing Hue: (http://archive.cloudera.com/cdh/3/hue/manual.html#_further_hadoop_configuration_and_caveats)
    a. Login as root
        $ su
    b. Copy tar from Hadoop_EcoSystem folder to /usr/local
        $ cp /home/eagroup/Hadoop_EcoSystem/hue-1.2.0.0-cdh3u6.tar.gz /usr/local
    c. Go to /usr/local/
        $ cd /usr/local
    d. Extract hue-1.2.0.0-cdh3u6.tar.gz
        $ tar xzf hue-1.2.0.0-cdh3u6.tar.gz
    e. Rename hue-1.2.0.0-cdh3u6.tar.gz as hue
        $ mv hue-1.2.0.0-cdh3u6 hue
    f. Delete tar file
        $ rm hue-1.2.0.0-cdh3u6.tar.gz
    g. Give ownership to hduser
        $ sudo chown -R hduser:hadoop hue

    **********hue Prerequesties******************
    -> login as hduser
    a. Install Python2.7
        $ dpkg -l gcc libxml2 libxslt libsasl2 libmysqlclient python python-setuptools python-simplejson libsqlite3 ant
        $ sudo add-apt-repository ppa:fkrull/deadsnakes
        $ sudo apt-get update
        $ sudo apt-get install python-dev
        $ sudo apt-get install libxml2-dev
        $ sudo apt-get install libxslt-dev
        $ sudo apt-get install libmysqlclient-dev
        $ sudo apt-get install libsqlite3-dev
        $ sudo apt-get build-dep python-ldap

    a. Add the following line in $HOME/.bashrc
        export HUE_HOME=/usr/local/hue-1.2.0.0-cdh3u6/hue/build/env
        export PATH=$PATH:$HADOOP_HOME/bin:$HUE_HOME/bin:$OOZIE_HOME/bin:$PIG_HOME/bin:$SQOOP_HOME/bin:$HIVE_HOME/bin:$FLUME_HOME/bin:$ZOOBINDIR:$HBASE_HOME/bin:/usr/local/Talend:.

        b. Configure $HADOOP_HOME and $PREFIX with the path of your Hadoop installation and the path where you want to install Hue by running:
            -> Go to /usr/local/hue-1.2.0.0-cdh3u6
            -> HADOOP_HOME=/usr/local/hadoop PREFIX=/usr/local/hue-1.2.0.0-cdh3u6 make install
        c.## Install plug-ins
            $ cd /usr/local/hadoop/lib
            $ ln -s /usr/local/hue-1.2.0.0-cdh3u6/hue/desktop/libs/hadoop/java-lib/hue-plugins-1.2.0-cdh3u6.jar .
        d. Restart Hadoop
            $ stop-all.sh
            $ start-all.sh
        e. hadoop-metrics.properties[To enable full monitoring in the Health application,
            the metrics contexts must not be NullContext] (go to /usr/local/hadoop/conf) and add the following:
                # Exposes /metrics URL endpoint for metrics information.
                dfs.class=org.apache.hadoop.metrics.spi.NoEmitMetricsContext
                mapred.class=org.apache.hadoop.metrics.spi.NoEmitMetricsContext
                jvm.class=org.apache.hadoop.metrics.spi.NoEmitMetricsContext
                rpc.class=org.apache.hadoop.metrics.spi.NoEmitMetricsContext
        f. sudo apt-get install libsasl2-modules-gssapi-mit
        g. edit hadoop-env.sh and add
            export HADOOP_CLASSPATH="/usr/local/hadoop/lib/:$HADOOP_CLASSPATH"
        h.ln -s usr/local/hive/conf/hive-default.xml /usr/local/hive/src/data/conf


        i. Configure Hadoop
            Edit hdfs-site.xml:

            <property>
              <name>dfs.namenode.plugins</name>
              <value>org.apache.hadoop.thriftfs.NamenodePlugin</value>
              <description>Comma-separated list of namenode plug-ins to be activated.
              </description>
            </property>
            <property>
              <name>dfs.datanode.plugins</name>
              <value>org.apache.hadoop.thriftfs.DatanodePlugin</value>
              <description>Comma-separated list of datanode plug-ins to be activated.
              </description>
            </property>

        j.Edit mapred-site.xml:

            <property>
              <name>mapred.jobtracker.plugins</name>
              <value>org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin</value>
              <description>Comma-separated list of jobtracker plug-ins to be activated.
              </description>
            </property>


        k. edit hue.ini and add
            path: /usr/local/hue-1.2.0.0-cdh3u6/hue/desktop/conf
            hadoop_home=/usr/local/hadoop
            http_host=127.0.0.1

        l. edit hue-beeswax.ini
            hive_home_dir=/usr/local/hive
            hive_conf_dir=/usr/local/hive/conf
        m. Start Hue, use build/env/bin/supervisor of hue_install
        -> http://localhost:8088
        -> http://localhost:8088/dump_config
        n.Run!
        $ /usr/local/hue/build/env/bin/supervisor




   

24. Installing Talend Open Studio: (http://www.talendforge.org/wiki/doku.php?id=doc:installation_on_ubuntu)
    1.Prerequisites
        Install Java on Linux
        Database client


    2. Install Talend Open Studio
        Download  Talend for DigData from  http://www.talend.com/download
        Get the archive file from the download section of the Talend website.
        Note that the TOS_BD-r95165-V5.2.1.zip file contains binaries for ALL platforms (Linux/Unix, Windows and MacOS).
        Once the download is complete, extract the archive files on your hard drive.
        tar-xvf  TOS_BD-r95165-V5.2.1 /usr/local
                mv TOS_BD-r95165-V5.2.1 to /usr/lcoal/Talend   
        Edit TOS_BD-linux-gtk-x86_64.ini file and add following lines
            -vmargs
            -Xms64m
            -Xmx1536m
            -XX:MaxPermSize=512m
            -Dfile.encoding=UTF-8
        Give execute permission for file
            chmod +x /usr/local/Talend/TOS_BD-linux-gtk-x86_64
        create script start_talend.sh and add following lines to start Talend.
            vi start_talend.sh
            #!/bin/sh
            export GDK_NATIVE_WINDOWS=1
            /usr/local/Talend/TOS_BD-linux-gtk-x86_64
        execute start_talend.sh to start
            ./start_talend.sh
   





       


       

   




***************************************************************************************************************************
--------------------------Hadoop & Its Eco System Services Guidelines------------------------------------------------------
1. Login as hduser: (password - hduser)
    $ su - hduser

2. To check running services:
    $ jps

------------Starting Services-----------
1. To run Hadoop and its daemons: (name node, secondary name node, data node, job tracker and task tracker)
    $ start-all.sh

2. To start Zookeeper: (should start zookeeper before HBase)
    $ zkServer.sh start

3. To start HBase: (HMaster and HRegionServer services)
    $ start-hbase.sh
    $ hbase master start (if HMaster dont start from start-hbase.sh)

4. To start the Flume node:
    $ flume node

5. To start the Flume master:
    $ flume master

6. To start Oozie service:
    $ oozie-start.sh

7. To start hive:
    $ hive
        $ hive --service metastore

8. To start Hive Server
    $HIVE_HOME/bin/hive --service hiveserver

8. To start Pig:
    $ pig

9. To start sqoop:
    $ sqoop command

------------Stop Services-----------
1. To stop Hadoop and its daemons:
    $ stop-all.sh

2. To stop Zookeeper: (should start zookeeper before HBase)
    $ zkServer.sh stop

3. To stop HBase: (HMaster and HRegionServer services)
    $ stop-hbase.sh
    $ hbase master stop (if HMaster dont start from start-hbase.sh)

4. To stop the Flume node:
    $ cntl + c

5. To stop the Flume master:
    $ cntl + c

6. To stop Oozie service:
    $ oozie-stop.sh

7. To stop hive:
    hive>quit;

8. To stop Pig:
    grunt>quit;

Comments

Popular posts from this blog

how to get hive table size from metastore mysql

select    d.name  as db_name ,t.tbl_name     as tbl_name ,from_unixtime(min(t.create_time))   as create_time ,min(t.owner)          as owner ,min(case when tp.param_key = 'COLUMN_STATS_ACCURATE'  then tp.param_value                 end) as COLUMN_STATS_ACCURATE ,min(case when tp.param_key = 'last_modified_by'       then tp.param_value                 end) as last_modified_by ,min(case when tp.param_key = 'last_modified_time'     then from_unixtime(tp.param_value)  end) as last_modified_time  ,min(case when tp.param_key = 'numFiles'               then tp.param_value                 end) as numFiles ,min(case when tp.param_key = 'numRows'                th...

Hadoop Yarn MR(MapReduce) streaming using Shell script part 2

Friends, This is a streaming MapReduce job (shell script) that reads any text input and computes the average length of all words that start with each character . --------------------------------------------------------------------------------------------------------------------------------------------------------------- $ cat avg_ln_mpr.sh #! /bin/bash while  read  line do  for word in `echo $line`  do     c=`expr substr $word 1 1`     l=`expr length $word`     echo $c $l  done     done --------------------------------------------------------------------------------------------------------------------------------------------------------------- $ cat avg_ln_rdr.sh #! /bin/bash old='' new='' val='' cnt=1 sum=0 avg=0 start=0 while  read  line do new=`echo $line|cut -d' ' -f1` val=`echo $line|cut -d' ' -f2` if [ "$old" != "$new" ]; then [ $start -ne 0 ] &...

MySQL replication - Master Slave Easy way with Crash test sample

I expect we have Master and Slave machines having MySQL installed  on both with server-id as 1 and 2 on Master and Slave . Mysql Replication steps: On Master: stop all transactions. mysql> FLUSH TABLES WITH READ LOCK; mysql> show master status ; +---------------+----------+--------------+------------------+ | File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | +---------------+----------+--------------+------------------+ | binlog.000005 |  4913710 |              |                  | +---------------+----------+--------------+------------------+ take mysql dump of Master $ mysqldump -u root -p --all-databases --master-data > dbdump.sql mysql> unlock tables; transfer dump file  to slave host scp dbdump.sql  usr@slave:/tmp/ On Slave: [usr@slave ~]$ ls -ltr -rwx------ 1 usr usr 57319214 Nov  6 06:06 dbdump.sql...