Skip to main content

Hadoop Yarn MR(MapReduce) streaming using Shell script

Hello friends,
Let's check how to run one simple map reduce program in Linux environment.
It's a word count program.



1. create file words.txt with few words like shown below.

words.txt
--------------------------------
cow india japan
america japan
hindu muslim christian
india cow
america america america
china
india
china pakistan

2. cp words.txt to hdfs (give appropriate path)
hadoop fs -copyFromLocal words.txt /user/cloudera/words.txt

3. create mapper.sh
wc_mapper.sh
--------------------------
#! /bin/bash
while  read line
do
 for  word in $line
 do
    echo  $word 1
 done
done


4.create reducer.sh
wc_reducer.sh
------------------------
#! /bin/bash
cnt=0
old=''
new=''
start=0
while read line
do
new=`echo $line|cut  -d' ' -f1`
if  [ "$new" != "$old" ]; then
[ $start -ne 0 ] && echo -e "$old\t$cnt"
old=$new
cnt=1
start=1
else
cnt=$(( $cnt + 1 ))
fi;
done
echo -e "$old\t$cnt"
  
5. invoke map-reduce  using following command. ( Give proper path)

hadoop jar    /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.3.0-mr1-cdh5.1.0.jar  -input /user/cloudera/words.txt  -output /user/cloudera/op_wc  -mapper wc_mapper.sh  -reducer wc_reducer.sh  -file wc_mapper.sh   -file wc_reducer.sh

6. Check file created in hdfs
$ hadoop fs -ls -R  /user/cloudera/op_wc
-rw-r--r--   1 cloudera cloudera          0 2015-02-18 03:27 /user/cloudera/op_wc/_SUCCESS
-rw-r--r--   1 cloudera cloudera         61 2015-02-18 03:27 /user/cloudera/op_wc/part-00000


$ hadoop fs -cat  /user/cloudera/op_wc2/part-00000
america    4
christian    1
cow    2
hindu    1
india    2
japan    2
muslim    1

------------------------------------





Done!!! Enjoy .  :) 

If you get trouble ping me at dhanooj.world@gmail.com






Comments

Popular posts from this blog

how to get hive table size from metastore mysql

select    d.name  as db_name ,t.tbl_name     as tbl_name ,from_unixtime(min(t.create_time))   as create_time ,min(t.owner)          as owner ,min(case when tp.param_key = 'COLUMN_STATS_ACCURATE'  then tp.param_value                 end) as COLUMN_STATS_ACCURATE ,min(case when tp.param_key = 'last_modified_by'       then tp.param_value                 end) as last_modified_by ,min(case when tp.param_key = 'last_modified_time'     then from_unixtime(tp.param_value)  end) as last_modified_time  ,min(case when tp.param_key = 'numFiles'               then tp.param_value                 end) as numFiles ,min(case when tp.param_key = 'numRows'                th...

Hadoop Yarn MR(MapReduce) streaming using Shell script part 2

Friends, This is a streaming MapReduce job (shell script) that reads any text input and computes the average length of all words that start with each character . --------------------------------------------------------------------------------------------------------------------------------------------------------------- $ cat avg_ln_mpr.sh #! /bin/bash while  read  line do  for word in `echo $line`  do     c=`expr substr $word 1 1`     l=`expr length $word`     echo $c $l  done     done --------------------------------------------------------------------------------------------------------------------------------------------------------------- $ cat avg_ln_rdr.sh #! /bin/bash old='' new='' val='' cnt=1 sum=0 avg=0 start=0 while  read  line do new=`echo $line|cut -d' ' -f1` val=`echo $line|cut -d' ' -f2` if [ "$old" != "$new" ]; then [ $start -ne 0 ] &...

MySQL replication - Master Slave Easy way with Crash test sample

I expect we have Master and Slave machines having MySQL installed  on both with server-id as 1 and 2 on Master and Slave . Mysql Replication steps: On Master: stop all transactions. mysql> FLUSH TABLES WITH READ LOCK; mysql> show master status ; +---------------+----------+--------------+------------------+ | File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | +---------------+----------+--------------+------------------+ | binlog.000005 |  4913710 |              |                  | +---------------+----------+--------------+------------------+ take mysql dump of Master $ mysqldump -u root -p --all-databases --master-data > dbdump.sql mysql> unlock tables; transfer dump file  to slave host scp dbdump.sql  usr@slave:/tmp/ On Slave: [usr@slave ~]$ ls -ltr -rwx------ 1 usr usr 57319214 Nov  6 06:06 dbdump.sql...