Skip to main content

Posts

Showing posts from 2015

Hadoop Yarn MR(MapReduce) streaming using Shell script part 2

Friends, This is a streaming MapReduce job (shell script) that reads any text input and computes the average length of all words that start with each character . --------------------------------------------------------------------------------------------------------------------------------------------------------------- $ cat avg_ln_mpr.sh #! /bin/bash while  read  line do  for word in `echo $line`  do     c=`expr substr $word 1 1`     l=`expr length $word`     echo $c $l  done     done --------------------------------------------------------------------------------------------------------------------------------------------------------------- $ cat avg_ln_rdr.sh #! /bin/bash old='' new='' val='' cnt=1 sum=0 avg=0 start=0 while  read  line do new=`echo $line|cut -d' ' -f1` val=`echo $line|cut -d' ' -f2` if [ "$old" != "$new" ]; then [ $start -ne 0 ] &...

Hadoop Yarn MR(MapReduce) streaming using Shell script

Hello friends, Let's check how to run one simple map reduce program in Linux environment. It's a word count program. 1. create file words.txt with few words like shown below. words.txt -------------------------------- cow india japan america japan hindu muslim christian india cow america america america china india china pakistan 2. cp words.txt to hdfs (give appropriate path) hadoop fs -copyFromLocal words.txt /user/cloudera/words.txt 3. create mapper.sh wc_mapper.sh -------------------------- #! /bin/bash while  read line do  for  word in $line  do     echo  $word 1  done done 4.create reducer.sh wc_reducer.sh ------------------------ #! /bin/bash cnt=0 old='' new='' start=0 while read line do new=`echo $line|cut  -d' ' -f1` if  [ "$new" != "$old" ]; then [ $start -ne 0 ] && echo -e "$old\t$cnt" old=$new cnt=1 start=1 else cnt=$(( $cnt + 1 )) fi; done echo -e ...