Skip to main content

Posts

Hive sentry group and role mapping report generation

 All, Lets try to make our hadoop admin work easy by setting up automated sentry group role mapping which will help to list groups and roles in a easy way. It's very simple. create one script to generate html report and use httpd/tomcat to view it. Sample code would be like shown below.: beeline --silent=true --showHeader=false --outputformat=csv2 -u "jdbc:hive2://edge_node:10000/default;principal=hive/_HOST@Domain.COM" -e "show roles;"|awk '{ print "show grant role " $0 ";" }' >$RLE beeline --silent=true --showHeader=false  --outputformat=csv2 -u "jdbc:hive2://edge_node:10000/default;principal=hive/_HOST@ADomain" -f $RLE |sort |grep -v '^$' | awk -F',' '{ print $5","$1","$2","$7","$9}' >$GRT mysql sentry -u sentry -p"sentry_password" -e "SELECT SENTRY_ROLE.ROLE_NAME,SENTRY_GROUP.GROUP_NAME FROM SENTRY_ROLE_GROUP_MAP JOIN SENTRY_ROLE ON SENT...
Recent posts
Hive Database Schema usage report UI using html All, Here we are trying to generate Hive schema usage report using shell scripting and html. Prerequisites: edge node with access to hive. httpd /tomcat service is running on this node. Google chart api Create one script Hive_Rpt.sh , Let put high level steps here. declare variables for Log/stage/tmp files Get database list  using beeline/cmp Api calls , if you have single hive for multiple environments like dev/sit/prod , Please make necessary changes in  code to identify the environment. Describe database <database name> will provide the database location in hdfs capture that to variable  Also you can get hive table partition details using below query and format to html code. SELECT DBS.NAME DB_NAME, TBLS.TBL_NAME TABLE_NAME, COUNT(*) NO_OF_PARTITIONS FROM PARTITIONS JOIN TBLS ON (TBLS.TBL_ID = PARTITIONS.TBL_ID) JOIN DBS ON (DBS.DB_ID = TBLS.DB_ID) GROUP BY TBLS.TBL_ID ORDER BY NO_OF_PARTITIONS DESC; eg: mysql ...

how to get hive table size from metastore mysql

select    d.name  as db_name ,t.tbl_name     as tbl_name ,from_unixtime(min(t.create_time))   as create_time ,min(t.owner)          as owner ,min(case when tp.param_key = 'COLUMN_STATS_ACCURATE'  then tp.param_value                 end) as COLUMN_STATS_ACCURATE ,min(case when tp.param_key = 'last_modified_by'       then tp.param_value                 end) as last_modified_by ,min(case when tp.param_key = 'last_modified_time'     then from_unixtime(tp.param_value)  end) as last_modified_time  ,min(case when tp.param_key = 'numFiles'               then tp.param_value                 end) as numFiles ,min(case when tp.param_key = 'numRows'                th...

MySQL replication - Master Slave Easy way with Crash test sample

I expect we have Master and Slave machines having MySQL installed  on both with server-id as 1 and 2 on Master and Slave . Mysql Replication steps: On Master: stop all transactions. mysql> FLUSH TABLES WITH READ LOCK; mysql> show master status ; +---------------+----------+--------------+------------------+ | File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | +---------------+----------+--------------+------------------+ | binlog.000005 |  4913710 |              |                  | +---------------+----------+--------------+------------------+ take mysql dump of Master $ mysqldump -u root -p --all-databases --master-data > dbdump.sql mysql> unlock tables; transfer dump file  to slave host scp dbdump.sql  usr@slave:/tmp/ On Slave: [usr@slave ~]$ ls -ltr -rwx------ 1 usr usr 57319214 Nov  6 06:06 dbdump.sql...
connect to impala from VM Ubuntu using Kerberos key tab                        On client machine:(Ubuntu-VM) 1)   Download unix odbc driver: http://www.unixodbc.org/unixODBC-2.3.4.tar.gz ./configure  make  make install 2)  install Cloudera impala odbc driver   clouderaimpalaodbc_2.5.32.1002-2_amd64.deb http://www.cloudera.com/downloads/connectors/impala/odbc/2-5-32.html  dpkg -i clouderaimpalaodbc_2.5.32.1002-2_amd64.deb 3) cp /etc/odbc.ini and /etc/ odbcinst.ini  to home directory. 4)  cat /home/user_impala/odbc.ini [ODBC DATA Sources] Impala_DSN=Cloudera Impala ODBC Driver 64-bit [Impala_DSN] Driver=/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so HOST=Impala.dev.Server.com Port=21050 Database=iris AuthMech=1 KrbRealm=DEV.SERVER.COM KrbFQDN=Impala.dev.Server.com KrbServiceName=impala UID=user_impala UseKeytab=1; #UPNKeytabMappingFile=/home/user_i...

Hadoop Yarn MR(MapReduce) streaming using Shell script part 2

Friends, This is a streaming MapReduce job (shell script) that reads any text input and computes the average length of all words that start with each character . --------------------------------------------------------------------------------------------------------------------------------------------------------------- $ cat avg_ln_mpr.sh #! /bin/bash while  read  line do  for word in `echo $line`  do     c=`expr substr $word 1 1`     l=`expr length $word`     echo $c $l  done     done --------------------------------------------------------------------------------------------------------------------------------------------------------------- $ cat avg_ln_rdr.sh #! /bin/bash old='' new='' val='' cnt=1 sum=0 avg=0 start=0 while  read  line do new=`echo $line|cut -d' ' -f1` val=`echo $line|cut -d' ' -f2` if [ "$old" != "$new" ]; then [ $start -ne 0 ] &...

Hadoop Yarn MR(MapReduce) streaming using Shell script

Hello friends, Let's check how to run one simple map reduce program in Linux environment. It's a word count program. 1. create file words.txt with few words like shown below. words.txt -------------------------------- cow india japan america japan hindu muslim christian india cow america america america china india china pakistan 2. cp words.txt to hdfs (give appropriate path) hadoop fs -copyFromLocal words.txt /user/cloudera/words.txt 3. create mapper.sh wc_mapper.sh -------------------------- #! /bin/bash while  read line do  for  word in $line  do     echo  $word 1  done done 4.create reducer.sh wc_reducer.sh ------------------------ #! /bin/bash cnt=0 old='' new='' start=0 while read line do new=`echo $line|cut  -d' ' -f1` if  [ "$new" != "$old" ]; then [ $start -ne 0 ] && echo -e "$old\t$cnt" old=$new cnt=1 start=1 else cnt=$(( $cnt + 1 )) fi; done echo -e ...