All, Lets try to make our hadoop admin work easy by setting up automated sentry group role mapping which will help to list groups and roles in a easy way. It's very simple. create one script to generate html report and use httpd/tomcat to view it. Sample code would be like shown below.: beeline --silent=true --showHeader=false --outputformat=csv2 -u "jdbc:hive2://edge_node:10000/default;principal=hive/_HOST@Domain.COM" -e "show roles;"|awk '{ print "show grant role " $0 ";" }' >$RLE beeline --silent=true --showHeader=false --outputformat=csv2 -u "jdbc:hive2://edge_node:10000/default;principal=hive/_HOST@ADomain" -f $RLE |sort |grep -v '^$' | awk -F',' '{ print $5","$1","$2","$7","$9}' >$GRT mysql sentry -u sentry -p"sentry_password" -e "SELECT SENTRY_ROLE.ROLE_NAME,SENTRY_GROUP.GROUP_NAME FROM SENTRY_ROLE_GROUP_MAP JOIN SENTRY_ROLE ON SENT...
Hive Database Schema usage report UI using html All, Here we are trying to generate Hive schema usage report using shell scripting and html. Prerequisites: edge node with access to hive. httpd /tomcat service is running on this node. Google chart api Create one script Hive_Rpt.sh , Let put high level steps here. declare variables for Log/stage/tmp files Get database list using beeline/cmp Api calls , if you have single hive for multiple environments like dev/sit/prod , Please make necessary changes in code to identify the environment. Describe database <database name> will provide the database location in hdfs capture that to variable Also you can get hive table partition details using below query and format to html code. SELECT DBS.NAME DB_NAME, TBLS.TBL_NAME TABLE_NAME, COUNT(*) NO_OF_PARTITIONS FROM PARTITIONS JOIN TBLS ON (TBLS.TBL_ID = PARTITIONS.TBL_ID) JOIN DBS ON (DBS.DB_ID = TBLS.DB_ID) GROUP BY TBLS.TBL_ID ORDER BY NO_OF_PARTITIONS DESC; eg: mysql ...