Skip to main content

Posts

Showing posts from 2021

Hive sentry group and role mapping report generation

 All, Lets try to make our hadoop admin work easy by setting up automated sentry group role mapping which will help to list groups and roles in a easy way. It's very simple. create one script to generate html report and use httpd/tomcat to view it. Sample code would be like shown below.: beeline --silent=true --showHeader=false --outputformat=csv2 -u "jdbc:hive2://edge_node:10000/default;principal=hive/_HOST@Domain.COM" -e "show roles;"|awk '{ print "show grant role " $0 ";" }' >$RLE beeline --silent=true --showHeader=false  --outputformat=csv2 -u "jdbc:hive2://edge_node:10000/default;principal=hive/_HOST@ADomain" -f $RLE |sort |grep -v '^$' | awk -F',' '{ print $5","$1","$2","$7","$9}' >$GRT mysql sentry -u sentry -p"sentry_password" -e "SELECT SENTRY_ROLE.ROLE_NAME,SENTRY_GROUP.GROUP_NAME FROM SENTRY_ROLE_GROUP_MAP JOIN SENTRY_ROLE ON SENT...
Hive Database Schema usage report UI using html All, Here we are trying to generate Hive schema usage report using shell scripting and html. Prerequisites: edge node with access to hive. httpd /tomcat service is running on this node. Google chart api Create one script Hive_Rpt.sh , Let put high level steps here. declare variables for Log/stage/tmp files Get database list  using beeline/cmp Api calls , if you have single hive for multiple environments like dev/sit/prod , Please make necessary changes in  code to identify the environment. Describe database <database name> will provide the database location in hdfs capture that to variable  Also you can get hive table partition details using below query and format to html code. SELECT DBS.NAME DB_NAME, TBLS.TBL_NAME TABLE_NAME, COUNT(*) NO_OF_PARTITIONS FROM PARTITIONS JOIN TBLS ON (TBLS.TBL_ID = PARTITIONS.TBL_ID) JOIN DBS ON (DBS.DB_ID = TBLS.DB_ID) GROUP BY TBLS.TBL_ID ORDER BY NO_OF_PARTITIONS DESC; eg: mysql ...