hue

HUE - Hadoop User Experience

This initialization action installs the latest version of Hue on a master node within a Google Cloud Dataproc cluster.

Using this initialization action

⚠️ NOTICE: See best practices of using initialization actions in production.

You can use this initialization action to create a new Dataproc cluster with Hue installed:

Use the gcloud command to create a new cluster with this initialization action.

REGION=<region>
CLUSTER_NAME=<cluster_name>
gcloud dataproc clusters create ${CLUSTER_NAME} \
    --region ${REGION} \
    --initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/hue/hue.sh

Once the cluster has been created, Hue is configured to run on port 8888 on the master node in a Dataproc cluster. To connect to the Hue web interface, you will need to create an SSH tunnel and use a SOCKS 5 Proxy with your web browser as described in the dataproc web interfaces documentation. In the opened web browser go to 'localhost:8888' and you should see the Hue UI.

Important notes

If you wish to use Oozie in Hue, it must be installed before running this initialization action e.g. put the Oozie initialization action before this one in the --initialization-actions list argument to gcloud dataproc clusters create.

Example: Hue Hive (SQL UI) - basic SQL queries

Create warehouse bucket:

export WAREHOUSE_BUCKET=<BUCKET NAME>
gsutil mb -l ${REGION} gs://${WAREHOUSE_BUCKET}

Prepare dataset: Copy the sample dataset to your warehouse bucket:

gsutil cp gs://hive-solution/part-00000.parquet \
gs://${WAREHOUSE_BUCKET}/datasets/transactions/part-00000.parquet

Create Dataproc cluster using hue.sh init script

export CLUSTER_NAME=<CLUSTER_NAME>
export REGION=<REGION>
export INIT_BUCKET=<INIT_BUCKET_NAME>
gcloud dataproc clusters create ${CLUSTER_NAME} \
--region ${REGION} \
--initialization-actions gs://${INIT_BUCKET}/hue.sh \
--properties "hive:hive.metastore.warehouse.dir=gs://${WAREHOUSE_BUCKET}/datasets"

Configure Hue editor in the Dataproc cluster. SSH on master node and edit hue.ini as follows:
Under [beeswax] section configure Apache Hive connector by uncommenting hive_server_host and hive_server_port
Under [[ interpreters ]] section add following block
[[[Beeswax]]] name=Hive interface=hiveserver2
Restart hue: [link to section]
Exit master node
Connect to the Hue web interface: Create an SSH tunnel and use a SOCKS 5 Proxy with your web browser as described in the dataproc web interfaces documentation. In the opened web browser go to 'localhost:8888' and you should see the Hue UI.

Hue UI: Create an external Hive table for the dataset:

CREATE EXTERNAL TABLE transactions
(SubmissionDate DATE, TransactionAmount DOUBLE, TransactionType STRING)
STORED AS PARQUET
LOCATION 'gs://${WAREHOUSE_BUCKET}/datasets/transactions';"

Run the following simple HiveQL query to select the 10 transactions:
```
SELECT *
	FROM transactions
	LIMIT 10;"
```

Another query:

SELECT TransactionType, COUNT(TransactionType) as Count
    FROM transactions
    WHERE SubmissionDate = '2017-08-22'
    GROUP BY TransactionType;"

Common issues

Missing configurations for the distributed Hadoop services

The hue.ini configuration file is configured to assume a single-node cluster with hostnames set to localhost and port numbers set to default values. For a distributed system, the different sections for each Hadoop service must be updated with the correct hostnames and ports for the servers on which the services are running.

The hue.sh script performs some generic updates to the hostnames, such as replacing all occurrences of localhost with the fully qualified domain name (FQDN), but additional configurations may be required based on the available services, as described in the documentation for How to configure Hue for your Hadoop cluster.

Changes made to hue.ini are not effective after restart using systemctl {#changes-made-to-hue.ini-are-not-effective-after-restart-using-systemctl}

systemctl restart hue.service leaves behind some orphaned processes and the following command can be used to perform a clean restart of Hue: sudo /etc/init.d/hue force-stop && sudo /etc/init.d/hue start
User [hue] not defined as proxyuser" when integrated with Oozie

This is caused by missing ProxyUser/impersonation configuration for the logged in user in Oozie.
Hue.sh only adds the default user Hue as a ProxyUser, therefore all the other users must also be added.
1. Identify the impacted user for example in the error message below it is the doAs user `hueuser`: 401 Client Error: Unauthorized for url: http://<redacted>:11000/oozie/v1/jobs?len=100&doAs=hueuser&filter=user=hueadmin;startcreatedtime=-7d&user.name=hue&offset=1&timezone=America/Los_Angeles&jobtype=wf {"errorMessage":"User [hue] not defined as proxyuser","httpStatusCode":401} (error 401)
2. Add the following properties to oozie-site as follows replacing the #USER# with the impacted user in step 1 above.
```
<property>
    <name>oozie.service.ProxyUserService.proxyuser.\#USER\#.hosts\</name>  
    <value>*</value>  
</property>  
<property>  
    <name>oozie.service.ProxyUserService.proxyuser.\#USER\#.groups\</name\> 
    <value>*</value>  
</property>
```
1. Restart Oozie
  1. Identify oozie process PID
  2. Stop the Oozie process with sudo systemctl stop oozie.service and if it doesn't stop successfully, then use kill -9 <oozie-process-PID> to kill the process
  3. Confirm that the process is not running: ps \-ef | grep oozie
  4. Restart Oozie service sudo systemctl restart oozie.service

Name		Name	Last commit message	Last commit date
parent directory ..
BUILD		BUILD
README.md		README.md
__init__.py		__init__.py
another-query.png		another-query.png
create-hive-table.png		create-hive-table.png
hue-ui.png		hue-ui.png
hue.sh		hue.sh
run_queries.py		run_queries.py
simple-hiveql.png		simple-hiveql.png
test_hue.py		test_hue.py
verify_hue_running.py		verify_hue_running.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hue

hue

README.md

HUE - Hadoop User Experience

Using this initialization action

Important notes

Example: Hue Hive (SQL UI) - basic SQL queries

Common issues

Missing configurations for the distributed Hadoop services

Changes made to hue.ini are not effective after restart using systemctl {#changes-made-to-hue.ini-are-not-effective-after-restart-using-systemctl}

User [hue] not defined as proxyuser" when integrated with Oozie

Files

hue

Directory actions

More options

Directory actions

More options

Latest commit

History

hue

Folders and files

parent directory

README.md

HUE - Hadoop User Experience

Using this initialization action

Important notes

Example: Hue Hive (SQL UI) - basic SQL queries

Common issues

Missing configurations for the distributed Hadoop services

Changes made to hue.ini are not effective after restart using systemctl {#changes-made-to-hue.ini-are-not-effective-after-restart-using-systemctl}

User [hue] not defined as proxyuser" when integrated with Oozie