Friday, March 1, 2013

Oozie Example: Hive Actions

Running Hive through Oozie is pretty straight-forward and it's getting much simpler day-by-day. 1st time when I used it(old versions) I faced some issues mostly related to classpath though I resolved them. But when I used the recent versions (Hive 0.8+, Oozie 3.3.2+), I only faced 1 or 2 issues at most.

In this example, I'm going to execute a very simple Hive script through Oozie. I have a Hive table "temp" and it's currently empty. The script will load some data from HDFS to that specific hive table.
And here is the content of the script.hql:
Now you need to setup your Oozie workflow app folder. You need one very important file to execute Hive action through Oozie which is hive-site.xml. When Oozie executes a Hive action, it needs Hive's configuration file. You can provide multiple configurations file in a single action. You can find your Hive configuration file from "/etc/hive/conf.dist/hive-site.xml" (default location). Copy that file and put it inside your workflow application path in HDFS. Here is the list of files that I have in my Oozie Hive action's workflow application folder.
And here is my workflow.xml file:
Look at the <job-xml> tag, since I'm putting hive-site.xml in my application path, so I'm just passing the file name not the whole location. If you want to keep that file in some other location of your HDFS, then you can pass the whole HDFS path there too. In older version of Hive, user had to provide the hive-default.xml file by using property key oozie.hive.defaults while running Oozie Hive action, but from now on (Hive 0.8+) it's not required anymore.

Here I'm using another tag <param>, which is not required but I'm using it just to show how to pass parameter among hive script, job properties and workflow. If you are using any parameter variable inside your hive script, it needs to pass through the hive action. So you can do, either:
  • <param>INPUT_PATH=${inputPath}</param> (where inputPath can be passed through job properties) , Or
  • <param>INPUT_PATH=/user/ambari-qa/input/temp</param>

Inside my HDFS, "/hive-input/temp" folder contains files which need to be loaded to Hive table:
And here is my job.properties file:
That's it! You can now run your Hive workflow by executing this on the client node:

Two common issues:
You might face some issues if the required jar files are not present inside "/user/oozie/share/lib/hive" folder (HDFS). One of the commons issue is not having the hcatalog* jar files in that folder. In that case you will see something like this in the log:
In that case, you need to manually copy those required jar files into that folder. You can do that by following:

Another common issue you might face is:
SemanticException [Error 10001]: Table not found
Even though you can see your table is exists, you might see this error when running through Oozie. Most of the time it happens when your Hive is not properly pointing to the right metastore. Most of the time, the problem goes away when you copy the correct hive-site.xml into hive lib folder inside HDFS. Make sure you check your hive-site.xml file to see all properties are correctly set. Like,  "hive.metastore.uris", "javax.jdo.option.ConnectionUR", "javax.jdo.option.ConnectionDriverName". But me and other users (Hive action failing in oozie) also found out that the above error message is ambiguous and doesn't give much insight. If the expected jar files are not present in the share lib folder, hive also throws the same error message! So be careful about what you have in the classpath when running hive through Oozie.

Note: For privacy purpose, I had to modify several lines on this post from my original post. So if you find something is not working or facing any issues, please do not hesitate to contact me.

8 comments:

  1. Hi ,
    I want to run multiple create table scripts using hive action - oozie. I am ready with the Job properties , the script and the workflow.xml . AFter validating the xml when I run the oozie command to run the work flow I get the following error :
    4407 [main] ERROR org.apache.hadoop.hive.ql.exec.Task - FAILED: Error in metadata: MetaException(message:java.lang.RuntimeException: commitTransaction was called but openTransactionCalls = 0. This probably indicates that there are unbalanced calls to openTransaction/commitTransaction)
    org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.RuntimeException: commitTransaction was called but openTransactionCalls = 0. This probably indicates that there are unbalanced calls to openTransaction/commitTransaction)
    Please help.

    ReplyDelete
    Replies
    1. Hi..
      Sorry for my late response. Please post/email your job.properties, workflow.xml and other relevant files. I can check your files and get back to you soon. Thank you.

      - Tanzir

      Delete
  2. Asalaamu-Alaikum Tanzir,

    Is there any blog on How to run oozie on YARN cluster. if yes, please provide/guide.

    Regards, Mohammed Niaz

    ReplyDelete
    Replies
    1. WS Niaz,
      I didn't get a chance to write about that but we do run Oozie on YARN everyday. It's almost similar. So what are the issues you are facing? Feel free to send me an email or post here your error log.

      Delete
  3. HI,
    I have tried to create my oozie workflow aswell. I'm currently using Hive actions but I am experiencing error on the Oozie job. it always show the ff:
    <<< Invocation of Hive command completed <<<

    Hadoop Job IDs executed by Hive:

    Intercepting System.exit(12)

    <<< Invocation of Main class completed <<<

    Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [12]

    Oozie Launcher failed, finishing Hadoop job gracefully



    Please help! :(

    ReplyDelete
  4. here is the errror i am getting when i try run oozie wf for simple hive action
    please help me out here.
    [IllegalArgumentException: Wrong FS: hdfs://nameservice1/user/oozie/share/lib/lib_20151217111633/hive/ST4-4.0.4.jar, expected: hdfs://dbplidenn01.us.dnb.com:8020]
    org.apache.oozie.action.ActionExecutorException: IllegalArgumentException: Wrong FS: hdfs://nameservice1/user/oozie/share/lib/lib_20151217111633/hive/ST4-4.0.4.jar, expected: hdfs://dbplidenn01.us.dnb.com:8020
    at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:401)

    ReplyDelete
  5. here is the error I am getting when i run the Oozie work flow for simple hive action
    [IllegalArgumentException: Wrong FS: hdfs://nameservice1/user/oozie/share/lib/lib_20151217111633/hive/ST4-4.0.4.jar, expected: hdfs://dbplidenn01.us.dnb.com:8020]
    org.apache.oozie.action.ActionExecutorException: IllegalArgumentException: Wrong FS: hdfs://nameservice1/user/oozie/share/lib/lib_20151217111633/hive/ST4-4.0.4.jar, expected: hdfs://dbplidenn01.us.dnb.com:8020
    at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:401)

    ReplyDelete