Cloud for Beginners: Oozie Example: Java Action / MapReduce Jobs

Sunday, March 3, 2013

Oozie Example: Java Action / MapReduce Jobs

Running a Java action through Oozie is very easy. But there are some things you need to consider before you run your Java action. In this tutorial, I'm going to execute a very simple Java action. I have a JAR file TestMR.jar which is a MapReduce application. So this application will be executed on the Hadoop cluster as a Map-Reduce job.

TestMR.jar file has a class TestMR.java which has a public static void main method(String args[]) that initiates the whole application. To run a Java action, you need to pass the main Java class name through the tag <main-class>.

This is the workflow.xml file for a Java action with minimum number of parameters:

Your Java action has to be configured with <job-tracker> and <name-node>. And as you know, Hadoop will throw exceptions if the output folder is already exists. That's why I'm using <prepare> tag which will delete the output folder before execution. My jar also takes command line arguments. One of the argument is "-r 6" which means how many reducers I want to use for the MR job. So I'm using "<arg>" tag to pass command line arguments. You can have multiple <arg> for a single Java action. As like other actions, to indicate a "ok" transition, the main Java application needs to be completed without any error. If it throws any exception, the workflow will indicate a "error" transition.

Now comes to the folder structure inside HDFS. When Oozie executes any action, it automatically adds all JAR files and native libraries from the "/lib" folder to its classpath. Here, "/lib" folder is a sub-folder inside Oozie workflow application path. So, if "java-action" is the workflow application path then the structure would be:
- java-action
- java-action/workflow.xml

- java-action/lib

In my HDFS, I have:
And here is my job.properties file:
That's pretty much it! Now you can execute your workflow by:
Remember, this is a very basic and simple workflow to run a Java action through Oozie. You can do a lot more than these by using several other options provided by Oozie. Once you are able to run a simple workflow, I would recommend you to go through Oozie documentation and try some workflows with different settings.

Consideration: Be careful about what you have inside your "/lib" folder. If the version of the library which you are using for your application conflicts with Hadoop's library file's version, it will throw errors and those type of errors are hard to find. To avoid those kind of errors, better to match your library files with the versions you have inside "/usr/lib/[hadoop/hive/hbase/oozie]/lib" folder on your client node.

Note: For privacy purpose, I had to modify several lines on this post from my original post. So if you find something is not working or facing any issues, please do not hesitate to contact me.

11 comments:

UnknownJuly 13, 2015 at 1:42 AM
As a beginner, i really find your blog very useful thanks for sharing.

Hadoop training velachery
ReplyDelete
Replies
ardeshana ankit k.January 19, 2016 at 6:53 AM
i got EJ001 error code with "could not locate sharelib. It is compulsory to use sharelib to run mapreduce job with oozie.
ReplyDelete
Replies
AnonymousMarch 24, 2016 at 1:08 AM

In the site was very excellent,i think this is very best one compare others and then the more information to get after read this post.

HADOOP Training in Chennai
ReplyDelete
Replies
UdayOctober 18, 2016 at 7:29 AM
Hi,

did you created the .jar file and uploaded to HDFS.? If so please let me know how did you created a jar file.
ReplyDelete
Replies
UnknownMarch 11, 2017 at 12:38 AM
Nice post Thanks for sharing it
Hadoop Training in Chennai
ReplyDelete
Replies
srihariparuJune 16, 2017 at 6:38 AM
Thanks for sharing this information article..

Oracle Training Institute in Chennai | No.1 Oracle Training Institute in Chennai
ReplyDelete
Replies
UnknownJuly 6, 2017 at 12:48 AM
The information shared was very much useful my sincere thanks for sharing this post Please continue to share this post
Hadoop Training in Chennai
ReplyDelete
Replies
nancySeptember 23, 2017 at 9:00 AM
This is really an amazing blog with smart and cute content..Thanks for sharing an informative article..
BE Project Center in Chennai | ME Project Center in Chennai | MBA Project Center in Chennai | BBA Project Center in Chennai
ReplyDelete
Replies
UnknownOctober 15, 2017 at 2:23 AM
This is excellent information. It is amazing and wonderful to visit your site.Thanks for sharing this information, this is useful to me…
Java Training Center in Chennai | Best J2EE Training Center in Chennai | No.1 Java Training Institution in Velachery | Core Java Training in Chennai
ReplyDelete
Replies
venushaDecember 7, 2017 at 1:35 AM
Nice post..Thanks for sharing your wonderful information. keep updating..
Android Project Center in Chennai | Android Project Center in Velachery
ReplyDelete
Replies

Add comment