Thursday, March 21, 2013

A basic Oozie coordinator job

Suppose you want to run your workflow in every two hours or once per day, at that point coordinator job comes out very handy. There are several more use cases where you can use Oozie coordinator. Today I'm just showing you how to write a very basic Oozie coordinator job.

I'm assuming that you are already familiar with Oozie and have an workflow ready to be used as coodinator job. For this tutorial, my Oozie workflow is a shell-based action workflow. I want to execute a shell script in every two hours starting from today to next 10 days. My workflow.xml is already inside the a HDFS directory.
Without the coordinator, I'm currently running it like this:
Here is my job.properties file:
Now I want to run this workflow with coordinator. Oozie Coordinator Engine is responsible for the coordinator job and the input of the engine is a Coordinator App. At least two files are required for each Coordinator App:
  1. coordinator.xml - Definition of coordinator job is defined in this file. Based on what(time based or input based) your workflow will trigger, how long it will continue, workflow wait time - all of this information need to be written on this coordinator.xml file.
  2. coordinator.properties - Contain properties for coordinator job, behaves same as job.properfiles file.
Based on my requirement, here is my coordinator.xml file:
As I need to pass coordinator.properties file for a coordinator job, I cannot pass previous job.properties file at the same time. That's why I need to move all properties from the job.properties file to coordinator.properties file. Remember one thing, coordinator.properties file must have a property which specifics the location of coordinator.xml file (similar to oozie.wf.application.path in job.properties). After moving those properties my coordinator.properties file became:

As you noticed I mentioned application path oozie.coord.application.path and that path contains the cooridnator.xml file.
Now I'm pretty much set. Now if I execute a coordinator job now it will execute the coordinator app located in the coordinator application path. Coordinator app has a tag <workflow><app-path>.... </app-path></workflow> which specifics the actual workflow location. At that location, I have my workflow.xml file. So that workflow.xml will be triggered based on  how I define the job in coordinator.xml file.

I'm submitting my coordinator job by:
If you are running your coordinator job successfully, I highly recommend you to go through this document and try out some other use cases and alternatives.


Note: For privacy purpose, I had to modify several lines on this post from my original post. So if you find something is not working or facing any issues, please do not hesitate to contact me.

4 comments:

  1. Hi Tanzir,
    very nice article...
    we r trying to download twitter data using flume by a shell script...can we schedule the shell script using oozie? can we set start time and end time for script without frequency ?

    ReplyDelete
  2. Hi Vinita,
    Thank you for stopping by. Yes, you can definitely schedule the shell script using Oozie. If you check this post:

    http://www.tanzirmusabbir.com/2013/05/chunk-data-import-incremental-import-in.html

    You will see that I'm executing a shell script through Oozie.

    Oozie coordinator job is meant for scheduling Oozie workflow, it can be either time based or event based. So if you do not want repetition, you can use a frequency where it crosses end time, in that case it will execute only one time.

    Example:
    start time: 1:00 PM
    end time: 3:00 PM
    frequency: 130
    So, any minutes > 120 minutes (2hrs) will cross the end time, so the 2nd job will not be executed.

    "frequency" is required if you use start/end time as per of the Oozie schema:

    https://oozie.apache.org/docs/3.1.3-incubating/CoordinatorFunctionalSpec.html#Oozie_Coordinator_Schema_0.2

    Hope it helps. Let me know if you have questions.

    Thank you.

    ReplyDelete
    Replies
    1. Hi Tanzir
      Apologies for the late reply,
      Thanks a lot.....

      Delete
  3. In the fast paced world of Internet surfing, most job seekers will only take the time to view the top 20 search results.guarantor

    ReplyDelete