Oozie is a workflow scheduler system to manage primarily Apache Hadoop jobs. Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions. Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availability. Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Spark, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts). Oozie is a scalable, reliable and extensible system.
In this blog, we are going to learn about the data transfer between your On Prem Hadoop Cluster and Amazon S3 and further learn how to do advanced data analytics in Databricks Cloud and integrate the data pipelines using Apache Oozie with an End to End Integration.