YARN

YARN

is resource manager for hadoop.

  • It's a framework to provide computational resources for execution engine.

    • Here computational resources means -> CPU, Memory, Disk I/O, Network Bandwidth

    • execution engine example - Spark, Map Reduce, Storm, Solr, Tez

Component of YARN, Below are 2 daemon running on cluster.

  • Resource Manager - One per Cluster - master service

    • deployed on high availability configuration.

  • Node Manager - One per data Node - Slave service

    • Node manager is responsible to launching and monitoring container. - Container is linux control groups that limits / isolates the resource usage (CPU, memory, disk I/O, network, etc.) to process.

job is submitted -> resource manager -> resource manager finds one node manager and ask to start one container -> its first container and called Application Master -> Now Application Master takes responsibility of executing job, and behave differently for different executing framework -> (next steps for Map reduce) -> Now Application Master will ask Resource Manager for more containers so it can start map and reduce tasks -> Once container are located, Application Master ask Node Manager to launch containers and execute the task -> Task directly report its status to Application Master -> Once all Tasks are completeed, All containers including Application Master perform necessary clean-up and terminate.

Spark on YARN

Last updated