неділя, 16 вересня 2012 р.

hadoop jobs waterflow

hi all, sometimes we can solve out task just with one map-reduce job. However we have to create a flow from jobs to solve the most of the real-world tasks. There is simple example, how you can create a flow of tasks. Let's examine the next scheme:
So, we age going to start two M/R jobs at the same time, then cumulate result in third job and pass to 4st. Awesome, let's do it I'll use Google Guava to deals with java collections args - is arguments for job driver, feel free to use Options or whatever
  1 JobControl flow = new JobControl("MR flow");
  2
  3 // we don't have dependencies yet, but API required a list of it
  4 List<ControlledJob> emptyList = Lists.newArrayList();
  5 // and we'll pass this list as dependencies to job C
  6 List<ControlledJob> dependencies = Lists.newArrayList();
  7
  8 ControlledJob aJob = new ControlledJob(getAJob(args), emptyList);
  9 flow.addJob( adJob );
 10 dependencies.add( aJob );
 11
 12 ControlledJob bJob = new ControlledJob(getBJob(args), emptyList);
 13 flow.addJob( bdJob );
 14 dependencies.add( bJob );
 15
 16 ControlledJob cJob = new ControlledJob(getCJob(args), dependencies);
 17 flow.addJob( cJob );
 18
 19 ControlledJob dJob = new ControlledJob(getDJob(args), Lists.newArrayList(cJob));
 20 flow.addJob( dJob );
 21
 22 // hurray! start waterflow!
 23 flow.run();
So, last question to disscusion: what is getXJob misterious methid? In this method we just return configured Job as instance of your Job class. However, any other solution is avaliable hey. Enjoy!

Немає коментарів:

Дописати коментар