неділю, 16 вересня 2012 р.

hadoop jobs waterflow

hi all, sometimes we can solve out task just with one map-reduce job. However we have to create a flow from jobs to solve the most of the real-world tasks. There is simple example, how you can create a flow of tasks. Let's examine the next scheme:
So, we age going to start two M/R jobs at the same time, then cumulate result in third job and pass to 4st. Awesome, let's do it I'll use Google Guava to deals with java collections args - is arguments for job driver, feel free to use Options or whatever
  1 JobControl flow = new JobControl("MR flow");
  2
  3 // we don't have dependencies yet, but API required a list of it
  4 List<ControlledJob> emptyList = Lists.newArrayList();
  5 // and we'll pass this list as dependencies to job C
  6 List<ControlledJob> dependencies = Lists.newArrayList();
  7
  8 ControlledJob aJob = new ControlledJob(getAJob(args), emptyList);
  9 flow.addJob( adJob );
 10 dependencies.add( aJob );
 11
 12 ControlledJob bJob = new ControlledJob(getBJob(args), emptyList);
 13 flow.addJob( bdJob );
 14 dependencies.add( bJob );
 15
 16 ControlledJob cJob = new ControlledJob(getCJob(args), dependencies);
 17 flow.addJob( cJob );
 18
 19 ControlledJob dJob = new ControlledJob(getDJob(args), Lists.newArrayList(cJob));
 20 flow.addJob( dJob );
 21
 22 // hurray! start waterflow!
 23 flow.run();
So, last question to disscusion: what is getXJob misterious methid? In this method we just return configured Job as instance of your Job class. However, any other solution is avaliable hey. Enjoy!

perfect blog about java ee

hi guys, if you have a deals with java ee stuff, soa and other parts of hell... I like to intoduce samolisov.blogspot.com blog (in Russina) There are a lot of useful and up-to-date information about enterprise defelopment and integrations. I believe it will be useful to read for every java ee developer