Kostiantyn's blog: 2012

середа, 26 грудня 2012 р.

surprise in Hadoop log

When I started working with Hadoop I was confused by next message in logs:

DEBUG conf.Configuration: java.io.IOException: config(config) 
    at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:225) 
    at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:183)

It was Hadoop 1.0.3 and I didn't understand "what am I doing wrong?". It was just the newest Hadoop and I didn't find more information in Google, So, I was need to check source code... surprise! Look at it (line 4!):

public Configuration(boolean loadDefaults) {
    this.loadDefaults = loadDefaults;
    if (LOG.isDebugEnabled()) {
      LOG.debug(StringUtils.stringifyException(new IOException("config()")));
    }
    synchronized(Configuration.class) {
      REGISTRY.put(this, null);
    }
    this.storeResource = false;
  }

I can't believe they always log exception... strange way to get stack trace? maybe...

вівторок, 18 грудня 2012 р.

In memorize: REGEXP

The best online java regexp tester http://www.regexplanet.com/advanced/java/index.html

середа, 12 грудня 2012 р.

Raspberry Pi cluster for hadoop?

I just thought "Raspberry Pi cluster for hadoop?" Is it possible? does it makes any sense?

Let's think... Hadoop uses hard-drive very-very intensive. Memory.. it's good too have enough memory, but it doesn't critical; I believe 512 MB will be enough. CPU... depends on your code, but usual it's not critical point for map-reduce in general

So, with Raspberry Pi you get (just for $35!):

RAM 512 MB
CPU ARM11 700 MHz
SD with Linux 4-16 GB (you will need to buy it separately)

Some time ago there was the nice article about Paspbery Pi supercomputer: 64 Raspberry Pi computers were connected into the one cluster (via Ethernet); each has 16 GB SD card and it means 1 TB storage for whle cluster (!), and costs about $4000
One concern: access speed to SD card. It isn't good enough and you will need to buy external SSD hard-drive. I assume each Raspberry Pi has to have own SSD (32-64 GB must be enough). So, this solution will be a more expensive that $4000, but cheapest than whole PC or cloud instances.

Let's try to calculate: 64 Raspberry Pi * $35 = 2240, SSD 64 GB * 64 = 4TB costs $4500, whole solution will cost $6500-$7000 for 64 physical node:)

So, does is make sense to build hadoop-oriented cluster? I believe so, what do you think?
At least, it will be a great experiment!

PS. Maybe someone wants to donate money for this experiment? kickstarter sounds resonable here

середа, 21 листопада 2012 р.

Project management: communications in distributed team

During my working experience I had working on several projects in distributed team or I had just communication with customer. The first type of collaboration is typical for out-staffing and some kind of product companies (do you remember "Rework"?), the second type is typical for outsourcing companies and some kind of product companies.

As you know the one of the most important factors in distributed team is communication. Communication is really important for team in general, and the main challenge of communications in distributed teams is a question: how to organize high-quality communication inside distributed team? The friendship in a team is important, how can we reach it in a distributed team?

Fix microphone problem in Ubuntu

I'm using Ubuntu/Kubuntu on a Dell laptop, and it doesn't seem to recognize my headset's microphone. What can I do?

It was the question that was worried me for a last week. Actually, my microphone stopped working suddenly on the previous weekends. And my microphone didn't work in Skype nor Gtalk, it was awful!
I was looking for solution for a several hours and found it just a seconds ago! Hopefully it was described here http://preprocess.me/skype-microphone-does-not-work-on-ubuntu-heres-a-fix for Ubuntu (Ideal works under Kubuntu too)

Just main steps (to me in the future):

Run alsamixer
Go to Capture devices
Select capture device (I had two) and highlight it; then pres space to enable - you capture device must be highlighted in Red "Capture" label

понеділок, 12 листопада 2012 р.

Helping Java build friendship with Common Lisp

When I was 3rd year student I had functional programming and AI classes where we studied Common Lisp programming language. At the end of AI course I had coursework: write program to calculate differential expression, something like f(x) = x*x then f'(x)=2*x. It was pretty simple to do in Lisp...

As business layer was implemented in Lisp I decided to create user interface in Java. I had to use Swing, because JavaFX didn't exist at this time. So, I guess I started hate Swing from this time...
But here I'd like to show how you can execute Common Lisp from Java (from my personal experience with code sample and some other options). So, how to use Lisp from Java?

Push existing source code to GitHub

There are a lot of advice about that, but the most of them don't work. I found one that really works:

Create the remote repository, and get the URL such asgit://github.com/youruser/somename.git

If your local GIT repo is already set up, skips steps 2 and 3
Locally, at the root directory of your source, git init
Locally, add and commit what you want in your initial repo (for everything, git add . git commit -m 'initial commit comment')
to attach your remote repo with the name 'origin' (like cloning would do)
git remote add origin [URL From Step 1]
to push up your master branch (change master to something else for a different branch):
git push origin master

Origin: http://stackoverflow.com/questions/4658606/import-existing-source-code-to-github

понеділок, 5 листопада 2012 р.

Build fast distributed cache with GoogleGuava

There is a set of application that required distributed cache to work. The classic example is a several front-end servers that share one distributed cache or database engine. And usually you have to sent request each time when you want to get data.
Depends on your data, the most of request can be used by small part of keys. I mean, for example you have 1M diferent key-value pairs, but you processed usually only 1,000 of them. So, 1% of data makes 99% of requests by keys as on the image:

What u must know about EclipseLink ORM

A year ago I worked with EclipseLink ORM on the project where we're working with Oracle DBMS. I had a strong experience with Hibernate before that.
So, I want to share my experience and compare EclipseLink and Hibernate. Comparison EclipseLink and Hibernate as ORM framweworks for Oracle
Well, I don't like ORM in general, but it often required by management or so on... and in this case, if you have vendor lock in, and this vendor is Oracle, you have to think about EclipseLink as a perfect alternative to Hibernate.

Let's start
EclipseLinks is beter because... It's choosend by Sun Microsystems as standard JPA 2 implementation. It's a better support, bugs are fixed much more quickly; let's look at statistic (as for July 26, 2011):

Name	Number of bugs	Number of blockers	The oldest blocker
Hibernate	156	49	29/Sep/2006
EclipseLink	15	3	04/Jun/2010

It's Oracle-oriented additions which are incredible useful if you create Oracle oriented s/w and want to have all power of oracle in your orm.
The most important Oracle features in EclipseLink are:
1) Built-in Oracle hints support
Hints are important for sql optimization in Oracle and with EclipseLink you can add them through API. Also, EclipseLink will be automatically insert Hints if needs and their usage is clear for orm

2) Hierarchical Queries Support
PL/SQL has a useful featurs as Hierarchical Queries which can help you a lot if you work with hierarchical structures intensive. So, you don't need to write native sql anymore, just use API

3) Oracle Flashback Transaction Query
Historical queries during transaction available just from ORM

4) Stored Procedures/Functions Support
Well, Hibernate/JPA also have it. However, in EclipseLink you can use another way (JPA way is still available) with StoredFunctionCall class and related API. In others words, you can build stored procedure/function dynamical invocation. I guess, it is the easiest way to call SP from your java code

Next cool feature is Report Queries that means you can build queries with "reporting" possibilities (AVG, SUM, ROLL UP) with API instead of writing native SQL.

Summarize topic, I can recommend EclipseLink as a perfect ORM for Oracle lock-in project, because a lot of features will be available in this case. However you must keep in mind Hibernate has a lot of own features (versioning, shards, search, etc) and you must to think about your project requirements and pick up the most convenience tool.

неділя, 28 жовтня 2012 р.

H2 vs HSQLDB

There is usual to develop application works with database, often RDBMS. And if you are good developer, you write Unit Tests. As result, you have to use lightweight rdbms for testing purposes, like H2 or HSQLDB.
So, you can ask yourself what is the right choose? What should you pick up for your project? What is the main difference between hsqldb, h2 and derby?
In the article I'd like to compare H2 and hsqldb and pick up one of them for testing purposes. If you don't have willing to read full article - h2 win.

So, let's start. H2 vs HSQLDB

1) H2 has compatibility mode with Oracle, DB2 and MS SQL as well as with hsqldb, derby, mysql and so on; otherwise it seems poor quality of support, for example I'm not sure, that h2 also support hierarchy queries as Oracle do (needs to check); and it defently doesn't have materialized views. So, this compatible mode is very virtual and limited
2) H2 claims to be faster that hsqldb
3) Hibernate support for hsqldb is more stable, in this point HSQLDB wins, and if you prefer to use Hibernate, you probably must to get hsqldb either h2; also, I have experience of using EclipseLink with h2 and any problems were here
4) as for me, H2 is more easy to use than hsqldb
5) both have views, subqueries, triggers, user defined functions, scrollable result set
6) Indexes available in H2: hash index and full text search
7) h2 is the next project hsdldb's creator :)

Licensing (if you care about it): h2 has mozilla public license and hsqldb has BSD

Also, I've seen XML format support in H2 on several sites, but I've not found detail description or prove link that it's real exist feature

As you understand, it's very simple to switch between these db, usual you have to change connection string if any special functionality is used

Full list of the features of H2 is available here http://www.h2database.com/html/features.html

Also, you can compare both engines and make personal decision:
H2 http://database-management-systems.findthebest.com/l/16/H2
HSQLDB http://database-management-systems.findthebest.com/l/15/HSQLDB

неділя, 16 вересня 2012 р.

hadoop jobs waterflow

hi all, sometimes we can solve out task just with one map-reduce job. However we have to create a flow from jobs to solve the most of the real-world tasks. There is simple example, how you can create a flow of tasks. Let's examine the next scheme:

So, we age going to start two M/R jobs at the same time, then cumulate result in third job and pass to 4st. Awesome, let's do it I'll use Google Guava to deals with java collections args - is arguments for job driver, feel free to use Options or whatever

  1 JobControl flow = new JobControl("MR flow"); 
  2  
  3 // we don't have dependencies yet, but API required a list of it 
  4 List<ControlledJob> emptyList = Lists.newArrayList(); 
  5 // and we'll pass this list as dependencies to job C 
  6 List<ControlledJob> dependencies = Lists.newArrayList(); 
  7  
  8 ControlledJob aJob = new ControlledJob(getAJob(args), emptyList); 
  9 flow.addJob( adJob ); 
 10 dependencies.add( aJob ); 
 11  
 12 ControlledJob bJob = new ControlledJob(getBJob(args), emptyList); 
 13 flow.addJob( bdJob ); 
 14 dependencies.add( bJob ); 
 15  
 16 ControlledJob cJob = new ControlledJob(getCJob(args), dependencies); 
 17 flow.addJob( cJob ); 
 18  
 19 ControlledJob dJob = new ControlledJob(getDJob(args), Lists.newArrayList(cJob)); 
 20 flow.addJob( dJob ); 
 21  
 22 // hurray! start waterflow! 
 23 flow.run();

So, last question to disscusion: what is getXJob misterious methid? In this method we just return configured Job as instance of your Job class. However, any other solution is avaliable hey. Enjoy!

perfect blog about java ee

hi guys, if you have a deals with java ee stuff, soa and other parts of hell... I like to intoduce samolisov.blogspot.com blog (in Russina) There are a lot of useful and up-to-date information about enterprise defelopment and integrations. I believe it will be useful to read for every java ee developer

понеділок, 16 липня 2012 р.

Oracle ADF

Guys, I've been completely disappointed in Oracle Java Middleware. Awkward WebLogic on one hand, and awful ADF on other. Let's start from WL. I've fall in trouble with simple using XPath, I wasn't able to use Xerces or any other. So, I'he used standard WL implementation from weblogic.* package. OMG! My code was working when it was Library of Sun JVM, but on WL it crashed all time. Crazy! ADF... what can i say. The worst framework in the world. The slowest software development. Incredible slow production. No good documentation, no books, no examples, a lot of guys from India. Never ever ever ever use this framework. Don't believe me? Check up this article and comments

четвер, 21 червня 2012 р.

node.js impression

Well, I had some deals with node.js in last weeks. That's nice. Node.js is high-performance web-server, you application is in JavaScript and runs on back-end. The most of developers, specially web-developers, knows JavaScript. Well, I agree that usually people know jQuery or only base knowledge in JavaScript. However, it's very easy and powerful language. Unfortunately, node.js is still bad documented. That sad to hear, but this is bad documented area. So, please, pay your attention on it when choose node.js. Ok, go further. JavaScript is nice language to parse pages (or json and so on - so node.js can be a good choose for high-performance REST service). I was happy to work with this technology, but it has some problems... For example, error on page means error in your server code - it can be a problem. The next one, is "infinity pages" - I guess, you know what I mean. so, surprise here! The easiest way to work with "infinity page" is Selenium WebDriver:) I guess, if you need to use not high-performance parser, you have to look at this testing tool

пʼятниця, 20 квітня 2012 р.

вибір джави під лінуксом: sudo update-alternatives --config java

понеділок, 16 квітня 2012 р.

Jersey as yout REST impl in standalone app

Just today morning I wrote a simple standalone application which works with REST as a server (and client). Well, server-client application via REST. In general, the idea is pretty simple, but I was disappointed when got next exception: A message body reader for Java class java.lang.String, and Java type class java.lang.String, and MIME media type text/plain was not found. wtf? I googled this problem and found there are problems with packing META-INF/services/*. Not sure, but it seems happens when you use Maven to assemble your Jersey application. Fortunately, the solution is very simple and fast. You have to add next plugin to your pom.xml ServicesResourceTransformer For now, all works:)

середа, 14 березня 2012 р.

Execute commands in parallel via SSH

When you need to manage one web-server (or AS) there isn't problems. because you make all operations just one time. But what about next situation: you have several servers and need to do the same operations several times? Of course, you can repeat your set of commands several times. However there is much better solution - Capistrano. In fact, it was developed in Ruby and for Ruby projects need. Don't worry, you can use it not for only Ruby applications:)
Capistrano uses special DSL for specifying commands.
Yeap, it's perfect solution for executing commands in parallel

вівторок, 6 березня 2012 р.

how to add logger to class in one click

Hi! Just a simple genius way to add logger to each java class in one click (for Eclipse)
There is Eclipse template, to add this one go to Windows — Preferences — Java — Editor — Templates — New
And text of template:
${:import(org.slf4j.Logger, org.slf4j.LoggerFactory)}
private static final Logger log = LoggerFactory.getLogger(${primary_type_name}.class);

середа, 15 лютого 2012 р.

патерни українською

книга з патернів проектування від львівського розробника
http://designpatterns.andriybuday.com/

усі патерни описано дотепно та весело, читати можна (і навітьрекомендується) не зважаючи на приклади мовою С# (і відповідне неканонічне форматування коду)

вівторок, 14 лютого 2012 р.

calc median in mysql

Aloha! As you know, MySQL is a very poor rdbms, but cheap. There aren't a lot of build in functions, special for statistic processing...
But imagine, you need to calculate median. It seems very easy, isn't it? Well, I have bad news for you - Mysql doesn't have this aggregation function.
I've googled it and found several solutions, but all of them have a problems with performance and ugly plan... So, I created own stored procedure (inspired by comment to http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html)

CREATE PROCEDURE calcMedian(IN tbl CHAR(64), IN col CHAR(64))
BEGIN
DECLARE counter INT;
SET @counter:= 0;
SET @s1 = CONCAT('select floor(count(',col,')/2) into @counter from ',tbl);
prepare rowsCounterStmt from @s1;
execute rowsCounterStmt;
SET @s2 = CONCAT('select ',col,' from ',tbl,' order by ',col,' asc limit ',@counter,', 1' );
PREPARE stmt FROM @s2;
EXECUTE stmt;
END

Another way: you can use User defined function as explained here http://mysql-udf.sourceforge.net/. In fact, it's much more powerful solution, but it's not suitable for all of us

пʼятниця, 10 лютого 2012 р.

funny trap : JUnit vs TestNG

I've found a funny trap just several minutes ago:) If you have strong background in JUnit and then you start coding in TestNG, you can fall in trouble with asserts.
In JUnit we have: assert(expected, actual)
In TestNG: assert(actual, expected)
So, be careful!

понеділок, 16 січня 2012 р.

use runtime data in soapUI

If you are using SoapUI for web-service testing, you can want to set up some data in the request in runtime (for example, for testing purposes).
The easiet way to set up date time in runtime is:
Time_is_${=new java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm").format(java.util.Calendar.getInstance().getTime())}

Also, you can get values from system setings or from special property file:
${System.getProperty("some.property")}

середа, 26 грудня 2012 р.

вівторок, 18 грудня 2012 р.

середа, 12 грудня 2012 р.

середа, 21 листопада 2012 р.

субота, 17 листопада 2012 р.

понеділок, 12 листопада 2012 р.

неділя, 11 листопада 2012 р.

понеділок, 5 листопада 2012 р.

пʼятниця, 2 листопада 2012 р.

неділя, 28 жовтня 2012 р.

неділя, 16 вересня 2012 р.

понеділок, 16 липня 2012 р.

четвер, 21 червня 2012 р.

пʼятниця, 20 квітня 2012 р.

понеділок, 16 квітня 2012 р.

середа, 14 березня 2012 р.

вівторок, 6 березня 2012 р.

середа, 15 лютого 2012 р.

вівторок, 14 лютого 2012 р.

пʼятниця, 10 лютого 2012 р.

понеділок, 16 січня 2012 р.