To illustrate main Storm points, let's go though the main points...
So, the idea of application is to scan web-server log (well, be honest this log is created by us but it's very similar to real-world log), push each event into Storm topology, calculate some statistic and update this statistic in database. Well, stupid user-case for Storm, but good enough to show all main concepts.
Let's start from log line, it is something like follow:
SpoutWell, first of all, we have to get this line into Storm and we know that we have to use Spout for this purposes. In example, we using Spout to read file, but this is a very bad practice in real-life! You can have your log file on separate machine, or you can configure different number of spouts and log files or something other. Anyway, it will be better to introduce data from external source into spout by some general flexible solution like MQ or Flume.
The easiest way to write Spout is extending backtype.storm.topology.base.BaseRichSpout.
There are next important method that must be overrode:
declare fields in output tuple, i.e.
BaseRichBolt. Similar yo Spout you will need to implement
method and describe output tuple if they are present (otherwise just left empty implementation).
If you wish, you can implement following method:
Usualy, it uses to get outputCollector instance available from next method (the most important):
Notice, all fields in Bolt must be serializable!