четвер, 25 липня 2013 р.

Market basket analysis with R

Affinity analysis is a data analysis and data mining technique that discovers co-occurrence relationships <...>. In retail, affinity analysis is used to perform market basket analysis, in which retailers seek to understand the purchase behavior of customers. This information can then be used for purposes of cross-selling and up-selling, in addition to influencing sales promotions, loyalty programs, store design, and discount plans. [from Wikipedia]
In other words, you want to find all items from your sails that are sold together, for example: people usually buy chips with beer. There are several algorithms and one of them is Apriori algorithm which is available in R and implemented in 'arules' package.

вівторок, 23 липня 2013 р.

Build and run Flume agent to work w/ Twitter Streaming API



Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications to Apache Hadoop's HDFS. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic applications (source)

Date flow model is perfect described at the official documentation and contains 3 components:
source gets data from external system and delivers them into flume, channel which transport data from source to sink (think about channel as about queue; also, this queue makes possibility to async source and sink execution) and sink (destination). All of them is called agent and agents can be grouped to build complex and fail-over flow.