<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:iweb="http://www.apple.com/iweb" version="2.0">
  <channel>
    <title>The adventures of a software architect</title>
    <link>http://www.davidgreco.it/MySite/Blog/Blog.html</link>
    <description>I have the chance to challenge myself with very complex software systems and it’s always for me to choose the right tools, technologies and architectures. I know, it’s really a tough task, at least for me, but this is what I like most. My blog tries to describe this continuous effort to become a good software architect.&lt;br/&gt;Disclaimer:&lt;br/&gt;The views expressed in this blog are solely the personal views of the author and DO NOT represent the views of his employer or any third party.&lt;br/&gt;&lt;br/&gt;</description>
    <generator>iWeb 3.0.1</generator>
    <item>
      <title>My stairway to the cloud</title>
      <link>http://www.davidgreco.it/MySite/Blog/Entries/2010/7/2_My_stairway_to_the_cloud.html</link>
      <guid isPermaLink="false">8e3b0b60-88aa-49b0-80c5-6d77b6d4bc33</guid>
      <pubDate>Fri, 2 Jul 2010 13:49:18 +0200</pubDate>
      <description>In the last couple of weeks, probably influenced by the hype around that word, I decided to learn a little bit more about cloud computing.&lt;br/&gt;I was already a subscriber of the Amazon EC2 service but I never got the chance to play with it, so I decided to give it a try.&lt;br/&gt;My experience with EC2 has been very positive and I had also the chance to play a lot with the &lt;a href=&quot;http://www.eucalyptus.com/&quot;&gt;Eucalyptus&lt;/a&gt; product which is a sort of “clone” of EC2.&lt;br/&gt;Frankly speaking the word “Cloud” is really over exposed, too much marketing, too much business “noise” around, it remembers me when SOA was like the cure for every IT disease, but like SOA, cloud computing is just a paradigm shift, it’s an evolution of something we have already, like SOA is an architectural pattern evolved from the concept of component based architecture.&lt;br/&gt;I think that cloud computing is just an evolution of the concept of virtualization which is not new, the concept of virtual machine is around since the advent of the IBM VM370 architecture in the early 70’.&lt;br/&gt;Modern cloud infrastructures provide very nice and powerful automation infrastructure around some kind of hypervisor/virtualization kernel.&lt;br/&gt;This is the real value of the cloud, easy to use automation mechanisms for provisioning of virtual resources: machines, disks, networks.&lt;br/&gt;The paradigm shift, I think, is in the fact that the cloud move towards the programmers what is used to be in the hands of system administrators.&lt;br/&gt;Using a private/public cloud, you, as a programmer, can put into your development life cycle the management of the infrastructure needed to run your application. As it happen often, if you want to really get the advantages of a technology you have to pay a price. In the case of the cloud, if your application has not been designed to be “elastic”, i.e. easy to scale and deploy, you don’t get the real bonus in using a cloud.&lt;br/&gt;An application made of many different kind of subsystems with complex and rigid relationships among them it doesn’t fit very well with a cloud infrastructure. A rigid application with a dynamic infrastructure cannot leave well together no ?&lt;br/&gt;There are also “social” consequences in the cloud adoption. If you start to use a cloud either private or public, you could also take the ownership of the physical deployment which is used to be in the sphere of influence of the sys admins. I observed some diffidence in the operational people when I introduced a private cloud into my organization, they thought that it was a sort of lost of power, they used to be engaged even for creating ad hoc virtual machines for doing stress tests for example. With a private cloud in your laboratory, the developers can create by themselves everything they need for doing integration and scalability tests.&lt;br/&gt;Fortunately, the system and network administrators in my company are smart open to new things, so they realized that the cloud can free them from boring provisioning activities, giving them extra time in looking after the “iron”, the actual hardware, the storage, the network elements, etc.&lt;br/&gt;Off course, the cloud allows a better utilization of the hardware, so with the same hardware you can do more. Combining a private cloud with a public one, you could absorb the load peaks just temporary provisioning on the public cloud, in this way you are note necessarily forced to buy all the hardware needed by the foreseen peak.&lt;br/&gt;So, behind the hype I really think that cloud computing it’s another nice tool for us, as software architects, for designing applications easy to scale and to adapt themselves automatically to the load changes.&lt;br/&gt;As a side note, I wanted to learn a little bit more the language Scala, so I decided to define a DSL (Domain Specific Language) for quickly provisioning software infrastructure on top of an EC2 based cloud either Amazon or Eucalyptus, you can find the project &lt;a href=&quot;http://github.com/dgreco/stairway2cloud&quot;&gt;here&lt;/a&gt;. It’s still a working in progress and nothing more than an experiment, but at least I could learn a bit more of Scala and how to deal with EC2 and Eucalyptus.</description>
    </item>
    <item>
      <title>HPC = High Performance Camel ?</title>
      <link>http://www.davidgreco.it/MySite/Blog/Entries/2010/4/19_HPC_%3D_High_Performance_Camel.html</link>
      <guid isPermaLink="false">498fc508-fabc-4987-ab1d-2c4e908c5352</guid>
      <pubDate>Mon, 19 Apr 2010 11:27:12 +0200</pubDate>
      <description>I did another experiment with my favorite software beast. Even if I’m still convinced that using C/C++ for building low latency/high throughput applications is more effective than using Java I’m also convinced that this should be balanced with the cost of development. So, provided that Java allows a more effective way for developing complex applications, I started to wonder if Camel can be also used as a bus in scenarios where performances and scalability really count. &lt;br/&gt;As in the previous post, I made some experiments integrating Camel with a very fast &lt;a href=&quot;http://www.zeromq.org/&quot;&gt;messaging library&lt;/a&gt; and  the preliminary results are not too bad I reached a throughput of more than 50K messages per second (1KB payload). The next step was to see if Camel can be used for storing messages into an highly scalable NoSQL store like &lt;a href=&quot;http://hadoop.apache.org/hbase/&quot;&gt;HBase&lt;/a&gt;. My first idea was to write a Camel component for reading/writing data into HBase as I did with the &lt;a href=&quot;http://camel.apache.org/hdfs.html&quot;&gt;camel-hdfs&lt;/a&gt; component, since I’m lazy I looked around to see if something similar was already made by someone else. This is what I found:&lt;br/&gt;	1.	 &lt;a href=&quot;http://ghelmling.github.com/meetup.beeno&quot;&gt;http://ghelmling.github.com/meetup.beeno&lt;/a&gt;&lt;br/&gt;	2.	 &lt;a href=&quot;http://www.datanucleus.org/plugins/store.hbase.html&quot;&gt;http://www.datanucleus.org/plugins/store.hbase.html&lt;/a&gt;&lt;br/&gt;I decided to investigate a little bit more with the second option. In fact the Datanucleus guys provide us with a JPA/JDO tool with a pluggable storage mechanism where one of the different storages can be HBase. &lt;br/&gt;Using Datanucleus you can just annotate a class using either the &lt;a href=&quot;http://java.sun.com/javaee/technologies/persistence.jsp&quot;&gt;JPA&lt;/a&gt; or &lt;a href=&quot;http://java.sun.com/jdo/&quot;&gt;JDO&lt;/a&gt; standard annotations and voila that class can be automatically persisted into HBase. Additionally, Camel provides a &lt;a href=&quot;http://camel.apache.org/jpa.html&quot;&gt;camel-jpa&lt;/a&gt; component  which allows to to store and retrieve Java objects from persistent storage using JPA. Combining camel-jpa and the Datanucleus Hbase plugin I have a camel based mechanism for streaming messages into an HBase store and this is what I needed. &lt;br/&gt;So, I put together a simple example that shows how to stream events to a Camel endpoint, using zeromq and to store them into an HBase store.&lt;br/&gt;It seems to me that Camel is enough lightweight and fast to be used as a bus for glowing together technologies for the realization of high performant/available architectures. In any case, I think that it could be useful to investigate and to see if it’s possible to increase the overall throughput of Camel this could make it a very nice tool even in contexts where the throughput is the most fundamental requirement. &lt;br/&gt;You can find the complete maven based project &lt;a href=&quot;http://github.com/dgreco/camel-hbase-zeromq-example&quot;&gt;here&lt;/a&gt;.&lt;br/&gt;</description>
    </item>
    <item>
      <title>A quick Camel ride</title>
      <link>http://www.davidgreco.it/MySite/Blog/Entries/2010/3/27_A_quick_Camel_ride.html</link>
      <guid isPermaLink="false">262f5416-62dd-4d59-9675-459487ec9688</guid>
      <pubDate>Sat, 27 Mar 2010 14:00:25 +0100</pubDate>
      <description>I wanted to do a quick experiment with Camel. My idea was to find a way for quickly integrating Camel with any native transport library.&lt;br/&gt;As an example I took the &lt;a href=&quot;http://www.zeromq.org/&quot;&gt;zeromq library&lt;/a&gt; and I wrote a Camel component which supports that library for sending and receiving byte buffers.&lt;br/&gt;For developing this component I needed to find a way for making portable the integration of a pure Java based framework like Camel with C/C++ based code. Fortunately there are nice tools that allow developing portable JNI C/C++ code able to be compiled seamlessly under different platforms. The result is a maven based project that could be integrated into the Camel building process even if it needs to build native libraries.&lt;br/&gt;It works under Mac OSX and Linux and adding the support for other OSes shouldn’t be so difficult.&lt;br/&gt;The tools I used are:&lt;br/&gt;	1.	 &lt;a href=&quot;http://www.cmake.org/&quot;&gt;CMAKE&lt;/a&gt;: it’s a powerful building files generator.&lt;br/&gt;	2.	 &lt;a href=&quot;http://www.swig.org/&quot;&gt;SWIG&lt;/a&gt;: it generates all the JNI plumping for integrating existing C/C++ code with Java.&lt;br/&gt;	3.	 &lt;a href=&quot;http://www.boost.org/&quot;&gt;BOOST&lt;/a&gt;: it’s one of the most important C++ frameworks. It provides among many different things OS independent APIs for concurrent programming.&lt;br/&gt;You can find the project under &lt;a href=&quot;http://github.com/dgreco/camel-zeromq&quot;&gt;GitHub&lt;/a&gt;.&lt;br/&gt;See you at the next Camel ride.</description>
    </item>
    <item>
      <title>My Camel likes the Elephant</title>
      <link>http://www.davidgreco.it/MySite/Blog/Entries/2009/12/20_My_Camel_likes_the_Elephant.html</link>
      <guid isPermaLink="false">5ad3cd11-e1b4-4023-ab30-06ad3a9fb6d7</guid>
      <pubDate>Sun, 20 Dec 2009 13:00:36 +0100</pubDate>
      <description>I’m still working on the Camel HDFS component and I think it reached a good level of maturity. I added also the consumer part, so it’s possible now to read from an HDFS file. I submitted the patch to the Camel community and it’s pending for the approval and to be integrated into the Camel trunk.&lt;br/&gt;In the meanwhile if you wanna play with the latest version of this component, you can check it out from my &lt;a href=&quot;http://home.davidgreco.it/svn&quot;&gt;subversion repository&lt;/a&gt;:&lt;br/&gt;&lt;br/&gt;svn co &lt;a href=&quot;http://home.davidgreco.it/svn/camel-hdfs/trunk&quot;&gt;http://home.davidgreco.it/svn/camel-hdfs/trunk&lt;/a&gt; camel-hdfs&lt;br/&gt;&lt;br/&gt;Below the documentation of the component.&lt;br/&gt;&lt;br/&gt;HDFS Component:&lt;br/&gt;&lt;br/&gt;URI Format&lt;br/&gt;&lt;br/&gt;hdfs://hostname[:port][/path][?options]&lt;br/&gt;&lt;br/&gt;the path is treated in the following way:&lt;br/&gt;	1)	as a consumer if it’s a file it just read the file, otherwise if it represents a directory it scans all the file under the path satisfying the configured pattern. All the files under that directory must be of the same type.&lt;br/&gt;	2)	as a producer, if at least one split strategy is defined, the path is considered a directory and under that directory the producer creates a different file per split named seg0, seg1, seg2, etc.&lt;br/&gt;&lt;br/&gt;Options&lt;br/&gt;&lt;br/&gt;Name                 Default Value       Description&lt;br/&gt;overwrite             true                       the HDFS file can be overwritten&lt;br/&gt;&lt;br/&gt;bufferSize           4096                     the buffer size used by HDFS &lt;br/&gt;&lt;br/&gt;replication           3                           the HDFS replication factor&lt;br/&gt;&lt;br/&gt;blockSize            64MB                    the size of the HDFS blocks&lt;br/&gt;&lt;br/&gt;fileType               NORMAL_FILE    it can be SEQUENCE_FILE,&lt;br/&gt;                                                         MAP_FILE, ARRAY_FILE, or&lt;br/&gt;                                                         BLOOMMAP_FILE, see Hadoop&lt;br/&gt;&lt;br/&gt;fileSystemType   HDFS                   it can be LOCAL for local filesystem&lt;br/&gt;&lt;br/&gt;keyType              NULL                    the type for the key in case of&lt;br/&gt;                                                         sequence or map files. See below.&lt;br/&gt;&lt;br/&gt;valueType           TEXT                    the type for the key in case of&lt;br/&gt;                                                         sequence or map files. See below.&lt;br/&gt;&lt;br/&gt;splitStrategy                                     A string describing the strategy on &lt;br/&gt;                                                         how to split the file based on different&lt;br/&gt;                                                         criteria. See below.&lt;br/&gt;&lt;br/&gt;openedSuffix        opened                When a file is opened for reading/&lt;br/&gt;                                                         writing the file is renamed with this&lt;br/&gt;                                                         suffix to avoid to read it during the&lt;br/&gt;                                                         writing phase.&lt;br/&gt;&lt;br/&gt;readSuffix             read                    Once the file has been read is&lt;br/&gt;                                                         renamed with this suffix to avoid to&lt;br/&gt;                                                         read it again.&lt;br/&gt;&lt;br/&gt;initialDelay            0                         For the consumer, how much to wait&lt;br/&gt;                                                        before to start scanning the directory.&lt;br/&gt;&lt;br/&gt;delay                     0                         Then interval between the directory&lt;br/&gt;                                                         scans.&lt;br/&gt;&lt;br/&gt;pattern                   *                         The pattern used for scanning the&lt;br/&gt;                                                         directory&lt;br/&gt;&lt;br/&gt;chunkSize            4096                    When reading a normal file, this is split   &lt;br/&gt;                                                         into chunks producing a message per&lt;br/&gt;                                                         chunk       &lt;br/&gt;&lt;br/&gt;The keyType and the valueType can be:&lt;br/&gt;NULL it means that the key or the value is absent&lt;br/&gt;BYTE for writing a byte, the java Byte class is mapped into a BYTE&lt;br/&gt;BYTES for writing a sequence of bytes. It maps the java ByteBuffer class&lt;br/&gt;INT for writing java integer     &lt;br/&gt;FLOAT for writing java float&lt;br/&gt;LONG for writing java long&lt;br/&gt;DOUBLE for writing java double&lt;br/&gt;TEXT for writing java strings&lt;br/&gt;&lt;br/&gt;BYTES is also used with everything else, for example, in Camel a file is sent around as an InputStream, int this case is written in a sequence file or a map file as a sequence of bytes.&lt;br/&gt;&lt;br/&gt;Splitting Strategy&lt;br/&gt;In the current version of Hadoop (0.20.1) opening a file in append mode is disabled since it’s not enough reliable. So, for the moment, it’s only possible to create new files. The Camel HDFS endpoint tries to solve this problem in this way:&lt;br/&gt;	•	 If the split strategy option has been defined, the actual file name will be &amp;lt;file name&gt;0 initially&lt;br/&gt;	•	 Every time a splitting condition is met a new file is created with name &amp;lt;original file name&gt;N where N is 1, 2, 3, etc.&lt;br/&gt;The splitStrategy option  is defined as a string with the following syntax:&lt;br/&gt;splitStrategy=&amp;lt;ST&gt;:&amp;lt;value&gt;,&amp;lt;ST&gt;:&amp;lt;value&gt;,*&lt;br/&gt;&lt;br/&gt;where &amp;lt;ST&gt; can be:&lt;br/&gt;BYTES a new file is created, and the old is closed when the number of written bytes is more than &amp;lt;value&gt;&lt;br/&gt;MESSAGES a new file is created, and the old is closed when the number of written messages is more than  &amp;lt;value&gt;&lt;br/&gt;IDLE a new file is created, and the old is closed when no writing happened in the last &amp;lt;value&gt; milliseconds&lt;br/&gt;&lt;br/&gt;for example:&lt;br/&gt;hdfs://localhost/tmp/simple-file?splitStrategy=IDLE:1000,BYTES:5&lt;br/&gt;it means: a new file is created either when it has been idle for more than 1 second or if more than 5 bytes have been written&lt;br/&gt;&lt;br/&gt;P.S. (8 Jan 2010)&lt;br/&gt;I moved this project to &lt;a href=&quot;http://www.github.com/&quot;&gt;github&lt;/a&gt; &lt;a href=&quot;http://github.com/dgreco/camel-hdfs&quot;&gt;here&lt;/a&gt;. There is also a wiki page under the Apache Camel documentation, &lt;a href=&quot;http://camel.apache.org/hdfs.html&quot;&gt;here&lt;/a&gt;.</description>
    </item>
    <item>
      <title>When a Camel encounters an Elephant</title>
      <link>http://www.davidgreco.it/MySite/Blog/Entries/2009/11/17_When_a_Camel_encounters_an_Elephant.html</link>
      <guid isPermaLink="false">a642e27b-87c9-4972-a3a2-6d7be2c0f1af</guid>
      <pubDate>Tue, 17 Nov 2009 11:59:30 +0100</pubDate>
      <description>I don’t know why, but I’m still pretty linked to my HPC background, even if I spent most of my time working for different kind of enterprises and trying to solve problems on different domains, I always thought that enterprises should try to learn from the HPC community how to solve “hard” business problems. With  the adjective “hard”, I mean difficult in terms of computational power needed for solving the problem.&lt;br/&gt;I’m observing a sort of convergence towards a sort of cross fertilization in terms of technologies involved between the HPC community and the enterprise business community. For example, grid computing and cloud computing are technologies that used to be already popular in the HPC field, but now are getting more and more interest even in the enterprises.&lt;br/&gt;In the HPC, off course the emphasis is more on the solution of numerical problems where the data set involved can be really huge. There are so many interesting problems in this area: oil exploration, weather forecast, molecular dynamics, etc. On the other end enterprises need to process huge amount of data as well, but in this case they usually need to digest huge amount of data in structured/semi-structured form for extracting something useful: analysis of web servers logs, information retrieval/extraction from a massive number of documents, indexing, etc.&lt;br/&gt;In both cases, there is a trend in trying to scale the applications using many inexpensive nodes than using big computers.&lt;br/&gt;Along this line, I started to give a look to a very interesting technology used for scaling the analysis of very huge quantity of data.&lt;br/&gt;This technology is &lt;a href=&quot;http://hadoop.apache.org/&quot;&gt;Hadoop&lt;/a&gt;. Hadoop is an open source implementation of the &lt;a href=&quot;http://labs.google.com/papers/mapreduce.html&quot;&gt;map/reduce algorithm&lt;/a&gt; proposed by Google. I would say that Google didn’t invent anything, they just applied the concept of collective computations (reductions), typically used in parallel computing, to the problem of creating reverse indexes for the web search. I used to use sort of map/reduce algorithm using &lt;a href=&quot;http://www.mpi-forum.org/&quot;&gt;MPI&lt;/a&gt; when I was working in a parallel computing laboratory.&lt;br/&gt;I don’t want to go deep into details regarding the Hadoop ecosystem, on the Apache site it’s plenty of information. What I tried to do is to see if one of my favorites open source tools, &lt;a href=&quot;http://camel.apache.org/&quot;&gt;Camel&lt;/a&gt;, could be useful for making the usage of Hadoop easier. Look at the picture below:&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;A fundamental piece in the Hadoop ecosystem is HDFS (Hadoop File System). From the Apache site: “The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets.”&lt;br/&gt;So, before analyzing the data the first step is to put the data on the HDFS filesystem. HDFS provides APIs and also command line utilities for loading data into HDFS, however I thought it would be nice to take advantage of the multi-transport/multi-protocol nature of Camel for accomplishing the task of filling the Hadoop file system. &lt;br/&gt;In fact, using Camel you could easily gather data for virtually any kind of system using a plethora of many different mechanisms: files, ftp, http, mail, atom, rss, xmpp, etc. So, having an HDFS Camel component you could just define routes that gather data from any kind of sources to be put directly into the Hadoop file system, this is why I wrote a Camel HDFS component. In attachment I put the zip file containing the component, youi can dig around the tests to see how it works. At the end of this post I’ll put some documentation. For the next steps I’m thinking if it makes sense to integrate Camel even with the map/reduce framework. In the meanwhile I think that using Camel only for writing data into an HDFS filesystem could really help.&lt;br/&gt;Very soon I’ll try to submit this work to the official Camel project.&lt;br/&gt;&lt;br/&gt;HDFS Component:&lt;br/&gt;This component provides only a producer, it only allows writing data into the HDFS, reading from HDFS is not implemented yet. This component can be only used with the to clause.&lt;br/&gt;&lt;br/&gt;URI Format&lt;br/&gt;&lt;br/&gt;hdfs://hostname[:port][/path][?options]&lt;br/&gt;&lt;br/&gt;Options&lt;br/&gt;&lt;br/&gt;Name                 Default Value       Description&lt;br/&gt;append               false                      the HDFS file is opened in append&lt;br/&gt;                                                        mode&lt;br/&gt;&lt;br/&gt;overwrite             true                       the HDFS file can be overwritten&lt;br/&gt;&lt;br/&gt;bufferSize           4096                     the buffer size used by HDFS &lt;br/&gt;&lt;br/&gt;replication           3                           the HDFS replication factor&lt;br/&gt;&lt;br/&gt;blockSize            64MB                    the size of the HDFS blocks&lt;br/&gt;&lt;br/&gt;fileType               NORMAL_FILE    it can be SEQUENCE_FILE or  &lt;br/&gt;                                                         MAP_FILE, see Hadoop&lt;br/&gt;&lt;br/&gt;fileSystemType   HDFS                   it can be LOCAL for local filesystem&lt;br/&gt;&lt;br/&gt;keyType              NULL                    the type for the key in case of&lt;br/&gt;                                                         sequence or map files. See below.&lt;br/&gt;&lt;br/&gt;valueType           TEXT                    the type for the key in case of&lt;br/&gt;                                                         sequence or map files. See below.&lt;br/&gt;&lt;br/&gt;splitStrategy                                     A string describing the strategy on &lt;br/&gt;                                                         how to split the file based on different&lt;br/&gt;                                                         criteria. See below&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;The keyType and the valueType can be:&lt;br/&gt;NULL it means that the key or the value is absent&lt;br/&gt;BYTE for writing a byte, the java Byte class is mapped into a BYTE&lt;br/&gt;BYTES for writing a sequence of bytes. It maps the java ByteBuffer class&lt;br/&gt;INT for writing java integer     &lt;br/&gt;FLOAT for writing java float&lt;br/&gt;LONG for writing java long&lt;br/&gt;DOUBLE for writing java double&lt;br/&gt;TEXT for writing java strings&lt;br/&gt;&lt;br/&gt;BYTES is also used with everything else, for example, in Camel a file is sent around as an InputStream, int this case is written in a sequence file or a map file as a sequence of bytes.&lt;br/&gt;&lt;br/&gt;Splitting Strategy&lt;br/&gt;In the current version of Hadoop (0.20.1) opening a file in append mode is disabled since it’s not enough reliable. So, for the moment, it’s only possible to create new files. The Camel HDFS endpoint tries to solve this problem in this way:&lt;br/&gt;	•	 If the split strategy option has been defined, the actual file name will be &amp;lt;file name&gt;0 initially&lt;br/&gt;	•	 Every time a splitting condition is met a new file is created with name &amp;lt;original file name&gt;N where N is 1, 2, 3, etc.&lt;br/&gt;The splitStrategy option  is defined as a string with the following syntax:&lt;br/&gt;splitStrategy=&amp;lt;ST&gt;:&amp;lt;value&gt;,&amp;lt;ST&gt;:&amp;lt;value&gt;,*&lt;br/&gt;&lt;br/&gt;where &amp;lt;ST&gt; can be:&lt;br/&gt;BYTES a new file is created, and the old is closed when the number of written bytes is more than &amp;lt;value&gt;&lt;br/&gt;MESSAGES a new file is created, and the old is closed when the number of written messages is more than  &amp;lt;value&gt;&lt;br/&gt;IDLE a new file is created, and the old is closed when no writing happened in the last &amp;lt;value&gt; milliseconds&lt;br/&gt;&lt;br/&gt;for example:&lt;br/&gt;hdfs://localhost/tmp/simple-file?splitStrategy=IDLE:1000,BYTES:5&lt;br/&gt;it means: a new file is created either when it has been idle for more than 1 second or if more than 5 bytes have been written&lt;br/&gt;</description>
    </item>
    <item>
      <title>Towards the real-time web</title>
      <link>http://www.davidgreco.it/MySite/Blog/Entries/2009/10/24_Towards_the_real-time_web.html</link>
      <guid isPermaLink="false">af9f55a8-1b55-42b8-862f-0246c851d713</guid>
      <pubDate>Sat, 24 Oct 2009 17:18:42 +0200</pubDate>
      <description>In the last couple of months I spent some time investigating different options for real-time pushing of data to the browser. &lt;br/&gt;Till very recently, a common technique for simulating a sort of real-time updating of information on a web based application was through the usage of polling or long-polling. I don’t want to spend time explaining those two mechanisms, it’s plenty of information on the web.&lt;br/&gt;Sometime relying on the browser polling periodically information updates it’s not enough. There are classes of applications were the latency really counts: real-time monitoring, gaming on line, etc.&lt;br/&gt;So, since the browser is becoming more and more a platform providing full computing power (there is a race among the different browsers around the speed of the javascript interpreters), it’s now possible to conceive web based applications able to receive a continuous stream of informations even for latency sensitive applications.&lt;br/&gt;The very first technology I experimented with has been &lt;a href=&quot;http://en.wikipedia.org/wiki/Comet_(programming)&quot;&gt;reverse ajax also known as comet&lt;/a&gt;. In particular, I got very good feedbacks from a comet implementation called &lt;a href=&quot;http://svn.cometd.org/trunk/bayeux/bayeux.html&quot;&gt;Bayeux protocol&lt;/a&gt;. Bayeux is an high level protocol developed to simplify the development of comet based applications, it’s being developed under the &lt;a href=&quot;http://www.cometd.org/&quot;&gt;Dojo foundation umbrella&lt;/a&gt;. It seems a promising technology, it’s simple and it’s already stable, they are very close to the release of the version 1.0. Bayeux got interest also among other software providers, there are other Bayeux implementations both open and closed source: &lt;a href=&quot;https://grizzly.dev.java.net/&quot;&gt;Grizzly&lt;/a&gt;, &lt;a href=&quot;http://www.oracle.com/appserver/weblogic/weblogic-suite.html&quot;&gt;Weblogic&lt;/a&gt;, &lt;a href=&quot;http://tomcat.apache.org/&quot;&gt;Tomcat&lt;/a&gt;.&lt;br/&gt;From the scalability standpoint Bayeux takes advantage of the new advancement in the web container technology. The &lt;a href=&quot;http://www.cometd.org/&quot;&gt;cometd&lt;/a&gt; implementation of Bayeux, the original one, is based on the &lt;a href=&quot;http://docs.codehaus.org/display/JETTY/Continuations&quot;&gt;continuation&lt;/a&gt; mechanism of &lt;a href=&quot;http://www.eclipse.org/jetty/&quot;&gt;jetty &lt;/a&gt;, a very powerful and popular web container. This mechanism is going to be included into the upcoming Servlet 3.0 specifications, so, pretty soon we will have a standard mechanism for developing Java based comet applications. In the context of a web container the &lt;a href=&quot;http://en.wikipedia.org/wiki/Continuation&quot;&gt;continuation&lt;/a&gt; allows to break the one to one correspondence between a connection and a thread. In fact, since a comet application is based on the browser keeping one connection to the server always open, it’s obvious that this it’s not affordable, one thousand clients would mean one thousands active threads. Using jetty and its continuation mechanism you could host even&lt;a href=&quot;http://cometdaily.com/2008/01/07/20000-reasons-that-comet-scales/&quot;&gt; thousands and thousands connections&lt;/a&gt;  just with one single machine.&lt;br/&gt;After played with Bayeux I started to look around for alternatives, this is what I discovered:&lt;br/&gt;	1.	 If you like Flash/Flex, Adobe provides &lt;a href=&quot;http://opensource.adobe.com/wiki/display/blazeds/BlazeDS/&quot;&gt;BlazeDS&lt;/a&gt;, a server-based Java remoting and web messaging technology that enables developers to easily connect to back-end distributed data and push data in real-time. There are also other open source implementation of the protocols as defined by BlazeDS, just google a little bit and you’ll find what you are looking for.&lt;br/&gt;	2.	 &lt;a href=&quot;http://xmpp.org/&quot;&gt;XMPP&lt;/a&gt;, this is a popular presence and real-time communication protocol. XMPP can be also used for pushing data to the browser, this can be achieved using Flash through XMPP Flash client APIs freely available. It’s also possible to use http through the &lt;a href=&quot;http://xmpp.org/extensions/xep-0124.html&quot;&gt;BOSH&lt;/a&gt; XMPP extension. Using BOSH a browser can be connected just using the HTTP protocol. There are Javascript APIs for developing BOSH based client applications, among the others I found this library &lt;a href=&quot;http://code.stanziq.com/strophe/&quot;&gt;Strophe&lt;/a&gt; pretty popular.&lt;br/&gt;	3.	 &lt;a href=&quot;http://dev.w3.org/html5/spec/Overview.htm&quot;&gt;HTML5&lt;/a&gt;: this is the new HTML standard. A lot of new features are being introduced in the new HTML standard, among the other, &lt;a href=&quot;http://www.w3.org/&quot;&gt;W3C&lt;/a&gt; is working on a protocol for allowing a bi-directional communication between a browser and a remote host. This protocol is called Websockets. This new protocol is the result of the joint effort of W3C and &lt;a href=&quot;http://www.ietf.org/&quot;&gt;IETF&lt;/a&gt;.&lt;br/&gt;IETF is defining the transport protocol, you can find the RFC &lt;a href=&quot;http://tools.ietf.org/html/draft-hixie-thewebsocketprotocol-54&quot;&gt;here&lt;/a&gt; where W3C is busy with the definition of the APIs to be used inside the browser for using the &lt;a href=&quot;http://dev.w3.org/html5/websockets/&quot;&gt;Websockets&lt;/a&gt; protocol. There is an interesting product by this company &lt;a href=&quot;http://www.kaazing.com/&quot;&gt;Kaazing&lt;/a&gt;. They provide a gateway with associated client APIs allowing to develop Websockets based applications. If a browser doesn’t directly support the Websockets protocol, the Kaazing Javascript libraries fallback to simulate the Websockets client layer. Tha Kaazing gateway provides specific Javascript libraries for connecting a bowser with popular messaging technologies like XMPP. I found the overall Kaazing stack fast and easy to use.&lt;br/&gt;My personal opinion is that Javascript is already enough fast and expressive for developing rich internet applications. All the main web browsers are constantly improved in terms of Javascript processing speed and, moreover, it’s really plenty of powerful Javascript frameworks for helping people to develop nice and user friendly applications. &lt;br/&gt;Javascript is also available in all the browsers on board of the most popular smart phones. So, the client side is going to be dominated by Javascript. Given this fact, I would opt for Javascript/HTTP based solutions for real time data pushing. Bayeux, in my opinion, is the first candidate, it could be a medium term solution till the Websockets will become adopted and available. I found also interesting the XMPP based approach, it could be an alternative, however it forces you to have the XMPP broker server in the middle, between the client and the data producer. Using Bayeux, the protocol listener/manager can be directly co-located with your back-end giving you a better chance for reducing the latency. &lt;br/&gt;Above, I tried to describe solutions based on well defined protocols either “de iure” or “de facto” standardized.&lt;br/&gt;However, during my investigations on pushing technologies, I found other platforms/solution providing this kind of capabilities.&lt;br/&gt;I found these particularly interesting:&lt;br/&gt;	•	&lt;a href=&quot;http://liftweb.net/&quot;&gt;Lift&lt;/a&gt;: it’s a &lt;a href=&quot;http://www.scala-lang.org/&quot;&gt;Scala&lt;/a&gt; based web framework providing also a comet mechanism.&lt;br/&gt;	•	&lt;a href=&quot;http://www.erlang.org/&quot;&gt;Erlang&lt;/a&gt; based solution: read &lt;a href=&quot;http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-1/&quot;&gt;this&lt;/a&gt; article really impressive.&lt;br/&gt;Sorry for this long post, it’s always so difficult to condense so many information into something not too much long and boring ;)</description>
    </item>
    <item>
      <title>Using ActiveMQ and CXF for inter-webapp communication</title>
      <link>http://www.davidgreco.it/MySite/Blog/Entries/2009/10/2_Using_ActiveMQ_and_CXF_for_inter-webapp_communication.html</link>
      <guid isPermaLink="false">21304603-a227-4d70-b65a-d99eb128c367</guid>
      <pubDate>Fri, 2 Oct 2009 13:42:58 +0200</pubDate>
      <description>For a project I’m working on I had the need to make two separate web applications running inside Tomcat to talk each other.&lt;br/&gt;Basically, each web application represents a distinct business component, see previous post, that exposes a well defined interface.&lt;br/&gt;I preferred to stick with the one business component per web app since it allows to deploy and manage the business components separately each with its own lifecycle.&lt;br/&gt;Being on different deployment units the two separated business components cannot talk each other even if they reside on the same container, this is, off course, due to the class loading mechanism of the the web container.&lt;br/&gt;So, the problem is, if I have two business components deployed into two different web archives how could I make one component consume services exposed by the other one ?&lt;br/&gt;I decided to use &lt;a href=&quot;http://cxf.apache.org/&quot;&gt;CXF&lt;/a&gt; and &lt;a href=&quot;http://activemq.apache.org/vm-transport-reference.html&quot;&gt;ActiveMQ&lt;/a&gt; for achieving that goal. Using the code first approach, just annotating the component interfaces with JAX-WS annotations I can make those interfaces remotely callable just using the configuration based mechanism, based on Spring, provided by &lt;a href=&quot;http://cxf.apache.org/&quot;&gt;CXF&lt;/a&gt;.&lt;br/&gt;In order to make this communication as fast as possible I decided to use &lt;a href=&quot;http://activemq.apache.org/tomcat.html&quot;&gt;ActiveMQ embedded into Tomcat&lt;/a&gt; using the &lt;a href=&quot;http://activemq.apache.org/vm-transport-reference.html&quot;&gt;vm connector&lt;/a&gt;. In this way, when the client calls the service exposed by the component on the other side, the messages are sent using the memory by passing the network stack (even the local loop). Off course, I still have to pay the price of marshaling/unmarshaling to/from SOAP/XML, but I thought it was worthwhile for preserving a clean separation among the business components.&lt;br/&gt;As usual, I’m putting as an attachment (in the comment section) a simple project showing what I tried to describe on this post.&lt;br/&gt;Let me know what you think.&lt;br/&gt;</description>
    </item>
    <item>
      <title>OSGi for enterprise applications: part 3</title>
      <link>http://www.davidgreco.it/MySite/Blog/Entries/2009/6/21_OSGi_for_enterprise_applications__part_3.html</link>
      <guid isPermaLink="false">529bc96a-2771-43ca-bd5b-dd828eac7249</guid>
      <pubDate>Sun, 21 Jun 2009 17:32:59 +0200</pubDate>
      <description>My efforts in understanding the implication of the usage of OSGi are still continuing. With this entry I’d like to share with you my efforts to use OSGi as the foundation technology for developing component based enterprise systems. As I wrote many times, I’m really convinced on the effectiveness of the component based development for realizing large scale enterprise systems. A couple of years ago I had the chance to work with Peter Herzum one of the authors of the book &lt;a href=&quot;http://www.amazon.com/Business-Component-Factory-Comprehensive-Component-Based/dp/0471327603&quot;&gt;Business Component Factory&lt;/a&gt;. This book introduced to me the concept of &lt;a href=&quot;http://www.omg.org/docs/ormsc/98-09-01.pdf&quot;&gt;Business Component&lt;/a&gt;. Being a lazy person I don’t want to write too much about these things. very simply a business component is a unit of functionality that manages a single business abstraction. A business component is tiered, for each architectural tier it contains a distributed component:&lt;br/&gt;	•	User Distributed Component: responsible for the user interface.&lt;br/&gt;	•	Workspace Distributed Component: supports activities belonging to a single user.&lt;br/&gt;	•	Enterprise/Service Distributed Component: it contains the business logic and it enforces the integrity of the shared resources managed by the resource component&lt;br/&gt;	•	Resource Distributed Component: it provides access to the shared resources like the persistent data in a database.&lt;br/&gt;This picture summarizes the relationship among the different concepts in a business component based system.&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;I found this model very powerful, during the analysis phase you try to discover the business components that could realize your system. Then for each DC (Distributed Component) belonging to the individuated BCs you start to define the interfaces, after this phase you could start to develop each single DC in insulation maybe developing proper mockups for the DCs you depend upon.&lt;br/&gt;I thought that the combination of OSGi + SpringDM and the usage of a robust OSGi based container like ServiceMix4 could allow me implementing the business component concept. In fact, what I found dealing with OSGi for the enterprise is the lacking of a more coarse grain component model. In my opinion the bundle doesn’t provide the right level of abstraction when you want to develop a component based system. Maybe, it’s a silly idea, but OSGi can be thought as a sort of assembler for  component models. Given the extreme modularity and the low level service concept that OSGi provides, this framework can be used for implementing any kind of more sophisticated component model. On the other end OSGi is being adopted as a foundation layer for other component models like JBI (with ServiceMix4) and JEE (Geronimo, Glassfish). So, as I said, OSGi, per se, provides a component model but unfortunately doesn’t provide the right level of abstraction for dealing with the development of business component based systems. As a direct consequence, it’s pretty easy to fail in managing the complexity of so many bundles with all the relationships among them. An enterprise application of moderate complexity can be made of hundreds bundles, so, without a strict discipline or a strict architectural blueprint is very probable that sooner or later you’ll loose the control, loosing also the advantages of using OSGi.&lt;br/&gt;So, at least with the current status of OSGi based technologies, it’s really important to establish either an architectural metaphor or an architectural blueprint that is well suited for the development of your system/enterprise application, and then spend some time for setting up reusable templates.&lt;br/&gt;Driven by this considerations, after decided to embrace OSGi for developing enterprise applications, I decided to choose an architectural metaphor: the business component concept. Then we spent a couple of weeks for developing a business component example that can be used as a template, in this way the developers should have an easier way for starting the development of a business functionality and at the same time (most important) they have a clear path to follow and clear model to adhere to.&lt;br/&gt;The template we developed provides a hierarchical maven based project of a complete business components with all its distributed component.&lt;br/&gt;The resource component uses JPA for dealing with data persistence. The service component uses JAX-WS based annotation and &lt;a href=&quot;http://cxf.apache.org/&quot;&gt;CXF&lt;/a&gt; for providing a remoting capable interface, The service component then uses &lt;a href=&quot;http://camel.apache.org/&quot;&gt;Camel&lt;/a&gt; for providing an event driven connector. In fact, in our model, a distributed component should provide a remote interface and it should also able to receive and send events. At the end using SpringDM, CXF and Camel we could add the needed connectors to our components (synchronous, RPC based connector, and asynchronous event based connector), just using a configuration based approach. The template we wrote contains all the needed dependencies and import/export packages directives, the idea behind is to relieve the developer from dealing with too much details of the OSGi mechanic.&lt;br/&gt;I hope to put sooner in attachment a complete example of a business component hoping it could be helpful for you.&lt;br/&gt;</description>
    </item>
  </channel>
</rss>
