Friday 27 December 2013

Cloud Architectural Challenges

Building applications for cloud.


How is creating software for cloud different from previous architecture? Not that much really. There is a great deal of hype which makes it seem that cloud software architecture is a huge improvement compared to "traditional" software architecture - whatever that is: mainframe architecture, thin/fat-client architecture, server farm architecture, any other architecture...

The differences between other architectural styles and cloud architecture rise not from the offerings of the cloud but rather from the unique challenges posed by operating in the cloud. The challenges are two, and they are mainly a question of uncertainty...

Challenge: Out of Sync


The cloud is - by its very nature - a distributed environment. A distributed application consists of multiple parts which communicate with each other but don't necessarily run in the same system. When an application has functionality which is used at different times or different frequencies, or can be run parallel to other parts, then it often would make sense to remove it outside the normal execution process and perhaps even set it into a different system to be connected only when required.

This kind of functionality might for example be the billing or archiving functions of a website. After making an order in a web-shop, the website will not wait until the customer's credit card is actually billed; the web-shop returns control to the user immediately and a different subsystem of the application handles the credit card billing. When it is finished, the user will get a confirmation email and user's profile in the web-shop will be updated.


Workers


The example above is very trivial. The subsystem handling the billing is a "worker", a unit which is activated only when the application requires its service. The unit might exist outside the application's system, maybe even on a different server. The important thing is that the application does not wait for it to return anything. It runs alone according to the parameters the application provides to it, and after running it closes itself down automatically without need for any interaction. Therefore, it runs out of sync with the main application.


Being Out of Sync


When programs run inside one server and one operating system, the communication is instantaneous or - in practice - real time. But communication in the cloud is not in real time, sometimes not even stable. The connection might take a long time to establish, or it might break, or simply be slow. The vendor might have an unscheduled maintenance break or the service might have been relocated physically to a different server. In general, subsystems often use IP addressing to connect to each other. The two main architectural choices are RPC (remote procedure call) interface or REST (representational state transfer) interface. The main difference between these is less in the implementation and more in their philosophy.

RPC is an interface for a tightly coupled application where the subsystems are in fact subroutines and the main program waits for the completion of the subroutine before continuing. REST on the other hand is an API which can function both synchronously and asynchronously. REST is best used when subsystems of the application are autonomous services which can - in principle - be offered to any application. REST encourages the design of the API into the form of a (public) service. REST is stateless API so no client context is stored on the server between requests.

Because of the possible or (pessimistically) likely problems in connection between the application and its subsystems in the cloud, it is generally better to go for REST style architectural design. Properly implemented it provides a robust loosely coupled system fit for cloud.


Messaging


Of course, the distributed parts (workers and others) need to communicate with each other. There are many ways to do it but generally the best is an out-of-sync way: a message queue. A message queue is an external application, "messaging middle-ware", to whose care the application gives a message and then "forgets it". Another part of the application polls the message queue at preset intervals and reads the message when it is available. The message queue guarantees that a message will never get lost but it doesn't know how quickly the other subsystem will read it or act upon it. It does not wait for a return message. It will wait, however, for a receipt from so it knows the message was handled. When using a message queue to link subsystems together, an API (REST or other) is not necessary.


Challenge: Unreliability


As mentioned above, cloud is a volatile environment, also in the sense that vendor companies may come and go, new services promoted and old ones canceled. Cloud architecture is also about preparing for the eventuality of migration to new services or platforms. It is the natural additional price to pay when seeking "affordable" cloud services as most companies always do.


Design and prepare for eventual platform or vendor change.


The platforms that cloud vendors and service providers offer include not only real or virtual servers but also "platforms" that are more like services, such as databases, messaging middle-ware, worker platforms, and of course varied special services like log collectors (IT operation oriented) or daily currency rate providers (business oriented). All purchased services, not to mention free services, have a tendency to change APIs or even disappear as time goes by.

Cloud architecture has ways to prepare for this eventuality. Most of these are coding practices that can be forced for instance if the implementation is done using a framework. Connection to external APIs can be isolated, database connection abstracted into an ORM (object relational mapper), message queue connection as well. Unfortunately this also means that the risk for complicating the implementation rises.

Another isolation layer could be a proxy server for REST or RPC calls. A proxy server can provide additional security as it could also keep the remote services' passwords and other connection details hidden from the service users.


Change is always pending


With cloud architecture, preparing for trouble and change is always paramount because in the cloud an application can have very little control over its environment. The cloud creates a new kind of approach into dealing with vendors: pay-as-you-go. If the costs of vendor service are billed accurately according to the actual usage of resources (memory, CPU cycles, bandwidth, tech support requests, ...), this will prompt the design of applications better optimized for cloud environment.