Tuesday 30 September 2014

IO::Iron Policies - No Typing Errors to Iron.io Services!

Policies is a way to limit the names of message queues, code packages, caches and items (item keys) to a predefined group of possible strings. This can limit the chances for typos and enforce an enterprise policy. The policies are loaded from a JSON file which is specified either when creating a IO::Iron::Iron*::Client object, or in the config file .iron.json (or equivalent).

Policies in Config file

Add the item policies to the config file. The value of the item is the file name of the policies file.

Example config file:

    {
        "project_id":"51bdf5fb2267d84ced002c99",
        "token":"-Q9OEHZPhdZtd0KHBzzdUJIqV_E",
        "host":"cache-aws-us-east-1.iron.io",
        "policies":"iron_policies.json"
    }

Policies file specified when creating the client

    my $policies_filename = '/etc/ironmq/global_policies.json';
    my $client = IO::Iron::IronCache::Client->new('policies' => $policies_filename);

Examples of Policies File and Explanation of Configuration

The 'default' policies JSON file:

    (
    'definition' => {
        'character_group' => {
        },
        'no_limitation' => 1, # There is an unlimited number of alternatives.
    },
    'queue' => { 'name' => [ '[:alnum:]{1,}' ], },
    'cache' => {
        'name' => [ '[:alnum:]{1,}' ],
        'item_key' => [ '[:alnum:]{1,}' ]
        },
    'worker' => { 'name' => [ '[:alnum:]{1,}' ], },
    );

The above file would set an open policy for IronMQ, IronCache and IronWorker alike. The file is divided into four parts: definition for defining meta options, and queue|cache|worker parts for defining the changing strings (queue|cache|worker names and item keys). The character group alnum covers all ascii alphabetic characters (both lower and upper case) and digits (0-9).

N.B. The option no_limitation controls the open/closed policy. If no_limitation is set (1=set), the policy control is turned off.

An example of policies file

    {
        "__comment1":"Use normal regexp. [:digit:] = number:0-9, [:alpha:] = alphabetic character, [:alnum:] = character or number.",
        "__comment2":"Do not use end/begin limitators '^' and '\$'. They are added automatically.",
        "__comment3":"Note that character groups are closed inside '[::]', not '[[:]]' as normal POSIX groups.",
        'definition' => {
            'character_group' => {
                "[:lim_uchar:]":"ABC",
                "[:low_digit:]":"0123"
            },
        },
        "cache":{
            "name":[
                "cache_01_main",
                "cache_[:alpha:]{1}[:digit:]{2}"
            ],
            "item_key":[
                "item.01_[:digit:]{2}",
                "item.02_[:lim_uchar:]{1,2}"
            ]
        }
    }

This policies file sets policies for cache names and item keys. Both have two templates. Template "cache_01_main" is without wildcards: the template list can also only contain predefined names or keys. Sometimes this could be exactly the wanted behaviour, especially in regard to cache and message queue names.

Items beginning with '__' are considered comments. Comments can not be inserted into lists, such as I.

The definition part contains the list character_group for user-defined groups. The following groups are predefined:

[:alpha:]
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
[:alnum:]
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
[:digit:]
0123456789
[:lower:]
abcdefghijklmnopqrstuvwxyz
[:upper:]
ABCDEFGHIJKLMNOPQRSTUVWXYZ
[:word:]
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_

All lower ASCII (7-bit) characters are allowed in names and in character groups, except for the reserved characters (RFC 3986):

!$&'()*+,;=:/?#[]@

A character group definition is closed inside characters '[::]', not '[[:]]' as normal POSIX groups. Only the equivalents of the POSIX groups mentioned above can be used; e.g. POSIX group [[:graph:]] is not available.

When using the character groups in a name or key, only two markings are allowed: [:group:]{n} and [:group:]{n,n}, where 'n' is an integer. This limitation (not being able to use any regular expression) is due to the double functionality of the policy: a) it acts as a filter when creating and naming new message queues, code packages, caches and cache items; 2) it can be used to list all possible names, for example when quering for cache items.

Sunday 31 August 2014

IO::Iron::Applications - Command line tools for Iron.io services

IO::Iron::Applications is an auxiliary package for IO::Iron. IO::Iron contains the library for using the Iron.io cloud services in Perl programs. IO::Iron::Applications contains command line programs to operate those services.

IO::Iron::Applications is my addition to the IO::Iron interphase library package which I wrote earlier. The Iron.io WWW interface hud.iron.io Dashboard is great but a bit slow to use when you only need to quickly change some values in IronCache, send a message to IronMQ, erase or empty cache for debugging purposes or other similar activities. With these command line utilities the same functions can be performed fast from a normal shell and not using a web browser.

Policies

The programs take use of the IO::Iron package feature policies so wildcard characters can be used in cache names, item keys, etc.

For example, if iron_cache_policies.json:

    {
        "definition":{
            "character_group":{
                "[:lim_uchar:]":"ABC",
                "[:low_digit:]":"01"
            }
        },
        "cache":{
            "name":[
                "cache_[:lim_uchar:]{1}0[:digit:]{1}"
            ],
            "item_key":[
                "item.02_[:lim_uchar:]{1,2}[:low_digit:]{1}"
            ]
        }
    }

then

    ironcache list items .* --cache cache_A01 --policies iron_cache_policies.json

would print out:

    Cache                         Item                expires
    cache_A01                     item.02_A0                               Key not exists.
    cache_A01                     item.02_A1                               Key not exists.
    cache_A01                     item.02_AA0                              Key not exists.
    cache_A01                     item.02_AA1                              Key not exists.
    cache_A01                     item.02_AB0                              Key not exists.
    cache_A01                     item.02_AB1                              Key not exists.
    cache_A01                     item.02_AC0                              Key not exists.
    cache_A01                     item.02_AC1                              Key not exists.
    cache_A01                     item.02_B0                               Key not exists.
    cache_A01                     item.02_B1                               Key not exists.
    cache_A01                     item.02_BA0                              Key not exists.
    cache_A01                     item.02_BA1                              Key not exists.
    cache_A01                     item.02_BB0                              Key not exists.
    cache_A01                     item.02_BB1                              Key not exists.
    cache_A01                     item.02_BC0                              Key not exists.
    cache_A01                     item.02_BC1                              Key not exists.
    cache_A01                     item.02_C0                               Key not exists.
    cache_A01                     item.02_C1                               Key not exists.
    cache_A01                     item.02_CA0                              Key not exists.
    cache_A01                     item.02_CA1                              Key not exists.
    cache_A01                     item.02_CB0                              Key not exists.
    cache_A01                     item.02_CB1                              Key not exists.
    cache_A01                     item.02_CC0                              Key not exists.
    cache_A01                     item.02_CC1                              Key not exists.

On the command line, all normal regular expression are allowed. E.g.

    item.02_A.{1}0

would return

    Cache                         Item                expires
    cache_A01                     item.02_AA0                              Key not exists.
    cache_A01                     item.02_AB0                              Key not exists.
    cache_A01                     item.02_AC0                              Key not exists.

Following command line programs are available:

ironcache

clear: Clear a cache.
E.g. ironcache clear cache_main
delete: Delete a cache.
E.g. ironcache delete cache_main
delete: Delete item from cache.
E.g. ironcache delete item item.01_AB1
get: Get item/items from cache/caches.
E.g. ironcache get item item.02_A.{2} --cache cache_A01 --config iron_cache.json --policies iron_cache_policies_test_01.json --warn
increment: Increment an item/items in cache/caches.
E.g. ironcache increment item item.02_AC1,item.02_BC1 --cache cache_A01 --value 225
list: List caches or items in a cache/caches.
E.g. ironcache list items .* --cache cache_A01
E.g. ironcache list caches
put: Put or replace item/items to a cache/caches.
E.g. ironcache put item item.02_CC1,item.02_CC2 --cache cache_A01 -- value 123
show: Show the properties of a cache/caches.
E.g. ironcache show cache cache_A01

Monday 30 June 2014

Revision Control and Project Culture

Version control, or simply repository control, is one of the most important parts of a software project. After all, it is in many cases used daily. No wonder, then, that version control is not only part of the project structure, but also part of its culture.

This blog entry is partly based on a report, Jämförelse: Subversion och Git, written for Init Ab, a consulting company headquartered in Stockholm.

Centralized and Distributed Version Control

A repository is the place where the source code of a program is kept. The control to a repository is organized with revision control software. This software maintains a monopoly on read and write access to the repository.

Two recently popular programs in this area are Subversion and Git. They represent very different views on version control.

Subversion is the leading program among centralized version control software. A centrally controlled repository is the "classic" way to arrange control over source code. In this system every user first copies the needed parts of the software to his or her local disk and, when done with making changes to it, commits the changed files to the central repository. For every operation, access to the repository is required.

In a decentralized (i.e. distributed) revision control software there is no absolute central repository. Instead, a new user copies the whole repository from any other existing user. Together with the current code also the history of changes is copied. Every user maintains a complete copy of the repository and therefore there is also no need for centralized backups. In practise, it is customary for a project to keep a "dummy user" account which is used for release testing, nightly builds or linked to a continuous integration system, for example Hudson.

Growing Popularity

According to recent studies by Eclipse Community Survey1 and ITJobsWatch2 in the last few years Git has become as popular as Subversion also in business world. Among Open Source hobbyist developers Git has been popular already for some time. However, as the statistics show us, Subversion hasn't actually been losing ground to Git. Subversion is the direct descendant of once hugely popular CVS, Concurrent Version System, and there is still a great number of enterprises who are running CVS and will only consider changing to Subversion.

YearGitSubversion
20092.4%57.5%
20106.8%58.3%
201112.8%51.3%
201227.6%46.0%
201336.3%37.8%
Results of the Eclipse Community Survey regarding SVN and Git usage.
YearPermanent positions:Rank:
Git SubversionGitSubversion
20121167335426391
201320492836157107
2014360532659099
ITJobsWatch: Git & Subversion.

(De)centralized Culture

I will not concentrate on technical side of revision control but rather on the cultural aspects that these two very different solutions foster.

Version control, or simply repository control is one of the most important parts handling a project or participating in one. After all, we use it daily. The program which we use to access the repository is one of our most often used tools. Therefore, when it feels like it refuses to co-operate with us, it immediately becomes a major irritation. So it must be simple, reliable and fast.

But more than a tool for programmers, version control is also a link between project leadership (maybe even middle-level management, depending on company structure) and developers and architects. It provides us with (inflexible?) boundaries to how we shape our work.

Ben Collins-Sussman, one of Subversions designers, claims that decentralized version control works badly for teams which don't consist of equally competent people. He quotes some requests3 he got when developing Subversion:

Can you guys please give Subversion on Google Code the ability to hide specific branches?
Can you guys make it possible to create open source projects that start out hidden to the world, then get revealed when they're ready?
Hi, I want to rewrite all my code from scratch, can you please wipe all the history?
Developers are humans and they have a tendency to want to work privately, in a cave, then spring "perfect" code on their community, as if no mistakes had ever been made. In a decentralized version control environment it can be too easy to "slip" into isolation, thinking that committing into your own repository has the same purpose as committing to the central repository. But this is not the case. The local copy of the repository is for the developers hourly or daily use for local backups; but the central repository is "public" so the project manager and others can see where the developer is going. The project policy could be to commit every day before finishing work, and if the central repository is connected to a continuous integration system with unit tests, errors and bad solutions will be discovered earlier. Collins-Sussman quotes Google's culture och mantra: don't run from failure - fail often, fail quickly, and learn.

On the other hand, if the team is small and every developer about at the same level, decentralized version control can foster meritocracy and friendly competing spirit. In a true decentralized version control environment (without a "centralized dummy user") changes are copied directly from one user to another so trusting the other's code becomes a necessity.

A decentralized environment is not the only way to foster meritocracy, however. The Apache Software Foundation is also known for its meritocratic structure in open source projects. They use Subversion exclusively. Project participants are divided into three groups: users who can make suggestions and bug reports, developers who submit their code but cannot commit, and committers who have write access to the repository. Anyone can become user and being a developer only requires to checkout the freely available source code from the Subversion repository. The committers' group replenishes itself from the developers' group by selecting with a common decision the ones whose submitted source code has the best quality. The GNOME Foundation, Apache Software Foundation, Mozilla Foundation, and The Document Foundation officially claim to be meritocracies.

Centralized version control favours a more structured organization, whereas decentralized can suit a self-forming or self-governing team, or hobbyist group. On the hand other, the technical know-how must be somewhat higher, especially when using Git. Git is powerful but somewhat complicated to use, more error-prone (or gives that appearance) in daily usage than its main decentralized competitors Bazaar or Mercurial, not to mention centralized Subversion.

Naturally decentralized version control can suit a well structured organization or a company, as well, but it requires stricter guidelines and processes to guide its usage which in part may nullify its benefits.

Conclusion

The question of team and organization's culture is the most important. As mentioned above, version control is a daily tool, and its users' culture will influence the way it is being used; but also the opposite: the version control tool will influence the users by favouring certain work flows and usage patterns over others.

References

1. Eclipse Community Survey Report 2013, Retrieved 2014-06-13.
2. ItJobsWatch, Retrieved 2014-06-13.
3. Brian W. Fitzpatrick and Ben Collins-Sussman, Team Geek, A Software Developer's Guide to Working Well with Others, 2012, First Edition, O'Reilly Media.

Saturday 31 May 2014

HtmlUnit - For Integration Testing and Webcrawling

To put it in just a few words: HtmlUnit is a web browser without a window.

Intended for integration testing, HtmlUnit allows user programmatically to manipulate a webpage on a high level, i.e. as if doing it with a normal web browser. The calling program can fill and submit forms, click on buttons, imagemaps and hyperlinks, or activate JavaScript created object. JavaScript, cookies and AJAX are supported. So are proxies and immediate redirection.

GUI integration testing


This kind of testing is about as close to human testing we can get with automated testing. Testing static webpages is always easy because the content only gets loaded once from the remote server but nowadays webpages have more often dynamic content than not. Once the page is loaded not only the outward appearance but also the content itself is changed with the help of JavaScript, CSS (Cascading Style Sheets), AJAX and Adobe Flash (although flash - being a self contained "applet" or videoplayer - is outside the scope of HtmlUnit.

With HtmlUnit the test program can "crawl" through the HTML code section by section confirming that content is correct. Or it can jump straight to a certain part identified by id or name tag. It can "hover" the mouse pointer (emulated, of course) over parts of text or a button on a form, or e.g. select an item from a select (list) button which is wired with JavaScript, and then confirm that the page or form content changes as planned.

HtmlUnit does UI testing for webpages, or more precisely integration testing for HTML elements' and JavaScript's integration.

Webcrawling


Because HtmlUnit is a headless (i.e. windowless) web browser, it can also be used to programmatically browse websites and extract information. On many webpages JavaScript is intimately linked to the processing of forms so that a form cannot be submitted properly without JavaScript's help. These kind of pages are of course examples of poor webform design (separation of concerns is not completed; business logic is mixed with the program flow) - but ours being an imperfect world, even they must be accepted. And that's where HtmlUnit shows what it's made of.

There is plenty of pages where user only needs to log in through the front page, and immediate the sought after information is available, or maybe via a simple form, like logging to your telephone company's website only to see how much saldo or network quota you still have left for the current month. Many simple hardware devices, such as home routers, only provide a Web interface, no SOAP or REST API. HtmlUnit to the rescue! Earlier it was impossible or close to it to get to this content.

Let's see an example in Java:
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlInput;
final WebClient webClient = new WebClient(BrowserVersion.CHROME, proxyIP, proxyPort);
webClient.getOptions().setRedirectEnabled(true);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getCookieManager().setCookiesEnabled(true);

We have imported some HtmlUnit element classes. We create a new WebClient instance by tell it which browser it should spoof and which server to use as a proxy. Both of these are optional. Sometimes an HTTP server or the client side JavaScript changes layout of the page depending on the requesting browser. We also enable redirection, JavaScript support and cookies support. Another way:

   final WebClient webClient = new WebClient();

Let's continue. We want to find the submit button and input fields for userid and password. Once we get them, we can finish logging in by clicking the submit button and loading a new page in the bargain.

HtmlInput submitButton = null;
HtmlPage titlePage = null;
try {
    titlePage = webClient.getPage(hostname);
} catch (IOException e) {
    e.printStackTrace();
}
final List forms = titlePage.getForms();
// iterate through the list to find what we need.

	[...]
submitButton = loginForm.getInputByName("login");
final HtmlTextInput usernameTextField = loginForm.getInputByName("login_id");
final HtmlPasswordInput passwordTextField = loginForm.getInputByName("login_password");
usernameTextField.setValueAttribute(userId);
passwordTextField.setValueAttribute(password);

try {
  entryPage = submitButton.click();
} catch (IOException e) {
  e.printStackTrace();
}
List links = entryPage.getAnchors();
for (HtmlAnchor link : links) {
  logger.debug("Entry Page link: " + link.asXml());
  if (link.asXml().contains("create_new_entry.new")) {
    linkToJobAdPage = link;
  }
}


HtmlUnit for Perl


HtmlUnit is not a Java monopoli just because it was developed on Java. It's also available for other programming languages.

Celerity is a JRuby wrapper around HtmlUnit – a headless Java browser with JavaScript support.

WWW::HtmlUnit is the Perl equivalent, an Inline::Java based wrapper of the HtmlUnit v2.14 library

Wednesday 30 April 2014

IO::Iron gets command line tools

Now that all the functions of Iron.io's IronMQ, IronCache and IronWorker services are turned into Perl client libraries, it is time to think about not only the programmer but also the tester and application supporter's needs. They require an easy and quick access to the services: command line tools.

Perl has several possible frameworks for creating command line utilities: e.g. CLI::Framework, App::Cmd, Badger, CLI::Application and CLI::Dispatch. From these I picked App::Cmd mostly because of its decentralized nature.

Command Line Tool Design for Continuous Integration


One of the principles of Continuous Integration states that "Everyone commits to the baseline every day". To make the programmers' load lighter every change should be in as much isolation as possible. Centrally located code which refers to individual parts of the system should always be generated automatically so the programmer does not need to remember and bother to keep up-to-date any kind of central index / reference table / central documentation / user reference or any other kind of central keeping place for things. Automatic code generation not only ensures that the "keeping place" is always up-to-date but also avoids typing errors. In Continuous Integration repository it also prevents or at least limits the possibility of merge conflicts in same files.

Equally important is to follow the practise of "Generate User Documentation from Program Code". App::Cmd is a good example of this. Actually, App::Cmd uses Getopt::Long::Descriptive which is its own small system (or framework) for processing command line options and parameters. The options and parameters are defined in a meta language (in a Perl hash data structure) and the Getopt::Long::Descriptive package uses this data structure to present the same options to user when needed, e.g. when user mistypes a parameter name.

my ($opt, $usage) = describe_options(
'my-program %o <some-arg>',
[ 'server|s=s', "the server to connect to", { required => 1 } ],
[ 'port|p=i', "the port to connect to", { default => 79 } ],
[],
[ 'verbose|v', "print extra stuff" ],
[ 'help', "print usage message and exit" ],
);
becomes on the text terminal:

  my-program [-psv] [long options...] <some-arg>
    -s --server     the server to connect to
    -p --port       the port to connect to
    -v --verbose    print extra stuff
    --help          print usage message and exit

Loose Coupling at Runtime on Application Level


However, App::Cmd goes even further. When several distinctively different command functions are combined into one application, they are completely separated into individual files. This is a fine example of loose coupling inside one application. The commands do not know of each others' existence and neither do the programmers need to know of it. In IronMQ's case, the executable ironmq contains individual commands like 'add', 'delete', 'show' and 'list' but their existence is not documented permanently (statically) anywhere. One programmer works on one command and completes his or her work regardless of whether the other programmers have finished with the other commands. When user executes ironmq, App::Cmd framework discovers at runtime which commands and parameters are available.

This fastens the application development by making interconnected parts not reliant of each other. It speeds Continuous Integration and time to deployment.

Sunday 30 March 2014

Dist::Zilla as a Continuous Delivery Tool

I just recently converted my IO::Iron distribution to using Dist::Dilla as a release and build automation tool. Dist::Dilla is mainly targeted at people writing free software Perl packages for releasing into CPAN (Perl free software archive) but if used properly it can make easier the releasing of any software.

Before


When I started to build the IO::Iron distribution, I already knew of Dist::Zilla but two things kept me from adapting it. Firstly, I considered it too difficult to learn for such a small project (which later grew), and, secondly, being bloated and suffering from featuritis. Instead, I went with the classic solution of using Module::Starter to begin, and continued with manually editing the Makefile.PL and every other file, including MANIFEST, README and Changes. I used my private Subversion repository. I uploaded to CPAN via the CPAN Author page page.

After I had forgotten to update the Changes file a few times, I started to reconsider Dist::Zilla. The more I read about it, e.g. Dave Rolsky's excellent blog entry Walking Through a Real dist.ini, the more it seemed to make sense. About two weeks ago I decided to take the time required, a day or two, and go through the setting up of Dist::Zilla and converting IO::Iron.

After


It was worth the effort. Dist::Zilla does not replace the Makefile.PL which is used when user takes a distribution into use. Makefile.PL builds, tests and installs at user's end. But Dist::Zilla prepares the distribution for uploading. It automates almost all the repeating steps involved when releasing: determines prerequisites, manages version numbers and Changes file, checks that the changes have been committed, and - above all - builds the Makefile.PL.

Dist::Zilla streamlines the code-test-commit-release -cycle and defines a workflow, thus rising release quality.

Inner and Outer Workings


Using Dist::Zilla is done with the command line tool dzil. It is very similar to Make in outward appearance. Dist::Zilla itself is actually a frame for defining workflow stages. All functionality is executed by plugins. Building a release is divided into stages or roles similiar to what Makefile.PL uses: build, test, install, release, etc. The plugins are attached into separate stages. For example, gathering the distribution files and reading them into memory (from which they will later be written into a new build directory) is a stage and the equivalent roles are FileGatherer and FileInjector. All required plugins which fill these roles will be executed at this stage, and a plugin can read an existing file from disk, or create a file dynamically.

Creating Distributions


When creating a CPAN distribution, such as IO::Iron, whose "distribution source code" is now located publicly at Github, the last action (i.e. plugin) when executing dzil release is normally "UploadToCPAN", but this can be changed by editing the dist.ini file. CPAN distribution format is convenient also for other code releases than just CPAN packages. Instead of uploading, the last action in the chain could be committing code to the repository, or making a direct installation.

Continuous Delivery


Dist::Zilla's modular structure makes it adaptable to new situations, even to different programming languages. It is not limited to Perl, not even to programming. With a different set of plugins it could just as well serve as a blog authoring (automatic spell checking, abbreviation expanding, date/version managing) and uploading tool. It becomes a competitor to e.g. Maven [http://maven.apache.org], which is best known in conjunction with Java (although it is more of a project management tool than software authoring tool).

In the field of continuous delivery and continuous integration Dist::Zilla contributes to lightening the strain of programmers from remembering often repeated actions, codifying workflow and rising the quality of releases, especially when releasing often. The plugins are reusable pieces of code easily shared among developers. This in turn reduces the time and effort of rebuilding the build system when updating old projects or creating new ones.