Saturday, January 5, 2013

Importance of Configuration Management


Parer prepared for "Riga Technical University 53rd International Scientific Conference dedicated to the 150th anniversary and The 1st Congress of World Engineers and Riga Polytechnical Institute / RTU Alumni"

Paper starts with research and definition of current enterprise development requirements to support rapid development needs, gaps and ways of improvements. Paper continues with research of popular tools used for Configuration Management to prove their insufficiency for rapid development and finishes with offer of new tools architecture as answer to current tools' imperfection.
Keywords - Continuous delivery, Build system, Deployment pipeline, Release gates

I. Introduction

Since late Eighties computer software development has changed and evolved a lot to improve development process and value to business. Any of that would not be possible without Configuration Management and absence of it usually leads projects to over budgeting and even failure. Still, the term Configuration Management quite often is misunderstood and forgotten during development and under high pressure. Continuous Integration and Continuous Delivery are terms from Configuration Management and missing proper tools nowadays.
Whilst Configuration Management is essence in product development and some companies spend a lot of efforts on it, companies in Latvia and even well known international companies keep struggling with typical problems because of different myths and lack of knowledge about it. There are no certifications or official studies in this area so sometimes development teams look at Configuration Management as some archivation process. Other common misunderstanding is Configuration Management as Continuous Integration whilst other essential parts of it are forgotten.
Configuration Management is a must in any type of project. It is a process which wraps product development starting from first written code till end of project. There are differences between Waterfall and Agile Configuration Management but common understanding and necessity remains as inherent part of software development. Based on different polls worldwide it is clear that Configuration Management is taking bigger and bigger role in delivery process to improve feedback time on every product change.
Continuous Integration is one of Configuration Management essences and can raise development quality significantly but only when used appropriately and discipline is well known word in project. Next generation of Continuous Integration is Continuous Delivery and Deployment Pipelines which becomes more and more popular worldwide but sometimes key points of it are misunderstood and efforts are spent in wrong areas. Most important key point of these systems is feedback time or change lifecycle what makes them inefficient without continuous support and improvement.
There are different open source and enterprise solutions to start Continuous Integration process in company but none of them fulfills all primitive requirements of Configuration Management or at least not in an easy way. Current practices in Continuous Delivery recommend setup and integration of two or more systems to support it what makes change lifecycle hard to read and requires additional maintenance efforts so as result becomes inefficient for rapid development.
It is clear that in evolving Agile development and growing business needs also software must be developed more rapidly.  Improvements in Configuration Management tools are required as well. New solution is required which focuses on Configuration Management process instead of build system, artefacts management and reporting. Such solution should ease implementation and support of Configuration Management in development processes by generic process control and management. It should focus on monitoring and measurement of product changes lifecycle to discover bottlenecks in development and build processes as soon as possible. To avoid hardware limitations of build processes improvements, solution should support scalable environments and offer virtual build environment to split build tasks on build machines in background.
Aim of this article is actualize problem of Configuration Management misunderstanding and absence in software development companies. In addition author proposes architecture for new software which will focus on Configuration Management process and its metrics instead of artefacts build and reporting.

II. Requirements for rapid development

Configuration Management quite often is misunderstood and judged as waste of efforts. On one hand it could be truth for small teams which consist of one to 5 developers, however it should be understood as a methodology of work and cared with discipline. If there is any chance project will be longer than few iterations and development team may extend, it is cheaper to implement Configuration Management principles at start of project than forcing it later and trying to adapt existing processes to it.
One of biggest mistakes teams run into, is thinking that having Continuous Integration server in project is an answer to any problems and it is enough. Continuous Integration server is just a tool which helps to manage Continuous Integration and Delivery process so there must be special role of configuration manager and common understanding why and how this tool should be used. The keyword there is discipline so some effort is spent to follow Configuration Management processes instead of spending more efforts later because of insufficient quality, unexpected runtime problems on production or delays of product delivery.
Configuration Management practices can be enforced in projects in several ways though that can' t guarantee absolute success in development because any enforcing always demotivates people and usually results in developers searching for workarounds to avoid enforced rules. Development teams should make a commitment to use Continuous Integration tools and follow Configuration Management principles instead.
Since first popular Continuous Integration tools like CruiseControl which had focus on build of artefacts and reporting, Continuous Integration and Configuration Management existed near each other and spread some misunderstanding between these terms whilst actually Continuous Integration is a part of Configuration Management and it is never enough. Since those times new tools were developed though for some reason all of them are still focusing on the same goals - produce deployable artefacts, run some analysis and publish reports. Teams which cared about Configuration Management as a whole started to write different plug-ins and customizations for these systems what makes whole configuration over-complicated and sometimes may lose focus on what actually was planned to achieve in start. To fulfill other rapid development needs, some companies started to develop unique tools with special focus on filling gaps of existing Continuous Integration tools by making integrations with them. Such solution cover all problems in overall and may work in most cases though results in bigger maintenance efforts, higher costs, requires skilled staff and gives more risks that something could go wrong with communication between those tools.
Configuration Management includes different key factors for successful development so special tooling could help a lot and could be split in following areas:
·         Version Control System - all software source code should be stored in central repository and versioned so any source code change can be tracked down to any point of time;
·         Build architecture - lots of platforms of software requires transformation from source code to deployable artefacts so there is special architecture how this transformation is done. Such architecture is a build scenario which should be automated and is called Build Scripts;
·         Build System - Build Scripts is only procedures how software is prepared for deployable artefact so these procedures must be managed by some system called Build System. It is used to run Build Scripts and prepare packages. Preparation usually includes sources download, compilation, packaging, configuration transformation and delivery to Repository Management System;
·         Repository Management System - output artefacts made by Build System and developers should be stored in some repository or storage so they can be used later to deliver project value. Such systems may be used for third party artefacts caching as well to minimize network load for company and improve performance;
·         Versioning management - every artefact may have different versions and types and may change a lot over time so special naming rules should be applied to store and identify needed artefact in storage;
·         Release management - every artefact may have more than one state of its readiness to be production artefact which are changed by sign-offs of different stakeholders of company [13];
·         Codeline management - in enterprise when main codeline which is under continuous development is ready to release, good practice is to branch it to release or production codeline so it can be patched any time only with bug fixes;
·         Delivery architecture - any artefact which presents software needs to be delivered in some way so it has special delivery architecture. Such architecture is a deployment scenario which should be automated and is called Deployment Scripts;
·         Delivery system - to actually deliver any software there is a process or scenario for it so Delivery System is required to run it;
·         Software testing - software can be tested in different ways but all of them can be grouped in human and automated tests. Automated tests generally is testing software for software so it has its own sources code, scenarios, artefacts and execution methods which all must managed by some scripts and system;
·         Software analysis - over the years a lot of efforts are spent on software which could analyze other software and its source code for good and bad practices, different programming rules and so on as expertise which would help to avoid common mistakes, keep code clean and generally bring value to development process. Different tools exist but some scripting and Build System are required to bind it all together and bring to user;
·         Development history - to measure development efforts and changes over time from source code perspective, special system for history is required. It may contain deployment, testing, analysis history and others.
·         Ease of use - every team member is involved in Configuration Management activities so tool should be generic, simple to configure, easy to understand and support any workstation.
Those were general parts of Configuration Management to support any development needs from source code perspective. As from management perspective, sometimes issue management and tracking is added, so as efforts reporting system as well, and even some binary documents storage. Author doesn't recommend binding these together because of different goal of such and large datasets they may bring with them. Their goal is financials what doesn't bring any value to development process as a whole [6].
There are different methods to measure development activities and new ones are researched every day but author focuses only on one of them - change lifecycle. Physically it is a moment when changed source code is submitted to Version Control System till moment until it is approved by all teams as ready for production. Such lifecycle include packaging, testing, analysis and efforts of all development teams to approve its readiness. Configuration Management may play different role in software development companies though measuring and minimizing this lifecycle duration should be an essence of successful Configuration Management because change lifecycle is the only metric which measures all Configuration Management systems and their integration with each other and even all software development teams.
To minimize change lifecycle duration, bottlenecks should be identified at first in whole change lifecycle process from start to end. Such bottlenecks could be discovered in systems or in processes so company can make a decision what would bring bigger benefit of optimization.
Sometimes even lack of well understood versioning management may be the cause of poor artefacts manageability in project which consists of different modules so all teams are wasting efforts on identifying correct artefacts for production ready deliverable.
Popular metrics of development performance for business is code coverage and lines of code counting. In many ways those are good metrics and can be adjusted for business needs positively or negatively. The problem is that these metrics are for business. There are different ways how to make up results for them and type of project may be the reason why well tested project has lower results than other which has bad quality or poor performance. When thresholds of such metrics are forced for development, there are even ways how to write code to raise results of them without improving software code. The same as refactoring may affect results of lines of code metric, unfixed tests may give wrong code coverage results.
Typical example of Configuration Management failure is long Build System activities like build or automated testing outputs so development teams are not interested in them anymore because they already have started to work on other change. Ignoring such problem over time can even result in absolute denial of Configuration Management by development teams and chaos in company what results in project failure sooner or later. Configuration Management can't work without disciple and common understanding of its goals by development teams though having it never is enough.

III. Configuration management tools

Tools to ease Configuration Management tasks were invented years ago and still are in active development. Since Agile methodology became so popular, need of such tools have risen as well. There are not so much literature on best practices for Configuration Management. That could be the reason why it became so separated inside where builds and build scripts sometimes are even responsibility of other team than release management or software configuration values and deployment activities. First tools for Configuration Management were designed for specific problems like separate tool for managing builds or presenting build results. Release management often is implemented with asynchronous documentation updates and software configuration values are added as post-factum.
Lots of comparisons had been done over time of Configuration Management tools and it would not make any sense to make one more in this paper. There is an issue with them that generally all they are focusing on features list and some specific functionality what some tools have and others don't. If one of tools get new functionality, it is added to comparisons matrix and developers of other tools focus on filling this gap to don't drop behind. In such way of evolvement Configuration Management tools are losing their focus more and more and getting more complex when they shouldn't.
One more problem of different tools is that most of them are designed for some special development platform like Java, .Net or PHP etc.
Worth of mentioning is absent Codeline management feature of any existing tools. Some tools offer cloning of main codeline configuration into branched one without configuration changes integration possibility what means any configuration changes must be applied by hands. Some tools offer task libraries so codelines may share the same tasks though that may break released codeline configuration if main codeline evolves.
Version Control Systems are specially designed as source code storage and it is recommended to be stored separate from other Configuration Management tools so it will not be included as requirements of assessment.
Comparison of Configuration Management tools will be done with elimination method by criteria one by one to find appropriate tool which would fulfill all Configuration Management requirements. The closest ones to goal author is aiming are Continuous Integration systems. Most popular are assessed in following chapters.

A.              CruiseControl

CruiseControl by ThoughtWorks is open source tool first released in 1996 and adapted to XP methodology in 2000 together with Continuous Integration term. It has evolved a lot and still has contributors who implement new features when required. CruiseControl is designed for Java platform so CruiseControl.Net and CruiseControl.rb were designed separately for other platforms so it is not recommended to be used for cross-platform Configuration Management.
It has no staged builds whatsoever and distributed builds functionality was released last several years when it already lost its place on market. [7]

B.              Jenkins

Jenkins is open source tool used widely in small projects and enterprise as well. Initially is was named Hudson and firstly release around 2007 as alternative to CruiseControl. Hudson since 2011 is under ownership of Oracle and community of around 100 members is using name Jenkins instead. It is cross-platform Continuous Integration tool and hundreds of custom made plug-ins to extends its functionality for any project needs. As open source project currently it is most popular tool on the market and still in active development. Because of its plug-ins theoretically it fits any requirement and missing ones can be written from scratch. The problem with Jenkins is that adding more and more plug-ins to it makes configuration very complex and dashboard screen for user becomes hard to understand. It is possible to change dashboard with other plug-ins what makes it even more complex at end. As result plug-ins may start conflict with each other.
Despite announcements that Jenkins supports staged builds and deployments it is not. Any plug-ins what makes it possible are hard to understand for non-technical user and not enough for business needs. As alternative there is additional good tool named DeployIt by XebiaLabs. By adding additional tool maintenance efforts are wasted and more misconfiguration risks are brought with it.
Even when ignoring all configuration issues and complex administration Jenkins still is not enough because it has no codeline branching strategy by design and having more than ten projects with sub-projects configured makes standard dashboard slow and hard to read. [8]

C.              TeamCity

TeamCity by JetBrains firstly was released on October, 2006 as an alternative to CruiseControl with great features for monitoring build activities, agent management and lot more. It is cross-platform Continuous Integration tool used in leading companies of IT development. TeamCity now and then is commercial so it became popular generally only in enterprise. It has different plug-ins for integration with other systems and tools. Last few years JetBrains was working to make it simple as possible for development team what means it is not so easy to understand for business.
TeamCity has no staged builds what means software deployments to different servers are done manually or with specially configured jobs. Chained builds can be used instead which are hard to track and such configuration increases maintenance efforts exponentially and makes dashboard hard to read. As JetBrains states, it is not their focus of Continuous Integration system. [9]

D.              Continuum

In 2005 Continuum was promising project because of its simplicity and great Apache community. To keep it like that it was decided to support only Apache build systems Maven and Ant so its evolvement stopped and is missing a lot of functionality comparing to other tools. Still, tool could be the right choice for small Java projects on Maven. [10]

E.              Anthill PRO

Anthill PRO by Urbancode is authors choice for Continuous Integration server Configuration Management tool because of many reasons which aren't scope of this paper. It still lacks some functionality by design and is commercial.
Despite its extensibility because of beanshell scripts and plug-ins support it does not cover all enterprise requirements. Its dashboard is not customizable and is hard for users to track down actual problem when process breaks.
Anthill PRO has no Codeline management functionality so it is not possible to protect released codeline from unwanted changes. There is possibility to organize all processes as separate functionality and use cloning of configuration together with source code branching what brings risk of human factor and increased maintenance efforts exponentially.
There is not so good artefacts storage and database backend design undercover what together leads into performance problems and unexpected problems with choosing right file system. Support team is focusing on business value and are not interested in problems which are invisible at some point.
Anthill PRO is the only system which has implemented Release management by design. The implementation is hard to understand for user though and any logic for them needs to be scripted.
System has its own Repository Management System included but it can be used only within system itself and with special tooling provided by Urbancode. Tooling can be used in Eclipse and is evolving but still is unstable and not suitable for enterprise. Third party Repository Management System still is required.
System is designed for enterprise needs and can serve hundreds of agents but for some reason connectivity to them may be unstable and there are no measurement tools whatsoever. When there is a need to extend system resources it is done post-factum because problems are discovered when they occur. [11]

F.              Bamboo

Bamboo by Atlassian is powerful Continuous Integration tool which covers many requirements of Configuration Management like Release Management and Codeline Management. It is easy to use and configure and has reliable integration with important tools for enterprise like JIRA and Fisheye because those are products by Atlassian as well. It is commercial tool though it is worth of considering to buy whole Atlassian platform.
Still, Bamboo was designed for Atlassian internal needs so it is not designed as cross-platform tool. Any non-Java projects requires additional scripting and extending tools functionality makes things more complicated than they should be.
Bamboo is designed to work with predefined  build pipelines what means it is not possible to change build results when pipeline is started without changing other builds so it means there are no staged builds. It has its Deployment System included but it has limited functionality and can be used only as step of predefined pipeline.
There are no steps' or jobs' library so only default ones are available and all custom stuff is branched with every codeline what means fixing bug in one configuration requires fixing it in other codelines and projects by hands. [12]

G.              TFS

Team Foundation Server or TFS was developed by Microsoft as development platform for any project needs. First release was on 2005 and its focus was on teams collaboration, it had new Version Control System and scheduled jobs feature. Release in 2008 had functionality of Continuous Integration tool named TFS Build with distributed agents but still a lot of functionality was missing. TFS 2010 was a total redesign of build system what made it more configurable, extensible and powerful to scale build environment.
It has straight integration with other Microsoft tools and technologies what means it is designed for .Net platform. TFS is commercial the same as other Microsoft products and doesn't make sense to be used as standalone tool. It can't be configured without Visual Studio and any customizations aren't recommended by Microsoft.
Last release TFS 2012 has focus on deployment pipelines and agility improvements for development teams but it still can't be called cross-platform tool and can be installed only on Windows Operating System.

IV. Architecture of new tool

As identified above, there are different useful tools on the market which still are in active development and evolving in different ways. The common problem of them is architectural design inadequacy for rapid development needs. Adding missing features to current systems may result in major redesign and wouldn't be valuable for authors of existing tools because clients don't request such remake. The answer is to develop new tool with different focus comparing to existing tools and don't advertise it as other Continuous Integration system. It is Configuration Management tool with Continuous Integration feature.

A.              Non-functional requirements

·         internet browser support - application is used with internet browser which supports drag and drop feature;
·         intelligent usability - interaction with elements is done by drag and drop action where possible, related elements to objects are dropdowns of icons depending on mouse location;
·         dynamic screen elements - all elements where possible are loaded asynchronously when required;
·         measurement tools - any process can be measured and thresholds can be set on history analysis results to warn about changes;
·         cross-platform runtime - tool can be run on Windows and Linux operating systems and build .Net, C/C++ and Java artefacts;
·         generic pipeline scenarios and management - pipeline elements like build steps or staging levels can be organized in customizable subgroups and simlinks [1];
·         multi dashboards - different types of configurable dashboards are available like for smartphones, standard workstations and monitoring screens.

B.              Functional requirements

·         LDAP groups support - LDAP groups are automatically scanned for changes and can be mapped to systems custom groups or used directly to assign permissions for users;
·         virtual build environment - hardware is connected in virtual environment so it is scalable and easy to manage;
·         temporary user builds - users are available to test changed code by running builds on server before commit of changes to Version Control System;
·         in-memory builds - Random Access Memory is used for virtual file system to make IO operations faster than hard drive read and write operations;
·         staging support - any server objects can be staged for Release Management with options to manage rules and accessibility;
·         history database - database with blob support is used to store process outputs like artefacts and build logs;
·         versioning support - set of rules is available to configure processes and artefacts versioning;
·         release repository interface - interface of Ibiblio repository is available for build systems like Maven;
·         post build configurator - build outputs are scanned and can be used to make default set of required artefacts instead of defining them before build is executed;
·         outputs comparator - any pipeline results can be compared including execution time, build outputs, used resources etc.;
·         command line interactions - full command line support is available to run any processes;
·         branch feature - any pipeline can be branched together with source code by cloning some or all elements and adjusting its configuration automatically.

C.              Deployment diagram

Following deployment diagram consists of server for system management, three logical databases for different targets and unlimited virtual environments which consist of unlimited hosts which can be scaled or descaled online without interrupting system availability. Three types of adjusted dashboards can be used to interact with system.
Diagram is missing external systems like Version Control System or third party Repository Management System which aren't in scope of this paper. Application servers are missing as well because of the same reason.
It should be understood, that host computer from diagram can be any user workstation or server if it has communications agent installed so for enterprise significant amount of resources is used more cost effective.
Logical databases can be configured on one server and on separate servers as well to use appropriate hardware for every database because their targets and usability may be significantly different comparing to each other.
Tool supports different notification ways. Often showing results on dashboard is not enough because dashboard should be monitored all the time then. Tool depending on configured rules can send e-mail notifications to users list or users analyzed from Version Control System. There are cases when e-mails are not enough and still has not so good feedback loop because user needs to open and read e-mail to get a message. Tool also supports private message sender to popular messengers like Skype [3], Google Chat [4] and Windows Messenger [5].

Fig. 1. Deployment diagram presents general architecture of system components and user interaction with system through internet browser.

D.              Prototype

User interaction with system must hit two targets of usability which usually conflicts with each other - system outputs should be easy to understand even for non-technical people, so only minimal set of information should be shown, and it should be very generic to configure any pipeline so process should be understood good enough with high error handling and rich logging.
Such behavior of system is achieved by implementing expert levels for users and user groups and giving an option to change it by user himself. Every pipeline element is designed to be shown on dashboard depending on users expert level and can be adjusted by system administrator. Since all elements are loaded dynamically when required, dashboard is fast and retrieves only important information.
By having system administrator role, every pipeline, element and dashboard has button to switch to configuration dashboard so it can be adjusted with minimal user interaction. Any configuration changes are versioned in Configuration database so it is possible to rollback configuration with several clicks or compare with changes made previously.
For advanced users who want to see actual progress of executed processes and discover failure as soon as possible, bottom of dashboard has live output console with error auto-focus feature.
Following prototypes are overall designs of possible implementation for basic users like business needs and the same screen extended for developer needs. Last screen is example of pipeline element configuration screen. All screen elements are draggable where it makes any sense and system offers different target actions when dropped if there are more than one.
Dashboard for smartphones has push [2] notifications so system can be monitored even out of office without delay.
Dashboard for monitoring like TV screen has self-reload feature to always be up to date and notify people in room immediately.

Fig. 2. Prototype of first dashboard screen for two projects in development where one of them has some problems which should be analyzed. Such dashboard is for basic user.

Fig. 3. Prototype of second level dashboard screen after opening item from previous screen. This situation has two codelines under where one has some problems.

Fig. 4. Prototype of third level screen where elements are binded together and executed as a pipeline. Element "Unit tests" is failing what usually for business is enough to point to responsible group and don't get confused in technical details. Prototype also has example of Suggestion box that "Project 2" is missing sign-off to continue and current user should pay attention to it immedately.

Fig. 5. Prototype of dashboard screen adjusted for expert level with more technical details what gives value for developer to investigate failure fast as possible and proceed with fix.

Fig. 6. Prototype of dashboard for pipeline configuration. Every element is draggable where it makes sense and has additional configuration menu if clicked with right mouse button.

E.              Conclusion

Provided prototypes for dashboards and requirements are general guidelines for missing tool development. Other requirements should be defined during development by analyzing feedback from tool users. Still, some new requirements may arise when new tool starts running but at the same time focus should be continuously controlled to avoid development of one more Continuous Integration tool which can be compared to existing ones.

References

[1]     Wikipedia: Symbolic link, Online, http://en.wikipedia.org/wiki/Symbolic_link. [Accessed September 24, 2012].
[3]     Skype: Instant Messaging, Online, http://www.skype.com/intl/en/features/allfeatures/instant-messaging/ [Accessed September 25, 2012].
[4]     Google: Google Chat, Online, http://www.google.com/talk/ [Accessed September 25, 2012].
[5]     Microsoft: Windows Messenger, Online, http://windows.microsoft.com/en-US/messenger/home [Accessed September 25, 2012].
[6]     Martin Fowler, Continuous Delivery. Safary, Books Online, 2011, http://my.safaribooksonline.com/9780321670250/ch01#X2ludGVybmFsX0ZsYXNoUmVhZGVyP3htbGlkPTk3ODAzMjE2NzAyNTAlMkZjaDAxbGV2MXNlYzI= [Accessed August 24, 2012].
[7]     Wikipedia: CruiseControl, Online, http://en.wikipedia.org/wiki/CruiseControl [Accessed September 10, 2012].
[8]     Wikipedia: Jenkins, Online, http://en.wikipedia.org/wiki/Jenkins_(software) [Accessed September 10, 2012].
[9]     Jetbrains: TeamCity, Online, http://www.jetbrains.com/teamcity/features/index.html [Accessed September 11, 2012].
[10]  Apache: Continuum, Online, http://continuum.apache.org/features.html [Accessed September 11, 2012].
[11]  Urbancode: Anthill PRO, Online, http://www.urbancode.com/html/products/anthillpro/ [Accessed September 12, 2012].
[12]  Atlassian: Bamboo, Online, http://www.atlassian.com/software/bamboo/overview [Accessed September 12, 2012].
[13]  Wikipedia: Microsoft, Team Foundation Server, Online, http://en.wikipedia.org/wiki/Team_Foundation_Server [Accessed September 13, 2012].
[14]  Martin Fowler, Continuous Delivery, Safary, Books Online, 2011, http://my.safaribooksonline.com/9780321670250/ch01#X2ludGVybmFsX0ZsYXNoUmVhZGVyP3htbGlkPTk3ODAzMjE2NzAyNTAlMkZjaDE1bGV2MXNlYzI= [Accessed September 20, 2012].


No comments:

Post a Comment