Skip to main content

Continuous Integration Simplified

  " continuous integration (CI) implements continuous processes of applying quality control — small pieces of effort, applied frequently".   - Wikipedia

Is continuous integration  another one of the buzzwords that goes round in the Software industry or does it merit serious consideration.  This post is my attempt to answer the question based on my experiences.

Short Answer -  It is not a fad. If implemented with the right focus and intent, Continuous Integration can help organizations realize immense value in terms of time required to release software and the quality with which the software is released.

Reality Check:   Have seen numerous instances where Continuous integration becomes an exercise in itself. The discussion on the process and how to implement the same extends for months if not years, and still folks struggle to realize any value out of it.  

Every system or a process tries to address a core problem area.   In my view the core problem area that CI tries to address is to "Minimize uncertainties in a software release cycle".   One of the biggest challenges encountered during  a typical software release is  ensuring smooth integration of the code committed by the different developers.  This problem becomes more acute if the size of the team is big and if the team is spread across different geographic locations.

I can tell you an example from my own experience. At one stage, I was managing a software release which involved developers checking in code from Bangalore as well as Sunnyvale.   Managing the integrity of the code used to be a nightmare as we regularly used to see build breakages and the severity of the breakage used to get amplified as the fix would be about a day away due to the timezone issues.

The biggest value I see from CI is that it helps me address these kind of issues and helps build a sense  of predictable behavior with respect to the release process.   In the paragraphs below, will expand on some of the core principles of ensuring a good CI process.   (Many thanks to Martin Fowler, who has elaborated in detail on these).   What I would like to provide is a core set of principles that has helped me in managing releases.  Some of what i am going to say looks very obvious, but believe me, I have been repeatedly been surprised how the obvious gets missed out on a consistent basis.

CI 101:

- Have a version repository  :   This is the first brick to be laid to ensure a good CI process.   All developers check in to the repository and build out of it.   This repository is the only source of truth for all code base related to the product being released.

-  Keep the repository simple -  One of the consistent issues I see with many software developers and managers is the tendency to exercise every feature a tool provides.  I have seen the same behavior manifest w.r.t to version control tools too.   My advice is to keep it simple.  If possible minimize your branching strategy and work always on the main trunks ( have a branch tag for only the releases made previously, to provide hotfixes or patches).

- Check in everything you need for the release into the repository  - Make repository the only source of truth to build your product.  Checkin everything.   This include aspects like, third party libraries, database schema creation scripts, binaries, etc.   If your product uses something which is not from the version repository, then you are missing something big.

- Automate the build - This is important and goes in conjunction with the previous point.  By automating the build, it implies that your product is built in its entirety ( to its final shipable or deployable state) through an automatic build process.   There should never be a need for anyone to manually intervene during any part of the build process.  The automation should include compilation. building of binaries and dependent libraries/packages,  installation scripts, database creation and schema generation scripts, documentation, etc.

- Have a dedicated build setup -   This is  dedicated setup used to just make the builds.  This is a read-only setup where in everything from version control is extracted and built.   None uses this setup to check in anything.   This setup can be used to regularly check out from the version control repository and create the final product.  Control the access to this setup.

- Have a dedicated deployment setup. -  This is the setup where you can deploy the freshly created build package.   This setup will be used to test the sanity of the build generated.  It should be a restricted environment and no ad-hoc fixes should be deployed on the setup to resolve an issue.   The sole purpose of this setup is to test the integrity of the build.    If the deployment has multiple options ( For example: fresh install and upgrade over the previous version), it is preferable to have multiple setups to exercise each of the option.  (Handling upgrades and sanitizing upgrades is easier said then done. Probably warrants a separate blog post for the same)

-  Have a core set of tests to validate the build.  This is where the debate starts and things starts becoming cloudy.  Things like Unit Tests, Functional Tests, Automated test cases, etc are thrown around.   In my view all these are details which helps in improving the CI framework, but the basic pre-requisite is to have a set of core test cases, which would exercise every feature (in the quickest possible time) and serve as the sanity tests for the build.   If it is automated, great, but if you are just beginning,  having a set of manual test cases to test the build sanity is  welcome first step.

- Build and test regularly  - More frequently the better.    But as a rule of thumb, there should be a build at-least once every day.  This build should be deployed and tested for sanity.   This will ensure that the maximum latency of about a day for you to identify a potential issue with the build.

- Focus on Automation -   Last but the most important of them Automate every stage.  If you notice,  we have already covered automating the build process early in the cycle.  That is a essential must have without which you cannot even move forward. But the automation I am talking about now will really differentiate your release cycle from the rest of the crowd.  So here goes the list of things to automate (preferably in the same order):

a) Deployment -  The first thing you would take to automate is deployment.  With this, you will ensure that you have a product ready to test on demand without having to spend any effort on it.  Believe me, it is a great feeling for a tester or developer to come to his table and have a product readily deployed and ready to test. ( It is a greater feeling to know that build has broken - Early heads up is what everyone values the most ).
b) Automate reporting of the build and deployment -  It is really cool to have a report which summarizes the state of the build and deployment sent to everyone involved through an email.   Whenever there is a problem, the problem gets communicated immediately.    This should be a summary report and not a extended log output.  This report can be expanded to include all significant aspects with respect to release.
c) Improve build automation speed  -   One of the core value preposition of CI is to identify issues with the build/release process as early as possible.  To facilitate this, you need really fast build.  What would be great is to refine your build process to ensure that the build gets done in the fastest possible time.  But believe me this is a double edged sword.  The only way you will shorten the build time, beyond a point is by not building something.  Is this good - this is a very specific choice that need to be made on a case by case basis.  The good compromise is always to have two types of build, one quick build, which is an incremental build for the new changes and one complete build every day.
c) Automate Test Cases -   I am not going to get into the debate of  unit testcases vs. functional test cases.  But what should be the focus is on automating a set of tests that would help you validate your build by ensuring sufficient coverage of all features.   This can be a mix of unit tests, functional tests, integration tests etc.  but the core thing to realize is that you are going to use the output of these tests to determine if the product is in a shippable/deployable state

In this post, apart from few mentions of the tools, I have intentionally dwelt on the various tools that are available to make this process simpler.   In reality, there are numerous tools to help out for each of the points  I have talked about.  Will talk about it in a future post.  

Comments

Popular posts from this blog

Dilbert on Agile Programing

Dilbert on Agile and Extreme Programming -  Picked up from dilbert.com - Scott Adams.

Big Data: Why Traditional Data warehouses fail?

Over the years, have been involved with few of the data warehousing efforts.   As a concept, I believe that having a functional and active data  ware house is essential for an organization. Data warehouses facilitate easy analysis and help analysts in gathering insights about the business.   But my practical experiences suggest that the reality is far from the expectations. Many of the data warehousing initiatives end up as a high  cost, long gestation projects with questionable end results.   I have spoken to few of my associates who are involved in the area and it appears that  quite a few of them share my view. When I query the users and intended users of the data warehouses, I hear issues like: The system is inflexible and is not able to quickly adapt to changing business needs.  By the time, the changes get implemented on the system, the analytical need for which the changes were introduced is no longer relevant. The implementors of the datawarehouse are always look

Overview of Hadoop Ecosystem

Of late, have been looking into the Big Data space and Hadoop in particular.  When I started looking into it, found that there are so many products and tools related to Haddop.   Using this post summarize my discovery about Hadoop Ecosystem. Hadoop Ecosystem A small overview on each is listed below: Data Collection  - Primary objective of these is to move data into a Hadoop cluster Flume : - Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. Developed by cloudera and currently being incubated at Apache software foundaton. The details about the same can be found here . Scribe : Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures. Dveloped by Facebook and can be found here .  Chuckwa : Chukwa is a Hadoop subproject dev