Posts

Showing posts from 2014

Docker - Things I'd like to see fixed

Recently I've been lucky enough to spend some time with Docker.  I've really enjoyed it and would definitely consider myself a fan.  However I think a couple of issues really need fixing. Docker registries should treat images as immutable entities Currently docker registries (including docker hub) allow you to overwrite an image with the same tag.  This means you can't be sure that an image hasn't changed since you last pulled it.  This is a nightmare from a build and deploy point of view.  I can imagine a sequence of events whereby a small low risk change is pushed through the environments with plenty of testing, only to fail in production when suddenly a modified and incompatible dependent image is pulled and deployed for the first time.  Sadly it seems that this  issue has been closed  - presumably without a fix. Registry locations and Image tags need to be separate entities Uniquely identifying a speci...

Does a long notice period hold back your organisation?

My last role had a long notice period of three months.  Although long notice periods are intended to protect an organisation, I think they can act as a detriment. I remember being concerned when I took on the role that a three month notice period could make job hunting tricky as and when I wanted to move on.  I could easily imagine this being a deciding factor for a team that wanted someone to start as soon as possible, no matter how suitable I was. " Don't worry, you can negotiate it down when you want to leave " ... I was told.  This didn't really console me as broaching this subject would reveal my intentions of leaving before the new role was agreed.  In any event, I signed the contract and agreed to the notice period. When it was time for me to move on (after four great years at said company) I decided to hand in my notice before getting my new role agreed.  I also elected to work my notice period in full and did not ask to leave early. I thought I ...

Change the characteristics of your complexity

I was lucky enough to attend a talk by John Turner from Paddy Power  as part of London Continuous Delivery  meetup.  The talk was very informative and gave the audience some great insight into rolling out PaaS and continuous delivery in the real world. In order to innovate faster, Paddy Power made a number of changes to "re-orient" the company and put the engineer team first.  To help speed up engineering, where possible, decisions on architecture were removed and replaced with convention. Pipeline An abstract (no tools specified) pipeline was defined with the following stages: 1. Code commit Trunk based commits with feature switches. 2. Build Component and Integration tests Code quality rules (enforced by sonar ) 3. Acceptance Tests Run against the application/service as delpoyed in a VM as provisioned by Gigaspace's Cloudify product.  Config for the VM's dependencies (e.g. java installation) is defined in a blueprint .  Acceptance te...

Pull Requests encourage critique, even in small teams

A member of our team recently suggested that we use pull requests in our development workflow.  Many points were made for and against in our debate.  After only a short time of adopting this change, the major benefit I have seen is the increased critique of the team's output. The debate The decision to use pull requests was certainly not a given and gave rise to an interesting debate.  Many arguments were made for and against which I've tried to summarise below. For pull requests Increased knowledge sharing - Non reviewers can view pull requests just as easily. Simpler commit messages - Each pull request is (hopefully) created "with the reviewer in mind" . Provides a forum for commenting on work. Increase in code quality Against pull requests "It's too much process and will slow things down" As a small team I think we take pride in our lightweight process of pair programming and informal ad-hoc code reviews.  Adding extra checks a...

Alerts should treat your Ops team like the police... only contact them in real emergencies!

Every time an alert notifies your Ops team, there should be a real problem.  If there isn't a real problem, you're wasting their time, adding confusion and making it harder for them to respond to real incidents. The problem we have: Too many alerts for non-issues and non-critical issues. I sat with a member of our Ops team recently and was horrified to see how many notifications they received for our monitoring systems.  At times, it was almost impossible to make any sense from them due to the sheer volume of emails filling their inbox.   Why is this such a problem? Multiple reasons: Each notification comes at a cost of lessening the impact of all other notifications.  Take this to the extreme, where Ops receive hundreds per day, the impact of an alert can be almost zero. A false alarm is a distraction and can waste valuable minutes in debugging real issues. How did we get here? The short answer is: by diligently adding more alerts but not dilig...

Why run browser based acceptance tests as monitoring checks?

So far the services I have been involved with have had acceptance tests and monitoring checks.  For various reasons the tests and checks have been designed, developed and run in separate worlds.  This approach has lead to issues falling through the cracks and remaining undetected in live for too long. Here I will explain our first pass at joining these two separate suites... running browser based acceptance tests as monitoring checks. What we've done in the past... Separate tests and checks Acceptance tests A good suite of acceptance tests should test your service end to end, verify external code quality and run against real external dependencies.  Some acceptance test suites can tick all of these boxes without relying on browser based tests, this is great as much complexity is removed.  For other test suites, a browser is essential.  In past projects, the browser based acceptance tests have run against various non-production test/integration environ...

Maven's great, before you get annoyed with a feature, find out why it was implemented.

I have always liked Apache Maven, however I recently found a few issues with it that irritated me.  This post was going to moan about them, but after further reading I have come to realise that these issues are not as simple as I first thought.  They are in fact conscious decisions associated with a very complicated problem domain. Fist off, why is maven great? The best thing about maven is the standards it imposes on you.  This has the obvious downside of it feeling restrictive at times.  However, the benefits are huge.  The standard directory layout , standard build phases and of course the standard dependency management have been around so long now that we tend to take them for granted.  Unless a project uses custom or obscure plugins or overridden many default settings, each project looks and builds in a familiar way.  As mentioned on maven.apache.org, " ... [it's] only necessary to learn a small set of commands to build any Maven project ...

SAML feels like a missed opportunity

"The nice thing about standards is that you have so many to choose from" -  Andrew S. Tanenbaum This quote is very appropriate for Single-Sign-On and specifically SAML.  Here I will discuss why SAML is a great protocol for point to point integrations, but can get very complicated very quickly once you take it beyond that. Single Sign On - Why is it so hard? Single Sign On (or SSO) can be described very simply, to quote wikipedia "...user logs in once and gains access to all systems without being prompted to log in again at each of them". This boils down to three different entities who trust each other directly and indirectly.  A user  enters a password (or some other authentication method) to their identity provider (IDP)  in order to gain access to a service provider (SP) .  User trusts IdP, SP trusts IDP so SP can in-turn trust user. This seems so simple, however if you are a service provider and want to integrate with many IdPs (e.g. twitt...

Notes & Learnings from Q Con London 2014 - Day 3

Image
Gunter Dueck - The World after Cloud Computing & Big Data Gunter is a funny and intelligent man with a delivery style I would compare to that of a stand-up comedian.  Some of his content was on dangerous ground, but he also made some very interesting points. Gunter showed a diagram similar to the above which illustrates the choices we face when creating an IT solution, which I'm inclined to agree with. He also showed another diagram which I'll re-create in list form.   Creative. Skilled. Rote work. Robotic work. Gunter made the point that work starts off at the top of this list and gradually works it's way down until eventually it's fully automated.   What I took away...  Make sure your work is as close to the top of the list as possible. Akmal B Chaudri - Next Gen Hadoop: Gather around the campfire and I will tell you a good YARN This talk was aimed at Hadoop novices which was perfect for me and also the rea...

Notes & Learnings from Q Con London 2014 - Day 2

Tim Lister - Forty Years of Teams Or... "Forty years of playing well with others".  A really uplifting talk by Tim introduced us to day two, where we looked at Tim's very interesting career.  The alternative title was given because "one person cannot do anything substantial" and even if you could it's more satisfying saying "we did this!" His IT career started by chance, when he found himself living with Fred Wang at University.  Fred's father, An Wang, was founder of Wang Laboratories  which meant him and Fred had access to more computing power than his entire University.  They used this to analyse data from their fruit fly breeding as part of his Biology degree. Tim was also lucky enough to be mentored by computer scientist  Michael Jackson  who was a "software poet" and produced code so beautiful and lucid, it was as if it were written by aliens. The term "Best practices" can undermine the intellectual wor...

Notes & Learnings from Q Con London 2014 - Day 1

I was lucky enough to go to Q Con London 2014 .  Here is an update of my experience from day 1. Damian Conway - Life, The Universe and Everything Damian is an amazing speaker and made his already fun, interesting, geeky subject, even more fun interesting, and geeky with his great presentational skills. He showed us Perl code (and later Klingonscript) that not only implemented the game of life but also (kind of) disproved Maxwell's Demon . We were also shown an example of a Turing complete machine which was implemented using game of life made by Paul Rendell .  Video also on youtube . This talk reminded me of another cellular automaton I encountered many years ago whilst studying Genetic Algorithms at University.  I am pleased to say that the creator ( David Eck ) has converted the original Java Applet version of his program Eaters, that so impressed me, into javacript available here . What I took away... Coding can and should be fun. Daniel Schauenberg - ...

Client Side vs Server Side Session

Image
We recently looked at replacing our legacy session management system with a new one.  During this analysis, we came close to choosing a client side session but eventually concluded server side was better for us.  Here's why... Client side session In this model, all session state is stored in the client in a cookie.  The benefits of this are you don't need to worry about persisting and replicating state across nodes, session validation is lightning fast since you don't need to query any data store which means it's super scalable.  The session cookie must obviously be tamper proof (to prevent people creating a session of their choice) which is achieved by signing the cookie using asymmetric cryptography. The signing of a cookie value uses the private key, the validation uses the corresponding public key.  Our idea was to try and keep the private key as private as possible by storing it in memory only.  Each node (4 shown below) would create a new priva...

Lessons learned from a connection leak in production

We recently encountered a connection leak with one of our services in production.  Here's what happened, what we did and the lessons learned... Tests and Monitoring detected the issue (silently) Our automated tests (which run in production for this service) started failing soon after the incident occurred.  We didn't immediately realise since they just went red on the dashboard and we just carried on about our business.  Our monitoring detected the issue immediately and also went red but crucially didn't email us due to an environment specific config issue. The alarm is raised A tester on the team realises that the tests had been failing for some time (some time being 4 hours... eek!) and gets us to start investigating. Myself and another dev quickly looked into what was going on with our poorly  dropwizard  app.  We could see from the monitoring dashboard that the application's health check page was in error.  Viewing the page showed us tha...

What I learnt from the Phoenix Project

Having just read this book - The Phoenix Project - A Novel About IT, DevOps and Helping your Business Win - Gene Kim, Kevin Behr, George Spafford -  I'm keen to express exactly what I learnt so I don't forget its important messages. The book follows Bill, a "Director of Midrange Technology Operations" who is forced into a more senior role of "VP of IT Operations".  As soon as he is given this job he's on the back foot trying to work out why stuff keeps breaking.  As the book goes on, we get the impression of an IT department stumbling around from disaster to disaster in complete and utter disarray.  Bill guided by his khaki pants guru Erik, turns the place around. Here's what I learnt... Find your bottlenecks -  The books message here is that any improvement made anywhere besides the bottleneck is wasted.  This makes perfect sense and comes from comparing IT to a manufacturing pipeline and applying the  Theory of Constraints .  Not onl...

The more people contribute to a team's output, the closer they should be to the team.

Everyone contributing to an IT project, ideally needs to be on the team responsible for delivery.  When this isn't possible, the more they contribute the closer they need to be to the team.  The above point probably seems obvious to the point of being redundant to most people.  I'm sure that many teams (like mine) are structured with a healthy enough mix of skills that they can be mostly self-sufficient. With our team of testers, developers, infrastructure people etc etc I too would have thought we had this base more than covered.  However, beware that contributors to a team's output come in many different guises.  I feel that we recently suffered on a project due to a key contributor being very distant from the team. The project involved adding a new user journey to ft.com.  The UX team had designed a user journey, in isolation from the development team, which we were to implement.  The user journey that had been created was great, however, th...