Notes & Learnings from Q Con London 2014 - Day 3

Gunter Dueck - The World after Cloud Computing & Big Data

Gunter is a funny and intelligent man with a delivery style I would compare to that of a stand-up comedian.  Some of his content was on dangerous ground, but he also made some very interesting points.

Gunter showed a diagram similar to the above which illustrates the choices we face when creating an IT solution, which I'm inclined to agree with.

He also showed another diagram which I'll re-create in list form.  

  1. Creative.
  2. Skilled.
  3. Rote work.
  4. Robotic work.
Gunter made the point that work starts off at the top of this list and gradually works it's way down until eventually it's fully automated.  

What I took away... Make sure your work is as close to the top of the list as possible.

Akmal B Chaudri - Next Gen Hadoop: Gather around the campfire and I will tell you a good YARN

This talk was aimed at Hadoop novices which was perfect for me and also the reason as to why I didn't "get" the joke in the title.  Akmal is from Hortonworks which is a company (similar to Cloudera) that offer professional services to support the Hadoop framework.

Big Data 
It now takes only two days for the world to store as much data as it stored from the beginning of time until 2003.  The three Vs of big Data:

  • Variety - unstructured and not fit for a relational DB.
  • Volume - also keeping for a longer duration.
  • Velocity - generated at very high speed.
All of this data, in it's many different forms (Sentiment, Clickstream, Sensor/Machine, Geographic, Server Logs, Text etc etc) is useful to us.

Hadoop is a framework for solving a data-intensive process.  It's fast for large jobs but not small ones.  Hadoop can be integrated with many of your existing data stores and doesn't have to replace anything to add value.

Hadoop 2.0 consists of the following modules:

  • Hadoop Common
  • HDFS - The Hadoop Distributed File system.
  • YARN - A framework for job scheduling and cluster resource management.
  • Map Reduce - For processing large data sets in parallel.
Pig - An extension of Hadoop that simplifies ability to query large data sets on HDFS.
PigLatin - A way to query the data.  Good for people performing ETL.
Hive & HiveQL - Like SQL so less of a learning curve.  This gives a view of the data that is like a relational DB. 

HDFS Name Node - Keeps track of where data is.
HDFS Data Node -  Stores he data.

The exact same data can be stored on multiple nodes for redundancy.  This means loosing one node will not loose your data.

What I took away... Hadoop is not a single framework to learn, it's many related frameworks that seem to be changing rapidly.

Joakim Recht - The Mean & Lean Pipeline

Joakim talked about how Tradeshift has changed as a company and how it's build pipeline changed with it.  He summarized the company's early days (when it was a startup) as:

  • Not enough time.
  • Not enough people.
  • Not enough money.
  • Too many crazy ideas.
  • Too many requirements.

Joakim said that the perfect build pipeline should have the following characteristics:

  • Stops bad code going live.
  • Fast.
  • Ensures consistency
  • Fully automated
  • Creates nice screens and buttons to click.
  • Allows you to talk about it on conferences

...however, setting all of this up takes time and effort!

The development/infrastructure/build-pipeline looked like this in 2010:

  • Java backend.
  • All on Amazon.
  • Drupal front end
  • Subversion - no branches - everything on HEAD.
  • Hudson to run tests.
  • Deploys to prod - semi automatic.
  • Little docs but Lots of tests (which define behaviour)
Moved from Subversion to git.

In an effort to improve UI tests, they migrated away from "fragile" Selenium to Geb and Spock (Groovy based BDD framework)

By Mid 2011 the number of teams and grown which made keeping track of build status on branches was hard.  They custom made a nice build screen which collected build info.  The code for this is not in an open-sourceable state!

New Office in San Francisco
Starting the new office could have gone better for Tradeshift.  I say this because the "new office" slide featured an animated gif of someone putting a knife in a toaster and then being showered by burning hot components in the ensuing explosion.  Having distributed teams working on the same code base sounded like a big cultural change which demanded technical changes to their build pipeline.  More consistency and rules were enforced with Jenkins.  Along with this came mandatory code reviews on github (some requiring a specific group of reviewers, e.g. DB changes).

Other recent issues can be found in this blog post.

What I took away... The better your build pipeline, the more you have to invest in it up-front.  As teams increase in size or become more distributed, the better the build pipeline has to be.

Glen Ford - Lean Under Pressure

Glen (chief architect at Zeebox) clarified the meaning of lean and pressure:

Pressure "The use of persuasion, influence or intimidation to make someone do something."

Lean (in software) -

  • Eliminate waste
  • amplify learning
  • decide as late as possible
  • deliver as fast as possible
  • empower the team

All of the factors above can create tensions within the team as that is where the compromises on the above have to be made.

Tensions within a Startup (I personally think this can be extended to any work) - What you think you need vs What you can get away with.

The Story

  1. Two founders have an idea.
  2. They create a proof of concept.
  3. Investment gained.
  4. To create the idea, they hire (brings challenges).
  5. To deliver as quick as possible, they split the big team up (divide and conquer). 
  6. Inconsistencies creep in with "the idea".
  7. They then slimmed down (removed some management).
  8. Set a goal which was unattainable - Managers like doing this generally (according to Glen), but in a startup there is a notion that you work very hard.
  9. The unattainable goal inevitably results in failure.
  10. Team demoralised - some burnt out.

They go back to the drawing board and try again - this time with better results.

  1. Teams re-organised - cross functional product teams to deliver vertical slices of the idea.
  2. Fostered a culture of urgency but NOT panic (attainable goals).
  3. Increased learning and knowledge sharing with things like lightning talks.
Other things which helped:

  • You build it, you run it (no ops team).
  • Encourages delivering small improvements - Understanding operational costs and being open with them.
  • Garden architecture - However, you must accept the need to prune and shape as you go along.

Final point: Don't be disheartened.  Learn and adapt.  Enjoy the journey as it never ends.

What I took away... Tensions in a team are inevitable as it's where the many compromises have to be made. Cross functional product teams seem to work everywhere.

Volker Pacher & Sam Phillips - How Shutl delivers even faster using the Neo4J, the Graph Database

A very interesting talk on the architecture of Shutl.  Their business provides a point to point delivery service which is an alternative to the standard Hub & Spoke" model (i.e. big distribution centres where packages are sent to and dispatched from) which is offered by the big companies such as Royal Mail.

Generating a quite for a user involves generating many 

Russell Miles - PaaS - Epic search for Truth

Russell explained that a Platform as a Service can promise (and deliver) a huge number of services for you to take advantage of.  PaaS can offer the following:-

  • Database
  • Integration
  • User Management
  • Development workflow
  • Build services
  • Health Monitoring
  • Multitenency
  • etc etc etc
All of this looks and sounds so great that... "Anyone on the golf course will buy it!"

The risk of PaaS
Libraries will do what you want but frameworks will say "work my way".  A PaaS is a framework which inevitably reduces your flexibility.  A PaaS needs to be as changeable as the software you are running on it. 

How to stay flexible with PaaS?
  • Ensure that the barrier to entry for your PaaS is low.
  • FaaS - Fail as a Service - This means apply Darwinian evolution.  Experiment with multiple PaaS as opposed to just using one for everything.

What I took away... PaaS offers many benefits, but beware of the hidden coupling to your software.


Popular posts from this blog

Lessons learned from a connection leak in production

How to connect your docker container to a service on the parent host

How to test for connection leaks