Software engineers are lazy bastards (pt. 3)

I decided to make this a three part piece. The first one concerns componentization and following good practices when building software as an argument for Software Engineering as a legitimate Engineering field. The second covers proper testing that all Software Engineers should be following. In this post I'll talk about Development and Operations, i.e. DevOps.

Ok so DevOps became a thing. If I understand it properly, which I'm sure lots of people will say I do not, it's the idea that proper development practices are brought to bear on operations problems. This means things like using source code management systems (Git), having code reviews, following a process around work that needs to be done. These are all great things, and it's way better than way that people did this before. But here's the irony, and something I find kinda sad, computers and software were built to automate things. They were built to run factories, to make it easier to compute flight paths, to help with order fulfillment; I'm not going to list out all the places they help, I hope you get the point. So the irony is that DevOps is needed to help automate the systems that we (Software/Hardware/Computer People) built to help automate away other issues. I can't be the only one who sees this as ironically sad. To really make a point here, I'm going to start with an analogy, trains.

Remember how complicated it was to drive a steam train?

Yeah, me too. You had to fill it with coal or wood, or whatever combustible thing was available at hand. And it was really hot, always sweating, that's why I stopped being and engineer and became a lazy software engineer. Ok, not funny, but still steam trains were really hard to operate. Look at all these dials and levers:

I'm going to guess that the big red lever is the brake (no, no, the one on the right not the left), Oh, and the dial on the left makes it go faster (the heavily used looking one). But seriously, I have no idea. The only thing I'm pretty sure of from this photo is where you put the fuel, and I'm guessing coal.

So, why am I talking about trains as a Software Engineer who knows little to nothing about trains? Because that picture above is what us lazy software engineers have been giving the operations folks for years. In fact, operations became so complicated that the operations folks needed to adopt development methodologies to manage systems at scale, and thus was born DevOps. Which is a great thing, using good standards in operations like SCM, releases of tools, declarative systems, etc. these are all great advancements, but why haven't they always been there? And why are they needed? Because as developers we've given operators so many dials to properly run our software (think on every config your require someone to write, every command line option...).

DevOps should not be necessary

I blame Java for the state of the world (don't get me wrong, I still love Java though Scala is growing on me as a JVM language). Java and many of the interpreted languages out there put us in a state where the developer actively divorced themselves from the system they were building software for. This was great for being able to build things faster, test it once and expect it to work the same on any platform you deploy to after that. I remember when I had to build a common client/server system in C++ with a network stack that was portable between Windows, Linux and SunOS on x86, amd64, and sparc. Having to build something that worked on all of those platforms and think about how it would run under COM in Windows, daemontools in Linux/SunOS was a major pain and caused a ton of bugs where some needed to be debugged after shipping the software to the customer. I do not want to go back to that, so these VM based languages are always going to have a place for portable code, but I do ask myself this question all the time; Do I need portability? I have only shipped software on one platform for the past 11 years, Linux. But I do think one day I'll probably end up needing to support BSD, so let's just say that I have no intention of supporting a non-POSIX OS from here on in my development career (enter interest in Rust). When software engineers experience the pain of running their own code, they will change it so that it's easier to operate. Why? Because at heart, they are lazy, they want to get back to writing code, not supporting this crappy thing in production. As a comparison, look at modern train controls, even I could probably figure out how to stop this or make it go faster:

The point is that Software Engineers have gotten lazy in how their code is deployed and run. There have been great advancements over the last four years in the operations world. Everyone has heard of Docker, some rkt, appc, LXC, LXD, etc, all of these are giving us lazy people easy methods of packaging our software. It's easier to implement actual deployment tests upfront now than any point in the past. You don't need to chroot, it's done for you; you don't need to do any port mapping, you can get a unique IP per container or VM; you don't need to guess about installation, you can use the installation tools to create the container image. What this means is that as a developer there is no reason anymore to hand these jobs off to anyone else to validate the functionality of your system. As a Software Engineer today you should know exactly how your software is going to be deployed, run, executed, what your data persistence requirements are, etc. You can not ignore this, and it's never been easier with all the tools that exist out there now. There are lots of methods, you should research what you want. I'll say this, I've been using LXC for over four years, and containers are the way to go. I can't be happier about the OpenContainer spec that was recently announced.

Where is this train going?

I'm watching these things: CoreOS, rkt, Atomic, Nix. I think that Nix is probably going to be the standard way of declaring system dependencies in containers. I think Nix or Atomic (with OSTree) will be the standard way of managing the BaseOS, but CoreOS is probably good enough for now. Work with your operations or DevOps teams to make this possible, as it will help immensely down the road. In other words, the operating system and application environment should be completely declarative.

Software Engineers are not actually lazy

I know I called software engineers lazy bastards, and that probably stopped half of them from reading these posts, but I actually don't think we're lazy bastards. We got caught up in the methods and complexities of the day and forgot to stay grounded. Basically if you want to be lazy, it's important to be lazy after your software is built and functioning properly. Which means to be a good Software Engineer you need to keep these three areas in mind when building software:
  • Modular and Component based systems (pt. 1)
  • Testing without relying on others (pt. 2)
  • Design your software to be easily deployed and managed (pt. 3, this one)
If you make good decisions in each of those categories, most likely you'll get to go back to being lazy and working on what you really want, and isn't that ultimately everyone's goal?


Software engineers are lazy bastards (pt. 2)

I decided to make this a three part piece. The first one is here if you're interested in reading it. It concerns componentization and following good practices when building software as an argument for Software Engineering as a legitimate Engineering field. In this post I'm going to cover proper testing that all Software Engineers should be following. The final post is on DevOps

Let me start off by explaining why I am calling Software Engineers lazy. This stems from the general principle that people are going to generally do the least amount of work possible to accomplish a given task. Software Engineers are no different; Computer Scientists on the other hand are perfectionists always looking for the most elegant solution to a problem. Perhaps even creating a new theorem or axiom, it's my job as a Software Engineer to understand and utilize these new advancements (and perhaps one day I will create some, I did get a degree in Computer Science after all). Like the CAP theorem or Raft consensus protocol, it's definitely my responsibility to understand the theories behind these and be able to implement them if necessary, but I look at these as things like I beams in construction that can be bought off the shelf. My job as a Software Engineer is to take disparate pieces of technology and put them together to build a larger system. But why are we inherently lazy? Corners will be cut in order to ship software, systems will have components that aren't complete, because we have limited time and money which constrains our abilities to be perfect.

In other worlds, the real world intervenes and so what we need to do is find a way to mitigate issues down the road. This is why I'm so adamant about componentization or modularization in my own code, and any team that I'm leading. If that piece isn't perfect, go back and refactor. This post isn't about that, what it's about is how to make each of those components as high quality as possible. But before that let's discuss the quality of airplanes.

We need some chicken guns!

Chicken guns, created in the 1950s to test the strength of different components of an airplane.
F16 canopy birdstrike test
They're used to check the strength of both engines and windshields in the event that a large bird hits the airplane. It's exactly what it sounds like, a large pressure gun that shoots chickens at a high velocity to simulate a midair strike (like that potato gun you used to accidentally break the window of the garage at your friends house, no, not me...).  This is obviously important, you don't want to discover during flight with 150+ people on board that the windshield or engines can't withstand something that's fairly common in the air. So the chicken gun is used to make sure we're all safe in case of a mid-air strike with a bird. Cool right?

The wingbend test is used to make sure that the aircraft wings aren't going to fall off or apart in mid-flight. It's that fear you have in the back of your mind when the plane is bouncing around in crazy turbulence that one of the wings is just going to pop off and you all go spiraling down to your deaths. Aren't you happy that the wings will bend up to a degree that is so mindbogglingly extreme that you could try a 900 like Tony Hawk on them? But even before this test, they already have subjected the aluminum composites being used to understand their tensile strengths. Testing all the way down to each panel before it's attached to the plane.

There are tons of tests that are run through before a plane can even fly. Checking that all the electrical systems, the flaps, etc., are fully functional. Then they actually take the plane into the air for a battery of flight tests. These tests are critical to determine the safety of a plane. Now imagine for a moment that the tensile strength or longevity of the metal wasn't known before the flight test, that the pressures of the initial flight test shows that the internal ribbing of the plane needs to be replaced. How difficult would that be? How much more expensive is it to replace that at the end, than to realize it at the beginning and just choose a different material in the first place? Code quality should be treated in the same way.

What is quality code?

This question is always answered in a lot of different ways. Years ago I was working on a project in which I needed to modify some of DJB's code. He writes a quality of code that is impressive. C that's platform independent, big-endian and little-endian compatible. It was a really cool experience to see that amount of thought put into writing software, but I wouldn't say it's easy to grok. While I have the highest respect for what he's done, in my case what I think is just as important is writing code which is easily maintainable. Components help with this, but so do tests. A friend and former coworker of mine introduced me to this idea of a testing pyramid. For a basic overview of this, here's an ok blog post. I'm going to modify that in some important ways, especially since I'm not a UI developer, I won't go into UI testing at all (though I would suggest you need this same structure on the UI side, with actual selenium tests or similar at the top of the pyramid). I'm a backend systems architect, <sarcasm>when do I need UI</sarcasm>. But in the vein of fighting the lazy software engineer, I think it's important that we software engineers have built tests around each of these areas, here is the pyramid that I implement my code by:

I'll run through each of these sections, but the important thing is that what this represents is that most of your test coverage occurs at the bottom of the pyramid, with ever smaller numbers of tests flowing to the top. From a code complexity standpoint though, the tests at the top of the pyramid are more difficult to write and maintain, therefor there should be fewer of them. I'll start from the bottom up, in fact I'll spend more time describing things bottom up too.

Unit test your code, fool!

If you're not writing unit tests around every class you write, you are a fool. Yes, you are, I know you're saying, "but that's just a simple function that adds two numbers and returns the result". But if a function as simple as:

int myAdd(int i, int j) {
  int k = i + j;
  return k;

still could have a bug! Hell, technically speaking you have an overflow problem that this isn't dealing with where i and j could be max ints and overflow the size of int. Also, it's just as easy to accidentally write 'return i;' and cause a bug in your most trivial code. I'm not suggesting to go crazy, just write a quick sanity test.

There is one rule that you must follow though when writing Unit tests, they must be clean of any external dependencies. Don't get stuck initializing a massive tree of dependencies, use Mocks for this stuff (if your a Scala or Java dev and you don't know Mockito, then learn it, right now, it will save your life). If you have to launch a DB to make your unit tests work, those are not unit tests (they are more likely functional or integration tests). So don't do that! You just slow down your development speed because you've made it harder to run your tests, and it takes longer to setup your development environment. I follow these rules on unit tests:

  • No external dependencies (i.e. DBs, or other remote services)
  • Don't require shared state between tests (or as little as possible)
  • Always make sure your tests can be run in your IDE with no other actions needed than compiling
  • Test all expected code paths in a method (i.e. you don't have to go overboard with exception cases)
  • If it's expected to be multi-threaded, write some threaded tests (these are fun, and will teach you a lot about making your code testable)

By-the-way, a target of 80% code coverage is generally good for unit tests, more than that and you start making reformatting code more difficult. Also, more than this and there are diminishing returns on the tests themselves, you're aiming for checks to make sure your code is going to function properly in production, with tests based on real world issues. Unit tests are your base set of tests, they test the tensile strength of the materials which your later going to use to put together your larger project. Like the aluminum in the plane, if you discover issues this early in your development, your saving yourself a lot of pain (and time) down the road, and your lazy, right? You don't want to spend any extra time writing code than you need to...

(Quick aside, I dissed Ruby for it's speed in my first post on this, but to be fair the Rubyists really nailed tests living with your code. Rust has taken testing to another level where you can actually include tests in comments, and then have examples that are always confirmed to be uptodate with the code, super cool) 

But functional tests are way better!

Ok, not really. Functional tests are the next order of tests. As an example, this would be where you test your SQL against your DB (or NOSQL against your non-ACID whatever). Functional tests should only test one thing. I was working on a large project before that had a lot of various external components to work with, one being DNS. It used the dynamic DNS protocol to update the DNS records. First the DNS module was a distinct component in the code such that it had one responsibility, talk to DNS. The functional tests only tested this one component, they did not have any other side-affects, e.g. the DB code et. al. were not dependencies for the tests on this component. From our plane example, this is like the wing-bend test. The plane is put together, but specific functions are being tested to make sure it will be ok in the air, sadly I don't think Tony Hawk could skate on any of the code I've written.

Cool, but functional tests have side-affects, don't they? For DBs and DNS or similar persistent state systems, how do you make sure your tests are starting from a known state? Virtualization! VMs are cool, use them. Use Vagrant to setup your VM, use installation (RPM, DEB, Chef, Docker, etc.) scripts to install the packages, cache the installed VM (so that those steps aren't long in the future), and then clean up all persistent data in the VM before running your tests. It sounds like a lot of pain, but the advantages are awesome; you get to figure out at the beginning of your dev work how your going to install and bootstrap each system, and your guaranteeing that everyone testing is starting from a known point. Pretty awesome and saves you a ton of time later.

My rules for functional tests:
  • One component is being tested
  • Always start from a known state by cleaning up persistent information each time (at the beginning, not the end!)
  • Install your external components now the way you plan to in the future (save yourself some time and heartache later, hell you might decide it's so ridiculously complex that you changed your mind about using it)
  • Only test high level functionality, the unit tests cover code paths, these tests make sure that your assumptions about the thing your using are correct.
What's great about these tests is that when you discover bugs in the interaction between your code and the external system (yes, there will be bugs this doesn't catch), you get to come back to these and replicate the bugs here. Isolated directly to just this one component without having to worry about other interactions, overtime hardening this component to ridiculous degrees.

OMG, all my inter-team dependencies just got easier!

If you're following along, then you'll notice something that you just got for free here. Scrum teams are all the rage these days in large development organizations. If you follow these nice boundaries that your functional tests are helping to guarantee, separate scrum teams can actually work on separate components that will be more likely to work together in the end. It allows those inter-dependent teams to pull in each others tests early and use them to help with their integration tests early. Great for helping deliver code faster in larger organizations. This will make your manager really happy, because they get to focus on all that HR stuff instead of technical problems (which is what they love, right?). And happy managers => promotions => more money => happy and lazy software engineer.

Ok, but I still don't know if my whole system works

Yeah, this is getting repetitive. Integration tests are similar to functional, but this is where you bring things together to make sure they all work. Essentially this just makes sure that when you put your DB in place and your DNS service in place that calling your REST API endpoint actually flows through each component and performs the action you expect. And why is this easy to set up? I know it sounds tedious, but you already have your VMs from your functional tests, reuse them! And again, you get to validate your assumptions about how all of these things will interoperate and what connectivity you need between them before production. Again cool, right?

Rules are the same as for functional tests, but for this you are writing a very minimal set of tests. This is the only place where you do end-to-end tests, in the plane this is akin to the full system tests performed before the flight tests.

Do we get to fly yet?

Ok, yes, you can fly now. Smoke tests, these validate that your system, after being deployed in production, is working as expected. I used to think that it's enough to run these post deployment only, but then I realized something, there are all sorts of monitoring pieces that need to run constantly to make sure your application is actually working. Use your smoke tests for your monitoring needs! What's the difference between these and integration tests you ask? Not much. The only rule here is that your smoke tests should not damage your production systems, so design your system and the tests to not over time take down your production instances. Oh, and guess what, you can virtualize your system just like you did for your integration tests to test these locally as well. Virtualization is an excellent way of validating your stuff before ever hitting a real deployment, and you can do it while sitting on the beach disconnect from a network on your laptop. If your not using VMs to test your deployment steps, then your just making it harder on yourself to verify this since you'll need to share time on actual full test beds or production.

But I'm a lazy bastard, remember?

So here's the key to getting to be lazy. What's harder, replacing the ribbing on the plane after it's built, or when you actually choose that material initially? It's the same here, if you have to track down crazy bugs because you never figured out if your stuff actually does the right thing at the most basic levels, debug time in production is much harder. You also didn't give yourself an easy place to replicate production issues to verify a fix before releasing. We've all been there, logging into production systems at 3am trying to figure out what's wrong with the damn system, definitely not lazy, compound that with an enterprise system where you as a developer are not even allowed to touch the production system, so you have to ask someone with production access to check stuff for you, all over the phone (or similar remote technology, because you were happily sleeping at home when this random problem just caused the entire site to crash and of course it was on your on-call night and you'd much rather get back to that dream and sleeping in, and oh my god now I have kids and they'll be up to less that two hours because debugging a system through someone else has already wasted an hour and you've only just gotten the heapdump, because it had to be uploaded to a safe directory so that you could download it, because you're not allowed to log into production! This is definitely not lazy). Remember, at each layer of the pyramid it is more difficult (and thus affects your ability to be lazy) to debug and fix issues as you get higher up the pyramid's stack. So the moral? Don't be lazy at the beginning and skip your testing, be lazy at the end when it all just works. And yes, even after all of this, eventually there will be a battery that ended up catching on fire for (initially) no obvious reason. At least it wasn't the engine blowing up mid-flight...

In the third part of this series I'll get into development's relationship with operations, yes, DevOps.


Software engineers are lazy bastards (pt. 1)

I decided to make this a three part piece. This one is about components and modularization, the next about testing, and then I will have a final one on development and operations.

Not too long ago, 2000 let's say, it was common for developers of software to be so lazy that they didn't even know if their software worked. They wouldn't write unit tests, they wouldn't even bother testing their code. They would throw it over the fence to quality assurance engineers and expect them to say, "Oh my god this is the greatest code ever, and it works perfectly". With the exception of the most simple program, this has never actually been the case, hell even Grace Murray Hopper (go Brewers!) couldn't account for literal bugs in the system. This has often caused great angst among real engineers out there and even today people in the Software industry continue to say that Software engineering is not "Real Engineering". This is too easy of a copout and let's people off the hook for designing bad software, or using bad techniques.

It's not like we're building Bridges!

This is a classic argument. Bridges are ridged and have to be designed up front to meet the engineering requirements to span what they are being built for. They are static entities that never change, right? Well, that's not totally accurate, even great engineering feats aren't perfect and need to be fixed, look at the new eastern span of the Bay Bridge. There are a bunch of issues that weren't properly accounted for and now need to be fixed. So, bridges aren't perfect, and need to go back and be fixed, sounds a lot like software. To try and match the way that buildings and bridges are engineered, the Waterfall Method was created (I remember this being taught to me in college as the new cool awesomeness for development). It really seemed like the greatest thing ever.

Years later I started working at a large company, when I got there they hadn't shipped their planned release for over a year. It was stuck in this cycle of Dev -> QA -> Dev -> Product -> Dev -> QA that was never ending. So this was first hand experience of this awesome technique that I was taught in college, and it's abject failure to deal with real world issues. What's the problem? Software has a million moving parts. It's dynamic by nature. This is why the Bridge analogy breaks down so quickly, but it's just the static bridges that don't match this. Not all bridges are made this way, check out the living bridges in India.

What's my point? The issue with Software Engineering isn't that it's not actual engineering, it's that it's Dynamic Engineering. It's constantly adapting and changing to new uses and needs. When you built that website tool it only expected one user at a time, then it blew up and you had like a whopping ten, so you decided it needed to run in parallel. Did you go back and rewrite the entire thing? No, you went back and made it parallel so that all those ten people weren't waiting on each other when they went to your site. Now of course we deal with thousands of connections at a time in an application and so that simple synchronization you did needs to give way to optimistic locking and atomic reads/writes to drastically increase the performance of your application. No locking; again, did you rewrite the entire thing? No, you fixed the weak spots (well unless you wrote it in Ruby and realized that you needed it to actually run fast).

Along came TDD

I remember when Test Driven Development came around. If you know all your inputs and outputs, you can just write the tests for it and then the code will just flow to match your test cases. The problem with this is it shoves Software Engineering back into this idea that it is static and non-changing. The one case I've found it useful is parsing and finite-state-machines, you know all the inputs and expected outputs, so it really does help you validate and write your code faster (let me know if you know of other really good fits). In most cases though, you actually don't know what your inputs are until you design the interface and then try to write the code behind it, only to discover that the interface you designed doesn't allow for nice code to be written after that. In fact, I have found TDD to actually be about as slow as Waterfall in terms of delivery. Ok, so TDD doesn't work, but it did pass on something really important, Test Oriented Development. This term is rising in popularity, but what it means is that the code you write is easily tested through the use of Mocks and Unit tests. So why is this an important development in the field? It shows that there is a discipline to coding, and technique which allows for software to be verified to be correct within whatever bounds you've declared. Ok, now we're getting closer to becoming engineers.

Dude, where's my Bridge?

In the American Civil War (War of Northern Aggression for you Southerners) the Confederate army burned many railroad bridges to try and slow the advance of the Union armies. The Trestle bridge construction was used to build bridges quickly and reliably in order to get the Union trains moving. So this is where we come to software as components in a larger system. Object-Oriented design, code reuse, I'll avoid Microservices for right now, maybe a future post. The point is that Components, like the trestle, allow for systems (bridges), to be built faster. The nice thing about this is that like an individual trestle, any component that fails or needs to be changed can be done so without rewriting the entire system. Components must have strong interfaces by which it is known which is in use in software (one reason why I've been attracted to OSGi in the past). So we can build bridges with individual components because each one can be verified individually before inserting into the whole. Unit tests on your components that cover what the expected input and outputs of the components mean that now you have a easy method of verifying a rewrite of that component, and any future rewrites after that.

So am I an engineer?

I build things, I follow strong rules about how I build these things, and I verify that the things I build are correct and fully functional. The entire point of writing this is that I find it annoying that people keep saying that we software engineers are not engineers, that at best we're designers or worst hacks. This is also used as an excuse by lazy developers to not put in proper design and controls in the software they build. This is not true and it's possible to build software both quickly and with a regard to future changes when required. Agility is key and the organization needs to be amenable to Agile development methodologies, with constant iteration to hone in on the ideal system. I follow these rules when building software:
  • Use Components to create strong Separation of Concerns and boundaries
  • Unit and Functional tests to verify those Components
  • All data is Immutable by default for ease of threading
  • Any system operation or high level operation should be idempotent
What these allow me to do is follow fast iterative design. Create a quick top level design, use that to guide the development of each component down the stack. When guiding large architectures across teams, this is essential. Get initial code out as quickly as possible such that the real world stresses can be discovered and fixed as issues come up, and because you have test harnesses and good boundaries around your component it's easy (ish) to replace. I am a Software Engineer, I follow strong engineering practices around what I do, do you?

What I've left to the reader, proper test design (checkout TestNG for Java and Mockito) and methodology, good Agile development methodologies (pick one). Do your own research.

I decided to make this a three part piece. The next one deals with testing if you're interested in reading it. The final is about DevOPs