The Twelve-Factor App

cartman82

The contributors to this document have been directly involved in the development and deployment of hundreds of apps, and indirectly witnessed the development, operation, and scaling of hundreds of thousands of apps via our work on the Heroku platform.

This document synthesizes all of our experience and observations on a wide variety of software-as-a-service apps in the wild. It is a triangulation on ideal practices for app development, paying particular attention to the dynamics of the organic growth of an app over time, the dynamics of collaboration between developers working on the app’s codebase, and avoiding the cost of software erosion.

So basically, the author of Heroku platform wrote a little document, describing what he thinks are the best practices when developing distributed service-oriented apps.

Give it a read, it's clearly written and not too long.

IMO this is, on average, a pretty solid guide to architecting modern service-oriented apps. I figured most of this shit through my own mistakes. Where I'd made different choices, I mostly wish I read through this beforehand and followed their advice.

Of course, I don't agree with everything.

###My biggest beef:

The Twelve-Factor App

The twelve-factor app stores config in environment variables (often shortened to env vars or env). Env vars are easy to change between deploys without changing any code; unlike config files, there is little chance of them being checked into the code repo accidentally; and unlike custom config files, or other config mechanisms such as Java System Properties, they are a language- and OS-agnostic standard.

IMO env is completely unsuited for storing any kind of sophisticated config I found my apps needing. Race conditions with how these values are overwritten can really ruin your day here. Not to mention you need to set up your own custom parsing if you need to store anything more complicated than strings and numbers.

I'm personally using a combination of unversioned json files and SQL database. In the past, I also played with xml, code files, redis, env and command line args. All have their faults, and env is definitely not the exception.

###My second biggest beef:

The Twelve-Factor App

A twelve-factor app never concerns itself with routing or storage of its output stream. It should not attempt to write to or manage logfiles. Instead, each running process writes its event stream, unbuffered, to stdout. During local development, the developer will view this stream in the foreground of their terminal to observe the app’s behavior.

OK, but that means you can't output any information richer than plain text. To separately write tracing info into a file and just short messages to the console. To dump a bunch of variables every time you call a function and store these separately. To pretty print stuff for the console one way, and for the text file the other way.

I can see the appeal of just having one output stream, but IMO you lose too much flexibility in that trade off.

###The missing piece

The Twelve-Factor App

They are careful to point out you must be able to easily move between releases, revert to a previous release in case of problems.

But I don't see any mention about the database. The hardest part with this setup is if update included a migration in the database schema. In my experience, if your update breaks, you better code fast to fix the problem, because there's usually no going back.

I was hoping they'll have some smart advice on that front, but nothing.

Stuff I agree with

Full setup in the dev environment, no "shared DB server" if you can help it
Static linking / bundling everything you can
No rails-like "environments", just granular configs
"Resources" paradigm
Deploy process in 3 stages. I don't use this, but I like the idea.
Stateless self-contained processes (eg. no apache)
Use reverse-proxy to expose services
Match dev environment closely to production

ben_lubar

@cartman82 said:

Stateless self-contained processes

Wouldn't that be less efficient than a process that handles multiple requests and keeps some cache between them?

cartman82

@ben_lubar said:

Wouldn't that be less efficient than a process that handles multiple requests and keeps some cache between them?

Yes. It's the trade off between performance and scalability. They are almost always opposed to each other.

ben_lubar

You could also run multiple of my process on different machines, so it's not so much of a trade-off as it is giving up performance for nothing.

Maciejasjmj

@cartman82 said:

The twelve-factor app stores config in environment variables (often shortened to env vars or env).

That's just horrible. All you get is a very unsophisticated and global KVP string store, you need to fuck around with prefixes if you want to avoid conflicts, and having two such apps running on the same machine with different configs is out of the question.

@cartman82 said:

A twelve-factor app never concerns itself with routing or storage of its output stream.

Also horrible. That means zero login granularity unless you set up a heavily complicated system to parse and split the log levels to different targets. I could see it if there was a standardized way for the application to output logging metadata (such as "this is a trace message, and it logs method entry") so that you can have a tool that basically shifts the logger configuration file out of the application, but I haven't really ever seen such thing.

@cartman82 said:

The hardest part with this setup is if update included a migration in the database schema.

Well versioning the database just fucks you over in 100 different ways depending on how you approach it, but there's pretty much no way to do it right. We've tried pretty much everything, from hand-writing migration files through Visual Studio database projects up to letting Entity Framework pretty much take over the database, but the bottom line is always "shit's gonna break eventually". You can't rollback to the point where the table had a column you dropped and have it fill itself back with correct data.

As for the "stuff I agree with" - most of it is pretty much "duh" to everyone who's ever written any code. One thing I take issue with is the whole "each app must be its own webserver" thing - it smells of reinventing the wheel quite a bit.

cartman82

@ben_lubar said:

You could also run multiple of my process on different machines, so it's not so much of a trade-off as it is giving up performance for nothing.

Depends on what you mean by "a process that handles multiple requests and keeps some cache between them". That kind of thing might well fit in this architecture.

His point is mostly, don't count on things being permanently anywhere else besides configured databases and other shared resources.

@Maciejasjmj said:

and having two such apps running on the same machine with different configs is out of the question.

Under their rules, you would have each instance isolated, running with its own user/container/machine, so it's less of a problem in that regard.

cartman82

@Maciejasjmj said:

As for the "stuff I agree with" - most of it is pretty much "duh" to everyone who's ever written any code.

I've seen every one of these broken by myself or my colleagues at one point or another.

HardwareGeek

@cartman82 said:

each running process writes its event stream, unbuffered, to stdout.

LoERINgRFsOR O m mewiessthss uagnbauffe egered frou frtput foromm omm pultiple a prsyncrohroocnous prceocesessesss as 2r1.
e fun to.
debug.

tar

You could make the case for having an environment variable which locates the config file the app is going to use. I suppose. That way, you can copy the checked-in config elsewhere and tailor it for your needs.

For t3k4 I have a commandline argument that specifies where the config file is, and I also made a little wrapper around git pull which does a diff/patch against the checked in config and attempts to migrate any new changes the the live config. You are then free to manually inspect the new config file and put it in place if it's good to go.

cartman82

@tar said:

For t3k4 I have a commandline argument that specifies where the config file is, and I also made a little wrapper around git pull which does a diff/patch against the checked in config and attempts to migrate any new changes the the live config.

Yes, CLI argument is more transparent than magical hidden environmental variables.

Also, I like the idea with diff, it never occurred to me I could do that. I'm closer to the mindset of, write a little script to update config in production, when needed.

PleegWat

@HardwareGeek said:

LoERINgRFsOR O m mewiessthss uagnbauffe egered frou frtput foromm omm pultiple a prsyncrohroocnous prceocesessesss as 2r1.
e fun to.
debug.

I recently moved some high-detail logging back to stdout. Log writer daemon couldn't handle the load… that thing needs a better solution (among other things, not including DB work in the main thread), but there's always piles of such things to do.

dkf

@PleegWat said:

I recently moved some high-detail logging back to stdout. Log writer daemon couldn't handle the load

You don't want to keep high-detail logging switched on for long periods of time in the first place. But that aside, the logging configuration should have the option to log to stdout; what you shouldn't do is just hard-code the dumping of all that to stdout precisely because then you then just get everything vomited out instead of just the interesting bits.

flabdablet

@Maciejasjmj said:

having two such apps running on the same machine with different configs is out of the question

Not really. Every process gets its own instance of the environment, so at worst you'd need to set up a little launch wrapper for each app that sets its env up properly from something sane (like a config file) before launch.

Personally I can't see what's wrong with config files, as long as the apps come with some reasonable way to specify explicitly where they need to look for config if not using the default.

calmh

I'm kind of torn on this point. On the one hand, environment variables are kind of clunky, and maybe especially so on Windows. On the other hand, every little thing having a config file in its own unique slow flake format (ini, yaml, toml, JSON, xml, plist, ...) has disadvantages too.

Onyx

@calmh said:

ini, yaml, toml, JSON, xml, plist, ...

Are all well-defined, AFAIK (some of them I never messed with so I don't want to claim it 100%).

What's special about them? There's hundreds of parsers for those and they are so easy to load and work with it's not even a concern. The only thing "special" about any config in one of those formats is that every application stores its own set of values relevant to that application and that application alone. Which is kinda the point, wouldn't you say?

Maciejasjmj

@flabdablet said:

that sets its env up properly from something sane (like a config file)

Then what's the point of having env-based config when you need to parse a config file anyway? This just doesn't make sense to me. I see zero advantages of that approach, especially when you're running in containers and each app needs to worry about its own environment anyway.

calmh

@Onyx said:

What's special about them?

That the user needs to know a bunch of them. But yeah, like I said, pros and cons.

flabdablet

@Maciejasjmj said:

Then what's the point of having env-based config when you need to parse a config file anyway?

Not much. As I said, I can't see what's wrong with config files.

I guess with env-based config, by taking the config file parser out of the app and handballing it to a launcher you're separating any config file parsing concern from the whatever-it-is-the-app-is-supposed-to-do concern. That might be useful, if you've got a suite of cooperating apps that need to share config in order to interwork properly; you could have file-based suite-level config for the sake of DRY, and your launchers would parse that and hand your apps the pieces they need via env. But that's a bit of a stretch.

PleegWat

@dkf said:

the logging configuration should have the option to log to stdout

In a common case of NIH¹ I don't actually have any logging configuration, and am stuck with sensible defaults. Those used to include "To the logfile, except when running from a terminal, then to stdout". I recently changed that to "To the logfile, and when running on a terminal also to stdout" which allows me to identify many key things done from the command line after the fact.

The problematic high-detail trace logs come out at hundreds of thousands of lines per second, which a >tmp.log can handle but the daemon can not, and that slows everything down because all the logging calls are blocking.

¹ Why use a library if you can write your own in a couple hundred lines.

Maciejasjmj

Maybe that's better what I've got, where I have no idea what to trace and what not to trace, so I inevitably end up with tens of thousands of messages per second making the trace level unusable, and then go around fixing it.

(pro tip - surrounding your single-property validation procedure with a disposable entry/exit logger is a rather terrible idea when you're importing several thousands of entries...)

cartman82

@Maciejasjmj said:

Maybe that's better what I've got, where I have no idea what to trace and what not to trace, so I inevitably end up with tens of thousands of messages per second making the trace level unusable, and then go around fixing it.

Very useful pattern I discovered for this:

if (dbg.trace) dbg.trace(thisFunctionToString(this))

During debugging, this is on, so you get trace messages. In production, the switch is off, so the call is short-circuited, and there is no performance penalty.

My hair stands up when I see pattern like this:

dbg.trace(thisFunctionToString(), complicatedMetadataGeneratingCall());

Maciejasjmj

@cartman82 said:

During debugging, this is on, so you get trace messages. In production, the switch is off, so the call is short-circuited, and there is no performance penalty.

Well, obviously. NLog, which I use, simply uses a lambda, so it's like 4 characters more.

My logging is still a bit WTFy - it seems like there's no way to fully log entry/exit to the function with arguments and a result and avoid the boilerplate of having an using block like so:

int someFunc(string arg1, DateTime arg2)
{
    int result;
    using (var log = LoggingContextFactory.GetLoggingContext((x, y) => someFunc(x, y), arg1, arg2, () => result))
    {
        //your code here
    }
}

without a magic AOP framework (which we can't use).

Also, even if your call costs zero when not tracing, it still costs a DB/file hit when tracing - so that means if you go overboard, you can't trace at all because every click around the application spews out a million of log messages.

apapadimoulis

@calmh said:

That the user needs to know a bunch of them. But yeah, like I said, pros and cons.

I have yet to write a Configuration Files Done Right soapbox, but I've written it up enough times and done talks on it, etc.

The key idea is that configuration needs to be changeable by someone else without application and coding expertise (think: operations peeps); if they need to look at your code to figure out what it's doing, then you're Doing It Wrong.

There are 3 simple rules:

Environment-specific values only; no soft coding stuff
Stay as close to the application executable as possible
Use simple key value pairs

Clever developers always fuck up #1 by "future proofing" it. Turns out, it's just software, and you can just change it again if you need to make Border Color or some other bullshit your crystal ball tells you the business will want different in production.

Environment variables are perhaps the largest violation of #2 I can think of, well, sort of having a "wanna be DHCP server" that somehow tells applications what their configuration is. The best is, application.exe.settings. Guess which file you have to edit to change the settings?

Clever developers fuck up #3, because key value pairs never seem like they'd be enough. They are. If you ABSOLUTELY NEED A LIST, then either allow duplicate key values or use a CSV or something. If you need something more complex, then use a goddamn database.

And yes, I realize how much of a fuck .NET's .config files; just use appSettings, and externalize it. Also, "code-based" configuration files are silly, because you need to know the language and conventions for when things go in quotes and not:

my_connection_port = 80;
my_connection_host = "dev.hdars.local";
my_connection_enabled = false;

apapadimoulis

@Maciejasjmj said:

My logging is still a bit WTFy - it seems like there's no way to fully log entry/exit to the function with arguments and a result and avoid the boilerplate of having an using block like so:

Is that actually helpful? Thus far the only thing I've found that works is to have descriptive messages before critical points and good exceptions; it should be obvious from a stack trace how to fix the error?

Here's what we do...it's .NET / C# specific...

interface ILogger { Log( MessageLevel level, string message ); event MessageLogged; }
enum MessageLevel { Debug, Info, Warn, Error }

We then have extension methods for LogDebug, LogWarn, etc. We could put conditional compilation attributes on the LogDebug extension method, which would cause it not to even be invoked unless it's a debug build.

You can pass an ILogger to a static function as an argument, or implement it in classes so that your function code only has to do this.LogX($""). Wiring it up to a logging framework is fairly easy, could even configure it if it differs per env; we have ILogger scopes, which make it easy to sort through lots of simultaneous things.

dkf

@apapadimoulis said:

Is that actually helpful?

At anything other than the finest trace level, probably not. You end up with a truly stupid amount of detail, which hides the problem that you're looking for. (BTDT…)

When I'm logging things with Java, I give each class its own logger (so I can control them individually; being able to switch a single class's logging to debug while keeping the rest at info or warn is a nice feature) and I only put in logging where there's something relevant to say. An exception handler that isn't going to rethrow is usually one of the places where there's something to say (I only don't where the catch is very close to the source and I know what's going on exactly, such as with some types of I/O).

In normal production operation, no more than log message should be produced per user action on average. Any more and you'll probably get snowed under. It's on average because operations that cause permanent changes will probably trigger more log messages than a read. (The exception is if you're keeping an audit log; that's a case where you necessarily log more.)

dkf

@apapadimoulis said:

There are 3 simple rules:

Good rules. I've found that it is good to supply a sample settings file with the application that describes how you'd configure all the default values (always pick sensible defaults if you can). If the configuration format supports comments, make sure it is excruciatingly well documented because the next person to touch it won't have read the installation and configuration documentation.

A small minority will fuck things up even then. There is literally nothing you can do about the determinedly stupid. Hopefully you can encounter their problems on a public forum with other users of your software calling them idiots for blundering like that: hopefully, because it means you don't have to be mean to them yourself, and so can keep a squeaky clean reputation.

Yamikuronue

also: document everything, even things you think are blindingly obvious. Someone not familiar with your codebase might not think they're obvious.

Case in point: I once had a configuration issue with a web server because I didn't realize the following:

"host" defaults to "localhost"
"address" defaults to "host" only if "host" is set explicitly
Therefore, setting "host" to "localhost" will result in the server not listening to any outside connection, while not setting "host" will result in a server which reports its host as "localhost" but which DOES allow outside connections

dkf

@Yamikuronue said:

also: document everything, even things you think are blindingly obvious.

Documentation needs QA just as much as code does.

Onyx

@apapadimoulis said:

Clever developers fuck up #3, because key value pairs never seem like they'd be enough. They are. If you ABSOLUTELY NEED A LIST, then either allow duplicate key values or use a CSV or something. If you need something more complex, then use a goddamn database.

I find that INI-ish format works well enough with CSV where needed. If that's not enough granularity, JSON is well defined and there's a good parser lib available for pretty much any language worth using.

dkf

@apapadimoulis said:

They are. If you ABSOLUTELY NEED A LIST, then either allow duplicate key values or use a CSV or something.

I've seen people do this:

some.property.1 = abc
some.property.2 = def
some.property.3 = ghi

It grates on my sense of neatness horribly, but it works in practice and it's easy to parse.

cartman82

@dkf said:

I've seen people do this:

some.property.1 = abc
some.property.2 = def
some.property.3 = ghi

That's error prone and difficult to maintain.

IMO, the moment you find yourself writing some custom parsing code for your config files, stop and fall back to a well known rich format, like XML or JSON.

dkf

@cartman82 said:

That's error prone and difficult to maintain.

Did I say I liked it?

Maciejasjmj

@apapadimoulis said:

Is that actually helpful?

In production, it hurts way too much, but I find it nice to have detailed logs with UATs. Stacktraces give you a "where" (kinda), but not really the "why" that arguments provide.

The boilerplate is really fucking annoying, though. So while I'm going with it, I'm really not sure about that, and you probably shouldn't take my advice...

(if you do want entry/exit tracking with arguments, though, implementing IDisposable is actually the best way to do it that I know of.)

@apapadimoulis said:

Here's what we do...it's .NET / C# specific...

Hmm... I don't like your solution though. All logging frameworks I know of provide both an ILogger interface (although NLog's is ridiculously obtuse - it's like 80 methods) and a matching implementation, why roll your own? If all you want to do is to log messages at a specific level in a specific point, you don't need extensions to the logging framework at all - just plop a private static ILogger _log = LogManager.GetCurrentClassLogger() (or inject it if you're feeling fancy), and that's it..

@apapadimoulis said:

We could put conditional compilation attributes on the LogDebug extension method, which would cause it not to even be invoked unless it's a debug build.

You can just pass a Func<string> closure to the logging method and evaluate it conditionally. You get a negligible hit of a method invocation and log level check, but you can granularize your logging better.

(oh, and being able to pass an exception to the logger for serialization is pretty much a must).

apapadimoulis

I haven't thought about Logging Done Right, but I'm starting to think that rule #1 will be "consider why you want logs" and maybe #2 is "every application can be different, and that's ok".

@Maciejasjmj said:

I find it nice to have detailed logs with UATs

Hmm, yeah so for us, a product company, that doesn't make sense. I guess our products actually have a "logging feature", I suppose it's a bit like Windows Event Log or something? This is something in-house apps would not / should not have.

Also, we basically look for customers to discover/provide reproduction steps (which is not any different than regular in-house I suppose), but you might do that in a testing environment for an in-house app instead.

@Maciejasjmj said:

why roll your own

I suppose it goes back to the "specific needs of logging" thing. I guess here, we can switch frameworks (or actually use one, I should say), and our SDK won't have to take a dependency on the logging system used.

We also have some weird requirements, with needing logs streamed over agent connections, and then dropped in the database.

@Maciejasjmj said:

being able to pass an exception to the logger for serialization is pretty much a must

Yes, for sure. I see that C# 6 now finally supports exception filtering, but I was really surprised they encouraged using it to log exceptions.That seems like a recipe for wtfery.

Maciejasjmj

@apapadimoulis said:

I see that C# 6 now finally supports exception filtering

.NET 4.5 also has a free, compile-time [CallerMemberName] attribute that's nice to leverage. Sadly, we're stuck on VS2010 and .NET 4.

@apapadimoulis said:

"consider why you want logs"

Yeah, that's kind of a big question. And not very easy to answer. Do you want a debugging aid at the cost of performance and/or maintainability? Does "I don't have to ask the testing drones for repro (we kinda do our testing directly with the business), all I need to do is browse the trace" outweigh "half of my codebase is logger management"?

@apapadimoulis said:

I guess our products actually have a "logging feature", I suppose it's a bit like Windows Event Log or something?

Obviously, in production, you don't go beyond what would go into the Event Log (which is "things the application does in very broad strokes" and "errors and problems", plus maybe request traces if you want to catch bad guys). Debugging is a bit different - a detailed stacktrace plus values causing the error help immensely with post-mortems when there's no good repro.