Hacker News new | past | comments | ask | show | jobs | submit login

I have a dumb question as a non-SWE who is curious about software engineering.

I've heard "feature flags" are popular these days, and I understand that that's where you commit code for a new way of doing things but hide it behind a flag so you don't have to turn it on right away.

Now, if I want to test in prod, couldn't I just make the flag for my new feature turn on if I log in on a special developer test account? And if everything goes well, I change the condition to apply to everyone?




Yes.

As long as your code makes sure it takes account of that flag everywhere that it is used. Otherwise your new feature could "leak" into the system for everyone else.

Plus, as systems grow in complexity, there's always a danger that features step on each other. We'd like to think that everything we write is nicely isolated and separated from the rest of the system, but it never works that way - plus we're just a group of squishy humans who make mistakes. There will be times when having Features A and C switched on, with B switched off, produces some weird interactions that don't happen if A, B and C are switched on together.


Feature flags sound great, but a company I’ve been consulting for has been using them to their own detriment. Seems like many bugs are due to a (production!) user not having the right combinations of flags enabled.

There ends up being code to deal with what happens when various combinations of flags are on/off, and that code doesn’t get tested much.

And teams spend a lot of time just removing flags.

This isn’t a safety-critical app - I really think they’d do better dropping the flags, and just deploying what they want when it’s ready.


I'm going to go further and say that Feature Flags are a nightmare and should be avoided. Because instead of just being used to stage roll-out, they get used to configure different environments for different customers.

You not only waste time with "Remove feature flag X" stories if all customers end up with the feature, you also slow down the response time of some categories of bugs, because you end up having to stop and check the combination of feature flags to reproduce a bug.

And if you end up with a feature that isn't popular except by one customer, not only are you now stuck supporting "Legacy feature Y", you're actually stuck supporting, "Optional legacy feature Y" which is worse.

Maybe I'm ranting about "misuse of feature flags", but I don't like to pontificate about how things ought to be, but how in my experience they actually are.


Yes, it really depends on the type of environment / app you are running too. If your app is stateful, uses lots of data, etc.. then feature flags can cause a lot of issues with inadvertent upgrades that have to be rolled back manually in things like user data.

Or you can have infinite permutations of feature flags if you don't flip them to be on for everyone quickly enough, and it becomes hard to test if[a&(not b)&c&(not d)] behavior vs if[a&b&(not c)&d] and.... you end up with too many to cover with testing well.


They can be very very very nice if you have a lengthy (or perhaps just unpredictable) build/deploy process. And/or if you have lots of teams working independently on the same monolith.

Suppose you have daily production builds. You are rolling out Feature XYZ. You would like to enable it in prod, but you would like to monitor it closely and may need to turn it off again. Feature flags allow that.

Ultimately what's being achieved is a decoupling of configuration and deployment.

    Maybe I'm ranting about "misuse of feature flags", but 
    I don't like to pontificate about how things ought to be, 
    but how in my experience they actually are. 
Similarly, I might just be making excuses for bad build/deploy processes. =)

At my last job we relied heavily on feature flags via Launch Darkly. I will admit: it was somewhat of a band-aid for the fact that our build process was way too slow and flaky, and that we had too many teams working on an overstuffed monolith.


I also use feature flags when I'm 100% sure stakeholders or PMs will somehow find fault a certain feature after it's deployed, even though they're the ones who specified it, approved it and tested it in a staging environment.

Not exactly the thing that we should be using Feature Flags for, but it saved my ass several times.

On the other hand: this removes some of the accountability that non-technical folks have over software. This can be detrimental in the long term.


I have also found that for UIs the best thing to do is have a staged rollout approach.

Internals / Friendly users / Less friendly users / VIPs. The blast radius & intensity of explosion is smaller on the earlier groups.

The groups themselves need not be fixed. If you have a stakeholder/group that demanded the new features, they can be in an early wave. Inevitably they may be the ones to find defects in it, so the sooner the better.


> they get used to configure different environments for different customers

That is not a feature flag, that is a customer configuration option. They are different things and should not be treated in the same way.

Sure, it is possible for a feature flag to behave like a configuration option but they have different lifecycles and different audiences and so should not be confused. Of course it is easy to say that but harder in practice to maintain those differences.


> Seems like many bugs are due to a (production!) user not having the right combinations of flags enabled.

In my experience, feature flags work best if you aim to remove them as quickly as possible. They can be useful to allow continual deployment, and even for limited beta programs, but if you're using them to enable mature features for the whole customer base, they're no longer feature flags.


We've been using feature flags extensively lately. A step that helps for this issue is having all merged code deploy automatically to our QA environment first. We have automated tests which run there regularly, as well as it being the environment most people use for testing, which increases the likelihood that issues like this will become evident quickly.

Definitely doesn't do anything like completely obviate the issue though.


You've described the ideal use case - a single feature flag, short lived, to let select users test one isolated piece of functionality until it's made generally available. Feature flags used in this way are wonderful.

But there are numerous ways to use feature flags incorrectly - typically once you have multiple long-lived flags that interact with each other, you've lost the thread. You no longer have one single application, you have n_flags ^ 2 applications that all behave in subtlety different ways depending on the interaction of the flags.

There's no way around it - you have to test all branches of your code somehow. "Just let the users find the bugs" doesn't work in this case since each user can only test their unique combination of flags. I've regularly seen default and QA tester flag configurations work great, only to have a particular combination fail for customers.

The only solution is setting up a full integration test for every combination of flags. If that sounds tedious (and it is), the solution is to avoid feature flags, not to avoid testing them!


> The only solution is setting up a full integration test for every combination of flags.

I've long been wondering whether there are tools that help with that. Like they measuring a test suite's code coverage but for feature toggle permutations. Either you test those permutations explicitly or you rule them out explicitly.


Long lived feature flags are totally fine, they're more like operational flags than anything. The Fowler article is pretty good at classifying them. Depending on the type of flag (longevity/dynamism) the design will vary. https://martinfowler.com/articles/feature-toggles.html


An essential property of a feature flag is that it is short-lived, existing only for the duration of the roll-out of the feature. In the language of your linked article, feature flags are 1-to-1 with "release toggles" and not really any other kind of toggle.


2^nflags actually. Which is a much bigger number.


Yes, thank you for the correction! Though the point still stands - keep your nflags <= 2 and you can reasonably test it.


The solution is to remove your feature flags after you are done with them.


The problem is when you use feature flags for customer-bespoke reasons or to enable paid features. Then they’re always there and have to be tested in combinations which sucks.


Yeah, those things are called "user settings". If you need them, you need them, but pretending they are feature flags and trying to port the flags development methods into your settings will lead to nothing but tears.


Echoing sibling comments, feature flags are about managing the deployment of new product capabilities, and should always be short-lived. They're not an appropriate choice for any kind of long-lived capability, like anything that's per-customer, or paid vs. non-paid, or etc. Using feature flags for those kinds of things is a classic design mistake.


It's so closely related and so often mistaken, feature flags and tenant features are two completely different things


Yes, that's the general idea - and it works pretty well.

It can also be a huge PITA. The fallacy is that a "feature" is an isolated chunk of code. You just wrap that in a thing that says "if feature is on, do the code!". But in reality, a single feature often touches numerous different code points, potentially across multiple codebases and services/APIs. So you have to intertwine that feature flag all over the place. Then write tests that test for each scenario (do the right thing when the feature is off, do the right thing then the feature is on). Then you have to remember to go back and clean up all that code when the feature is on for everyone and stabilized.

It's a good tool, but it's not an easy tool like a lot of folks think it is.


In web development there is often a single place you can put a feature flag though.

For example maybe the feature flag just shows/hides a new button on the UI. The rest of the code like the new backend endpoint and the new database column are "live" (not behind any flags) and just invisible to a regular user since they will never hit that code without the button.

As far as "remembering" to clean up the feature flag, teams I've been on have added a ticket for cleaning up the feature flag(s) as part of the project, so this work doesn't get lost in the shuffle. (And also to make visible to Product and other teams that there is some work there to clean up)


This is pretty common at larger scales, and is also often done on a per-tenant or per-account basis.

For example, the Microsoft Azure public cloud has a hierarchy of tenant -> subscription -> resource group -> resource.

It's possible to have feature flags at all four levels, but the most common one I see is rolling deployments where they pick customer subscriptions at random, and deploy to those in batches.

This means you can have a scenario where your tenant (company) is only partially enabled for a feature, with some departments having subscriptions with the feature on, but others don't have it yet.

This can be both good and bad. The blast radius of a bad update is minimised, but the users affected don't care how many other users aren't affected! Similarly, inconsistencies like the one above are frustrating. Even simple things like demonstrating a feature for someone else can result in accidental gaslighting where you swear up and down that they just need to "click here" and they can't find the button...


The training aspect of feature flags is a huge pain point.

Not to mention it looks really awkward when an account manager has forgotten to enable some great new feature for you.


Yes. That’s how we typically do it in our shop. Though we do test it during development. Then when we think it’s ready, we have the product owner (or whoever ordered it) “play around with it” on a test setup. Before we let select users “test it in production”.

I’m not a fan of this article in general, however, a lot of what it talks about is anti-pattern in my book. Take the bit about Micro-services as an example. They are excellent in small teams, even when you only have 2-5 developers. The author isn’t wrong as such, it’s just that the author seems to misunderstand why Conway’s law points toward service architectures. Because even when you have 2-5 developers, the teams that actually ”own” the various things you build in your organisation might make up hundreds of people. In which case you’re still going to avoid a lot of complexity by using service architecture even if your developers sort of work on the same things.


You’re describing QC. The reason that’s not sufficient is because your test user might not meet the conditions that trigger a bug. Trivial example: a bug that only shows up for users using RTL languages. A test suite allows you to test edge cases like that. Another shortfall of QC is that it doesn’t provide future assurance. A test suite makes sure the feature keeps working in the future when changes that interact with it are introduced.


Yes. Also, feature flags don't have to be on/off, they can be set to a % of requests or users, enabling a progressive rollout period.


Yes, this is a relatively common practice. There’s of course still the chance you make a mistake setting up the feature flag and bring down production/expose the feature to users who shouldn’t have access.


The risk is context dependent. It could be a great idea or it could be the end of the company.

Classic story: https://dougseven.com/2014/04/17/knightmare-a-devops-caution...


Feature flags are just code, like the rest of the software. You can program any feature with it, including auto-enabling it given appropriate circumstances (e.g., the user is logged in to a developer account). Of course, that doesn't work for features available without requiring an account.


Yes, feature flags are often able to be applied globally, or per customer. However, feature flags add complexity (littering your business logic with feature flag checks), so many small non-feature changes wouldn’t use them.


I guess you can implement them however works for you and your team. I have personally implemented them in various way: depending on the date, on the client IP address, on an env var, on a logged user id, etc etc.


A note of caution re: flags from an oracle dev: https://news.ycombinator.com/item?id=18442941




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: