Whenever someone mentions state I think of Rich Hickey's talk "Simple Made Easy" in which he seems almost terrified of state and strives always for pure functions, because as a human he has a hard limit as to how much state he can keep in his head at once when reasoning about the program. He's right. I've worked on enough bad code, including sometimes my own, to know that programmers are bad at managing state.
Pure functions are useless without inputs and outputs. The missing piece of the puzzle here is that we only want transitions between valid states. One way to define valid states is by enforcing invariants which must always be true. Armed with these, all the stuff that happens in-between can be guarded against, much like constraints in a database schema.
Contrast this with non-pure functions which incrementally mutate from a valid state, to a sequence of invalid states, and end up at a valid state again. If something goes wrong along the way, we end up with an invalid state. Think about program state like it's a journaling file system.
That just developers being lazy! As a user, I want my computer and my software to manage all state for me. This means remembering all data I entered (so I can continue work from where I left off), all files I saved (so I can search for them easily), all actions I've taken (so I can undo them), all sites I browsed (so I can find them again), all info I uploaded (so I can check which website knows what about me), etc.
I also found this to be defeatist (and the basis of the post):
> As a programmer, it’s impossible to predict all the states that your program can end up in.
It's true that we can't predict all the things other programmers will change the program to do in the future, but we can take precautions such as rejecting invalid states, or if being liberal in the input it accepts transform it to something usable.
Hiding in Jetpack Compose (Google's modern UI framework built for Android) is an incredible MVCC-based state management system that solves several problems at once:
1. Concurrent modifications: Variables wrapped in the State interface are automatically snapshotted, and readers see the most recent snapshot.
2. Observability: State reads are recorded since the last snapshot, which invalidates the Composable functions that contain the read, triggering a refresh of the UI.
3. Consistency: Because you're operating on snapshots, you don't need to wrap all mutable state in an immutable sum type; all changes are consistent with each other within a snapshot. So you can fearlessly keep state as a disconnected set of mutable variables and write naïve UI code without the boilerplate that comes with encoding state machines (or state charts).
I understand that React and SwiftUI are similar to an extent, but this feels better because it's actually completely disconnected from Compose-the-UI-framework. I would love the snapshot system to be ported to other environments.
I use xstate for all my ui state. I'm convinced that state machines + actors is the best way of modeling and building application state management. Going into any project and having this clear system for modeling any form of application state is extremely liberating. The problem with xstate and state machines is that the patterns are not well known yet. For example, one thing xstate encourages is creating different states for when a request is being made which is not scalable ime. Instead i think it's better to spawn a completely different actor that handles the request and then sends a message back on success or failure.
Programmers are bad at managing state when they forget to use the tools made for managing state. I've had to work on code bases where the state was "implicit" and encoded in a bunch of different fields which "grew" over time and it's a full on nightmare. Even a rudimentary state machine which makes both application state and transitions between states explicit feels like a super power by comparison.
This is a very common pattern in Ruby to manage state. It's especially useful to guard entering impossible states with respect to business logic and figuring out what went wrong. Something along the lines of:
Can't transition Command from 'ordered' to 'to-deliver': paid() == false
Sum type is key, called "discriminated union" sometimes generally. In Rust this is an `enum`. Simulated in some languages as tuples with tag first element. Discrete number of states, attaching information only relevant to each single state. Thus, never have invalid combination of other fields.
We actually have a great tool to manage state, but nobody seems to have used it to its full potential: regular expressions.
Imagine your program is a big state machine, and inputs and other events are making it transition from one state to another, and when in a state the apropos actions are performed. On this view, your program just is a regular expression parser--the tokenized stream of input events is what it is parsing.
And what is a great, high-level way of describing a parsing state machine? Yup--a regular expression. When compiled, a regular expression is translated into a state machine. However, you can't really specify that an arbitrary procedure is called when it enters a state--it can only do limited actions, e.g. extract a substring.
If we could let the compiled state machine call arbitrary procedures when it enters a state, well, we'd have a new program-control-flow statement. A super-duper if/then/else.
Instead of writing state charts--absolutely the lowest level you can program a state machine at--we could specify the state machine in a very high-level, easier to understand and maintain way.
This would be, speaking as someone who has dealt with his share of state machines, quite confusing. The uncomfortable thing about regular expression implementation as automata is either (a) non-determinism (i.e. "being in many states at once" a la Glushkov or Thompson NFAs, not "non-determinism meaning something different might happen on any given execution") or (b) state explosion (in a DFA, to represent non-determinism).
This has a huge impact on trying to hook actions, call-backs, etc (your "arbitrary procedure") to NFA states as you're frequently making many overlapping entries to these states, many of which go nowhere. Trying to figure out which entries correspond to which other entries isn't easy.
There are very stylized automata that are used in parsing that do interesting things beyond the finite automata space (like pushing things onto a stack), but they don't correspond to regex per se - instead they are generated from a grammar.
re: nondeterministic implementations: Detecting and resolving nondeterminism is something every method of implementing state machines has to consider. Even when implementing a state machine by hand coding switch statements, you have to worry about whether or not you have inadvertently specified an indeterminism.
> interesting things beyond the finite automata space
State machines certainly can be glorified into LR parsers, or even into Turing machines by adding various bells and whistles. So sure, this idea shouldn't be limited to just regular expressions--we've got good methods of specifying even very complicated state handling.
I hope you saw the excellent comment on this thread where somebody talked about how Ken Thompson implemented paxos using yacc for state management. I've been around the block with state management too, but using yacc for state management is something that never occurred to me, and frankly, opens up a whole new world of possibilities.
BTW, it's also an answer to your point about the pitfalls of nondeterminism: yacc is a great example of how to specify a state machine while detecting, reporting, and resolving nondeterminism.
Perhaps this comment was meant as a joke, but this is exactly what lex does for regexps and yacc does for LALR(1) grammars. For the right job, they are both great.
I watched Ken Thompson write a Paxos implementation in yacc once.
It's not as exciting as it sounds. He wanted to play around with learning Paxos, which is a big state machine, and he used yacc to do it. I shared an office with him for a couple years in the early days of Go, and he was working on it while I was working on other things. I helped him track down at least one bug in the Go port of yacc that way. I think the grammar he was writing was completely regular, but yacc is nicer to use than lex.
Yeah, but what if we didn't do that? I.e. when we are reading from memory, we treat it as part of the tokenized input stream, and whatever we are writing to memory is just the transformed output of the state machine.
This way, the contents of the memory wouldn't be state at all; they would just be, as it were, the scratchpad where sentences in the language accepted by the state machine, or output by the state machine, are found.
That's what regular expressions do for us: they are a compact way of specifying a language. Just a few characters--which is to say, just a few states--can specify a language which contains arbitrarily long strings.
But--even though the input and output streams can be very large, they no longer part of the state, and thus do not contribute to the exponential blowup of states.
Except the tape gets overwritten/looped over. What GP was referring to is event-sourcing the state in an append-only log and running finite state automata on this sequence.
Here is a java lib that apply regex to stream of Objects that could be used to achieve this purpose.
I like it! I don't know if I agree, but, the category of "data used in control flow" is a good one.
I think... maybe this definition needs a narrower category of control flow? Like, technically, if you're converting 24-hour time to 12-hour AM/PM time, you have control flow (if > 12 hours). IMO that seems like it shouldn't be state. But, "we've delivered this message, so don't try to deliver it again" definitely involves control flow and definitely feels like state. Something like "this list is sorted" is in-between; it's re-computable (easily, depending on list size), if it's not done you want to do it, but after that it doesn't affect control flow.
Maybe the difference is "control flow that modifies other data"?
State is the data required to continue the operation of a program. That's always the case (also in non-FP contexts), but state is often smeared all over a program as only loosely and implicitly connected pieces whereas FP is mainly concerned with how to manage state (and thus also with how to compose such pieces of state into a greater whole so it's easier to manage).
Edit: Note that there is also "data you don't need for continuing the operation of the program", which we could categorize as "input", "output" and "wasted memory/storage/cpu cycles". The last of the three can obviously still be crucial for other reasons than program execution (e.g. debugging), which makes the line between output and waste blurry.
I like it! I don't know if I agree, but, the category of "data required to continue the operation of the program" is a pretty excellent category, thank you for pointing it out!
Maybe... As you say, state is the information required for continued execution. Which would make "data" the information that's carried through the program. Like - parts of an HTTP request are "state" because they're what you use to route the packet through the internet. The other parts are "data", because all the intermediate programs don't use it during their operation; it's just passed through.
When state gets out of hand, I know of three options:
1. Put a constraint solver over it so that the programmer describes rulesets instead of state "paths". Regex rules like the Kleene star's backtracking are a simple example of such.
2. Put a compiler over it so that boilerplate "if" statements are generated from a smaller description. E.g. FSM compilers are one way of doing this. Years ago I read of an implementation of TCP/IP (which I can no longer find) built from a custom parser that read the actual text of the RFC spec and generated output source code.
3. Enumerate everything in an enormous decision table so that the spec is tighter.
> Years ago I read of an implementation of TCP/IP (which I can no longer find) built from a custom parser that read the actual text of the RFC spec and generated output source code.
OMFG! I thought of doing this a hundred times and never got around to it. Are you 100% sure you can't find it? I think such a piece of technology is extremely important!
Actually, this explains quite a lot of things that I knew, but couldn't express before. Thanks for the useful chunk to add to my map[1] of the world. It'll be interesting to see what other things fall out as I check against all the other chunks.
If I could reset my browser's state with the single exception of cookies.... it would be amazing. I just don't want to have to authenticate all over again, everywhere.
That's why FP is such an important pattern. I'm not advocating strictly following FP but people need to learn it to understand the fundamental problem with IO and state.
I think this kind of misses the point. Yes, programmers are bad at managing state. But programmers aren't managing state, the programs they write are. Programmers are bad at programming.
This isn't hard to understand at a fundamental level. Ask a programmer to write one simple algorithm during an interview, and there can be dozens of different bugs found. Modern software is made up of millions of these algorithms. So the potential for bugs is massive. There's simply too many things to think about, and you can't see all the invisible gotchas. We like to think of ourselves as "computer geniuses", these incredibly smart and talented humans who have an unlimited capacity. But that's just false. We're fallible in general, and software isn't an exception to that.
This is made worse by the fact that there are no "building codes" for software. In physical engineering, there are all kinds of requirements to build something. You must use 10d nails for this kind of wall, spaced at 16 inches, with 2x4 studs, etc, to build a given kind of wall for a given kind of building. You're not allowed to skip them because "you don't think this needs to scale". On the other hand, adding things unnecessarily just drives up cost and makes things take longer. But we software engineers get to literally do whatever we want (and often do), with the excuse that "it's not gonna kill anybody!" But people rely on the software we write for every task in their lives today. We tell ourselves that we don't need to care how we do our jobs, which results in things like avoidable defects, and an unnecessary increase in time and money, and difficulty in just getting things done with software products.
There is a simple solution to "managing state": versioned immutability. Basically, you look at something in a given state, and if it's working, you say "ok, take a snapshot of that state and give it version 1.0". Later, when the state naturally devolves into chaos, you say "ok, restore state version 1.0". This is essentially a hack to deal with the fact that we suck at making programs that can deal with entropy. But it's a hack that tends to work.
In the Operations space, we've long since learned that immutability is the only simple way to make systems reliable. You can build incredibly complex configuration and state management systems to constantly try to "fix" state by poking and prodding the state back to where you'd like it to be. Or you can just... restore a backup. Kill the current thing and replace it with the old working copy. That's Immutable Infrastructure, and it's the single most powerful idea we've had for managing systems in at least 30 years.
More software developers need to understand this principle and start applying it to the code they write. For example, Cloud-based services that provide no means for immutable management tend to be difficult to manage, and require complex configuration management tools like Terraform to constantly "fix" their state.
The flaw isn't that the state can change - it's that the state can change into a "bad" state. We can't predict what that state will be, because again, we're bad at programming. But we can tell when the state was "good", and go back to that when things stop working.
To facilitate that, a system needs to have a concept of the conditions under which it operates, and the actions taken under a given state. An example is a web app. When the web app state is "good", it can do things like process user registrations, display content, perform transactions. But when the app state is "bad", it can no longer perform the actions properly. The difference between the states is an operational state; the state which determines if the app can perform its function. Outside of that is the state of the individual actions, which will of course change during the course of the action (to perform a user registration, there must be state changes like "add a user to the database"). So software systems need to have a distinction between the kinds of state that affect operations or not, and version those, and allow easily restoring those versions.
Those aren't really like building codes. Building codes are effectively minimum specifications for safety, with the requirement that they can be inspected and approved. The codes eventually became more uniform, but their original intent was to prevent shoddy workmanship from creating hazardous conditions.
SOC II and ISO 27001 have to do with security, which is close to safety, but not the same. GDPR and HIPAA are concerned with privacy, which is more related to security than safety. The rest are about compatibility.
If somebody breaks into a building, that's a security issue. If the building falls down, that's a safety issue. Software can be secure, private, and compatible, and still crash all the time. Safety isn't the same as reliability, but safety tends to lead to reliability, because if it wasn't reliable, it wouldn't be all that safe.
So I think we could use standards that focus more on safety [and reliability]. It won't make the products make more money - in fact, it'll cost more to make them. But the result will be better for people and society overall, the way building codes have been.
Software has significant effects on people that isn't immediately apparent to the designers.
I'm sure that the contractors who designed accounting software for the UK Post Office didn't expect to ruin the lives of hundreds of people. https://www.bbc.com/news/business-56718036
The longer that we continue to be flippant about the impact of software on the lives of people, the longer people will suffer due to our laziness and unprofessionalism.
"Simple Made Easy" - Rich Hickey (2011) https://www.youtube.com/watch?v=SxdOUGdseq4