Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Mandala – Automatically save, query and version Python computations (github.com/amakelov)
100 points by amakelov 6 months ago | hide | past | favorite | 30 comments
`mandala` is a framework I wrote to automate tracking ML experiments for my research. It differs from other experiment tracking tools by making persistence, query and versioning logic a generic part of the programming language itself, as opposed to an external logging tool you must learn and adapt to. The goal is to be able to write expressive computational code without thinking about persistence (like in an interactive session), and still have the full benefits of a versioned, queriable storage afterwards.

Surprisingly, it turns out that this vision can pretty much be achieved with two generic tools:

1. a memoization+versioning decorator, `@op`, which tracks inputs, outputs, code and runtime dependencies (other functions called, or global variables accessed) every time a function is called. Essentially, this makes function calls replace logging: if you want something saved, you write a function that returns it. Using (a lot of) hashing, `@op` ensures that the same version of the function is never executed twice on the same inputs.

Importantly, the decorator encourages/enforces composition. Before a call, `@op` functions wrap their inputs in special objects, `Ref`s, and return `Ref`s in turn. Furthermore, data structures can be made transparent to `@op`s, so that an `@op` can be called on a list of outputs of other `@op`s, or on an element of the output of another `@op`. This creates an expressive "web" of `@op` calls over time.

2. a data structure, `ComputationFrame`, can automatically organize any such web of `@op` calls into a high-level view, by grouping calls with a similar role into "operations", and their inputs/outputs into "variables". It can detect "imperative" patterns - like feedback loops, branching/merging, and grouping multiple results in a single object - and surface them in the graph.

`ComputationFrame`s are a "synthesis" of computation graphs and relational databases, and can be automatically "exported" as dataframes, where columns are variables and operations in the graph, and rows contain values and calls for (possibly partial) executions of the graph. The upshot is that you can query the relationships between any variables in a project in one line, even in the presence of very heterogeneous patterns in the graph.

I'm very excited about this project - which is still in an alpha version being actively developed - and especially about the `ComputationFrame` data structure. I'd love to hear the feedback of the HN community.

Colab quickstart: https://colab.research.google.com/github/amakelov/mandala/bl...

Blog post introducing `ComputationFrame`s (can be opened in Colab too): https://amakelov.github.io/mandala/blog/01_cf/

Docs: https://amakelov.github.io/mandala/




Does it survive restarts? You mention that they are exported as dataframes, can they be reimported? Does this mean we can run mandala on many machines, and merge data frames together to get collective memoization?

Do you support persisting into external stores?

You mention incpy in readme, have you discussed this project with Philip Guo? https://pg.ucsd.edu/

What is the memory and cpu overhead?

How does the framework handle dependencies on external libraries or system-level changes that might affect reproducibility?

How do you rollback state when it has memoized a broken computation? How does one decide which memoizations to invalidate vs keep?


In order,

1. Yes, you can choose to create a persistent storage by passing `db_path` to `Storage()`. The current implementation is just an SQLite file. To run on many machines, you don't really need to be able to re-import from a dataframe (dumping to a dataframe is meant to be an exit point from `mandala` so that you can do downstream analyses in a format more familiar than `ComputationFrame`) - `ComputationFrame`s can be merged via the union (`|`) operator, see here https://amakelov.github.io/mandala/blog/01_cf/#tidy-tools-me... for an example. Storages don't support merging yet, but it's certainly possible!

2. Already answered in 1.

3. Nope, but I'd be happy to (though I feel like `mandala` took memoization in a substantially different direction). Are you in a position to make an introduction?

4. This project is currently not optimized for performance, though I've used it in projects spanning millions of memoized calls. The typical use case is to decorate functions that take a long time to compute, so the overhead of memoization amortizes. A very quick benchmark on my laptop shows ~6ms per call for in-memory storage, ~9ms for a persistent storage, with a simple arithmetic function that otherwise takes ~0 time.

5. Great question - currently, the dependency tracer is restricted to user-chosen functions to avoid tracking function calls an imported library makes. You could use a bit of magic (import-time automatic decoration) to track all functions in a file or a directory (not implemented right now). The reasoning is that, for a typical multi-month ML project, you usually have a single conda environment so you want to ignore library changes. Similarly, system-level (e.g. environment variables) are also not tracked. I think a very useful feature would be to at least record the versions of each imported library, so that storages can be ported between environments with some guarantees (or warnings).

6. - If an `@op` call was memoized, the underlying Python function call succeeded, so in this sense it can't be "broken"; it's however possible that there was a bug. In this case, you can delete the affected calls and all values that depend on them (if you keep these values, you're left with "zombie" values that don't have a proper computational history). The `ComputationFrame` supports declarative deletion - you build a ComputationFrame that captures the calls you want to delete, and call `.delete_calls()` - though there's still no example of this in the tutorial :) Alternatively, you can change the affected function and mark this as a new version. Then you should be able to delete all calls using the previous version (though, not supported at this moment).

- How the cache is invalidated is detailed here: https://github.com/amakelov/mandala?tab=readme-ov-file#how-i...


We have a framework at <job> to do memoization as well as distributed compute - in fact the memoization was mostly a happy side effect of the need to transfer serialized function arguments to other machines.

Your addition of code/runtime dependencies intrigues me. I will probably take a look at your code to try to understand this better.

I somehow doubt there's enough overlap for us to open source our work and try to merge with yours, but it's really cool to see other people working on similar concepts. I predict we'll see a lot more frameworks like these that lean on mathematical principles like functional purity in the future.


This blog port gives an overview of the core dependency tracking logic: https://amakelov.github.io/blog/deps/


thank you!


Is there a way to use this to capture just hashes of the code (transitive) that implements a particular function?

I can't buy into a whole framework in my current context -- but I would really like a way to roll my own content hashing for adhoc caching within an existing system -- where the hash will automatically incorporate the specific implementation logic involved in producing the data i want to cache (so that the hash automatically changes when the implementation does).

eg -- given python function foo -- i want a hash of the code that implements foo (transitive within my project is fine)


Great question! The versioning system does something essentially equivalent to what you describe. It currently works as follows:

- When a call to an `@op` is executed, it keeps a stack of @track-decorated functions that are called (you can add some magic - not implemented currently - via import-time decoration to automatically track all functions in your project; I've opted against this by default to make the system more explicit).

- The "version" of the call is a hash of the collection of hashes of the code of the `@op` itself and the code of the dependencies that were accessed

- Note that this is much more reliable than static analysis, and much more performant/foolproof than using `sys.settrace`; see this blog post for discussion: https://amakelov.github.io/blog/deps/

- When a new call is made, it is first checked against the saved calls. The inputs are hashed, and if the system finds a previous call on these hashes where all the dependencies had the same (or compatible) code with the current codebase, this call is re-used. *Assuming deterministic dependencies*, there can be at most 1 such call, so everything is well-defined. I don't think this is an unrealistic assumption, though you should keep it in mind - it is pretty fundamental to how this system works. Otherwise, disambiguating versions based on the state of the code alone is impossible.

- When a dependency changes, you're alerted which `@op`s' versions are affected, and given a choice to either ignore this change or not (in the latter case, only calls that actually used this dependency will be recomputed).

The versioning system is mostly a separate module (not in a way that it can be imported independently of the rest, but it should be pretty doable). I'd love to hear more about your use case. It may not be too difficult to export just versioning as its own thing - though from what you describe, it should also have some component of memoization? As in, you seem to be interested in using this information to invalidate/keep things in the cache?


Forgot to mention: yes, the dependency tracking is transitive, i.e. if your @op calls a @track-decorated function, which in turn calls another @track-decorated function, then both dependencies will show up, etc.


You can directly hash the code using inspect: https://docs.python.org/3/library/inspect.html#inspect.getso...

And do something similar for the arguments (maybe pickling to get a bytes object you can hash without relying on specific types). Using just the hash function could come with footguns for mutable objects


Cool! Looks pretty professional.

I explored a similar idea once (also implemented in Python, via decorators) to help speed up some neuroscience research that involved a lot of hyperparameter sweeps. It's named after a Borges story about a man cursed to remember everything: https://github.com/taliesinb/funes

Maybe one day we'll have a global version of this, where all non-private computations are cached on a global distributed store somehow via content-based hashing.


Thanks! Indeed, the ultimate (but very ambitious from the point of view of coordination and infrastructure) vision would be to build the "planetary computer" where everyone can contribute and every computation is transparently reproducible. While researching `mandala`, I ran into some ideas along those lines from the folks at Protocol Labs, e.g. https://www.youtube.com/watch?v=PBo1lEZ_Iik, though I'm not aware of the current status.


Oh also totally missed the Borges mention the first time - I'm a big fan of his stories!


This is really nice. I'll take a much closer look when I get time later. I'm very interested in how you think about incremental computation - I find cache invalidation to be incredibly annoying and time-consuming when running experiments.

We wrote our own version of this (I think many or all quant firms do, I know such a thing existed at $prev_job) but we use type annotation inspection to make things efficient (I had ~1-2 days to write it, so had to keep the design simple as well). It's a contract: if you write the type annotation, we can store it. Socially this incentivizes all the complex code to becoming typed. We generally work with timeseries data, which makes some things easier and some things harder, and the output format is largely storable in parquet format (we handle objects, but inefficiently).

One interesting subproblem that is relevant to us is the idea of "computable now", which adds a kind of third variant from the usual None/Some (i.e. is something intrinsically missing, or is it just not knowable yet?). For example, if you call total_airline_flights(20240901), that should (a) return something like an NA, and (b) not cache that value, since we will presumably be able to compute it on 20240902. But if total_airline_flights(20240101) returns NA, we might want to cache that, so we don't pay the cost again.

We sidestep this problem in our own implementation (time constraints) but I think the version at $prevjob had a better solution to this to avoid wasting compute.

(side note: hey Alex! I took 125 under you at Harvard; very neat that you're working on this now)


Thanks Rachit! Great running into you after all these years!

Being aware of types is certainly a must in a more performance-critical implementation; this project is not at this stage though, opting for simplicity and genericity instead. I've found this best for maintenance until the core stabilizes; plus, it's not a major pain point in my ML projects yet.

Regarding incremental computation: the main idea is simple - if you call a function on some inputs, and this function was called on the same (modulo hashing) inputs in the past and used some set of dependencies that currently have the same code as before (or compatible code, if manually marked by the user), the past call's outputs are reused (key question here: will there be at most one such past call? yes, if functions invoke their dependencies deterministically).

You can probably add some tools to automatically delete old versions and everything that depends on them, but this is definitely a decision the user must make (e.g., you might want to be able to time-travel back to look at old results). I'm happy to answer more nuanced questions about the incremental computation implementation in `mandala` if you have any!


This is great! Really good work. It reminds me of working in a SmallTalk environment and coding inside the debugger while an exception is being thrown and restarting the computation.

I believe that this path can be supported as it is right now, and the next step would be to store a computation on some server. If an uncaught exception is raised, store all the computation along with the state, transfer it to your local machine, and restore the state of the machine as it was when the exception was thrown. This way, you can debug the state of the program with all the live variables as it was being run.


Thanks! Indeed, despite the fact that the main goal is to track ML experiments, the approach taken in `mandala` has a lot in common with e.g. time-travel debugging (https://en.wikipedia.org/wiki/Time_travel_debugging).

In reality, there are many similarities between experiment tracking, debugging, and high-level computation graphs - they're all different ways of getting a handle on what a program did when it was ran.


How well does Mandala extend beyond the Numpy ecosystem?

I’m experimenting with Python CAD programming using the CadQuery and Build123d libraries. I’d like to speed up iteration time and intelligent caching would help.

These libraries are pretty opinionated, which make it a bit challenging to imagine how to squeeze cache decorators in there. They have a couple different APIs, all of which ultimately use the Open CASCADE (OCCT) kernel via https://github.com/CadQuery/OCP

CadQuery is a Fluent programming design that relies on long Method Chains [0]. It also has an experimental Free Function API [1].

Build123d iterates on CadQuery [2] with the goal of integrating better with the Python programming language. It has two supported APIs. The Builder API uses Python’s context manager (‘with’ blocks) heavily. The secondary Algebraic API is more functional, using arithmetic operators to define geometric operations [3].

The simplest way to integrate Mandala would probably be to use Build123d’s Algebraic API, wrapping subassemblies in functions decorated with @op.

However, it would be even better to proactively cache function/argument pairs provided by the underlying APIs. For example, if I change 50% of the edges passed to a Fillet() call, it would be nice to have it complete in half the time. I guess this would require me to fork the underlying library and integrate Mandala at that level.

[0] https://cadquery.readthedocs.io/en/latest/intro.html

[1] https://cadquery.readthedocs.io/en/latest/free-func.html

[2] https://build123d.readthedocs.io/en/latest/introduction.html...

[3] https://build123d.readthedocs.io/en/latest/key_concepts_alge...


Very cool! looking forward to trying it out - the graphs reminded me of a toy project I'd done a while back to better understand deterministic and reproducible execution in python as seen in marimo.io notebooks https://github.com/vrtnis/python-notebook-simulator


Ah, yes, the notorious state problem in notebooks. In your project, do you find the dependencies statically or dynamically?


Statically - basically just parsing the code into an AST and then walking through the tree to collect information about variable usage and definitions.


Congratulations, good job! The chaos of notebooks needs some tracking indeed.

7 years ago I made a project with 100 calculation dependencies, (in Python & SQL scripts) and the only thing that allowed not to loose track was Makefile + GraphViz.

I wanted to make something similar in Rust -- a static visualized of dependencies between structs. Things turned out way harder than expected.


Thanks! Yes I think a caching solution like this is great for notebooks, because it makes it very cheap to re-run the whole thing (as long as you reasonably organized long-running computations into `@op`s), overcoming the notorious state problem.

Graphviz is indeed a lifesaver; and you can similarly think of `mandala` as a "build system for Python objects" (with all the cool and uncool things that come with that; serializing arbitrary Python objects with strong guarantees is hard https://amakelov.github.io/mandala/tutorials/gotchas/).

I've no experience with rust, but I'd be curious to hear about the difficulties that came up. I'd expect to see some overlap with Python!


What I can remember immediately:

1. Imports are more complex than in Python, because a module can be just a block in code, not necessarily a separate file/folder. E.g. `pub mod my_module { <code with functions and constants> }` is a module inside another module, so you don't need a folder and `__init__` inside to have inner modules.

Also, `use something` may man an external crate.

`use super::something` means import from upper level of modules tree, but it's not necessarily a folder.

2. I can parse what types my functions require in their signatures, or structs require in their members (but I must have resolved where really those names point at), but there's also type elision -- i.e. you don't need to explicitly write the type of every var, it's deduced from what is assigned, for example `let my_var = some_func(...)` -- will make `my_var` have the return type of some_func. Now I must also keep track of all functions and what they return.

And then, there are generics:

    let my_var: Vec<MyType> = vec![...];
Vec is generic, and in this case it has MyType inside. Well, this may be enough to just register `MyType` on my deps list. But then I may do some more calls:

    let my_other_var = my_var.pop().unwrap().my_method();
Here, `pop()` returns `Option<MyType>`, unwrap returns `MyType`, and then in the end, my_method may return whatever, and I essentially need something like a compiler to figure out what it returns.

This seems big like a little compiler or language server.


Used something similar to this in the past: https://github.com/bmabey/provenance. Curious to see similarities/differences. Also reminds me of Unison at a conceptual level: https://github.com/unisonweb/unison


Thanks for sharing! This is a great project. It is quite close to the memoization part of `mandala` and I'll add it to the related work in the README. I think the similarities are:

- using `joblib` to hash arbitrary objects (which is a good fit for ML which inlcudes a lot of numpy arrays, which joblib is optimized for)

- how composition of decorated functions is emphasized - I think that's very important

- wrapping outputs of memoized functions in special objects: this encourages composition, and also makes it possible to run pipelines "lazily" by retracing memoized calls without actually loading large objects in memory

- versioning: in a past version of `mandala`, I used the solution you provided (which is now subsumed by the versioning system, but it still quite helpful)

The differences: - w.r.t. memoization, in `mandala` you can represent data structures in a way transparent to the system. E.g., you can have a memoized function return a list of things, and each thing will have an independent storage address and be usable by downstream memoized functions. Most importantly, this is tracked by the provenance system (though I'm not sure - maybe this is also possible in `provenance`?)

- one big finding (for me) while doing this project is that memoization on its own is not sufficient to manage a complex project; you need some declarative way to understand what has been computed. This is what all the `ComputationFrame` stuff is about.

- finally, the versioning system: as you mention in the `provenance` docs, it's almost impossible to figure out what a Python function depends on, but `mandala` bites this bullet in a restricted sense; you can read about it here: https://amakelov.github.io/blog/deps/

Re:Unison - yes definitely; it is mentioned in the related work on github! A major difference is that Unison hashes the AST of functions; `mandala` is not that smart (currently) and hashes the source code.


This is a very innovative take on infra for ML observability. Shreya Shankar and collaborators at Berkeley came up with https://github.com/loglabs/mltrace, which treads the same ground - perhaps you've looked at it already?


Thanks! And thanks for sharing the pointer - I think I've seen `mltrace` at some point in the past. The tool has some similarities, but seems different from `mandala` on a philosophical level - `mltrace` seems more opinionated and domain-specific to ML. `mandala`'s goal is to make persistence logic a more generic part of Python semantics, and there's much more emphasis on composing `@op`s freely in whatever complex ways you want. Python functions are great and everyone knows what they do - let's give them more superpowers!

If this is something you're interested in, I can probably give a more detailed comparison if I find the time.


Very cool concept, thanks for sharing this.

Is mandala designed for notebook-style interactive environments only or also when running python scripts more traditionally, in which case could this be integrated in a gitops like environment? (push to git, new run in CI/CD, export log metrics as an artifact with an easy way to compare to previous runs)


Great question - personally, I mostly use it from notebooks, and I think it's a great fit for that. Bundling experiment tracking with incremental computation makes a lot of sense in a notebook (or any other interactive) environment, because it solves the problem of state: if all your computations are end-to-end memoized, re-running the entire notebook is cheap (I routinely do this "retracing" to just get to some results I want to look at).

That being said, nothing prevents you from running this in a script too, and there are benefits of doing this as well. If your script mostly composes `@op` calls (possibly with some light control flow logic as needed), you get resumability "for free" on the level of individual `@op` calls after a crash. However, the workflow you're describing may run into some features that aren't implemented (yet) if your runs write to different storages. `mandala` makes it easy to get a "holistic" picture of a single `Storage` and compare things there. Comparing across storages will be more awkward. But it shouldn't be too hard to write a function that merges storages (and it's a very natural thing to do, as they're basically big tables).


Thanks for the in-depth explanation! I’ll keep an eye on it :)

Fine-grained crash recovery does sound like a great application as well.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: