Thanks for sharing! This is a great project. It is quite close to the memoization part of `mandala` and I'll add it to the related work in the README. I think the similarities are:
- using `joblib` to hash arbitrary objects (which is a good fit for ML which inlcudes a lot of numpy arrays, which joblib is optimized for)
- how composition of decorated functions is emphasized - I think that's very important
- wrapping outputs of memoized functions in special objects: this encourages composition, and also makes it possible to run pipelines "lazily" by retracing memoized calls without actually loading large objects in memory
- versioning: in a past version of `mandala`, I used the solution you provided (which is now subsumed by the versioning system, but it still quite helpful)
The differences:
- w.r.t. memoization, in `mandala` you can represent data structures in a way transparent to the system. E.g., you can have a memoized function return a list of things, and each thing will have an independent storage address and be usable by downstream memoized functions. Most importantly, this is tracked by the provenance system (though I'm not sure - maybe this is also possible in `provenance`?)
- one big finding (for me) while doing this project is that memoization on its own is not sufficient to manage a complex project; you need some declarative way to understand what has been computed. This is what all the `ComputationFrame` stuff is about.
- finally, the versioning system: as you mention in the `provenance` docs, it's almost impossible to figure out what a Python function depends on, but `mandala` bites this bullet in a restricted sense; you can read about it here: https://amakelov.github.io/blog/deps/
Re:Unison - yes definitely; it is mentioned in the related work on github! A major difference is that Unison hashes the AST of functions; `mandala` is not that smart (currently) and hashes the source code.
- using `joblib` to hash arbitrary objects (which is a good fit for ML which inlcudes a lot of numpy arrays, which joblib is optimized for)
- how composition of decorated functions is emphasized - I think that's very important
- wrapping outputs of memoized functions in special objects: this encourages composition, and also makes it possible to run pipelines "lazily" by retracing memoized calls without actually loading large objects in memory
- versioning: in a past version of `mandala`, I used the solution you provided (which is now subsumed by the versioning system, but it still quite helpful)
The differences: - w.r.t. memoization, in `mandala` you can represent data structures in a way transparent to the system. E.g., you can have a memoized function return a list of things, and each thing will have an independent storage address and be usable by downstream memoized functions. Most importantly, this is tracked by the provenance system (though I'm not sure - maybe this is also possible in `provenance`?)
- one big finding (for me) while doing this project is that memoization on its own is not sufficient to manage a complex project; you need some declarative way to understand what has been computed. This is what all the `ComputationFrame` stuff is about.
- finally, the versioning system: as you mention in the `provenance` docs, it's almost impossible to figure out what a Python function depends on, but `mandala` bites this bullet in a restricted sense; you can read about it here: https://amakelov.github.io/blog/deps/
Re:Unison - yes definitely; it is mentioned in the related work on github! A major difference is that Unison hashes the AST of functions; `mandala` is not that smart (currently) and hashes the source code.