Hacker News new | past | comments | ask | show | jobs | submit login
Collection of stand-alone Python machine learning recipes (2021) (github.com/rougier)
84 points by adbarba on Aug 23, 2023 | hide | past | favorite | 8 comments



A book along these lines is "Data Science from Scratch, 2nd ed.", by Joel Grus. I wish that book had used NumPy for array operations, although that would be less "from scratch".


Going to use this comment thread to field an opinion of mine to see if it's worth elaborating on further:

Python added an infix operator for matrix multiplication back in 2014, and the prevalence of "Python+NumPy for data analysis and number crunching" has only exploded since. I think it's time that NumPy becomes part of the standard library which ships with the CPython interpreter. There are many reasons why this would become a logistical nightmare, but personally I think it'd be worthwhile if it means that the presence of a Python interpreter equates to the presence of a fantastic numerical computing library, and it means anything built on top of NumPy could be truly "pure Python". It means I can ship a 2KiB zipapp and as long our Python versions match then there's no more fiddling with dependency management nonsense when distributing scripts.

I acknowledge that much like Golang, Python's ecosystem mostly assumes you'll be capable of pulling 3rd party dependencies—hence the inclusion of pip and venv. Unlike Golang, which downloads the dependency tree as source and compiles it on each install, tooling for working with 3rd party Python dependencies is messy and requires the user to learn about lots of tooling, especially if they need to build their dependencies from source.

I also acknowledge that 3rd party dependency management is much better today than it was 10 years ago. Conda/poetry/pipenv/etc. have made things better, and in many cases people are deferring to tools not directly associated with the Python ecosystem to simplify their dependency management (e.g. Docker). So many will see the inclusion of NumPy in the stdlib as a bunch of work that is already solved by other tools.

As a final note, this would also force NumPy to have versions that are tied to the Python interpreter, which would be another change for the maintainers of NumPy.

I'll conclude with the statement that just because Python users have become accustomed to the complexity associated with their dependency tooling doesn't mean it always has to be that way. It is my unconfirmed belief that a large amount of teaching content and 3rd party libs that could switch from "please download this binary blob from PyPA servers for this to work" to "If you have a Python interpreter you're good to go".


certain groups are already rewriting Numpy for cloudy destinations, this might be the 3rd year of that already. It links to Pandas re-architecture and some other things. Nation-level skill people are already recruited and doing it now.


My biggets gripe with working alongside data scientists (using python) is the way everything is named with terrible acronyms. Virtually everything in this repo is named with a 3 letter acronym.


Maybe I’m not knowledgeable enough about the subject to find this useful, but the list has no information about what any of the recipe does or when you should use them, what strength or tradeoff they present…


You're right to ask. Some of the methods like Eigenfaces are antiquated when compared to today, existing, stand-alone methods. Sadly, the context of the listed papers and implementations is lacking. It's not because it's outdated that it's worthless given the right outlook.


A little reading goes a long way in ML for sure.

One nice thing about having all these recipes is that if you have a classification problem, or stochastic scheduling problem, you can run em' all and cross validate to get a sense of what's the best thing for your data! :D


Papers or books are cited for each code -- they would cover that.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: