And we have funding to compile all of SciPy. If anyone here is interested to help out, please get in touch with me. Once LFortran can compile SciPy, it will greatly help with maintenance, such as on Windows, WASM, Python wrappers, even modernization later on.
> if SciPy accepts this modern Fortran codebase, then I think the tide for Fortran will change.
I believe that you and the LFortran project are truly turning the tide on Fortran. I wish to see the production release of LFortran as soon as possible.
If PRIMA can also contribute to “turning the tide on Fortran”, I will feel honored. All the pain, depression, and frustration that I have experienced in the past three years will then be partially compensated.
Let us the community work together to get PRIMA included in SciPy and see whether the tide on Fortran will improve.
LFortran sounds to be a genenius project. The interactivity is a critical block that is missing from FORTRAN ecosystem. Hope things will get better with LFortran!
Wow, just a few weeks ago I wanted an optimization function in Python and was already using SciPy, so I reached for scipy.optimize and ran into quite a few of the bugs described first hand, which had me kind of rattled. I just assumed whatever optimizer SciPy uses would be reliable even if it's not completely optimal or cutting edge.
I love the dedication these folks have to their old colleague too. They took a great deal of care in relicensing, so that SciPy could use their derived code even though the original author had passed. Nothing but admiration for them!
Anyways, I was lucky enough that I could restrict my search space on some assumptions and just brute force the problem in very reasonable time, but this is quite timely for me and great to see!
> I just assumed whatever optimizer SciPy uses would be reliable
That's usually a bad assumption for wrapped math libraries outside of the foundational ones like numpy. They tend to have a pretty low ratio of "mileage" to complexity in the first place, it's easy for the project to become a zombie without users realizing, and the layers of challenges to fixing issues leave many of them unresolved. If you need reliability, you'll have to deliberately choose software that's mature and actively maintained.
I'm gradually moving to (modern) Fortran, and it's just lovely to see all this activity. Fortran really reaches a sweet spot if all you want to do is write straightforward numerical algorithms that run reasonably fast.
reasonably fast is a bit of an understatement IMO. I don't know another language that is as machine friendly AND at the same time allows for reasonably high productivity and can be understood by (non-Computer-) scientists.
> reasonably fast is a bit of an understatement IMO.
I mean that a naive loop that traverses your data does not take several minutes like in Python. But there's a guy in my lab who takes my stupid loops, avxs the fuck out of them, and they become 8 times faster. Now, that's what you would call unreasonably fast (short of rewriting the whole thing for the GPU).
Fair enough. Btw. IMO rewriting for GPU (if you do have the hardware) can be quite a bit simpler than doing vector optimisations for CPU, depending on the codebase. Back in my research days I actually created a framework for doing just that with Fortran: https://github.com/muellermichel/Hybrid-Fortran.
How fast ? I gather, for example, that Intel's Fortran is a frontend to LLVM. So the speed of that Fortran can't be much different than anything compiled by LLVM such as rust or C. Correct ?
the main question is: how fast after how much learnings and optimizations that went into the 'naive' version of the application.
Just an example on how to declaring a couple of input float pointers in a fully optimized way, avoiding aliases and declaring it readonly (if I remember correctly, it's been a few years:
of course what happens then is that people will do a typedef and hide it away... and then every application invents its own standards and becomes harder to interoperate on a common basis. in Fortran it's just baked in.
Another example: multidimensional arrays. In C/C++ it either doesn't exist, is not flat memory space (and thus for grid applications very slow), or is an external library (again restricting your interop with other scientific applications). In Fortran:
integer, parameter :: n = 10, m = 5
real(32), dimension(n, m) :: foo, bar, baz
Again, reasonably simple to understand, and it's already reasonably close to what you want - there are still ways to make it better like memory alignment, but I'd claim you're already at 80% of optimal as long as you understand memory layout and its impact on caching when accessing it (which is usually a very low hanging fruit, you just have to know which order to loop it over).
> Another example: multidimensional arrays. In C/C++ it [...] is not flat memory space (and thus for grid applications very slow)
Can you clarify what you mean by this? A naively defined multidimensional array,
float foo[32][32] = ...
is stored in sizeof(float) * 32 * 32 = 4096 consecutive bytes of memory. There's no in-built support for array copies or broadcasting, I'll give you that, but I'm still trying to understand what you mean by "not flat memory space".
int a, b;
int arr[a][b]; // ERROR: all array dimensions except for the first must be constant
//or even:
auto arr = new int[a][b]; // ERROR: error: array size in new-expression must be constant
All you can do if you want a dynamic multidimensional array is:
int a, b;
auto arr = new int*[a];
for (int i = 0; i < a; i++) {
arr[i] = new int[b];
}
But now it's a jagged array, it's no longer flat in memory.
[disclaimer: I don’t know what I’m talking about in this comment]
Could you say like,
auto arr = new int[a*(b+1)];
int \* arr2 = (int\*)arr;
for(int i=0;i<a;i++){
arr2[i]=arr+(i*(b+1))+1;
}
and then be able to say like arr2[j][k] ?
(Assuming that an int and a pointer have the same size).
It still has to do two dereferences rather than doing a little more arithmetic before doing a single dereference, but (other than the interspersed pointers) it’s all contiguous?
But maybe that doesn’t count as being flat in memory.
You can do some trick like that, though arr2 would have to have type int** for that to work, and it wouldn't really work for an int array (though there's no real reason to store the pointers into arr inside arr itself - they could easily go into a separate place).
However, this would still mean that you need to do 2 pointer reads to get to an element (one to get the value of arr2[i], then another to get the value of arr2[i][j]). In C or Fortran or C++ with compile-time known dimensions, say an N by M array, multiArr[i][j] is a single pointer read, as it essentially translates to *(multiArr + i*M + j).
Ok, I was wrong about it not being flat, but due to lack of support from my experience it is usually not used to represent grids in a simulation, as dealing with it tends to involve loops, which may or may not be optimised away. In Fortran multidim arrays are supported so much that they are used everywhere. Even debuggers allow you to display arbitrary slices of multidim arrays in local scope.
Indeed, but they're still in the Standard, so if an implementation does them, it does them in a portable way - and gcc and Clang both support them, so there's no practical issue with access to the feature.
GHC with `-fllvm` is not going make Haskell any easier to compile just because it's targeting LLVM. Fortran is (relatively!) easy to make fast because the language semantics allow it to. Lack of pointer aliasing is one of the canonical examples; C's pointer aliasing makes systems programming easier, but high-performance code harder.
Pointers in Fortran can alias with each other and with valid pointer targets. What makes Fortran potentially easier to optimize are its rules against dummy argument aliasing in most (but not all) cases.
Thanks for the precise Fortran terminology; that's what I meant to say in my head but you're correct. From the linked website for everyone else:
"Fortran famously passes actual arguments by reference, and forbids callers from associating multiple arguments on a call to conflicting storage when doing so would cause the called subprogram to write to a bit of that storage by means of one dummy argument and read or write that same bit by means of another."
You have to caveat “faster than C.” It is basically impossible to beat C with sprinkled in assembly or intrinsics as appropriate.
Most people, even good C programmers, don’t write that kind of C, though.
The point of Fortran is that you can write Fortran code that is almost as fast as that nightmare C, but you can do it and still get your degree on time.
Negative. Fortran is in a better position to guarantee certain facts about sequential access in arrays and solid information about pointer aliasing that is generally not available in C or C++ unless the author of the C/C++ is extremely aware for compiler quirks and pragma. Fortran has a "just works" attitude to high speed array processing where as other languages are focused on edge cases that are good for PhD thesis on general computing optimization but rarely work in the easiest case without extensive pragmas.
See also CUDA. Sure you can write a C to CUDA converter auto-vectorizer but its likely to have all sorts of bugs and usually never work right except in rare hand-tune cases. May as well just write CUDA from scratch if it is to be performant. Same for array processing, wanna array process? Use compiler for array processing like Fortran or ISPC.
I use fortran in my day to day work and most of my work has been modernizing and refactoring an oldish code. Its amazing how nice it can be to work in once you've removed and reorganized the legacy cruft.
I kinda tried this a few years back, but found the documentation/lack of modern tutorials appalling ... but there's been a nice explosion of activity over the last few years (e.g. fpm, lFortran), which makes me think this may have changed considerably.
Also, any tips on nice ways/resources that helped you get (back) into it?
[50] M. J. D. Powell. A direct search optimization method that models the objective and constraint functions by linear interpolation. In S. Gomez and J. P. Hennart, editors, Advances in Optimization and Numerical Analysis, pages 51–67, Dordrecht, NL, 1994.
[53] M. J. D. Powell. UOBYQA: unconstrained optimization by quadratic approximation. Math. Program., 92:555–582, 2002.
[56] M. J. D. Powell. The NEWUOA software for unconstrained optimization without derivatives. In G. Di Pillo and M. Roma, editors, Large-Scale Nonlinear Optimization, volume 83 of Nonconvex Optimization and Its Applications, pages 255–297, Boston, MA, USA, 2006.
[58] M. J. D. Powell. The BOBYQA algorithm for bound constrained optimization without derivatives. Technical Report DAMTP 2009/NA06, Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK, 2009.
Good coffee is needed to understand Powell’s papers on these algorithms. If you don’t have, section 3 of the following paper is a good place to start with:
I remember using a program to find a function maximum using parabolic approximation on my programmable calculator in the 80s. Everything new is well forgotten old.
It sounds like this was a difficult task. The motivation to fulfill Prof. Powell's request and help the community of derivative-free optimization users must have been strong. Congratulations on your achievement!
From the GitHub README:
> In the past years, while working on PRIMA, I have spotted a dozen of bugs in reputable Fortran compilers and two bugs in MATLAB. Each of them represents days of bitter debugging, which finally led to the conclusion that it was not a problem in my code but a flaw in the Fortran compilers or in MATLAB. From a very unusual angle, this reflects how intensive the coding has been.
> The bitterness behind this "fun" fact is exactly why I work on PRIMA: I hope that all the frustrations that I have experienced will not happen to any user of Powell's methods anymore. I hope I am the last one in the world to decode a maze of 244 GOTOs in 7939 lines of Fortran 77 code — I have been doing this for three years and I do not want anyone else to do it again.
No, it is not difficult. It is way more than that. It is frustrating and depressing according to my experience attempting to modernize a FORTRAN 77 codebase (I gave up finally). Navigating through a quagmire of GOTO statements felt like a desperate endeavor. It's hard to fathom how someone could work on such a project for three years, especially if it appears to be a solo effort.
Upon examining the original FORTRAN code and accompanying papers, I realized that the complexity of the task was far from trivial. Working alone for three years, especially on such a complex codebase, requires a significant amount of dedication and perseverance.
Going to the Github issue page and reading the exchange is fascinating. Most of the discussion is about licensing and through the thread you see it getting resolved!
Also the author's dedication to Prof. Powell seems amazing. I know nothing about Prof. Powell but I am sure he would have much appreciated it.
I have some funding available if someone here would like to undertake the project of getting PRIMA into SciPy, although the person must come to work at my university in Hong Kong, as the funding cannot be used online/abroad. I can open a research assistant or postdoc position for the project. My contact can be found at https://github.com/libprima/prima . Thanks.
I'm surprised not to see a single mention of NLopt [0] in the discussion. I was under the impression most algorithms implemented in NLopt are in C and not using any Fortran77. But I'm sure the author knows something I don't (I've never really looked at NLopt's sources, but I've been using it for years).
Anyhow, the more alternatives the merrier I suppose!
“There do exist "translations" of Powell's Fortran 77 code into other languages. For example, NLopt contains a C version of COBYLA, NEWUOA, and BOBYQA, but the C code in NLopt is translated from the Fortran 77 code straightforwardly, if not automatically by f2c, and hence inherits the style, structure, and probably bugs of the original Fortran 77 implementation.”
I'm very glad to see these algorithms get the maintenance they deserve. I pulled them apart and used lots of their pieces and theory for my PhD; MJD Powell's original code was subtle and difficult to understand. Hats off to those doing to the hard work of keeping them alive. These algorithms are the absolute best in their domain, and should be more widely known.
Michael James David Powell FRS was “a British numerical analyst who was among the pioneers of computational mathematics” [1]. He was the inventor/early contributor of Quasi-Newton method, Trust-Region method, Augmented Lagrangian method, and SQP method [1,2]. Each of them is a pillar of modern numerical optimization. He also made significant contributions to approximation theory and methods [1,2,3].
Among numerous honors, Powell was one of the first two recipients of the Dantzig Prize [4] from the Mathematical Programming Society/Society for Industrial and Applied Mathematics (SIAM). This is considered the highest award in optimization.
See also his Google Scholar profile [5] and the memorial page at [6].
Cool, thanks so much for this info. I only knew the name from scipy, where one of the optimization method options carries his name. Good to learn more … and wow what a contribution.
This is truly impressive! While I'm not well-versed in these specific techniques, I do recall learning about Powell during a mathematics course in the past. It's reassuring to know that capable individuals are maintaining his work.
Some time ago, I attempted to update an antiquated FORTRAN program for a particular project, but regrettably, I had to abandon the effort. Navigating the labyrinth of countless GOTO statements was both challenging and disheartening. I became genuinely concerned about the potential impact on my mental well-being.
That being said, I have a profound admiration for the individual who successfully modernized such a complex FORTRAN project. Their achievement is undoubtedly commendable!
I work in this field. Derivatives (and in fact second derivatives or a Hessian matrix), when they are reliable and when you have them, makes your search faster.
Unfortunately for many problems, they are either unavailable or expensive to compute. (Imagine a case when your “function” is a full CFD simulation or a 2 hr job — every time an optimizer iterates it has to run an entire simulation. You can’t see the function and you can’t really calculate the derivatives — except crudely through finite differences).
Derivative free optimizers choose to forgo derivative information and instead try to efficiently search the space to optimize the objective. Their convergence is often much slower than derivative based optimizers but they are also much simpler to implement and tend to not get stuck in valleys. All they really are are a way to cleverly evaluate f(X) to find the optimum, without needing f’(X) or f’’(X).
Caveat: derivative free optimizers are terrible for constrained optimization however (especially if your constraints are anything but simple bounds). You can use barrier methods but still they’re not great.
Also there’s an element of luck involved. It’s hard to benchmark derivative free methods because the methods themselves don’t matter too much (Nelder Mead Simplex method outperforms Powell’s method in some cases but not others. Powell’s method is well known but isn’t categorically better than any other algorithm) because when you don’t use derivatives, other higher order effects like initial point and shape of terrain dominate over the the choice of algorithm. I wouldn’t be too hung up on picking the right algorithm — instead I would focus my efforts on heuristics to get to a good initial point cheaply.
According to the papers metnioned by others, the algos under discussion do not seem to support this point. The author of the project mentiones that the FORTRAN 77 implementation is `nontrivial to understand', which motivates this work.
`Caveat: derivative free optimizers are terrible for constrained optimization however (especially if your constraints are anything but simple bounds). You can use barrier methods but still they’re not great.'
Lincoa and cobyla in the package can handle problems with nontrivial constriants according to the documentation of the git repository, though I do not know how well they work.
Many common functions have non-derivative friendly discontinuities. Logic functions, rounding, conditional clauses, sawtooth functions, etc.)
Other non-derivative functions have flat constant zero-derivative areas. Like logic functions & rounding (again), etc.
Some functions have derivatives but have so many minima (or maxima) that non-optimal or non-acceptable minima are likely to trap learning. Sine and cosine waves, for instance.
Some functions even have repeating discontinuities going through +/-infinity. Tangent and secant.
Or, in some cases a model may exist of the system (or part of the system) to be optimized, which can be used to calculate the system response, but it contains relationships that are not easily accessible, so accurate derivatives are difficult or time costly to estimate. Since there is no symbolic expression to allow for efficient calculation of the derivatives.
Yes sometimes it’s hard to measure a derivative. Eg when doing hyperparameter tuning in ML, you can read out a metric at a given choice of parameters, but it’s generally not easy to get a gradient.
Shameless plug: I happen to have recently written a package for the opposite limit! It finds roots when you can only measure the derivative.
(It was written in another time, don't expect it to be nice. My 20-year-old Octave/Matlab wrappers on that page have almost certainly bit-rotted, don't expect them to work.)
I assume backprop is still most efficient for the neural net itself, rather than one of these algorithms? I am doing a course and I am aware that there are other algorithms than gradient descent but haven’t seen the details yet on that.
I know things like cross entropy loss are designed to be easy to differentiate and probably more efficient to do that than something that approximates it.
Yeah, backprop gives efficient gradient calcs to train at a given net architecture. To find the best architecture though people try various other methods, eg random search or Gaussian processes, which don’t evaluate derivatives wrt architecture Params.
All of the above. Some functions might not have derivatives in some places (where the function is discontinuous), the derivatives can be very difficult or impossible to calculate even if they exist.
I recently wrote some straightforward typescript code, and I want to find the inputs that make one of the outputs exactly zero. My options are to a) rewrite it in an autograd tool b) manually calculate the derivatives and write a new function for them, or c) just use a root-finder that doesn't need derivatives. I have the cpu cycles to spare, so it will save a huge amount of time to just do c.
And as a tangential gripe, I wish js had operator overloading. In python, I could just call my existing function with a symbolic argument and get a symbolic output to differentiate, because my `x+y*z` code would work on both kinds of arguments.
I'm trying to come up with a simple example. I think that for bunch of flying hard spheres in a box, any measurement that requires sufficiently time spaced samples of their positions will have trouble being differentiated.
Sometimes you really don't know how to calculate the derivatives.
But perhaps more often, even if in principle you could do the additional work, formulate how to calculate derivatives, and add the required extra code, you just don't want to invest the additional resources, when you can just call a derivative-free optimization library. The optimization will run slower, but perhaps you have some extra seconds so spare.
If you are using a derivative based optimization, smooth derivatives of your function need to exist.
Trying to use a derivative based optimization where a smooth derivative doesn't exist will likely result in getting stuck at a cliff where there's no change to either the function value or to the gradient.
I work professionally in computational optimization, so here are some of my observations coming from the world of mechanics.
In my experience, it's because the client is trying to reuse an existing simulation code for something that requires optimization. This could be something like imaging (parameter estimation), calibration, or control. For example, say someone has something like a PDE solver that can simulate sound waves going through a media. If you can match the output of this simulation to some collected data, the parameters for the simulation will tell you what material is where.
Anyway, the issue is that these existing codes are rarely instrumented or derived in a way to make it easy to get derivatives. Management doesn't want pay for a rewrite, so they demand that existing equity (the simulator) be used. Then, a derivative free optimizer (DFO) gets slapped on as a first attempt. To be clear, this doesn't work particularly well, but it tends to be the place where I most often see these algorithms used.
As to why it doesn't work very well, derivative based optimizers can scale to the order of hundreds of millions of variables without too many issues (modulo the kind of problem, but it works very well for a lot of problems of value). DFOs tend to have issues past a few dozen variables.
Now, for anyone not wanting to end up in this conundrum, when you write your original simulation code parameterize on the floating point type. Basically, don't write the code with double, use some kind of generic, template parameter, or whatever on something we'll call MyFloat. The purpose here is to allow the code to be more easily compatible with automatic differentiation (AD) tools, which will give a real and machine precision exact derivative. Don't do finite differences. Don't think reinstrumenting later is easy. It's not. It's a pain. It's expensive. And, to be clear, AD can be done poorly and it can be slow. However, I've never seen a code instrumented with AD and a good gradient based optimizer do more poorly than a derivative free optimization code. They've always done dramatically better. Beyond just giving the derivative, I'll also mention that even if the client wants to recode the simulator to give an analytic derivative for speed, it is vastly easier to do so if the results can be checked with a simulator instrumented with AD. There's also other tricks of value that can be used in a properly instrumented code such as interval arithmetic for some kinds of sensitivity analysis.
Anyway, there are other reasons to use derivative free optimization. Some functions just flat out don't have derivatives. However, in the world of mechanics, most do and DFOs gets abused due to poor prior planning. In either case, please, for the love of math, just parameterize on your floating point type. It would make my job much easier.
Agreed. If (approximate) 1st-order information can be obtained in any way (even by *carefully* deployed finite difference), then gradient-based methods should be used. It is wrong to use DFO in such cases.
> DFOs tend to have issues past a few dozens variables.
It seems that the PRIMA git repo https://github.com/libprima/prima shows results on problems of hundreds of variables. Not sure how to interpret the results.
Imagine if you were trying to optimize a real physical system you didn't have a model of. These methods will still work; you could implement them "by hand" if necessary.
“Finally, a fortran package not written in Fortran77 but one trying to be approachable and maintainable rather than the product of unhappy PhD students spending many months to eke out a little more in an incomprehensible, Oracle-comparable mess.”
“Renowned for their robustness and efficiency, these solvers are used in a wide
spectrum of applications, for instance, aeronautical engineering [25], astronomy [39], computer
vision [33], robotics [40], and statistics [5].” These algorithms seem truly used in science and engineering.
[5] D. M. Bates, M. Mächler, B. Bolker, and S. Walker. Fitting linear mixed-effects models
using lme4. J. Stat. Softw., 67:1–48, 2015.
[25] F. Gallard et al. GEMS: a Python library for automation of multidisciplinary design
optimization process generation. In 2018 AIAA/ASCE/AHS/ASC Structures, Structural
Dynamics, and Materials Conference, Kissimmee, FL, USA, 2018. AIAA.
[33] H. Izadinia, Q. Shan, and S. M. Seitz. IM2CAD. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pages 5134–5143, San Juan, PR, USA, 2017.
IEEE.
[40] K. Mombaur, A. Truong, and J. P. Laumond. From human to humanoid locomotion—an
inverse optimal control approach. Auton. Robot., 28:369–383, 2010.
[39] G. A. Mamon, A. Biviano, and G. Boué. MAMPOSSt: modelling anisotropy and mass
profiles of observed spherical systems I. Gaussian 3D velocities. Mon. Not. R. Astron. Soc.,
429:3079–3098, 2013.
I think there are a few derivative-free global optimizers, curious what sets this one apart? see e.g. https://www.microprediction.com/blog/optimize . If the current scipy-native one is unmaintained and this one is superior it certainly makes sense to update.
The proposal isn't a "new" algorithm, rather a rewrite (plus bug fixes) of the existing scipy implementation of "COBYLA" (which I don't think it tested in your linked comparison).
These algorithms work best on smooth functions with mid-to-low 100's of variables. They are very sample efficient, often obtaining superlinear convergence using just a linear number of evaluations of the objective function.
The extensive tests are vital in this project. The tests do not only verify the faithfulness of the implementation but also check that the solvers behave properly even if they are invoked with improper inputs or encounter failures of function evaluations.
My own focus is to compile all of SciPy with LFortran, we can already fully compile the Minpack package, here is our recent status update: https://lfortran.org/blog/2023/05/lfortran-breakthrough-now-...
And we have funding to compile all of SciPy. If anyone here is interested to help out, please get in touch with me. Once LFortran can compile SciPy, it will greatly help with maintenance, such as on Windows, WASM, Python wrappers, even modernization later on.