Hacker News new | past | comments | ask | show | jobs | submit login

One trick that works surprisingly often when you don't understand the root cause: scale the number of batches, rather than the batch size (think merge sort or map/reduce).

If 10% of your batches flake out, you rerun them and expect 1% to have to be run a third time, etc., giving you an O((batch run time)*log(number of batches)) expected wait until you can merge/reduce. Making the batches smaller means you have a shorter wait to merge/reduce (but also possibly worse merge time, so it doesn't always work; YMMV, etc.).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: