Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: AutoAI – A framework to find the best performing AI/ML model (github.com/blobcity)
41 points by sanketsarang on Nov 12, 2021 | hide | past | favorite | 12 comments



Hi HN, we have seen a lot of AutoML frameworks out there. As a Data Scientist myself, I have refrained from using these because at the end of the day, you have to submit complete source code to your clients, not just a functioning model. That is why we created AutoAI. Given data and target (value to predict), it can automatically discover and fully train the best performing AI solution. Still, most importantly, it also goes on to produce high-quality Jupyter Notebook code. AutoAI does Whitebox AutoML. A much-needed feature for Data Scientists. Do give it a try, and let me know what you think.


Looks like a nice project. I just bookmarked it to try sometime.

At a previous job, my boss wanted me to spend time on AutoML. I based my work on Google’s AdaNet [1] that did architecture search inside a single TensorFlow session. Unfortunately that project seems to have been abandoned.

[1] https://github.com/tensorflow/adanet


Thanks Mark. Please do share your experience. Very keen on hearing how you find our project. We are still in beta, so lots of scope for improvement. We are open to any suggestions you might have.


I noticed a variety of AutoML solutions in the market addressing similiar pain points. How is this different from other solutions/platforms.... Isn't this a saturated space?


Thanks for your question. Yes, we did research the space a lot before making AutoAI. Here is what we found:

PyCaret: Semi-automatic. You do the first run; then you figure the next set of runs. Ensemble models require manual configuration.

Tpot: Does a great job. Generates 4-5 lines of py code too. But does not support Neural Networks / DNN. So works only for problems where GOFAI works.

H2O.ai: They have an open-source flavor, but the best way to use it is the enterprise version on the H2O cloud. The interface is confusing, and the final output is black-box.

Now there are many in the enterprise category, such as DataRobot, AWS SageMaker, Azure etc. Most are unaffordable to Data Scientists unless your employer is sponsoring the platform.

AutoAI: This is 100% automated. Uses GOFAI, Neural Networks and DNN, all in one box. It is 100% White-box. It is the only AutoML framework that generates high-quality (1000s of lines) of Jupyter Notebook code. You can check some example codes here: https://cloud.blobcity.com


Your list excludes most of well-known open-source AutoML tools such as auto-sklearn, AutoGluon, LightAutoML, MLJarSupervised, etc. These tools have been very extensively benchmarked by the OpenML AutoML Benchmark (https://github.com/openml/automlbenchmark) and have papers published, so they are pretty well-known to the AutoML community.

Regarding H2O.ai: Frankly, you don't seem to understand H2O.ai's AutoML offerings.

I'm the creator of H2O AutoML, which is open source, and there's no "enterprise version" of H2O AutoML. The interface is simple -- all you need to specify is the training data and target. We have included DNNs in our set of models since the first release of the tool in 2017. Read more here: https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html We also offer full explainability for our models: https://docs.h2o.ai/h2o/latest-stable/h2o-docs/explain.html

H2O.ai develops another AutoML tool called Driverless AI, which is proprietary. You might be conflating the two. Neither of these tools need to be used on the H2O AI Cloud. Both tools pre-date our cloud by many years and can be used on a user's own laptop/server very easily.

Your Features & Roadmap list in the README indicates that your tool does not yet offer DNNs, so either you should update your post here or update your README if it's incorrect: https://github.com/blobcity/autoai/blob/main/README.md#featu...

Lastly, I thought I would mention that there's already an AutoML tool called "AutoAI" by IBM. Generally, it's not a good idea to have name collisions in a small space like the AutoML community. https://www.ibm.com/support/producthub/icpdata/docs/content/...


Thank you for the feedback Ledell. And congratulations to you on the latest $ 100 million fund raise. Really great to see the space growing.


AzureML and its AutoML are not unaffordable. It's literally a free service. You only pay for any compute you may consume, and for that you only pay the bare VM price. But you don't have to, you can also use your local compute for training.


That is not true actually. You must purchase an Azure VM for training a model. Deploying trained models on your own infrastructure is permitted. Reasonably speaking we would have to call this an enterprise solution, as there is no way to get a trained model without paying Azure fees. Most models will require GPU, and it is not like we can run AzureML on the free GPU offered by Google Colab.


?

https://docs.microsoft.com/en-us/azure/machine-learning/conc...

Not sure where you get your information from. It says it right here, you can use "local computer" for training your model, and that includes AutoML. There is no cost to using Azure ML.


So what we understand from our study is the following:

1. You must have an Azure account with an API key for using their AutoML. An AutoML environment won't get created without a valid key. Both local and cloud runs mandate this.

2. Getting an Azure account requires a credit card on file and comes with a limited-time free trial. This is a big NO NO when it comes to software claimed to be for free use.

3. The free for life services do not call out AutoML anywhere. They do not claim the AutoML environment (a required step) to be free in any form. Check this: https://azure.microsoft.com/en-gb/free/

4. When they say "local" what are they referring to? Run locally on an Azure Notebook, or can I run this locally on my laptop? We have tried and failed ever to get this to run locally on a laptop. So it is not clear whether this is even possible, or the term "local" is misleading.

Have you managed to run Azure ML locally on a laptop, without requiring a connected Azure account?

Yes, if we run the AutoML from our laptop, it uses the API key to create a cloud instance. The data gets uploaded to the cloud and runs on the cloud. Results are thrown back to the local code. We would not call this a local run.

The question is, have you managed actually to use your computer's local resources for training? If so, please do share how this was possible. We would like to know how this was achieved.


This has been my introduction to jupyter and pandas. Thanks!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: