Hacker News new | past | comments | ask | show | jobs | submit | craigsmith's comments login

putting the lie to the end of coders


Software developers spend around 35% of their time testing software, so automating this task not only saves time but increases productivity. Large language models can write code suggestions and so much has been made of their usefulness in unit testing.

GitHub's Copilot, which uses OpenAI’s Generative Pre-trained Transformer model, a derivative of GPT-3, does not explicitly generate unit tests, but it can suggest code snippets for testing.

So, while Copilot can be helpful in generating some initial test cases, it is not a replacement for a comprehensive testing strategy.

Microsoft Research, the University of Pennsylvania, and the University of California, San Diego have proposed TiCoder (Test-driven Interactive Coder), which leverages user feedback to generate code based on natural language inputs consistent with user intent. It uses natural language processing and machine learning algorithms to assist developers in generating unit tests.

When a developer writes code, TiCoder asks the coder a series of questions to refine its understanding of the coder’s intent. It then provides suggestions and autocomplete options based on the code's context, syntax, and language. It generates test cases based on the code being written, suggesting assertions and testing various scenarios.

Both Copilot and TiCoder, as well as other LLM-based tools, may speed up the writing of unit tests, but they are fundamentally AI assistants to human coders who check their work, rather than productive AI-based coders in their own right. So is there a better way?

Reinforcement learning systems can be far more accurate and cost-effective than large language models because they learn by doing.

Diffblue Cover, for example, writes executable unit tests without human intervention, making it possible to automate complex, error-prone tasks at scale.

The product uses reinforcement learning to search the space of all possible test methods, write the test code automatically for each method, and select the best test among those written. The reward function for reinforcement learning is based on various criteria, including coverage of the test and aesthetics, which include coding style that look as if a human has written them. The tool creates tests for each method in an average of one second, and delivers the best test for a unit of code within one or two minutes at most.

Diffblue Cover is more similar to AlphaGo, DeepMind’s automatic system for playing the game Go, than Copilot or TiCoder. AlphaGo identifies areas of a huge search space where there are potential moves to win the game, and then uses reinforcement learning on these areas to select which move to make next. Diffblue Cover does the same with unit test methods, coming up with potential tests, evaluating them to find the best test, repeating this operation until it has built a full test suite.

If the goal is to automate the writing of 10,000 unit tests for a program no single person understands, reinforcement learning is the only real solution. Large deep-learning models just can’t compete – not least because there’s no way for humans to effectively supervise them and correct their code at that scale, and making models larger and more complicated doesn’t fix that.

While large language models like ChatGPT have wowed the world with their fluency and depth of knowledge, for precise tasks like unit testing, reinforcement learning is a more accurate and cost-effective solution.

Mathew Lodge (mathew.lodge@diffblue.com) is CEO of Diffblue, an Oxford, UK-based AI startup.


Software developers spend around 35% of their time testing software, so automating this task not only saves time but increases productivity. Large language models can write code suggestions and so much has been made of their usefulness in unit testing.

GitHub's Copilot, which uses OpenAI’s Generative Pre-trained Transformer model, a derivative of GPT-3, does not explicitly generate unit tests, but it can suggest code snippets for testing.

So, while Copilot can be helpful in generating some initial test cases, it is not a replacement for a comprehensive testing strategy.

Microsoft Research, the University of Pennsylvania, and the University of California, San Diego have proposed TiCoder (Test-driven Interactive Coder), which leverages user feedback to generate code based on natural language inputs consistent with user intent. It uses natural language processing and machine learning algorithms to assist developers in generating unit tests.

When a developer writes code, TiCoder asks the coder a series of questions to refine its understanding of the coder’s intent. It then provides suggestions and autocomplete options based on the code's context, syntax, and language. It generates test cases based on the code being written, suggesting assertions and testing various scenarios.

Both Copilot and TiCoder, as well as other LLM-based tools, may speed up the writing of unit tests, but they are fundamentally AI assistants to human coders who check their work, rather than productive AI-based coders in their own right. So is there a better way?

Geoff Hinton from Google points out that we learn to play basketball by throwing the ball so it goes through the hoop. We don’t learn the skill by reading about basketball – we learn by trial and error. And that’s the core idea behind Reinforcement Learning, an area of AI that has demonstrated impressive performance in tasks like game-playing. Reinforcement learning systems can be far more accurate and cost-effective than large language models because they learn by doing.

Diffblue Cover, for example, writes executable unit tests without human intervention, making it possible to automate complex, error-prone tasks at scale.

The product uses reinforcement learning to search the space of all possible test methods, write the test code automatically for each method, and select the best test among those written. The reward function for reinforcement learning is based on various criteria, including coverage of the test and aesthetics, which include coding style that look as if a human has written them. The tool creates tests for each method in an average of one second, and delivers the best test for a unit of code within one or two minutes at most.

Diffblue Cover is more similar to AlphaGo, DeepMind’s automatic system for playing the game Go, than Copilot or TiCoder. AlphaGo identifies areas of a huge search space where there are potential moves to win the game, and then uses reinforcement learning on these areas to select which move to make next. Diffblue Cover does the same with unit test methods, coming up with potential tests, evaluating them to find the best test, repeating this operation until it has built a full test suite.

If the goal is to automate the writing of 10,000 unit tests for a program no single person understands, reinforcement learning is the only real solution. Large deep-learning models just can’t compete – not least because there’s no way for humans to effectively supervise them and correct their code at that scale, and making models larger and more complicated doesn’t fix that.

While large language models like ChatGPT have wowed the world with their fluency and depth of knowledge, for precise tasks like unit testing, reinforcement learning is a more accurate and cost-effective solution.

Mathew Lodge (mathew.lodge@diffblue.com) is CEO of Diffblue, an Oxford, UK-based AI startup.


I spoke about this new technology with Ilya Stutskever, a co-founder of OpenAI, about battling hallucinations and OpenAI's army of human disciplinarians to make GPT-4 behave.


Even as the US leads in research, China is edging ahead in the AI age of implementation


In the world of spreadsheets and data analysis, a new player has emerged to shake up the game. Akkio, the no-code AI company, has launched Chat Data Prep, a revolutionary machine learning platform that allows users to transform data using ordinary conversational language.

Gone are the days of struggling with complicated formulas and formatting commands in Excel. With Akkio's Chat Data Prep, users can simply type in conversational language to make changes to their spreadsheet data. Leveraging AI and large language models, the platform interprets the user's requests and makes the necessary changes to the data.

According to Jonathon Reilly, co-founder of Akkio, this new method of interacting with data results in a 10-fold reduction in the time it takes to prepare data for analysis. With Chat Data Prep, users can reformat dates, perform time-based math operations, and even fix messy data fields with a simple conversational command.

Large language models like ChatGPT have paved the way for this breakthrough, allowing for natural language processing capabilities that generate human-like text and images.

These models will revolutionize how we interact with and analyze data, making it easier and more efficient than ever before. One example of this is Chat Data Prep’s ability to quickly and easily manipulate large data sets with simple commands, instead of struggling to learn complex formulas or macros. This can be especially beneficial for those who are not experienced with data analysis, but still need to work with large amounts of information.

Since its public release in November, ChatGPT has led to a rush of business applications. Google has come out with a Chrome extension called GPT for Sheets, which allows users to manipulate data with conversational language, and Microsoft says it will integrate ChatGPT into all of its products.

But Akkio's Chat Data Prep is the first solution to utilize this technology for the preparation and transformation of large volumes of data from any database product. People who make their living as Excel jockeys are impressed.

“If AI can comprehend and set it up, that’s fantastic,” said Edward Godesky, a freelance Excel expert with more than a decade of financial and business analytics experience. “It’s another tool you can use for doing Excel work.”

Data scientists and analysts spend a significant amount of their time on data preparation, with some estimates putting it as high as 80%. Chat Data Prep aims to alleviate this issue by allowing users to clean and transform datasets in minutes, allowing for more time to be spent on insight and modeling.

By using natural language prompts, Chat Data Prep can help to make data analysis more intuitive and user-friendly, allowing anyone to gain insights from their data.

Additionally, Akkio’s product can improve the accuracy and efficiency of data analysis by quickly and easily identifying patterns and trends in large data sets. This can be particularly useful for industries such as finance and healthcare, where large amounts of data need to be analyzed.

By making data analysis more accessible, efficient and accurate, Chat Data Prep has the potential to change almost everything, from how we work with data to how we make important decisions.

Akkio's platform is already being used for a wide range of applications, from lead scoring and revenue forecasting to ad spend optimization and custom-built products. With the addition of Chat Data Prep, the platform now offers machine learning for data preparation as well.


Democrats relied more heavily on AI in finding donors, according to professional fundraisers, bringing in more money from individual, small-dollar donors than their Republican rivals.


Agree that learning is hard and AI is no magic bullet. But has there ever been anything as radical as this approach?


Cyral is the most interesting solution I've seen to the cloud security problem.


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: