Hacker News new | past | comments | ask | show | jobs | submit login

Synthetic data can never contain more information than the statistical model from which it is derived: it is simply the evaluation of a non-deterministic function on the model parameters. And the model parameters are simply a function of the training data.

I don't see how you can "bootstrap a smarter model" based on synthetic data from a previous-gen model this way. You may as well well just train your new model on the original training data.




It's been already proven possible https://arxiv.org/abs/2203.14465


>Synthetic data can never contain more information than the statistical model from which it is derived: it is simply the evaluation of a non-deterministic function on the model parameters. And the model parameters are simply a function of the training data.

The Information in the data isn't just about the output but its rate of occurrence/distribution. If what your base model has learnt is only enough to have the occasional flash of brilliance say 1 out of 40 responses and you are able to filter out these responses and generate as much as you like then you can very much 'bootstrap a better model' by training on these filtered results. You are only getting a function of the model's parameters if you train on its unfiltered, unaltered output.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: