Hacker News new | past | comments | ask | show | jobs | submit login

And another thing that irks me: none of these video generators get motion right...

Especially anything involving fluid/smoke dynamics, or fast dynamic momements of humans and animals all suffer from the same weird motion artifacts. I can't describe it other than that the fluidity of the movements are completely off.

And as all genai video tools I've used are suffering from the same problem, I wonder if this is somehow inherent to the approach & somehow unsolvable with the current model architectures.




I think one of the biggest problems is the models are trained on 2D sequences and don't have any understanding of what they're actually seeing. They see some structure of pixels shift in a frame and learn that some 2D structures should shift in a frame over time. They don't actually understand the images are 2D capture of an event that occurred in four dimensions and the thing that's been imaged is under the influence of unimaged forces.

I saw a Santa dancing video today and the suspension of disbelief was almost instantly dispelled when the cuffs of his jacket moved erratically. The GenAI was trying to get them to sway with arm movements but because it didn't understand why they would sway it just generated a statistical approximation of swaying.

GenAI also definitely doesn't understand 3D structures easily demonstrated by completely incorrect morphological features. Even my dogs understand gravity, if I drop an object they're tracking (food) they know it should hit the ground. They also understand 3D space, if they stand on their back legs they can see over things or get a better perspective.

I've yet to see any GenAI that demonstrates even my dogs' level of understanding the physical world. This leaves their output in the uncanny valley.


They don't even get basic details right. The ship in the 8th video changes with every camera change and birds appear out of nowhere.


As far as I can tell it's a problem with CGI at all. Whether you're using precise physics models or learned embeddings from watching videos, reproducing certain physical events is computationally very hard, whereas recording them just requires a camera (and of course setting up the physical world to produce what you're filming, or getting very lucky). The behind the scenes from House of the Dragon has a very good discussion of this from the art directors. After a decade and a half of specializing in it, they have yet to find any convincing way to create fire other than to actually create fire and film it. This isn't a limitation of AI and it has nothing to do with intelligence. A human can't convincingly animate fire, either. It seems to me that discussions like this from the optimist side always miss this distinction and it's part of why I think Ben Affleck was absolutely correct that AI can't replace filmmaking. Regardless of the underlying approach, computationally reproducing what the world gives you for free is simply very hard, maybe impossible. The best rendering systems out there come nowhere close to true photorealism over arbitrary scenarios and probably never will.


What's the point of poking holes in new technology and nitpiking like this? Are you blind to the immense breakthroughs made today and yet you focus what irks you about some tiny detail that might go away after a couple of versions?


At this phase of the game a lot of people are pretty accustomed to the pace of technological innovation in this space, and I think it’s reasonable for people to have a sense of what will/won’t go away in a few versions. Some of Sora’s issues may just require more training, some of these issues are intrinsic to their approach and will not be solvable with their current method.

To that end, it is actually extremely important to nit-pick this stuff. For those of us using these tools, we need to be able to talk shop about which ones are keeping up, which are work like shit in practice, and which ones work but only in certain situations, and which situations those are.


Neural networks use smooth manifolds as their underlying inductive bias so in theory it should be possible to incorporate smooth kinematic and Hamiltonian constraints but I am certain no one at OpenAI actually understands enough of the theory to figure out how to do that.


> I am certain no one at OpenAI actually understands enough of the theory to figure out how to do that

We would love to learn more about the origin of your certainty.


I don't work there so I'm certain there is no one with enough knowledge to make it work with Hamiltonian constraints because the idea is very obvious but they haven't done it because they don't have the wherewithal to do so. In other words, no one at OpenAI understands enough basic physics to incorporate conservation principles into the generative network so that objects with random masses don't appear and disappear on the "video" manifold as it evolves in time.


> the idea is very obvious but they haven't done it because they don't have the wherewithal to do so

Fascinating! I wish I had the knowledge and wherewithal to do that and become rich instead of wasting my time on HN.


No one is perfect but you should try to do better and waste less time on HN now that you're aware and can act on that knowledge.


Nah, I'm good. HN can be a very amusing place at times. Thanks, though.


How does your conclusion follow from your statement?

Neural networks are largely black box piles of linear algebra which are massaged to minimize a loss function.

How would you incorporate smooth kinematic motion in such an environment?

The fact that you discount the knowledge of literally every single employee at OpenAI is a big signal that you have no idea what you’re talking about.

I don’t even really like OpenAI and I can see that.


I've seen the quality of OpenAI engineers on Twitter and it's easy enough to extrapolate. Moreoever, neural networks are not black boxes, you're just parroting whatever you've heard on social media. The underlying theory is very simple.


Do not make assumptions about people you do not know in an attempt to discredit them. You seem to be a big fan of that.

I have been working with NLP and neural networks since 2017.

They aren’t just black boxes, they are _largely_ black boxes.

When training an NN, you don’t have great control over what parts of the model does what or how.

Now instead of trying to discredit me, would you mind answering my question? Especially since, as you say, the theory is so simple.

How would you incorporate smooth kinematic motion in such an environment?


Why would I give away the idea for free? How much do you want to pay for the implementation?


cop out... according to you, the idea is so obvious it wouldn't be worth anything.


lol. Ok dude you have a good one.


You too but if you do want to learn the basics then here's one good reference: https://www.amazon.com/Hamiltonian-Dynamics-Gaetano-Vilasi/d.... If you already know the basics then this is a good followup: https://www.amazon.com/Integrable-Hamiltonian-Systems-Geomet.... The books are much cheaper than paying someone like me to do the implementation.


Seriously... The ability to identify what physics/math theories the AI should apply and being able to make the AI actually apply those are very different things. And you don't seem to understand that distinction.


Unless you have $500k to pay for the actual implementation of a Hamiltonian video generator then I don't think you're in a position to tell me what I know and don't know.


lolz, I doubt very much anyone would want to pay you $500k to perform magic. Basically, I think you are coming across as someone who is trying to sound clever rather than being clever.


My price is very cheap in terms of what it would enable and allow OpenAI to charge their customers. Hamiltonian video generation with conservation principles which do not have phantom masses appearing and disappearing out of nowhere is a billion dollar industry so my asking price is basically giving away the entire industry for free.


Sure, but I imagine the reason you haven't started your own company to do it is you need 10s of millions in compute, so the price would be 500k + 10s of millions... Or you can't actually do it and are just talking shit on the internet.


I guess we'll never know.


Yeah I mean I would never pay you for anything.

You’ve convinced me that you’re small and know very little about the subject matter.

You don’t need to reply to this. I’m done with this convo.


Ok, have a good one dude.


There are physicists at OpenAI. You can verify with a quick search. So someone there clearly knows these things.


I'd be embarrassed if I was a physicists and my name was associated with software that had phantom masses appearing and disappearing into the void.


Why don't you write a paper or start a company to show them the right way to do it?


I don't think there is any real value in making videos other than useless entertainment. The real inspired use of computation and AI is to cure cancer, that would be the right way to show the world that this technology is worthwhile and useful. The techniques involved would be the same because one would need to include real physical constraints like conservation of mass and energy instead of figuring out the best way to flash lights on the screen with no regard for any foundational physical principles.

Do you know anyone or any companies working on that?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: