Hacker News new | past | comments | ask | show | jobs | submit login
OnScreen: AI generated long form sci fi TV show (bengarney.com)
54 points by bengarney 11 months ago | hide | past | favorite | 21 comments



Well that was unexpectedly awesome. It's useful to read even for people doing enterprise AI work, just to understand the workflow that went into improving the output quality.

I've been dreaming about this possibility since about the release of GPT-2, amazing to see someone made it. The current status-quo is very dissatisfying: sci-fi is only really made by a handful of huge US networks that insist on filling stories with useless but pervasive, offensive and ham-fisted attempts at social engineering. Beyond being bad in its own right it often breaks the script, e.g. you can guess who is going to end up being be a bad and good character just from their race and gender, making it hard for script writers to genuinely surprise you.

That said, I don't think having AI write the scripts from scratch is the right way to go here. The dialogue for the first episode still smells of RLHF, with characters being far too complimentary to each other and having bizarre verbal ticks. And is it needed? The world is full of people with smart stories who want to tell them, but we're in an era when reading is in decline. So the most interesting part of this is all the tooling that comes after that point: the rug smoothing, the AI-generated voice acting and especially the game engine based renderer that can generate videos given simple instructions. The blog posts sort of glide over that part, I guess due to the author's background in game engine development, but it seems the most useful part actually.

The key here is going to be connecting people with different skills in an open-source or more YouTube like system that allows people to remix each other's show kits (bibles, 3D objects, scene lots etc), so someone who develops a great world can accept fan episodes written with that show kit and then share in the monetization of them. Something like that would make story telling way more decentralized and allow it to get somehow "back to reality".


You've definitely hit on something here.

> That said, I don't think having AI write the scripts from scratch is the right way to go here. The dialogue for the first episode still smells of RLHF, with characters being far too complimentary to each other and having bizarre verbal ticks. And is it needed? The world is full of people with smart stories who want to tell them, but we're in an era when reading is in decline.

I'm not sure that it's right to say that the scripts are written "from scratch" -- the "Bible" for the series is hand-written. From Part 2 of the blog post:

> Episode generation is autonomous, but the show bible is human-made. The prompts and code that control the LLM are human-made, too. Each episode’s output is closely reviewed by humans. Because models often change, and each new episode tends to reveal bugs/weaknesses in the system, prompts get tweaked by humans, too. This is less and less necessary as more episodes are produced.

If the hierarchy goes Bible (series) -> Synopsis (Episode summary) -> Script (scene details), then the author is hand-writing #1, and you're suggesting humans hand-writing #3.

> So the most interesting part of this is all the tooling that comes after that point: the rug smoothing, the AI-generated voice acting and especially the game engine based renderer that can generate videos given simple instructions. The blog posts sort of glide over that part, I guess due to the author's background in game engine development, but it seems the most useful part actually.

The visualizer / generator certainly is the most novel and useful part of this. I had the same struggles / hangups with the overly-complimentary dialogue in E1 as you did, and it smells much of GPT-4. That said, I agree with the author -- this feels like the first "self-hosting" version of this entire pipeline. Steve Newcomb wrote an article on the idea of taking the lessons learned from CI/CD pipelines and applying them to movie development:

https://stevenewcomb.substack.com/p/a-whole-new-way-to-creat...

Now that the OnScreen system is "self-hosting" (maybe not the right analogous word) and producing the entire movie when clicking "build", it's possible to hand-tune things as needed to realize a vision -- with whatever level of detail and abstraction that the author would want -- whether it's at the "Bible" level, or on a more detailed note.


Great feedback. Thank you Mike and HanClinto.

I am planning on doing some more articles/director commentary as it goes along.

I have a number of episodes in the queue and each one is better than the last. My plan is to release an entire season of 12 or so.

The "I'm a GPT that wants everyone to be friends and how" is increasingly better in those episodes.

Even incremental improvements in stuff like background music make a big big difference.

I really want to do a v2 that is more of a "copilot" than an "AI first" experience. But I need partners to help with funding; I've taken it about as far as I can on a solo basis. The next step is a team of 4-5 people levelling it up. Every piece could be 10x better, and it would be a different beast entirely if that happened. I think there are some super exciting directions this could go.

The vision of a distributed creator system is very interesting, as is letting people do more hands-on writing/rewriting.

If any VCs are reading, I'd love to talk. :)

(PS - Hi Han!)


You should link not to the first episode but a playlist that you update in reverse order, so the best episodes come first. It wasn't clear to me that the quality would improve with each episode, and honestly getting through the first was a bit of a struggle.

How much funding do you think you need for an MVP that's more Copilot-like? I might be interested in taking part in a seed round. Having AI do everything is a fun challenge, but I think the sort of people who would actually pay for a product would want to have some creative control and let the AI handle the parts they don't want to or can't do.

The Minecraft-esque graphics probably aren't an issue, but scaling up to provide all the needed assets probably is. There are AIs that can generate 3D models but having a consistent art style is required for it to work visually and you provided that here. Finding a way to quickly and cheaply scale the "kitbashing" seems key to any kind of productization.


Good call on the playlist. I'll do that soon. I agree, ep1 is rough.

I shot you an e-mail on a v2. (An MVP would be less; I realized I sent you the pitch for a full v2.)

There are a LOT of art packs out there for a ton of different looks and genres. Building sets is quick and easy even with kitbashing. I think you could synthesize 3d content in a lot of ways (vid2vid, Gaussian diffusion generative models, prop placement by LLM, clever use of stable diffusion/firefly for mattes, etc.) or have a small stable of fiverr types to make art for people on demand in a specific style...


> I am planning on doing some more articles/director commentary as it goes along.

Speaking for myself, I expect that the behind-the-scenes commentary would be the most interesting part of the project!

> The "I'm a GPT that wants everyone to be friends and how" is increasingly better in those episodes.

How long does the pipeline take to run? (apologies if this was part of the blog series and I missed it). Depending on how close to a self-running CI pipeline the whole process is at, I think it might be interesting to run benchmarks against various versions of the pipeline and evaluate its performance at each stage. I feel like I could evaluate the improvement of the "let's make everyone be friends!" writing if I'm comparing Episode 1 (compiled w/ v0.3) against Episode 1 (compiled w/ v0.8), instead of Episode 1 vs. Episode 12.

Crazy idea: If one could somehow quantify the quality of consistency, dialogue, camera work, etc -- then you may be able to watch numbers-go-up in an actual graph sort of way (I'm imagining a multi-agent system where various agents are responsible for monitoring various aspects of script and production quality -- almost like an actor/critic setup).

But at the very least, being able to A/B comparison between v0.3 and v0.6 could be very interesting for people interested in the internals.

> I've taken it about as far as I can on a solo basis. The next step is a team of 4-5 people levelling it up. Every piece could be 10x better, and it would be a different beast entirely if that happened. I think there are some super exciting directions this could go.

I think that's the really cool thing about what you've built here -- it's a complete pipeline, and every piece is present -- even if the pieces aren't in their final form, the fact that you've pieced together an entire pipeline is extremely compelling.

> (PS - Hi Han!)

Hi!! It was a very cool surprise to see your name pop up on my HN feed this morning. :D


I agree, BTS is definitely very interesting.

But I had 8 kids 5-15 watch all of Ep1 _AND_ choose to watch Ep2 afterwards last night. They actually sat and watched, too, instead of having it on in the background... AND they were bummed they couldn't watch the super secret pilot episode (which has MAJOR audio issues - I couldn't bring myself to inflict it on them).

So I think something is there.

I agree, there are some great opportunities to track things somewhat more quantitatively. It takes ~15 minutes and $10 bucks to generate a script depending on how fast OpenAI is feeling. So in a real scale v2 it would be very reasonable to explore this.

Man, I sure hope I get to build this further!


> So I think something is there.

Yes, I think so! That's super encouraging about holding the attention of a room of kids!

> It takes ~15 minutes and $10 bucks to generate a script depending on how fast OpenAI is feeling. So in a real scale v2 it would be very reasonable to explore this.

Yeah -- still a bit large to truly put into a CI pipeline that is running against every commit tho. :-/

Do you mind sharing your context window size? I always want to use local LLMs for rapid iteration -- I think 32k window isn't too difficult (Mixtral supports this out of the box, I think?), but I've heard of people pushing 100k tokens locally. Even so, that's peanuts compared to what hosted LLMs are doing, and if quality of writing is your bottleneck, then you wouldn't want to stray too far away from GPT-4 / Claude.

> Man, I sure hope I get to build this further!

Yeah!! It really feels like you've latched onto a nugget of something here, and I'm excited to see what's next!


For those interested - you can see actual episodes here: https://www.youtube.com/@OnScreenShow

It's interesting to consider "AI as its own genre" rather than "AI replacing mainstream content" - like how cheap animation enabled the anime genre or cheap filmmaking enabled the indie genre.


I was disappointed at the black mirror new season, so I asked gpt to write a new episode synopsis and it was actually more interesting to me than any of the last seasons episodes!

Title: "Retrospect"

In the near future, a tech company called "MemorEase" creates a device named "Retrospect", a neuro-implant that allows individuals to vividly relive past memories. The device grows immensely popular, as people enjoy the nostalgic journeys back in time.

The protagonist, Jill, is a middle-aged woman who's struggling with the recent loss of her husband, Max. She decides to get the implant to relive her precious memories with him.

However, as she revisits her past, she starts noticing anomalies - small discrepancies in her memories. Certain scenes play out differently, some events she doesn't remember at all, and in others, Max behaves in ways she doesn't recall.

Jill contacts MemorEase, and they reassure her that Retrospect can't alter memories, it merely reveals them in their truest form. Jill grows paranoid and starts investigating. She finds a forum of other Retrospect users who have experienced similar anomalies.

Jill and her forum friends uncover that Retrospect is actually accessing the collective memory of its users, amalgamating all the memories into a unified version of the past. They find that MemorEase is subtly influencing this collective memory to rewrite history, shaping public opinion and manipulating power dynamics for unknown reasons.

They decide to expose MemorEase but face the dilemma of convincing a society that trusts the "reality" presented by Retrospect more than their own recollections. The episode ends on a suspenseful note, with Jill and her group preparing to disrupt a major MemorEase event, planning to wake the public up to the manipulation they've been subjected to.

https://mleverything.substack.com/p/we-should-just-let-gpt-w...


Heh, that's actually pretty compelling - it sounds like a darker twist on the plot of a Stargate episode I recently rewatched: https://en.m.wikipedia.org/wiki/Revisions_(Stargate_SG-1)


Pretty nice! Curious as to what prompt you used to produce this?


This is the future. I think that in 3-4 years we'll all be able to generate our own TV shows, share them with friends, and collaborate on getting new seasons done, maybe even sell them. Personally, my entire digital shelf of show would be bleak dystopian sci-fi.


Sell them? To who?


To the same people that buy or subscribe to podcasts, music, series today? I can find your AI-generated series interesting and subscribe to it.


Yeah - I think you don't necessarily (or only the best) go to a Netflix or MGM, but you could see success like a lot of smaller podcast content creators do.

10,000 screaming fans can take you a long long way.


Quite close to this topic, someone did kind of fantasy (not so) short stories with video all AI generated:

https://youtube.com/@WardenCinematics


I'd label Warden an "AI maximalist" approach. Also has a lot of merit and is very interesting, but much harder to have fine control or tweakability, and much much harder to do fully hands off. "Hands off"-ability is an important metric because it's useful to not babysit every second of footage.


I oncde tried to make ChatGPT output a transcript for a TV show called "The Cardassians", about the exploits of Gul Dukat and his wealthy family. Thought it would be funnier than it ended up being.


Writing comedy is really hard! Great concept for a show, tho. TBH a good example of something that could be interesting and viable with a tool like On Screen but never make it as a studio or even indie production.

The scope and target complexity of the series I'm making with On Screen is _dramatically_ cut down from what I started with, and it's still a bit of a stretch for the models at times. I started at DS9/Babylon 5 and ended up at Flash Gordon...


It's good AIslop is labeled for now, as if it's a novelty worth pursuing. But when this meaningless, demonic horror show takes over most entertainment, people will be less amused.

I for one do not look at AI-generated images, listen to AI-generated sounds or music, watch AI videos, and other than the various bots and shills online in forums like this, I do not interact with AI chat bots. Imagine filling your brain with generated garbage.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: