NuraML - Teaser Demo
Why can't I give a computer program the task of making videos such as:
- "Show me how to fix my car"
- "Make a TV show with my friends and me as the characters"
- "Create a podcast with bill gates and napoleon bonaparte"
The current state of the art for text-to-video is that provided a prompt to a computer program, the program can make a brief (10-30 second) visual representation of the content described in the prompt (check out Runway for example).
The next evolutionary step would be a system that can figure out how to use a collection of these shorter representations in succession in addition to generative auditory elements to represent a longer-duration video. Mimicking human-made videos such as a podcast or YouTube video, and eventually movies and TV shows. Which is what this project serves to be.
Below is a simple showcase of a couple of different videos generated by a preliminary text-to-movie system, satisfying the following constraints:
- Uncapped video duration
- Human & nonhuman characters
- Visual animations
- Visual character consistency (a single image reference)
- Music & SFX
- Character auditory consistency (many samples of speech references)
Showcase
Educational video
Input
Prompt
10 interesting facts about the ocean
Actor(s)
- Lex Fridman
Output
Podcast
Input
Prompt
Brief Lex Fridman podcast with old present-day Steve Jobs about AI.
Actor(s)
- Lex Fridman
- Steve Jobs
Output
Commercial
Input
Prompt
Funny commercial selling helpful but deadly robots
Actor(s)
- Tom Cruise