NuraML - Teaser Demo

Why can't I give a computer program the task of making videos such as:

The current state of the art for text-to-video is that provided a prompt to a computer program, the program can make a brief (10-30 second) visual representation of the content described in the prompt (check out Runway for example).

The next evolutionary step would be a system that can figure out how to use a collection of these shorter representations in succession in addition to generative auditory elements to represent a longer-duration video. Mimicking human-made videos such as a podcast or YouTube video, and eventually movies and TV shows. Which is what this project serves to be.

Below is a simple showcase of a couple of different videos generated by a preliminary text-to-movie system, satisfying the following constraints:

  • Uncapped video duration
  • Human & nonhuman characters
  • Visual animations
  • Visual character consistency (a single image reference)
  • Music & SFX
  • Character auditory consistency (many samples of speech references)

Showcase

Educational video

Input

Prompt

10 interesting facts about the ocean

Actor(s)

  • Lex Fridman

Output

Podcast

Input

Prompt

Brief Lex Fridman podcast with old present-day Steve Jobs about AI.

Actor(s)

  • Lex Fridman
  • Steve Jobs

Output

Commercial

Input

Prompt

Funny commercial selling helpful but deadly robots

Actor(s)

  • Tom Cruise

Output