The AI video generator’s potential hasn’t been fully unfolded yet. The most frequently used ones such as Synthesia, InVideo, and Hour One are close to the combination of voice-over technology and avatar generators. Therefore, they cannot correspond to the exact definition of the text-to-video generator.
Of course, apart from this ‘mainstream’ method of AI video generator – lowering users’ degree of freedom to enhance the accuracy of the outcome – there is an AI video generator that enables users to put their prompts freely like BasedLabs AI. However, they lack naturality and continuality, the most important elements of videos.
Yet, this current will be changed forever. On the 15th of February, OpenAI introduced ‘Sora’, an AI video-generating model. The model, which aims to be released to the public by the latter half of 2024, brings sensation. Along the lines of the report 〈Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models〉, released by Microsoft on the 28th, since ‘these advances show the potential of Sora as a world simulator to provide nuanced insights into the physical and contextual dynamics of the depicted scenes.’
In other words, these one-minute-long videos produced with Sora include various visual and auditory elements that follow the laws of the ‘physical’ world, and that differentiate themselves from the still images produced with image-generating AI models. This indicates the upcoming generation of image-and-video-generating AI models.
Also, adding ‘Lumiere’, Google’s space-time diffusion model that allows users to generate videos with similar styles based on the reference image(Styled Generation) and put certain elements on the original video(Inpainting), the upcoming video-generating AI models will exponentially escalate the degree of completion per time.
Even though this distinguishing achievement of video-generating AI technology astonishes us, we cannot dismiss that this could be used for something disturbingly wrong. Just like the specialists’ concern, it could contribute to the effective widespread of misinformation.
Overall, the creators’ task is not only to search for the furthest place they can get with their creativity; it lies somewhere higher than that. Contemplating the attitude of the creator while exploring the topographical structure of generating AI technology which will be reorganized by Sora will be needed the most in 2024.
References
Diagram Share: The Evolution of Commercial Text-to-Video https://towardsdatascience.com/diagram-share-the-evolution-of-commercial-text-to-video-8726dc01b270
OpenAI, ‘Sora’ https://openai.com/sora
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models https://arxiv.org/abs/2402.17177
How OpenAI’s text-to-video tool Sora could change science – and society https://www.nature.com/articles/d41586-024-00661-0
Google, ‘Lumiere’ https://lumiere-video.github.io/
Comments