EMO Project: generation of videos from a person's photo

emo

We are beginning to see with amazement some of the things that Artificial Intelligence is capable of doing applied to our everyday devices. The results are truly surprising, and that is only the tip of the iceberg. A good example is the EMO Project, which consists of generating expressive videos from photos and portraits.

In other words: it is a technology that gives life to static photos, giving them sound and movement. In this post we are going to explain what this idea consists of with some interesting examples.

What is the EMO Project?

EMO is the acronym for Portrait Alive Emote, a project developed by Linrui Tian, ​​Qi Wang, Bang Zhang and Liefeng Bo, three engineers from the Institute of Intelligent Computing, which is part of the Chinese technology and business conglomerate Alibaba.

In the words of its creators, it is an expressive generation system for audio-driven video portraits. It's a pretty rough definition of what Project EMO is capable of: take the image of a person and give it expression, voice and movement. It seems like magic.

These are not simple animation tricks that any app could do, but rather meticulous and high precision work which is reflected in a wide range of facial expressions, as well as head and lip movements. Added to this is the audio, which also determines the form these movements take.

On the other hand, the videos generated can have unlimited duration. They actually depend on the length of the video they are based on.

How It Works

The operation of this incredible tool is explained in detail on the page itself. project website. The method is structured in two different phases:

    1. Initial coding phase in which all aspects of the starting (or reference) image are studied, to better understand what movement and animation can be applied to it.
    2. Processing phase. In it, a pre-trained audio encoder processes the audio embedding, while the facial image generation layer or mask is applied.

Some details of this process should be highlighted, focused on eliminating noise and preserving the identity of the character. On the other hand, other temporal modules are used to adjust the duration of the video and the speed of movement.

The results that we are going to present below (whether to make each portrait talk or even sing) can only be defined as spectacular. AI intervention serves to achieve high levels of realism capable of completely deceiving us. Which is still disturbing, really.

EMO Project. Some examples:

Let's show some examples of what this technology can achieve. As you will see, we can use the image of a real character or one generated by AI. We can get it to move and gesture, to speak in the language we want (saying what we want it to say) and even making it sing. True prodigies.

These are some videos of photos that the EMO Project makes talk. The actress Audrey Hepburn comes to life to tell us about the right of people to cry and express their feelings:

You can also make characters who are not made of flesh and blood speak. Here we have the Mona Lisa by Leonardo da Vinci, whom EMO Project has breathed life into to recite Rosalinda's monologue in the play "As you like" by William Shakespeare:

Curiously, we can also take images of real actors and make them say anything. In this case, we see Rafael Phoenix in his famous role of The Joker, but pronouncing a text that corresponds to a different film, The Dark Knight.

Now let's move on to the world of music. In the following example, an AI-generated character named SORA perform the song «Don't Start Now» de Dua Lipa. The result is astonishingly human:

Finally we present a very young Leonardo DiCaprio singing the movie theme Godzilla composed by rapper Eminem:

Ethical and legal considerations

The use (or rather, the misuse) of Artificial Intelligence is currently at the center of the debate. It is about a disruptive technology whose limits and possibilities we are not yet able to glimpse and which, misused, could have negative consequences in many different aspects.

To cover their backs, on the EMO Project page they make it very clear that all their tests and creations are intended solely for academic research and demonstration of effects. There is no need to look for ulterior motives. However, a technology that can achieve such a degree of precision and realism constitutes a real danger for anyone who wishes to use it to commit fraud, identity theft and other crimes.


Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: Miguel Ángel Gatón
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.