Meta released yesterday a pretty nice new image generation engine As for more demo related to generative AI recently (in my case, connecting from Canada, all generative demos from Google like Bard or Meta can not be accessed), you will have to be connected from the validated countries to have access to it . Currently, Imagine Meta is only accessible from US.
It is included in the new demo page announced by Meta : To close out the year, we’re testing more than 20 new ways generative AI can improve your experiences across Facebook, Instagram, Messenger, and WhatsApp — spanning search, social discovery, ads, business messaging and more.
In this post we only present the generative module for image, but there is many others interesting tools promoted (see the list here).
a futuristic car driven by an extraterrestrial entity
As in the illustration 2, bellow, we can see that the engine (apparently trained on Instagram pictures) do not interpret easily some complex requests (in this example the futuristic depiction is not processed). The usual difficulty for those models to depict fingers that is now completely removed from Midjourney is still present with this version of the facebook model.
2 : some futuristic dressed people in a london pub from the 18th century. In the background we see the streets of london with people
But – even if the semantic content of the sentences is not fully interpreted – the creativity and the aesthetic is impressive as we can see in the illustration 3.
3 : a modern city in the african jungle where everybody have the head of an animal
Imagine uses Emu technology, the image foundation model of Meta. It is said that Meta used 1.1 billion publicly visible Facebook and Instagram images to train the model. Previously, Meta’s version of this technology—using the same data—was only available in messaging and social networking apps such as Instagram.
Arstechnica explain to us that If you’re on Facebook or Instagram, it’s quite possible a picture of you (or that you took) helped train Emu. In a way, the old saying, “If you’re not paying for it, you are the product” has taken on a whole new meaning. Although, as of 2016, Instagram users uploaded over 95 million photos a day, so the dataset Meta used to train its AI model was a small subset of its overall photo library. Lets see if a hack comes in literature that will allow to reverse engineer Emu and its training corpora !
Meta published a research paper on Emu, available here and they give a lot of details on how they constructed and trained their system. The abstract of the paper with some information on training data and performances is bellow. Emu is based on a latent diffusion model with an emphasis on fine tuning. Approach involves a knowledge learning stage followed by a quality-tuning stage (see section 3 of the meta paper).
Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack
Abstract :
Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text. However, these pre-trained models often face challenges when it comes to generating highly aesthetic images. This creates the need for aesthetic alignment post pre-training. In this paper, we propose quality-tuning to effectively guide a pre-trained model to exclusively generate highly visually appealing images, while maintaining generality across visual concepts. Our key insight is that supervised fine-tuning with a set of surprisingly small but extremely visually appealing images can significantly improve the generation quality. We pre-train a latent diffusion model on 1.1 billion image-text pairs and fine-tune it with only a few thousand carefully selected high-quality images. The resulting model, Emu, achieves a win rate of 82.9% compared with its pre-trained only counterpart. Compared to the state-of-the-art SDXLv1.0, Emu is preferred 68.4% and 71.3% of the time on visual appeal on the standard PartiPrompts and our Open User Input benchmark based on the real-world usage of text-to-image models. In addition, we show that quality-tuning is a generic approach that is also effective for other architectures, including pixel diffusion and masked generative transformer models.
You can experiment the tool (with the proper geographic localization) at imagine.meta.com
Ces dernières années, toute mon énergie était dédiée aux groupes de recherche que j’ai eu le plaisir d’animer dans plusieurs organisations. L’écriture et la vulgarisation (qui sont la deuxième facette de mon métier), étaient laissés de côté, ou réservés à des publics plus restreints. J’ai eu envie de revenir à l’écriture ! Et c’est ainsi que j’ai décidé d’entamer l’année 2024 en réservant un peu de temps pour partager avec vous mes sujets d’intérêt scientifiques et technologiques à travers une nouvelle publication : ce blog.
Ici nous parlerons des nouvelles technologies d’Intelligence Artificielle, et en particulier des modèles génératifs. La génération est un sujet que j’aime tout particulièrement : mon parcours académique s’y est en grande partie intéressé, que ce soit mon sujet de thèse de Doctorat qui est la génération de texte , ou le projet GITAN, à l’École Polytechnique qui était dédié au passage du texte à l’image.
Mais nous ne parlerons pas que de génération : implanter les technologies de l’IA dans des applications utilisables par de vastes populations, que ce soit d’employés ou de clients, est un défi en soi, et j’ai eu la chance de travailler sur ces aspects dans mes groupes de recherche industriels (mon expérience est présentée ici ). Besoin en nouvelles compétences, nouvelles façon de faire, difficultés inédites sont au cœur de la problématique du déploiement de l’Intelligence Artificielle dans les organisations. On parle de gestion du changement, de gestion de talents, de changements de cultures. Sur ces sujets, aussi, j’aimerais partager mon expérience ici.
Les nouvelles technologies génératives constituent sans aucun doute un changement de paradigme qui va profondément transformer nos sociétés: pour la première fois, des algorithmes sont capables de traiter des mediums historiquement humains et résistants à l’automatisation tels que le langage ou l’image en simulant certains aspects du raisonnement ou de la créativité. Jusqu’ici l’informatique et sa palette de solutions algorithmiques était largement finie ce qui lui rendait difficile la capacité d’offrir des applications dans des domaines naturellement humains: ce n’est plus le cas. Là ou l’algorithme d’IA et d’apprentissage automatique classait ou détectait, il est désormais capable de transformer (un document en document, une image en texte ou un texte en image, une description, une idée …): cette capacité de transformer est fondamentalement disruptive et neuve. Il me semble que la révolution industrielle de l’IA est ici, et c’est sur cela que j’aimerais échanger avec vous.
Au suivant !
In recent years, all my energy was devoted to the research groups I had the pleasure to build and lead in several organizations. Writing to make complex subjects easier to access for everybody (the second facet of my profession), was put on pause or reserved to more restricted audiences. I was missing that and wanted to get back to writing! So I’ve decided to start 2024 by setting aside some time to share my scientific and technological interests with you through a new publication: this blog.
Here we’ll be talking about new Artificial Intelligence technologies, and in particular generative models. Generation is a subject I’m particularly fond of: my academic career has been largely concerned with it, whether it’s my PhD thesis on text generation, or the GITAN project at the École Polytechnique, which was dedicated to the transition from text to image.
But we’re not just talking about generation: implementing AI technologies in applications that can be used by a wide range of populations, whether employees or customers, is a challenge in itself, and I’ve had the good fortune to work on these aspects in my industrial research groups (my experience is presented here ). The need for new skills, new ways of doing things, new difficulties are key (and novel) difficulties to deploy Artificial Intelligence in organizations. We’re talking about change management, talent management and cultural change. On these subjects, too, I’d like to share my experience here.
The new generative technologies undoubtedly represent a paradigm shift that will profoundly transform our societies: for the first time, algorithms are capable to process human mediums historically resistant to automation, such as language or images, by simulating certain aspects of reasoning or creativity. Until now, computer science and its range of algorithmic solutions was largely finite, making it difficult for it to offer applications in human domains: this is no longer the case. Where AI and machine learning algorithms were used to classify or detect, they are now capable of transforming (a document into a document, an image into text or text into image, a description, an idea…): this ability to transform is fundamentally disruptive and new. It seems to me that this is where AI’s industrial revolution lies, and this is what I’d like to discuss with you.