The AI drawing of Ali was tested internally, which brought a shock to some big factories

Author: Du Wei, Zenan

**To ask this year's WAIC World Artificial Intelligence Conference, who is the protagonist? The AI megamodel deserves it. **

The conference lasted for three days, and various companies and institutions successively unveiled more than 30 large models.

Language models are indispensable in this feast of large models. Of course, there are also large-scale painting models that often bring people visual shock. No, in the field of AI drawing, another domestic player has entered the field.

Three months after the release of the large language model Tongyi Qianwen, Ali's AI painting creation large model also came, and it is based on the self-developed combined generative model Composer.

At the WAIC conference on July 7, Alibaba Cloud Tongyi large-scale model family unveiled its newest member "Tongyi Wanxiang".

WAIC *In the theme forum of Alibaba Cloud's "MaaS: A New Paradigm for Model-Centric AI Development", Tongyi Wanxiang appeared. *

Its text generation image effect is like this, and the generation speed is very fast.

Tongyi Wanxiang can also generate a new image of another specified style for an original image.

There is also a nesting dolls game, which generates some similar pictures for an original picture.

Ali said that Tongyi Wanxiang has the ability to generate graphs and graphs, which can assist humans in image creation and greatly reduce the threshold for image design. In the future, it can also be applied to application scenarios such as art design, games and cultural creation.

Currently, the model has opened directional invitation testing.

Before ChatGPT became popular, the most popular topic in the field of AI was AI drawing. The diffusion model has taken the generative AI a big step forward. For a time, there have been a large number of AI models that input text and generate images of various styles. Afterwards, more ways to generate images from images and convert images to specified styles appeared, which made people dazzled and marveled at the magic of generative AI.

On the big stage of WAIC, Ali launched this AI artifact that can generate both text and graphs, which shows that it is very confident in its generation effect.

After getting the experience qualification, the heart of the machine must of course try it out first.

Tongyi Wanxiang Actual Measurement: Diversified gameplay, one shot is a masterpiece

Has this new model of the Tongyi family brought changes to the field of AI drawing? We speak with results.

Currently, Tongyi Wanxiang has launched the three functions of text image generation, similar image generation and image style transfer.

Let's start with standard text-to-image generation. In Wenshengtu, you can choose from various styles such as watercolor, oil painting, Chinese painting, flat illustration, two-dimensional, sketch, 3D cartoon, etc. After entering a text description and selecting a style, AI can automatically generate a creative picture. At the same time, for the convenience of use, the ratio of the output image has three options: 1:1, 16:9 and 9:16.

Let's start with something less complicated. We choose a set of words from Ma Zhiyuan's "Tianjingsha · Autumn Thoughts", one of the four masters of Yuanqu, to describe "small bridges, flowing water, and houses", and choose "Chinese painting" for the style.

As a result, Tongyi Wanxiang completely showed us paintings full of ancient charm, rich in details, and added some elements that were not in the description, such as distant mountains and ducks swimming in the water.

We might as well change two styles again, this time choose "sketch" and "oil painting". Tongyi Wanxiang can switch freely in various styles, and the generated sketches and oil paintings are equally amazing. It is no exaggeration to say that these pictures are of a level that can be used directly.

Let another set of text describe "a cat in a spacesuit, space, travel, starry sky", this time choose "two-dimensional" and "3D cartoon" for the style. The effect is clear at a glance, especially the 3D cartoon style group, the cats are so cute.

Above: 2D; Bottom: 3D cartoon

Here I suddenly want to compare Tongyi Wanxiang with the famous Stable Diffusion. The same text description is translated into English "cat in a spacesuit, space, travel, starry sky", and then "style of 3D carton" is added, the generated picture is as follows.

Unexpectedly, Tongyi Wanxiang won this wave. The cats generated by Stable Diffusion were either too abstract or too realistic, and did not show a 3D cartoon style.

Since a simple text description is not difficult for Tongyi Wanxiang, let's make it difficult.

This time there is a longer section of "a Japanese girl with straight brown hair, fair skin, wearing a dress, lace and bow, carrying a small bag, smiling", and the style is "two-dimensional". I would like to ask friends who like the second dimension, do these generated pictures match the Japanese girls in your mind?

Another set of descriptions in a magical style is "surrealism, outstanding texture, 4k resolution, cyberpunk, battleship, majestic, smoke, metal giants, laser weapons, octane renderer", and the style is "oil painting". Looking at the pictures below, there is a sense of tension in the doomsday battle.

We put the same description again into Stable Diffusion. In terms of richness of details, Stable Diffusion is better, but its picture style looks gray and does not give people a strong sense of color impact. And it is more realistic style, which is slightly different from surrealism.

It seems that at least on the track of Wen Shengtu, Tongyi Wanxiang seems to be completely grasped. People can't help but sigh, the ability of generative AI in the field of drawing has been evolving.

Next, we will talk about the similarity image generation function of Tongyi Wanxiang. Users only need to provide a reference image to obtain AI paintings with similar content and style. It should be noted here that the size of the uploaded image should be less than 10M, and the format supports common JPG, JPEG, PNG, BMP, etc.

Let's first put in a piece of Musk, a frequent visitor to the world of AI drawing, to see what Musk's "Fenke" looks like in Tongyi Wanxiang's eyes. Compared with Musk's real body, the generated picture is older, but the smile is equally cheerful.

Another landscape image, the generated effect is very good. The stream is gurgling, and the water is also dotted with more fallen leaves, which is not inferior to the original picture.

In the experience, the heart of the machine also found that the pictures generated by the general meaning Wanxiang text can directly generate similar pictures. Here we select one of the above 3D cartoon style "cats in space suits" as the original picture. As soon as the results come out, the generated cats are more cute and the background elements are more abundant.

Finally look at the style migration function. You only need to upload the original image you want to change the style and the schematic diagram of the target style, and you can quickly process the original image into the creative image of the target style. Same as similar image generation, the size of the original image and the style image should not exceed 10M, and the format is the same.

We first choose a realistic original picture and an impressionist style picture. As a result, the realistic original pictures have completely changed their style and become impressionist paintings.

Then try a 3D cartoon original image and a sketch style image. It can be seen from the results that the switching between the two styles is easy.

Finally, choose an original picture in the style of Chinese painting and a picture in the style of watercolor. The generated results are equally good.

After some experience, whether it is a Wensheng diagram or a Tusheng diagram, Tongyi Wanxiang has given us a lot of surprises in terms of semantic correlation, picture integrity and richness of details. Especially the style migration function, the switching between different styles is so smooth, the generated pictures have almost no sense of splicing and smearing, as if they belong to the target style.

As a new member of Alibaba Cloud's Tongyi large-scale model family, Ali said that Tongyi Wanxiang's existing capabilities are just a small test, and its capabilities are still evolving. In the future, relevant capabilities will be gradually opened to industry customers.

Self-developed Composer model: 5 billion parameters, will reach the top

Previously, the large models of many companies were setting up "multi-modal" people, with AI drawing capabilities. In contrast, how much technical content does Ali's universal meaning have? It seems that it is not a simple imitation, but has its own unique ability.

It is understood that Tongyi Wanxiang is based on Composer, a self-developed combined generative model developed by Ali, which has 5 billion parameters and is trained on billions of text and image pairs. At the point where the industry is considering how to improve the controllability of AI painting models, Composer has given its innovative ideas.

Through a "combined generation" framework based on a diffusion model, Composer can disassemble and combine image design elements such as color matching, layout, and style, achieving a highly controllable and extremely free image generation effect.

The result, as you and I can see, is that only one model can support multi-class image generation tasks. Zhou Jingren, Chief Technology Officer of Alibaba Cloud, participated in the research of Composer, and the relevant results have been included in ICML 2023, the top international AI conference.

* Paper address:

  • GitHub address:

The so-called disassembly-combination, first decomposes the image into different design elements, such as color matching, sketches, layout, style, semantics, materials, etc. These design elements are then recombined into new images using AI models. Here, the process of dismantling and assembling allows free modification and editing of the elements used, so that the controllability is greatly enhanced.

*Teardown - Combined image generation process. *

Not only that, Composer can also achieve a broader creative space by "squeezing" the potential of disassembly-combination. Assuming that there are 100 pictures, each of which is divided into 8 elements, there are 100 to the 8th power of combinations of all elements. This exponential increase in numbers is known as the combinatorial explosion phenomenon, and undoubtedly creates a huge generation space for AI models. At the same time, human designers are also given great freedom and customization capabilities when generating customized images.

* Image recombination process. *

It is based on the Composer framework that Tongyi Wanxiang allows us to experience the two functions of similarity graph generation and style transfer. While using the image understanding model to disassemble the image into different elements, while using the diffusion model to recombine these elements into a new image, the two-pronged approach, the image generation is a matter of course.

Among them, for the generation of similar images, keeping the semantic content of the image unchanged, only changing the local details in the image can generate similar images. In the process, the consistency of the main body of the original image can be better maintained, and the diversity and quality of the generated image can also be improved.

For style transfer, on the one hand, the basic shape and structure of the original image are retained, and on the other hand, the style, color, brush strokes and other personalized information of the target style image are transferred to finally realize the style transfer.

Using the large model as the core to create a unified base for generative AI

It seems that Tongyi Wanxiang's unexpected effect comes from Ali's own core technology.

In fact, in China, Ali is one of the big companies that started to explore generative AI earlier, and it started the research and development of large-scale model technology in 2018. In 2019, the large language training model StructBERT proposed by Dharma Institute surpassed the research of Google, Microsoft and Facebook, and reached the top of the NLP authoritative benchmark list GLUE at that time.

In 2021, Ali will release the first multi-modal large-scale model M6 with tens of billions of parameters in China and the large-scale language model PLUG called "Chinese version GPT-3". Among them, after multiple iterations, M6 has achieved a parameter scale of ten trillion levels, and M6 is combined with the business needs of Alipay and Taobao.

At last year's WAIC, Ali released the Tongyi large-scale model series, which built a "basic model" for the industry for the first time, achieving a unified modal representation, task representation, and model structure. Moreover, the relevant core models are open sourced to developers around the world.

Regarding the implementation of generative AI, we have been facing several challenges: high cost of computing power, complex construction process, and limited versatility. Tongyi has created the industry's first AI unified base, and built a hierarchical artificial intelligence system with large and small models coordinated. Its goal is to face the challenge and let AI move from perception to cognition.

It can be said that Ali has made some cutting-edge and leading contributions to the development of Chinese large-scale models in terms of super-large models, language and multi-modal capabilities, low-carbon training, platform services, and landing applications.

Before Tongyi Wanxiang, Ali has successively released "Tongyi Thousand Questions" for natural language processing and "Tongyi Listening" which specializes in audio and video productivity. So far, the three main directions of AI have all been opened up. In the face of the huge potential demand for large models and generative AI, Alibaba Cloud has unique advantages.

In addition to large-scale model technology accumulation, strong cloud infrastructure capabilities are crucial. In terms of computing power, Alibaba Cloud is the number one cloud computing service provider in Asia and the third in the world, and its large model has a solid computing power system support. For example, Alibaba Cloud has the strongest reserve of intelligent computing power in China, and Alibaba Cloud's intelligent computing cluster can support a maximum GPU scale of 100,000 cards.

In addition, Ali first proposed the concept of "Model as a Service" in China, and took the lead in building the largest AI model service community in China, "Magic Build", insisting on open source and openness, and promoting AI inclusiveness. At the theme forum of Alibaba Cloud's "MaaS: A New Paradigm for Model-Centric AI Development", Zhou Jingren shared his vision for MaaS and how to further empower products and partners.

*Jingren Zhou, CTO of Alibaba Cloud. *

In the competition of AI 2.0, the competition has entered a new stage. After the 100-model competition, there will inevitably be big waves, and Alibaba Cloud is ready.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)