🎉 #Gate xStocks Trading Share# Posting Event Is Ongoing!
📝 Share your trading experience on Gate Square to unlock $1,000 rewards!
🎁 5 top Square creators * $100 Futures Voucher
🎉 Share your post on X – Top 10 posts by views * extra $50
How to Participate:
1️⃣ Follow Gate_Square
2️⃣ Make an original post (at least 20 words) with #Gate xStocks Trading Share#
3️⃣ If you share on Twitter, submit post link here: https://www.gate.com/questionnaire/6854
Note: You may submit the form multiple times. More posts, higher chances to win!
📅 End at: July 9, 16:00 UTC
Show off your trading on Gate Squ
The AI drawing of Ali was tested internally, which brought a shock to some big factories
Author: Du Wei, Zenan
The conference lasted for three days, and various companies and institutions successively unveiled more than 30 large models.
Language models are indispensable in this feast of large models. Of course, there are also large-scale painting models that often bring people visual shock. No, in the field of AI drawing, another domestic player has entered the field.
Three months after the release of the large language model Tongyi Qianwen, Ali's AI painting creation large model also came, and it is based on the self-developed combined generative model Composer.
At the WAIC conference on July 7, Alibaba Cloud Tongyi large-scale model family unveiled its newest member "Tongyi Wanxiang".
Its text generation image effect is like this, and the generation speed is very fast.
Currently, the model has opened directional invitation testing.
Before ChatGPT became popular, the most popular topic in the field of AI was AI drawing. The diffusion model has taken the generative AI a big step forward. For a time, there have been a large number of AI models that input text and generate images of various styles. Afterwards, more ways to generate images from images and convert images to specified styles appeared, which made people dazzled and marveled at the magic of generative AI.
On the big stage of WAIC, Ali launched this AI artifact that can generate both text and graphs, which shows that it is very confident in its generation effect.
After getting the experience qualification, the heart of the machine must of course try it out first.
Tongyi Wanxiang Actual Measurement: Diversified gameplay, one shot is a masterpiece
Has this new model of the Tongyi family brought changes to the field of AI drawing? We speak with results.
Currently, Tongyi Wanxiang has launched the three functions of text image generation, similar image generation and image style transfer.
Let's start with something less complicated. We choose a set of words from Ma Zhiyuan's "Tianjingsha · Autumn Thoughts", one of the four masters of Yuanqu, to describe "small bridges, flowing water, and houses", and choose "Chinese painting" for the style.
As a result, Tongyi Wanxiang completely showed us paintings full of ancient charm, rich in details, and added some elements that were not in the description, such as distant mountains and ducks swimming in the water.
Here I suddenly want to compare Tongyi Wanxiang with the famous Stable Diffusion. The same text description is translated into English "cat in a spacesuit, space, travel, starry sky", and then "style of 3D carton" is added, the generated picture is as follows.
Unexpectedly, Tongyi Wanxiang won this wave. The cats generated by Stable Diffusion were either too abstract or too realistic, and did not show a 3D cartoon style.
This time there is a longer section of "a Japanese girl with straight brown hair, fair skin, wearing a dress, lace and bow, carrying a small bag, smiling", and the style is "two-dimensional". I would like to ask friends who like the second dimension, do these generated pictures match the Japanese girls in your mind?
Next, we will talk about the similarity image generation function of Tongyi Wanxiang. Users only need to provide a reference image to obtain AI paintings with similar content and style. It should be noted here that the size of the uploaded image should be less than 10M, and the format supports common JPG, JPEG, PNG, BMP, etc.
Let's first put in a piece of Musk, a frequent visitor to the world of AI drawing, to see what Musk's "Fenke" looks like in Tongyi Wanxiang's eyes. Compared with Musk's real body, the generated picture is older, but the smile is equally cheerful.
We first choose a realistic original picture and an impressionist style picture. As a result, the realistic original pictures have completely changed their style and become impressionist paintings.
As a new member of Alibaba Cloud's Tongyi large-scale model family, Ali said that Tongyi Wanxiang's existing capabilities are just a small test, and its capabilities are still evolving. In the future, relevant capabilities will be gradually opened to industry customers.
Self-developed Composer model: 5 billion parameters, will reach the top
Previously, the large models of many companies were setting up "multi-modal" people, with AI drawing capabilities. In contrast, how much technical content does Ali's universal meaning have? It seems that it is not a simple imitation, but has its own unique ability.
It is understood that Tongyi Wanxiang is based on Composer, a self-developed combined generative model developed by Ali, which has 5 billion parameters and is trained on billions of text and image pairs. At the point where the industry is considering how to improve the controllability of AI painting models, Composer has given its innovative ideas.
Through a "combined generation" framework based on a diffusion model, Composer can disassemble and combine image design elements such as color matching, layout, and style, achieving a highly controllable and extremely free image generation effect.
The result, as you and I can see, is that only one model can support multi-class image generation tasks. Zhou Jingren, Chief Technology Officer of Alibaba Cloud, participated in the research of Composer, and the relevant results have been included in ICML 2023, the top international AI conference.
The so-called disassembly-combination, first decomposes the image into different design elements, such as color matching, sketches, layout, style, semantics, materials, etc. These design elements are then recombined into new images using AI models. Here, the process of dismantling and assembling allows free modification and editing of the elements used, so that the controllability is greatly enhanced.
Not only that, Composer can also achieve a broader creative space by "squeezing" the potential of disassembly-combination. Assuming that there are 100 pictures, each of which is divided into 8 elements, there are 100 to the 8th power of combinations of all elements. This exponential increase in numbers is known as the combinatorial explosion phenomenon, and undoubtedly creates a huge generation space for AI models. At the same time, human designers are also given great freedom and customization capabilities when generating customized images.
It is based on the Composer framework that Tongyi Wanxiang allows us to experience the two functions of similarity graph generation and style transfer. While using the image understanding model to disassemble the image into different elements, while using the diffusion model to recombine these elements into a new image, the two-pronged approach, the image generation is a matter of course.
Among them, for the generation of similar images, keeping the semantic content of the image unchanged, only changing the local details in the image can generate similar images. In the process, the consistency of the main body of the original image can be better maintained, and the diversity and quality of the generated image can also be improved.
For style transfer, on the one hand, the basic shape and structure of the original image are retained, and on the other hand, the style, color, brush strokes and other personalized information of the target style image are transferred to finally realize the style transfer.
Using the large model as the core to create a unified base for generative AI
It seems that Tongyi Wanxiang's unexpected effect comes from Ali's own core technology.
In fact, in China, Ali is one of the big companies that started to explore generative AI earlier, and it started the research and development of large-scale model technology in 2018. In 2019, the large language training model StructBERT proposed by Dharma Institute surpassed the research of Google, Microsoft and Facebook, and reached the top of the NLP authoritative benchmark list GLUE at that time.
In 2021, Ali will release the first multi-modal large-scale model M6 with tens of billions of parameters in China and the large-scale language model PLUG called "Chinese version GPT-3". Among them, after multiple iterations, M6 has achieved a parameter scale of ten trillion levels, and M6 is combined with the business needs of Alipay and Taobao.
At last year's WAIC, Ali released the Tongyi large-scale model series, which built a "basic model" for the industry for the first time, achieving a unified modal representation, task representation, and model structure. Moreover, the relevant core models are open sourced to developers around the world.
Regarding the implementation of generative AI, we have been facing several challenges: high cost of computing power, complex construction process, and limited versatility. Tongyi has created the industry's first AI unified base, and built a hierarchical artificial intelligence system with large and small models coordinated. Its goal is to face the challenge and let AI move from perception to cognition.
It can be said that Ali has made some cutting-edge and leading contributions to the development of Chinese large-scale models in terms of super-large models, language and multi-modal capabilities, low-carbon training, platform services, and landing applications.
Before Tongyi Wanxiang, Ali has successively released "Tongyi Thousand Questions" for natural language processing and "Tongyi Listening" which specializes in audio and video productivity. So far, the three main directions of AI have all been opened up. In the face of the huge potential demand for large models and generative AI, Alibaba Cloud has unique advantages.
In addition to large-scale model technology accumulation, strong cloud infrastructure capabilities are crucial. In terms of computing power, Alibaba Cloud is the number one cloud computing service provider in Asia and the third in the world, and its large model has a solid computing power system support. For example, Alibaba Cloud has the strongest reserve of intelligent computing power in China, and Alibaba Cloud's intelligent computing cluster can support a maximum GPU scale of 100,000 cards.
In addition, Ali first proposed the concept of "Model as a Service" in China, and took the lead in building the largest AI model service community in China, "Magic Build", insisting on open source and openness, and promoting AI inclusiveness. At the theme forum of Alibaba Cloud's "MaaS: A New Paradigm for Model-Centric AI Development", Zhou Jingren shared his vision for MaaS and how to further empower products and partners.
In the competition of AI 2.0, the competition has entered a new stage. After the 100-model competition, there will inevitably be big waves, and Alibaba Cloud is ready.