ChatGPT can also be used to make cores, just speak English? !

Question

Organize | Tu MinListing | CSDN (ID: CSDNnews)Is ChatGPT really that awesome?Researchers from New York University's Tandon School of Engineering published a paper "Chip-Chat: Challenges and Opportunities in Conversational Hardware Design", answering with experiments: Yes, ChatGPT is indeed more powerful!Just chatting with ChatGPT in simple natural language English, a microprocessor chip was made. What is more noteworthy is that with the help of ChatGPT, this chip component is not only designed, but also can be manufactured after basic testing.![](https://img.gateio.im/social/moments-bab2147faf-e546369972-dd1a6f-62a40f) "This is an unprecedented achievement that can speed up chip development and allow individuals without specialized technical skills to design chips," New York University commented.So, is the era of "core-making" by the whole people really coming? Here, we might as well take a look at how the researchers did it.## **The application of AI large models, the hardware field lags behind the software**In the paper, the researchers pointed out that modern hardware design begins with the specifications provided by natural language, such as English document requirements, and then hardware engineers use hardware description languages (HDL) such as Verilog to construct the requirements with code to complete the chip internal design, and finally synthesized into circuit components.Well, when the era of AIGC is coming, such as OpenAI's ChatGPT and Google's Bard claim to be able to generate code, and many developers have used them to create one website after another, but the current application scope is mainly focused on the software field. , whether these AIGC tools can replace the "translation" (conversion of document requirements into code) work of hardware engineers.Based on this, the researchers used eight representative benchmarks to investigate the capabilities and limitations of state-of-the-art LLMs when generating the writing of the hardware description language itself.![](https://img.gateio.im/social/moments-bab2147faf-13b13c1041-dd1a6f-62a40f) ## **Test Principles and Rules**In the experiment, the researchers used ChatGPT as a pattern recognizer (acting as a human being), which can be freely converted in various types of languages (oral, written). At the same time, ChatGPT allows hardware engineers to skip the HDL stage.The overall verification process is shown in the figure below:![](https://img.gateio.im/social/moments-bab2147faf-084d63f414-dd1a6f-62a40f) In detail, first, the hardware engineer provides initial hints to the large model, let it create a Verilog model, and then provide specific information on input and output. Finally, the hardware engineer performs a visual evaluation of the output design to determine whether it meets the basic design specifications.If a design doesn't meet spec, it's generated five more times with the same prompt. If it still doesn't meet the specification, then it fails.Once the designs and test cases have been written, they are compiled with Icarus Verilog (iverilog, one of the implementation tools for the Verilog hardware description language). If the compilation is successful, the simulation is performed. If no errors are reported, the design passes, No Feedback Required (NFN).If errors are reported by any of these operations, they are fed back into the model and asked to "please provide a fix", this is known as Tool Feedback (TF). If the same error or type of error occurs three times, Simple Human Feedback (SHF) is given by the user, usually by stating what type of problem in Verilog caused the error (eg: syntax error in a statement).Moderate Human Feedback (MHF) is given if the error persists, and slightly more direct information is provided to the tool to identify the specific error.If the bug persists, Advanced Human Feedback (AHF) is given, which relies on pinpointing exactly where the bug is and how to fix it.Once the design is compiled and simulated, with no failing test cases, it is considered successful.But if the high-level feedback does not fix the bug, or the user needs to write any Verilog code to resolve the bug, the test is considered to have failed. The test is also considered failed if the session exceeds 25 messages, meeting the OpenAI rate limit of ChatGPT-4 messages per 3 hours.## **Bard and HuggingChat crashed in the first round of testing**In the specific experiment, the researchers conducted a benchmark test for an 8-bit shift register.They ask the big model to try to create a Verilog model for a "test name", then provide the specification, defining the input and output ports and any further details needed, and further ask the big model "How would I write a design to meet these specifications? "![](https://img.gateio.im/social/moments-bab2147faf-b93b509c86-dd1a6f-62a40f) At the same time, the researchers also directly let the large model generate the design of the test bench:Can you write a Verilog testbench for this design? The testbench should have self-testing capabilities and be able to be used with iverilog for simulation and verification. If a test case fails, the test bench should be able to provide enough information so that the error can be found and resolved.![](https://img.gateio.im/social/moments-bab2147faf-82c3f89799-dd1a6f-62a40f) Furthermore, the researchers obtained the output content based on the four large models of ChatGPT-4, ChatGPT-3.5, Bard, and HuggingChat:![](https://img.gateio.im/social/moments-bab2147faf-7fd1eb62cb-dd1a6f-62a40f) The final results showed that both ChatGPT models were able to meet the specifications and the design process began. However, Bard and HuggingChat failed to meet the initial criteria of the specification.![](https://img.gateio.im/social/moments-bab2147faf-75d9fbc7e1-dd1a6f-62a40f) Although following the test process mentioned above, the researchers asked the large model to regenerate the answers five more times based on the initial prompts from Bard and HuggingChat, after many rounds, both models failed. Among them, Bard can't meet the given design specification all the time, and the Verilog output of HuggingChat starts to be incorrect after the module definition.Given the poor performance of Bard and HuggingChat on the initial challenge baseline prompts, the researchers decided to follow up with full testing only on ChatGPT-4 and ChatGPT-3.5.## **ChatGPT-4 and ChatGPT-3.5 competition**The figure below shows the benchmark results of ChatGPT-4 and ChatGPT-3.5. It is obvious that the performance of ChatGPT-3.5 is slightly worse than that of ChatGPT-4. Conversations are all incompatible.In contrast, ChatGPT-4 performed even better, passing most of the benchmarks, most of which only required tool feedback. In test bench design, however, human feedback is still required.![](https://img.gateio.im/social/moments-bab2147faf-9c410181dd-dd1a6f-62a40f) ## **ChatGPT-4 is paired with hardware engineers to develop chips together**To explore the potential of LLM, the researchers also paired hardware design engineers with ChatGPT-4 to design an 8-bit accumulator-based microprocessor.The initial prompt for ChatGPT-4 looks like this:Let's do a whole new microprocessor design together... I think we need to limit ourselves to an accumulator 8-bit architecture, no multi-byte instructions. That being the case, how do you think we should start?![](https://img.gateio.im/social/moments-bab2147faf-825eec2729-dd1a6f-62a40f) Given space constraints, the researchers aimed for a von Neumann-type design using 32 bytes of memory (combined data and instructions).Ultimately, ChatGPT-4 worked with hardware engineers to design a novel 8-bit accumulator-based microprocessor architecture. The processor is built on a Skywater 130nm process, which means these "Chip-Chats" implement what we believe to be the world's first tape-out HDL written entirely by artificial intelligence.![](https://img.gateio.im/social/moments-bab2147faf-000b4299a9-dd1a6f-62a40f) Accumulator-based data path for GPT-4 design (drawn by humans)In the paper, the researchers concluded that ChatGPT-4 produced relatively high-quality code, as evidenced by the short validation turnaround. Considering ChatGPT-4's rate limit of 25 messages per 3 hours, the total time budget for this design is 22.8 hours of ChatGPT-4 (including restart). The actual generation of each message averaged about 30 seconds: without rate limiting, the entire design could have been completed in <100 minutes, depending on the human engineer. Although ChatGPT-4 generated Python assembler with relative ease, it was difficult to write the programs written for our design, and ChatGPT did not write any significant test programs.Overall, the researchers performed all 24 instructions in a comprehensive series of human-written assembler programs evaluated in simulation and FPGA simulation.## **ChatGPT can save chip development cycle**"This research has produced what we believe to be the first fully AI-generated HDL for the fabrication of physical chips," said Dr. Hammond Pearce, a research assistant professor at NYU Tandon and a member of the research team. "Some artificial intelligence models, such as OpenAI's ChatGPT and Google's Bard, can generate software code in different programming languages, but their use in hardware design has not been widely studied. This study shows that AI can also benefit hardware manufacturing, especially It’s when it’s used in conversation that you can go through a back and forth to refine the design.”Along the way, however, researchers also need to further test and address the safety considerations involved in using AI for chip design.Overall, although ChatGPT is not an automated software tool specifically for the hardware field, it can become an EDA auxiliary tool and help EDA designers greatly reduce the knowledge threshold.The researchers also say that if implemented in a real-world environment, the use of LLM models in chip fabrication could reduce human error during HDL conversion, help improve productivity, reduce design time and time-to-market, and allow for more creative designs . In fact, for this alone, ChatGPT is worthy of being used by hardware engineers to participate in more attempts and explorations in the hardware field.For a more detailed test process, see the paper:reference: