The single ability of the 15B model has surpassed GPT3.5, and the open source SQLCoder has been employed

The Coder family has added a new member, and it has been open sourced!

What big model tools do you know about code editing?

Twitter user @lvwerra made the image below to sort out most of the members of the code family.

Just two weeks after he released this picture, three new members joined the family, they are DeciCoder, OctoCoder and the latest member SQLCoder.

Among them, the latest member SQLCoder not only has excellent performance, but also has been open sourced!

SQLCoder

As a SOTA large-scale language model, SQLCoder converts natural language questions into SQL queries. In SQL, the developer's open-source evaluation framework, SQLCoder significantly outperforms all major open-source models and outperforms OpenAI's GPT-3.5.

SQLCoder is a 15B parameter LLM, and also a fine-tuned implementation of StarCoder. SQLCoder is fine-tuned on hand-crafted SQL queries of increasing difficulty. When fine-tuned for a single database schema, its performance is comparable to or even better than GPT-4.

* project address:

  • Demo address:
  • Model Weight:

In the past three months, SQLCoder has been deployed in medical, financial and other enterprises. These businesses often have sensitive data that they don't want off their own servers, so utilizing a self-hosted model is the only way they can use LLM.

method

create dataset

The authors created a hand-edited-completion pair dataset, focusing on text-to-SQL tasks. The dataset was created from 10 different patterns, with questions of varying difficulty. Additionally, they created an evaluation dataset of 175 questions from 7 new patterns.

They ensured that complex schemas with 4-20 tables were selected in both the training and evaluation datasets, since schemas with only 1 or 2 tables tended to allow for simple and straightforward queries due to limited relationships.

question category

After the dataset was created, the author classified each question in the dataset into four categories: easy, medium, difficult, and extremely difficult. This categorization is done by adapting the criteria used by the Spider dataset to measure SQL difficulty. Finally, they split the dataset into two distinct subsections, easy and moderate, and hard and superhard.

fine-tuning

The authors fine-tuned the model in the following two stages.

First, the StarCoder base model was fine-tuned only on easy and moderate difficulty problems.

Second, the obtained model (coded as defog-easy) is fine-tuned on hard and super-hard problems to obtain SQLcoder.

Evaluate

The authors evaluated the model on a custom dataset they created themselves. Assessing the correctness of SQL queries is very difficult, they considered using GPT-4 as the evaluation standard, but encountered many problems. Along the way they also realized that two different SQL queries might both be correct.

For the question "who are the last 10 users from Toronto", both of the following query forms are correct.

Given this, the authors built a custom framework to evaluate query correctness. They not only open sourced the model weights, but also open sourced the evaluation framework and evaluation dataset.

The purpose of releasing the dataset is to enrich available benchmarks and help researchers and engineers better understand the performance of text-to-SQL generative models, especially the model's response to innocuous changes in returned results such as column renaming, appending columns, and reordering ) robustness.

More details about the evaluation can be found in the blog content:

performance

In the evaluation framework, Defog SQLCoder outperforms all major models except GPT-4. In particular, it outperforms gpt-3.5-turbo and text-davinci-003, which are more than 10 times the size of the two models.

These results are for a generic SQL database and do not reflect SQLCoder's performance on a single database schema. When fine-tuning a single database schema, SQLCoder performs equal or better than OpenAI's GPT-4 with lower latency (on A100 80GB).

*Divides each generated question into 5 categories and shows the percentage of questions answered correctly by each model by category. *

SQLCoder Hardware Requirements

SQLCoder has been tested on an A100 40GB GPU with weights. You can also load 8-bit and 4-bit quantized versions of the model on consumer-grade GPUs with 20GB or more of memory, such as the RTX 4090, RTX 3090, and Apple's M2 Pro, M2 Max, or M2 Ultra chips with 20GB or more of memory .

The next job

In the coming weeks, the author will make the following updates to SQLCoder:

  • Train the model with more human-collected data and a wider range of questions;
  • Further fine-tuning the model using reward modeling and RLHF;
  • Pre-train a model (SQL + Python) specialized in data analysis from scratch.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)