TranSQL: A Transformer-based Model for Classifying SQL Queries

Abstract

Domain-Specific Languages (DSL) are becoming popular in various fields as they enable domain experts to focus on domain-specific concepts rather than software-specific ones. Many domain experts usually reuse their previously-written scripts for writing new ones; however, to make this process straightforward, there is a need for techniques that can enable domain experts to find existing relevant scripts easily. One fundamental component of such a technique is a model for identifying similar DSL scripts. Nevertheless, the inherent nature of DSLs and lack of data makes building such a model challenging. Hence, in this work, we propose TRANSQL, a transformer-based model for classifying DSL scripts based on their similarities, considering their few-shot context. We build TRANSQL using BERT and GPT-3, two performant language models. Our experiments focus on SQL as one of the most commonly-used DSLs. The experiment results reveal that the BERT-based TRANSQL cannot perform well for DSLs since they need extensive data for the fine-tuning phase. However, the GPT-based TRANSQL gives markedly better and more promising results.

Read the publication

Language

English

Author(s)

Shirin Tahmasebi
Amir Hossein Payberah
Ahmet Soylu
Titi Roman
Mihhail Matskin

Affiliation

SINTEF Digital / Sustainable Communication Technologies
Royal Institute of Technology
OsloMet - Oslo Metropolitan University

Year

2022

Publisher

IEEE (Institute of Electrical and Electronics Engineers)

Book

Proceedings 21st IEEE international conference on machine learning and applications : ICMLA 2022

ISBN

9781665462839

Page(s)

788 - 793

DOI

https://doi.org/10.1109/icmla55696.2022.00131

Read fulltext

https://hdl.handle.net/11250/3063329

View this publication at Norwegian Research Information Repository

Contact us

Our services

Career

Sustainability

Management and board

Institutes

Other units

About us

Follow us