A lot of people without technical backgrounds are learning SQL for data analysis or retrieving data from databases. But I see a lot of tutorials using some weird style when formatting the SQL code. There are even important brands promoting weird formatting in SQL for “clarity” but to my eyes, it is the opposite.

Even Google in its Data Analysis certificate at Coursera has styled for SQL code that I never saw in production or books before. The style is distracting for someone that has used SQL and I find it more difficult to scan when inspecting the code.

To illustrate, I will create a small query in Microsoft SQL Server Management Studio with the AdventureWorksLT2019 database. The table is the SalesLt.Customer (Customer table).

I see tutorials and examples showing something like this:

SELECT 
  FirstName,
  LastName,
  CompanyName,
  EmailAddress,
  Phone
FROM 
  SalesLT.Customer;

You might like this style. However, this is not a best practice as all the documentation you will find avoids the use of breaking lines. Therefore, the previous example should be rewritten like this:

SELECT FirstName,
       LastName,
       CompanyName,
       EmailAddress,
       Phone
  FROM SalesLT.Customer;

In my humble opinion, this is easier to read and takes fewer lines on the screen. It is necessary to remember that this is just a minor example, but production queries are way longer and more complicated. Thus, if the code is self-explanatory based on your coding style, it is better and easier to maintain.

I had the luck of working with SQL Server, MySQL, and Oracle almost 15 years ago while I was in charge of a very large database for telephone company. Never have I created or seen previous maintainers working without clear guidelines because SQL is a programming language that must be structured in a way that shows what data is produced and where. For instance, if you have a subquery, the code should be organized that tells you what is a subquery and what data is produced by just watching it. The following example displays the SalesOrderID and CustomerID on sales over 1000 dollars:

SELECT
  soh.SalesOrderID, 
  c.CustomerID 
FROM 
   SalesLT.SalesOrderHeader soh, 
   SalesLT.Customer c
WHERE 
  soh.CustomerID = c.CustomerID
  AND 
  SalesOrderID 
  IN 
  (SELECT 
      SalesOrderID 
   FROM 
      SalesLT.SalesOrderDetail
   GROUP BY 
     SalesOrderID
HAVING 
    SUM(LineTotal) > 1000)

I don’t think that this is a good way to write SQL queries because I don’t even know where to start. However, the same query can be written the following way which indicates visually that there is a subquery.

SELECT soh.SalesOrderID, 
       c.CustomerID 
  FROM SalesLT.SalesOrderHeader soh,
       SalesLT.Customer c
 WHERE soh.CustomerID = c.CustomerID
   AND SalesOrderID IN (SELECT SalesOrderID 
                          FROM SalesLT.SalesOrderDetail
                      GROUP BY SalesOrderID
                        HAVING SUM(LineTotal) > 1000)

As with any programming language, SQL code needs to be maintained; and how we organize it matters at the moment of reading it. SQL is not new and we should not reinvent the wheel.

If you are starting with databases and SQL, I highly recommend the SQL Style Guide by Simon Holywell which is the one that I found the most accurate in terms of programming in SQL. Of course, the environment influences the best practices and how you name things. For instance, SQL Server tables usually have field names in camel case while other database systems use snake cases. But, in general, the SQL Style Guide is a great start.

I hope you find this post useful and start following a style guide so your code is easier to scan.

«
»