Database Deep Dive: SQL

For years, I avoided SQL, convinced it was too complex. That changed the day I watched a teammate build a backend with pure SQL magic—no abstractions, just power. This is the story of how I went from dread to deep appreciation, and why learning SQL became my career’s turning point.

An illustration of SQL tables choking JSON documents

My SQL Journey: From Avoidance to Appreciation

For years, I avoided it like the plague.

Building an e-commerce platform that eventually failed taught me countless lessons about scale, architecture, and databases. But there was one thing I dreaded most—so much that I willingly dove into the weird JSON syntax of MongoDB and Elasticsearch just to keep my distance.

That thing was SQL.

It wasn't until five years into my professional journey that I witnessed a colleague building entire backend systems using nothing but PostgreSQL and SQL statements. No ORM layers. No complex abstractions. Just clean, efficient database queries that outperformed our overengineered solutions.

"Teach me," I finally said, swallowing my pride.

That moment changed everything.

I discovered that writing efficient, scalable applications mostly comes down to crafting smart SQL queries that allow databases to retrieve information quickly without burdening compute resources. Being a SQL beginner is simple enough, but developing proficiency in SQL and database internals can work wonders for your career.

This article is my attempt to relearn SQL—and hopefully help you appreciate its power too.

SQL: Not Dead, Just Misunderstood

An illustration of a stickman standing against all new proclaimed SQL replacements

SQL (Structured Query Language) has been proclaimed dead more times than I can count. Every few years, a new technology promises to replace it—from NoSQL databases to natural language interfaces powered by AI.

Yet SQL stubbornly refuses to die. In fact, it's been around since the 1970s, and it's likely to outlive us all. Even the systems that once positioned themselves as "SQL killers" now boast SQL compatibility layers.

There's a reason for this persistence: SQL works. It's declarative, meaning you describe what data you want rather than how to get it. This separation of concerns allows database systems to optimize the retrieval process behind the scenes.

A Brief History of SQL

A timeline showing the evolution of SQL as described below

SQL was born at IBM in the 1970s as SEQUEL (Structured English Query Language) as part of the System R project. The name later changed to SQL due to trademark issues. What's fascinating is that SQL isn't static—it continues to evolve through standardization:

  • SQL-92: The minimum standard that systems must support to claim SQL compatibility
  • SQL:1999: Added regular expressions and triggers
  • SQL:2003: Introduced XML support, window functions, and sequences
  • SQL:2008: Added truncation and advanced sorting capabilities
  • SQL:2011: Brought temporal databases and pipelined DML
  • SQL:2016: Added JSON support and polymorphic tables
  • SQL:2023: Introduced property graph queries and multi-dimensional arrays

The SQL standard is maintained by organizations like ANSI and ISO, though database vendors implement it to varying degrees, often adding their own proprietary extensions.

Understanding SQL's Core Components

SQL encompasses several language categories:

  1. Data Manipulation Language (DML): The verbs of SQL—SELECT, INSERT, UPDATE, and DELETE—that allow you to interact with data.
  2. Data Definition Language (DDL): Commands like CREATE, ALTER, and DROP that define and modify database structures.
  3. Data Control Language (DCL): Statements like GRANT and REVOKE that manage access permissions.

Despite its relational algebra foundations (based on sets with no duplicates or ordering), SQL actually operates on bags—collections that allow duplicates without defined order. This distinction becomes important when writing complex queries.

Modern SQL: Beyond the Basics

Let's dive into the powerful features that make modern SQL a force to be reckoned with.

Aggregation: Deriving Insights from Raw Data

Aggregation functions transform collections of tuples into single values, forming the foundation of data analysis in SQL:

-- Count students with email addresses in the CS department
SELECT COUNT(*) FROM student WHERE login LIKE '%@cs';

-- Find average GPA and student count for CS students
SELECT AVG(gpa), COUNT(sid) 
FROM student 
WHERE login LIKE '%@cs';

When working with aggregations, GROUP BY lets you perform calculations across subsets of data:

-- Average GPA per course
SELECT AVG(s.gpa), e.cid
FROM enrolled e JOIN student s ON e.sid = s.sid
GROUP BY e.cid;

The HAVING clause filters grouped results—think of it as WHERE for aggregated data:

-- Find courses with average GPA above 3.9
SELECT AVG(s.gpa) AS avg_gpa, e.cid
FROM enrolled e JOIN student s ON e.sid = s.sid
GROUP BY e.cid
HAVING avg_gpa > 3.9;

Window Functions: The SQL Superpower

Window functions might be SQL's most underappreciated feature. Unlike aggregations that collapse rows into a single result, window functions perform calculations across related rows while preserving individual records.

Think of them as "sliding calculations" that provide context for each row:

-- Assign row numbers to students in each course
SELECT cid, sid, ROW_NUMBER() OVER (PARTITION BY cid)
FROM enrolled 
ORDER BY cid;

-- Find the student with the second-highest grade per course
SELECT * FROM (
  SELECT *, RANK() OVER (PARTITION BY cid ORDER BY grade ASC) AS rank
  FROM enrolled
) AS ranking
WHERE ranking.rank = 2;

Window functions shine when you need:

  • Running totals or moving averages
  • Rankings and percentiles
  • Comparisons to previous/next records
  • Partitioned calculations across groups

Nested Queries: Queries within Queries

Nested queries (subqueries) allow SQL statements to contain other SQL statements, enabling complex logic within a single query:

-- Get names of students enrolled in course 15-445
SELECT name FROM student
WHERE sid IN (
  SELECT sid FROM enrolled
  WHERE cid = '15-445'
);

-- Find courses with no enrolled students
SELECT * FROM course
WHERE NOT EXISTS (
  SELECT * FROM enrolled
  WHERE course.cid = enrolled.cid
);

The power comes from how scoping works—inner queries can access attributes from outer queries, creating sophisticated data retrieval patterns.

Lateral Joins: The For-Loop of SQL

Lateral joins might be SQL's answer to the for-loop in procedural languages. The LATERAL keyword allows a subquery to reference columns from preceding items in the FROM clause:

-- For each course, calculate enrollment count and average GPA
SELECT * FROM course AS c,
LATERAL (SELECT COUNT(*) AS cnt FROM enrolled
         WHERE enrolled.cid = c.cid) AS t1,
LATERAL (SELECT AVG(gpa) AS avg FROM student AS s
         JOIN enrolled AS e ON s.sid = e.sid
         WHERE e.cid = c.cid) AS t2;

This creates a dependency chain where each query builds on the previous one's results—powerful for row-by-row processing.

Common Table Expressions: Temporary Tables for Clarity

Common Table Expressions (CTEs) provide a way to write auxiliary statements for use in larger queries, similar to temporary tables but with cleaner syntax:

-- Find student with highest ID enrolled in any course
WITH MaxSid AS (
  SELECT MAX(sid) AS sid FROM enrolled
)
SELECT student.sid, name
FROM student JOIN MaxSid
ON student.sid = MaxSid.sid;

CTEs make complex queries more readable by breaking them into named, logical components. With the RECURSIVE keyword, they can even implement iterative algorithms:

-- Generate numbers 1 through 10
WITH RECURSIVE CountingSeries(n) AS (
  SELECT 1
  UNION ALL
  SELECT n + 1 FROM CountingSeries WHERE n < 10
)
SELECT n FROM CountingSeries;

The SQL Multiverse: Vendor Differences

Multiple Spider-Men labeled with different languages point at each other, confusing a nearby cartoon database.

While SQL is standardized, implementations vary significantly across database systems. These differences become particularly apparent with date/time handling and string operations:

-- Getting current date (varies by system)
-- PostgreSQL
SELECT NOW();
-- Oracle
SELECT CURRENT_TIMESTAMP FROM DUAL;
-- SQL Server
SELECT GETDATE();

String operations also diverge:

  • String concatenation: || (standard), + (SQL Server), or CONCAT() function (MySQL)
  • Case sensitivity: varies by system (PostgreSQL is case-sensitive by default, MySQL isn't)

These inconsistencies make SQL both frustrating and fascinating—it's not just one language but a family of dialects.

Writing Efficient SQL: The Art and Science

The declarative nature of SQL means there are often many ways to express the same query, but not all approaches perform equally. Here are principles for efficient SQL:

  1. Use the right tool for the job: Window functions and CTEs often outperform nested subqueries
  2. Let the database do the work: Avoid row-by-row processing in application code
  3. Understand query plans: Use EXPLAIN to see how databases interpret your queries
  4. Minimize data transfer: Select only the columns you need
  5. Index strategically: Create indexes that match your query patterns

Here's how the same question can be answered multiple ways, with varying efficiency:

-- Find students taking at least one course (three approaches)

-- Using EXISTS (often efficient)
SELECT * FROM student s
WHERE EXISTS (SELECT 1 FROM enrolled e WHERE e.sid = s.sid);

-- Using IN (sometimes less efficient for large datasets)
SELECT * FROM student
WHERE sid IN (SELECT sid FROM enrolled);

-- Using JOIN (typically efficient)
SELECT DISTINCT s.* FROM student s
JOIN enrolled e ON s.sid = e.sid;

Beyond CRUD: Advanced SQL Applications

Modern SQL goes far beyond basic CRUD operations. Today's databases leverage SQL for:

  • Analytical queries: OLAP operations with aggregations across millions of records
  • Graph traversal: Property graph queries in SQL:2023 for relationship-based data
  • Full-text search: Text indexing and retrieval capabilities
  • JSON manipulation: Native JSON operations in many modern systems
  • Machine learning: In-database ML functions in systems like PostgreSQL

Conclusion: SQL's Enduring Power

A confident stick figure stands atop an “SQL” pillar, connected to others at desks, symbolizing SQL’s enduring importance.

My journey from SQL avoidance to appreciation taught me something crucial: mastering fundamentals often yields better results than chasing the latest trends.

SQL has remained relevant for over 40 years because it solves a fundamental problem well—describing what data you want in a way that allows database systems to retrieve it efficiently. While new paradigms come and go, SQL continues to evolve, absorbing the best ideas and adapting to new challenges.

Whether you're building a small web application or architecting enterprise systems, investing time in SQL proficiency will pay dividends throughout your career. It's not just about knowing syntax—it's about understanding how databases think and work with them rather than against them.

So the next time you're tempted to reach for a fancy ORM or new NoSQL database just to avoid writing SQL, remember: SQL isn't just alive; it's thriving. And it might just be the superpower your applications need.

References

  1. Pavlo, A. "Lecture #02: Modern SQL", 15-445/645 Database Systems, Carnegie Mellon University, Fall 2024.
  2. Date, C.J. "SQL and Relational Theory: How to Write Accurate SQL Code"
  3. SQL Standard - ISO/IEC 9075:2023, "Information technology — Database languages — SQL"
  4. PostgreSQL Documentation: https://www.postgresql.org/docs/
  5. Microsoft SQL Server Documentation: https://docs.microsoft.com/en-us/sql/
  6. MySQL Documentation: https://dev.mysql.com/doc/
  7. DuckDB Documentation: https://duckdb.org/docs/
  8. SQLite Documentation: https://www.sqlite.org/docs.html

Subscribe to Vade Bytes

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe