libsqlglot

What this is

sqlglot, in C++. 45 dialects, 126× faster on benchmark SQL, 252× on the kind your ORM generates when nobody's looking. Performance gap scales with query complexity, see Benchmarks.

Supports stored procedures (PL/pgSQL, T-SQL, MySQL, PL/SQL): where sqlglot falls back to passthrough, libsqlglot parses them into the AST.

What it's for

Anywhere SQL parsing meets the hot path: proxies, sidecars, migration, linters, Git pre-commit hooks, CI/CD pipelines. Also a replacement for every regex that's pretending to be a parser.

Why this exists

Because life is too short to parse SQL in Python.

Inspired by the original sqlglot, which did the decade-long work of mapping 31+ SQL dialects into an elegant, universal AST. libsqlglot does the comparatively trivial work of compiling it: the algorithm was already O(n), the runtime wasn't.

Functionality

Transpiles SQL dialects via sqlglot AST. Full stored procedure support. Fail-fast errors with exact line and column. Python bindings available at 95-98% of C++ speed (import libsqlglot as sqlglot and go).

Handles the full SQL surface: SELECT, INSERT, UPDATE, DELETE, CREATE TABLE, ALTER TABLE, DROP TABLE, TRUNCATE, MERGE, plus stored procedures (CALL, RETURN, DECLARE, IF/ELSEIF/ELSE, WHILE, FOR loops). Also handles CTEs, window functions, subqueries, and various JOIN types.

Compatible with 45 dialects (see Supported SQL dialects for the full list).

Quickstart

C++

#include <libsqlglot/transpiler.h>

using namespace libsqlglot;

// Transpile between dialects (MySQL → PostgreSQL)
std::string output = Transpiler::transpile(
    "SELECT `id`, `name` FROM `users` LIMIT 10",
    Dialect::MySQL,
    Dialect::PostgreSQL
);
// Returns: SELECT "users"."id", "users"."name" FROM "users" LIMIT 10

// Parse SQL into AST
Arena arena;
auto stmt = Transpiler::parse(arena, "SELECT name FROM users WHERE age > 18");
// Returns: AST with SelectStmt node

// Optimise AST (column qualification, predicate pushdown, constant folding)
Transpiler::optimize(arena, stmt);
// Modifies AST: name → users.name, age → users.age

// Generate SQL from AST
std::string sql = Transpiler::generate(stmt, Dialect::PostgreSQL);
// Returns: SELECT users.name FROM users WHERE users.age > 18

See Supported SQL dialects for all available Dialect:: values.

Python

Available on PyPI: pip install libsqlglot

System requirements: Linux with glibc 2.35+ (Ubuntu 22.04+, Debian 12+, RHEL 9+)

import libsqlglot as sqlglot

# Transpile
sql = "SELECT `id`, `name` FROM `users` LIMIT 10"
result = sqlglot.transpile(sql, read="mysql", write="postgres")
result = sqlglot.transpile(sql, "mysql", "postgres") # Both styles work

# Parse
stmt = sqlglot.parse_one(sql) # single statement
stmts = sqlglot.parse("SELECT 1; SELECT 2") # multiple statements

# Generate
sql = stmt.sql() # default ANSI
sql = stmt.sql(dialect="postgres", pretty=True) # pretty-print
sql = stmt.sql(dialect="sqlserver") # TRUE → 1

# AST traverse
columns = stmt.find_all(sqlglot.ExprType.COLUMN)
tables = stmt.find_all(sqlglot.ExprType.TABLE_REF)
stmt.walk(lambda n: print(n.type))

# Optimise
optimized = sqlglot.optimize(stmt)

# Builder
stmt = (sqlglot.select(["id", "name"])
        .from_("users")
        .where("active = TRUE")
        .order_by("name")
        .limit(10))

# Diff
diff = sqlglot.diff("SELECT id FROM users", "SELECT user_id FROM users")

Unix Pipes

libsqlglot can be used in Unix pipelines via the Python CLI:

# Basic transpilation
echo "SELECT \`id\` FROM \`users\`" | python3 -m libsqlglot -r mysql -w postgres
# Output: SELECT "id" FROM "users"

# Pipeline with grep
cat queries.sql | python3 -m libsqlglot -r mysql -w bigquery | grep "SELECT"

# Process multiple files
cat *.sql | python3 -m libsqlglot -r sqlserver -w postgres > output.sql

# Combine with other tools
find . -name "*.sql" -exec cat {} \; | python3 -m libsqlglot -r mysql -w postgres | wc -l

The CLI reads SQL from stdin and writes transpiled SQL to stdout, making it composable with standard Unix tools.

See Supported SQL dialects for all available dialect names.

Python API: parse(), parse_one(), generate(), transpile(), optimize(), diff(), .sql(), .find_all(), .walk(), select() builder.

Performance: 95-98% of C++ speed. Overhead is ~125ns per call (10ns function call + 100ns string marshal + 15ns misc). On typical 2.5μs parse, this is 5% overhead. On complex 25μs parse, 0.5% overhead.

Memory: Thread-local arenas, no runtime dependencies.

Differences from original sqlglot

	libsqlglot	Python sqlglot
Performance	32-242× faster (126× avg)	Baseline
Stored procedures	Support for PL/pgSQL, T-SQL, MySQL, PL/SQL	Limited (`exp.Command` passthrough)
Error handling	Fail-fast with precise errors (line, column, context)	Error recovery (IDE-friendly, slower)
Memory	Arena allocation (O(1) cleanup)	Garbage collection
Optimiser	Column qualification, predicate pushdown, constant folding, subquery elimination	Same + additional passes + full execution engine
Codebase	Header-only C++26 library	50,000+ lines Python
Keywords	C++26 reflection: auto-generated from enum (300+ keywords, zero maintenance)	Manually maintained dictionaries
Binary	C++ library 14KB, Python wheel 1.5MB	N/A
Dialects	45 SQL dialects (14 unique to libsqlglot, including ANSI)	32 dialects (including PRQL, which libsqlglot doesn't support)
SQL coverage	Same as Python sqlglot	Full SQL support
Dependencies	None (no runtime deps)	None

Building

Requires C++26 (GCC 14+ with -freflection) and CMake 3.21+.

C++26 features used:

Keyword reflection (std::meta): Auto-generates 300+ keyword mappings from TokenType enum at compile time. Zero maintenance, impossible to desync.
Dialect reflection: Build-time code generation parses the Dialect enum and generates compile-time mappings (CMake → Python script → generated header). When GCC fixes the reflection bug, will switch to pure C++26 reflection.
ExprType bindings: Python bindings auto-generated from the C++ ExprType enum (124 values). Run python3 scripts/generate_expr_type_bindings.py > src/python/expr_type_bindings_generated.h after modifying expression.h.
Advanced constexpr: Perfect hash tables, compile-time string processing.

Docker (Recommended)

The easiest way to build with GCC trunk + reflection support:

# Build the project (first build takes 30-45 min to compile GCC trunk)
docker compose -f docker/docker-compose.yml run --rm build

# Run tests
docker compose -f docker/docker-compose.yml run --rm test

# Build Python wheel
docker compose -f docker/docker-compose.yml run --rm wheel

# Development shell
docker compose -f docker/docker-compose.yml run --rm dev

See docker/README.md for full documentation.

C++ library (native)

Requires GCC trunk built from source with -freflection support:

cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)
sudo cmake --install build

Python package

pip install -e .

Building with tests and benchmarks

cmake -B build -DCMAKE_BUILD_TYPE=Release \
               -DLIBSQLGLOT_BUILD_TESTS=ON \
               -DLIBSQLGLOT_BUILD_BENCHMARKS=ON
cmake --build build -j$(nproc)
ctest --test-dir build

Compiled sizes (stripped, -O3): C++ library 14KB, Python extension 2.0MB.

Code quality: Compiles with -Wall -Wextra -Wpedantic -Werror. No runtime dependencies. No RTTI. Passes 27,040 assertions across 361 test cases. Fuzz-tested with libFuzzer + AddressSanitizer.

Advanced optimisations

Compiler optimisations (Release builds): Link-Time Optimisation (LTO), aggressive inlining, constant folding, constant merging, symbol visibility optimisation. All enabled by default with -DCMAKE_BUILD_TYPE=Release.

Architecture-specific builds: For maximum performance on your specific CPU, build with native architecture optimisations:

cmake -B build -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-march=native"
cmake --build build

This enables CPU-specific instructions (AVX2, AVX-512, etc.) for your exact processor, typically 5-15% faster than generic builds.

Benchmarking: Comprehensive benchmark suite available. Build with -DLIBSQLGLOT_BUILD_BENCHMARKS=ON to measure performance on your workload.

Architecture

Header-only design: you only pay for what you use. 19 header files, no .cpp. See include/libsqlglot/ for the full layout. Core files: parser.h (4203 lines), generator.h (2149), expression.h (1385, 115 expression types). Entry point is transpiler.h (86 lines).

Memory management

Arena allocation: all AST nodes allocated in contiguous chunks, freed together in O(1) time. String interning deduplicates identifiers. Tokenisation is zero-copy via string_view. Everything uses RAII, no manual delete calls.

Grammar pipeline

Grammar definitions from multiple sources (ANTLR and normalised JSON specs) are unified into a canonical IR, then compiled into a cache-efficient LUT used by the runtime parser.

SQL support

Statements: SELECT, INSERT, UPDATE, DELETE, MERGE, CREATE TABLE, ALTER TABLE, DROP TABLE, DROP INDEX, TRUNCATE, CREATE VIEW, DROP VIEW, CREATE SCHEMA, DROP SCHEMA

Stored procedures: CALL, RETURN, DECLARE, IF/ELSEIF/ELSE/END IF, WHILE/DO/END WHILE, FOR/IN/LOOP/END LOOP

Clauses: WHERE, GROUP BY, HAVING, ORDER BY, LIMIT, OFFSET, QUALIFY

JOINs: INNER, LEFT, RIGHT, FULL, CROSS, NATURAL, ASOF

Advanced: CTEs (WITH RECURSIVE), window functions (PARTITION BY, ORDER BY, ROWS/RANGE, named windows), subqueries, PIVOT/UNPIVOT

Operators: Arithmetic, comparison, logical, LIKE, ILIKE, IS NULL, BETWEEN, IN, EXISTS

Expressions: CASE WHEN, arrays, array indexing, set operations (UNION, INTERSECT, EXCEPT)

Transactions: BEGIN, COMMIT, ROLLBACK, SAVEPOINT

Utilities: SET, DESCRIBE, SHOW, EXPLAIN

Testing

361 test cases, 27,040 assertions, all passing.

cd build
ctest --output-on-failure

# Run specific test suites
./tests/libsqlglot_tests "[parser]"
./tests/libsqlglot_tests "[security]"
./tests/libsqlglot_tests "[stored_procedures]"
./tests/libsqlglot_tests "[performance]"

# Run benchmarks
./benchmarks/bench_tokenizer
./benchmarks/bench_parser
./benchmarks/bench_transpiler

Security

27,040 assertions covering SQL injection, buffer overflow, stack overflow (recursion depth at 256, adjustable via Parser::kMaxRecursionDepth in parser.h), memory corruption (arena prevents use-after-free and double-free), integer overflow, and encoding attacks (UTF-8 identifiers rejected, UTF-8 string literals accepted). All pass.

Fuzzing

There is a libFuzzer target in fuzzing/fuzz_parser.cpp:

cd fuzzing
clang++ -fsanitize=fuzzer,address -std=c++26 -freflection -I../include fuzz_parser.cpp -o fuzz_parser
./fuzz_parser -max_len=10000 -timeout=10

Compatibility testing

Test compatibility with Python sqlglot's test fixtures:

cd compat
python sqlglot_compat.py --sqlglot-path /path/to/sqlglot --limit 1000

Errors

libsqlglot provides precise, actionable error messages with exact locations:

Arena arena;
Parser parser(arena, "SELECT FROM users");  // Missing column list

// Throws: Line 1, column 8: Expected column list after SELECT (found: 'FROM')

All parse errors include line number, column number, the actual token that caused the error, and a human-readable explanation of what was expected. Fail-fast, no cascading secondary errors.

Examples

Dialect conversion

// MySQL -> PostgreSQL
std::string mysql_sql = "SELECT `user_id`, `name` FROM `users` LIMIT 10";
std::string pg_sql = Transpiler::transpile(
    mysql_sql, Dialect::MySQL, Dialect::PostgreSQL
);
// Output: SELECT "users"."user_id", "users"."name" FROM "users" LIMIT 10

Building AST by hand

Arena arena;
auto stmt = arena.create<SelectStmt>();

// SELECT *
stmt->columns.push_back(arena.create<Star>());

// FROM users
stmt->from = arena.create<TableRef>("users");

// WHERE age > 18
auto age_col = arena.create<Column>("age");
auto threshold = arena.create<Literal>("18");
stmt->where = arena.create<BinaryOp>(ExprType::GT, age_col, threshold);

std::string sql = Generator::generate(stmt);
// "SELECT * FROM users WHERE age > 18"

Optimisation

Arena arena;
auto stmt = Transpiler::parse(arena,
    "SELECT name FROM users WHERE age > 18");

// Apply optimisation passes
Transpiler::optimize(arena, stmt);

// Optimisations include:
// - Column qualification (name -> users.name)
// - Predicate pushdown
// - Constant folding
// - Subquery elimination

std::string sql = Transpiler::generate(stmt);
// "SELECT users.name FROM users WHERE users.age > 18"

Stored procedures (PL/pgSQL, T-SQL, MySQL, Oracle)

Arena arena;
std::string plpgsql = R"(
    FOR i IN 1..100 LOOP
        IF i > 50 THEN
            RETURN i;
        END IF
    END LOOP
)";

auto stmt = Transpiler::parse(arena, plpgsql);
std::string output = Transpiler::generate(stmt, Dialect::PostgreSQL);

Transpiling between dialects

// Real-world analytics query: CTEs, window functions, multiple JOINs
std::string mysql_query = R"(
    WITH regional_sales AS (
        SELECT
            region,
            product_id,
            SUM(amount) as total_sales,
            COUNT(order_id) as order_count,
            ROW_NUMBER() OVER (PARTITION BY region ORDER BY SUM(amount) DESC) as sales_rank
        FROM orders
        WHERE order_date >= '2024-01-01'
        GROUP BY region, product_id
        HAVING SUM(amount) > 10000
    ),
    top_regions AS (
        SELECT region
        FROM regional_sales
        WHERE sales_rank <= 5
        GROUP BY region
        HAVING COUNT(*) > 3
    )
    SELECT
        r.region,
        r.product_id,
        p.product_name,
        r.total_sales,
        r.order_count,
        RANK() OVER (ORDER BY r.total_sales DESC) as overall_rank,
        LAG(r.total_sales) OVER (ORDER BY r.total_sales DESC) as prev_sales
    FROM regional_sales r
    INNER JOIN products p ON r.product_id = p.id
    INNER JOIN top_regions t ON r.region = t.region
    WHERE r.sales_rank <= 10
    ORDER BY r.total_sales DESC
    LIMIT 100
)";

// MySQL → BigQuery
std::string bigquery = Transpiler::transpile(
    mysql_query, Dialect::MySQL, Dialect::BigQuery
);
// Full semantic preservation: CTEs, window functions, JOINs all intact

// Round-trip: MySQL → BigQuery → PostgreSQL → Snowflake → MySQL
std::string step1 = Transpiler::transpile(mysql_query, Dialect::MySQL, Dialect::BigQuery);
std::string step2 = Transpiler::transpile(step1, Dialect::BigQuery, Dialect::PostgreSQL);
std::string step3 = Transpiler::transpile(step2, Dialect::PostgreSQL, Dialect::Snowflake);
std::string back_to_mysql = Transpiler::transpile(step3, Dialect::Snowflake, Dialect::MySQL);
// Query semantics preserved across 4 dialect conversions

// Simple transformations: PostgreSQL → SQL Server
std::string pg_query = "SELECT * FROM users WHERE active = TRUE LIMIT 10";
Arena arena;
Parser parser(arena, pg_query);
auto stmt = parser.parse_select();
std::string sql_server = Generator::generate(stmt, Dialect::SQLServer);
// Result: SELECT TOP 10 * FROM users WHERE active = 1
// LIMIT → TOP, TRUE → 1

// Multi-dialect pipeline: Parse once, generate for multiple targets
Arena shared_arena;
auto ast = Transpiler::parse(shared_arena, "SELECT name FROM users WHERE age > 18");

std::string postgres_sql = Generator::generate(ast, Dialect::PostgreSQL);
std::string mysql_sql = Generator::generate(ast, Dialect::MySQL);
std::string duckdb_sql = Generator::generate(ast, Dialect::DuckDB);
std::string snowflake_sql = Generator::generate(ast, Dialect::Snowflake);
// Single parse, multiple outputs - efficient for multi-target scenarios

Benchmarks

Benchmarks run on x86-64 Linux with -O3 optimisation. libsqlglot compared against pure Python sqlglot 30.0.1. Python bindings add ~125ns overhead (95-98% of C++ performance).

What we measure: Full parse + generate round-trip (SQL → AST → SQL). No optimisation applied in either implementation. Both produce identical output, proving identical work done. Apples-to-apples comparison of parser and generator performance.

Measurement: std::chrono::high_resolution_clock with 1000 iterations per query, averaged.

The 16 standard queries are sqlglot's benchmark. The 8 stress tests are ours, excluded from the average. They're here to show the scaling doesn't stop: 178.6× on benchmarks, 252× on these. What happens past 252× is left as an exercise for the reader.

Standard benchmarks (16 queries)

Query	sqlglot (μs)	libsqlglot (μs)	Speedup
many_ctes	1,097.84	4.53	242.4×
many_joins	1,038.03	5.81	178.7×
nested_functions	892.91	6.27	142.4×
nested_subqueries	495.97	2.79	177.8×
many_unions	2,585.04	17.30	149.4×
tpch	1,110.16	7.74	143.4×
complex_where	575.69	4.27	134.8×
many_windows	933.16	13.00	71.8×
deep_arithmetic	388.10	2.86	135.7×
many_columns	1,643.41	14.81	110.9×
values	15,951.28	162.90	97.9×
large_case	4,480.21	35.30	126.9×
large_in	11,756.63	116.38	101.0×
many_numbers	6,463.57	67.71	95.5×
short	118.61	1.54	77.0×
large_strings	146.92	4.54	32.4×

Average: 126.1× faster (range: 32.4× to 242.4×). A million queries: 29 seconds vs 52 minutes.

libsqlglot achieves this through:

Memory: Arena allocation (O(1) cleanup), string interning (pointer equality), zero-copy tokenisation (string_view)
Algorithms: Perfect hash keyword lookup (O(1)), branchless uppercase conversion (no branch misprediction)
Compiler: LTO (whole-program optimisation), aggressive inlining, constant folding, C++23 constexpr
Cache: Contiguous memory layout, spatial locality, no per-node malloc/new fragmentation

Stress tests (8 queries, supplementary)

Designed to break parsers: 15-level nested CTEs, 35-level CASE expressions, 100+ WHERE conditions, 20-way joins. 1,186 lines of SQL, 51KB total.

Query	sqlglot (μs)	libsqlglot (μs)	Speedup
Correlated subqueries	14,400.86	60.60	237.6×
Complex subqueries	18,414.37	73.18	251.6×
Complex string functions	15,609.81	71.42	218.6×
Deep CTE nesting	17,214.85	82.36	209.0×
Union chains	24,075.57	115.77	208.0×
Complex WHERE clause	14,499.12	81.99	176.8×
Deep CASE nesting	14,651.62	103.10	142.1×
Multi-table joins	13,862.49	141.16	98.2×

Average: 192.7× faster (range: 98.2× to 251.6×). A million queries: 86 seconds vs 4.6 hours.

Validation

# Python sqlglot: 3,917 μs
parsed = sqlglot.parse_one("SELECT * FROM users WHERE age > 18")
output = parsed.sql()  # "SELECT * FROM users WHERE age > 18"

// libsqlglot: 45 μs (87× faster)
Arena arena;
auto ast = Transpiler::parse(arena, "SELECT * FROM users WHERE age > 18");
auto output = Transpiler::generate(ast);  // "SELECT * FROM users WHERE age > 18"

See benchmarks/bench_complete_comparison.py to reproduce.

Supported SQL dialects

45 dialects with full parse and generation support. Use Dialect::Name in C++ or "name" strings in Python (e.g., read="mysql", write="postgres").

Note: Dialect names are case-sensitive (e.g., Dialect::PostgreSQL, not Dialect::postgresql).

Each dialect includes proper identifier quoting, keyword handling, function name translation, and syntax transformations (e.g. LIMIT vs TOP vs FETCH FIRST).

If somehow a dialect is missing from the map, get_features() returns ANSI SQL defaults.

Core dialects (explicit configurations)

Database	Dialect	Supported features
ANSI SQL	ANSI	Standard SQL compliance: ANSI joins, standard aggregations, CTEs
BigQuery	BigQuery	STRUCT types, ARRAY literals `[1,2,3]`, SAFE_CAST, INT64/STRING types, nested field access
Calcite	Calcite	TABLESAMPLE BERNOULLI, Apache Calcite optimiser hints
ClickHouse	ClickHouse	Column-oriented syntax, MergeTree engines, SAMPLE BY, ARRAY JOIN
DB2	DB2	FETCH FIRST n ROWS ONLY, DB2 stored procedures, OLAP functions
Drill	Drill	Schema-free JSON queries, nested data access, FLATTEN
DuckDB	DuckDB	QUALIFY clause, ASOF joins, PIVOT/UNPIVOT, macro functions, LIST type
Hive	Hive	PARTITIONED BY, CLUSTERED BY, SerDe formats, Hive UDFs
MySQL	MySQL	Backtick identifiers, MySQL-specific functions, storage engines
Oracle	Oracle	CONNECT BY hierarchical queries, PRIOR, START WITH, DUAL table, PL/SQL blocks
Phoenix	Phoenix	HBase integration: SALT_BUCKETS, ARRAY_APPEND, UPSERT VALUES
Pinot	Pinot	Real-time OLAP: segment pruning, star-tree indexes, broker queries
PostgreSQL	PostgreSQL	RETURNING, ON CONFLICT, LATERAL joins, window functions, array types, JSONB
Presto	Presto	APPROX_DISTINCT, UNNEST, ROW types, lambda functions
Snowflake	Snowflake	FLATTEN for JSON, VARIANT type, TIME_TRAVEL, CLUSTER BY, RESULT_SCAN
Solr	Solr	score() relevance function, faceted search, Lucene query syntax
Spark	Spark	NULL-SAFE equality `<=>`, Hive metastore, broadcast hints, cache table
SQL Server	SQLServer	T-SQL syntax: TOP, IDENTITY, OUTPUT clause, EXEC, GO batches, temp tables `#`
SQLite	SQLite	Minimal SQL: no RIGHT JOIN, PRAGMA commands, autoincrement
Teradata	Teradata	MULTISET tables, BTEQ syntax, FastLoad/MultiLoad hints

Inherited dialect configurations

These dialects inherit features from a compatible base dialect and add specific extensions.

Database	Dialect	Inherits from	Supported features
Athena	Athena	Presto	AWS Athena Presto syntax, S3 partitioning
CockroachDB	CockroachDB	PostgreSQL	UPSERT statement, distributed transactions, RETURNING clause
Databricks	Databricks	Spark	OPTIMIZE tables, ZORDER BY clustering, Delta Lake operations
Doris	Doris	MySQL	DUPLICATE KEY model, BUCKETS distribution, DISTRIBUTED BY HASH
Dremio	Dremio	Presto	CREATE REFLECTION for materialisation, data lakehouse queries
Druid	Druid	MySQL	TIME_FLOOR for time bucketing, approximate aggregations, roll-up
Dune	Dune	PostgreSQL	Analytics: bytearray_to_uint256, ETH address functions
Exasol	Exasol	PostgreSQL	DISTRIBUTE BY for parallel execution, Lua scripting UDFs
Fabric	Fabric	SQL Server	Three-part lakehouse.schema.table naming, OneLake integration
Greenplum	Greenplum	PostgreSQL	DISTRIBUTED BY/RANDOMLY, column/append-optimised tables, GPORCA optimiser
Impala	Impala	Hive	COMPUTE STATS, Kudu integration, CACHED IN pools
MariaDB	MariaDB	MySQL	MySQL-compatible with RETURNING, window functions, JSON functions
Materialize	Materialize	PostgreSQL	TAIL for streaming results, materialised views, temporal filters
Netezza	Netezza	PostgreSQL	DISTRIBUTE ON distribution keys, zone maps, statistics
Redshift	Redshift	PostgreSQL	DISTKEY distribution, SORTKEY ordering, SUPER type (JSON)
RisingWave	RisingWave	PostgreSQL	EMIT CHANGES for streaming, temporal joins, watermarks
SingleStore	SingleStore	MySQL	VECTOR type for embeddings, DOT_PRODUCT, columnstore/rowstore
Spark2	Spark2	Spark	Legacy Spark 2.x: CACHE TABLE, broadcast joins, RDD compatibility
StarRocks	StarRocks	MySQL	Vectorised execution, primary key model, materialised views, bitmap indexes
Tableau	Tableau	PostgreSQL	ZN() null-to-zero, Tableau calculation functions, RAWSQL passthrough
Teradata	Teradata	Teradata	MULTISET tables, BTEQ syntax, FastLoad/MultiLoad hints
TiDB	TiDB	MySQL	AUTO_RANDOM for distributed primary keys, MySQL compatibility, TiKV storage
TimescaleDB	TimescaleDB	PostgreSQL	time_bucket() for time-series, hypertables, continuous aggregates
Trino	Trino	Presto	Presto-compatible: UNNEST, lambda expressions, ROW types
Vertica	Vertica	PostgreSQL	CREATE PROJECTION for physical design, SEGMENTED BY HASH, columnar storage
YugabyteDB	YugabyteDB	PostgreSQL	SPLIT INTO n TABLETS, distributed SQL, PostgreSQL compatibility

Contributing

libsqlglot is currently a solo project. Bug reports, test cases, and dialect edge cases are welcome via GitHub issues. If you have a dialect you wish to see added, please open an issue or PR.

If a query parses incorrectly, or a dialect transformation that produces wrong output, please open an issue with the input SQL, source dialect, target dialect, expected output and any other pertinent details.

Pull requests are considered but there is no guarantee of merge. The codebase is intentionally small and opinionated.

How about a Rust rewrite?

No.

Licence

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
.githooks		.githooks
.github/workflows		.github/workflows
benchmarks		benchmarks
cmake		cmake
compat		compat
dialects		dialects
docker		docker
docs		docs
external/lemon		external/lemon
fuzzing		fuzzing
generated/grammar_data		generated/grammar_data
include/libsqlglot		include/libsqlglot
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

libsqlglot

What this is

What it's for

Why this exists

Contents

Functionality

Quickstart

C++

Python

Unix Pipes

Differences from original sqlglot

Building

Docker (Recommended)

C++ library (native)

Python package

Building with tests and benchmarks

Advanced optimisations

Architecture

Memory management

Grammar pipeline

SQL support

Testing

Security

Fuzzing

Compatibility testing

Errors

Examples

Dialect conversion

Building AST by hand

Optimisation

Stored procedures (PL/pgSQL, T-SQL, MySQL, Oracle)

Transpiling between dialects

Benchmarks

Standard benchmarks (16 queries)

Stress tests (8 queries, supplementary)

Validation

Supported SQL dialects

Core dialects (explicit configurations)

Inherited dialect configurations

Contributing

How about a Rust rewrite?

Licence

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages