Keyboard shortcuts

Press ← or β†’ to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

readstat-rs

Read, inspect, and convert SAS binary (.sas7bdat) files β€” from Rust code, the command line, or the browser. Converts to CSV, Parquet, Feather, and NDJSON using Apache Arrow.

The original use case was a command-line tool for converting SAS files, but the project has since expanded into a workspace of crates that can be used as a Rust library, a CLI, or compiled to WebAssembly for browser and JavaScript runtimes.

πŸ”‘ Dependencies

The command-line tool is developed in Rust and is only possible due to the following excellent projects:

The ReadStat library is used to parse and read sas7bdat files, and the arrow crate is used to convert the read sas7bdat data into the Arrow memory format. Once in the Arrow memory format, the data can be written to other file formats.

πŸ’‘ Note: The ReadStat C library supports SAS, SPSS, and Stata file formats. The readstat-sys crate exposes the full ReadStat API β€” all 125 functions across all formats. However, the higher-level crates (readstat, readstat-cli, readstat-wasm, readstat-tests) currently only implement support for SAS .sas7bdat files.

πŸš€ CLI Quickstart

Convert the first 50,000 rows of example.sas7bdat (by performing the read in parallel) to the file example.parquet, overwriting the file if it already exists.

readstat data /some/dir/to/example.sas7bdat --output /some/dir/to/example.parquet --format parquet --rows 50000 --overwrite --parallel

πŸ“¦ CLI Install

Download a Release

[Mostly] static binaries for Linux, macOS, and Windows may be found at the Releases page.

Setup

Move the readstat binary to a known directory and add the binary to the user’s PATH.

Linux & macOS

Ensure the path to readstat is added to the appropriate shell configuration file.

Windows

For Windows users, path configuration may be found within the Environment Variables menu. Executing the following from the command line opens the Environment Variables menu for the current user.

rundll32.exe sysdm.cpl,EditEnvironmentVariables

Alternatively, update the user-level PATH in PowerShell (replace C:\path\to\readstat with the actual directory):

$currentPath = [Environment]::GetEnvironmentVariable("Path", "User")
[Environment]::SetEnvironmentVariable("Path", "$currentPath;C:\path\to\readstat", "User")

After running the above, restart your terminal for the change to take effect.

Run

Run the binary.

readstat --help

βš™οΈ CLI Usage

The binary is invoked using subcommands:

  • metadata β†’ writes file and variable metadata to standard out or JSON
  • preview β†’ writes the first N rows of parsed data as csv to standard out
  • data β†’ writes parsed data in csv, feather, ndjson, or parquet format to a file

Column metadata β€” labels, SAS format strings, and storage widths β€” is preserved in Parquet and Feather output as Arrow field metadata. See docs/TECHNICAL.md for details.

For the full CLI reference β€” including column selection, parallelism, memory considerations, SQL queries, reader modes, and debug options β€” see docs/USAGE.md.

For library, API server, and WebAssembly usage, see Examples below.

πŸ› οΈ Build from Source

Clone the repository (with submodules), install platform-specific developer tools, and run cargo build. Platform-specific instructions for Linux, macOS, and Windows are in docs/BUILDING.md.

πŸ’» Platform Support

PlatformStatusC libraryNotes
Linux (glibc)βœ… Builds and runsSystem iconv, system zlibβ€”
Linux (musl)βœ… Builds and runsSystem iconv, system zlibβ€”
macOSβœ… Builds and runsSystem libiconv, system zlibβ€”
Windows (MSVC)βœ… Builds and runsVendored iconv, vendored zlibRequires libclang for bindgen. MSVC supported since ReadStat 1.1.5 (no msys2 needed).

πŸ“š Documentation

DocumentDescription
docs/ARCHITECTURE.mdCrate layout, key types, and architectural patterns
docs/USAGE.mdFull CLI reference and examples
docs/BUILDING.mdClone, build, and linking details per platform
docs/TECHNICAL.mdFloating-point precision and date/time handling
docs/TESTING.mdRunning tests, dataset table, valgrind
docs/BENCHMARKING.mdCriterion benchmarks, hyperfine, and profiling
docs/CI-CD.mdGitHub Actions triggers and artifacts
docs/MEMORY_SAFETY.mdAutomated memory-safety CI checks (Valgrind, ASan, Miri, unsafe audit)
docs/RELEASING.mdStep-by-step guide for publishing crates to crates.io

🧩 Workspace Crates

CratePathDescription
readstatcrates/readstat/Pure library for parsing SAS files into Arrow RecordBatch format. Output writers are feature-gated.
readstat-clicrates/readstat-cli/Binary crate producing the readstat CLI tool (arg parsing, progress bars, orchestration).
readstat-syscrates/readstat-sys/Raw FFI bindings to the full ReadStat C library (SAS, SPSS, Stata) via bindgen.
readstat-iconv-syscrates/readstat-iconv-sys/Windows-only FFI bindings to libiconv for character encoding conversion.
readstat-testscrates/readstat-tests/Integration test suite (29 modules, 14 datasets).
readstat-wasmcrates/readstat-wasm/WebAssembly build for browser/JS usage (excluded from workspace, built with Emscripten).

For full architectural details, see docs/ARCHITECTURE.md.

πŸ’‘ Examples

The examples/ directory contains runnable demos showing different ways to use readstat-rs.

ExampleDescription
cli-demoConvert a .sas7bdat file to CSV, NDJSON, Parquet, and Feather using the readstat CLI
api-demoAPI servers in Rust (Axum) and Python (FastAPI + PyO3) β€” upload, inspect, and convert SAS files over HTTP
bun-demoParse a .sas7bdat file from JavaScript using the WebAssembly build with Bun
web-demoBrowser-based viewer and converter β€” upload, preview, and export entirely client-side via WASM
sql-explorerBrowser-based SQL explorer β€” upload a .sas7bdat file and query it interactively with SQL via AlaSQL

To use readstat as a library in your own Rust project, add the readstat crate as a dependency.

πŸ”— Resources

The following have been incredibly helpful while developing!

Building from Source

Clone

Ensure submodules are also cloned.

git clone --recurse-submodules https://github.com/curtisalexander/readstat-rs.git

The ReadStat repository is included as a git submodule within this repository. In order to build and link, first a readstat-sys crate is created. Then the readstat library and readstat-cli binary crate utilize readstat-sys as a dependency.

Linux

Install developer tools

sudo apt install build-essential clang

Build

cargo build

iconv: Linked dynamically against the system-provided library. On most distributions it is available by default. No explicit link directives are emitted in the build script β€” the system linker resolves it automatically.

zlib: Linked via the libz-sys crate, which will use the system-provided zlib if available or compile from source as a fallback.

macOS

Install developer tools

xcode-select --install

Build

cargo build

iconv: Linked dynamically against the system-provided library that ships with macOS (via cargo:rustc-link-lib=iconv in the readstat-sys build script). No additional packages need to be installed.

zlib: Linked via the libz-sys crate, which will use the system-provided zlib that ships with macOS.

Windows

Building on Windows requires LLVM and Visual Studio C++ Build tools be downloaded and installed.

In addition, the path to libclang needs to be set in the environment variable LIBCLANG_PATH. If LIBCLANG_PATH is not set, the readstat-sys build script will check the default path C:\Program Files\LLVM\lib and fail with instructions if it does not exist.

For details see the following.

Build

cargo build

iconv: Compiled from source using the vendored libiconv-win-build submodule (located at crates/readstat-iconv-sys/vendor/libiconv-win-build/) via the readstat-iconv-sys crate. readstat-iconv-sys is a Windows-only dependency (gated behind [target.'cfg(windows)'.dependencies] in readstat-sys/Cargo.toml).

zlib: Compiled from source via the libz-sys crate (statically linked).

Linking Summary

Platformiconvzlib
Linux (glibc/musl)Dynamic (system)libz-sys (prefers system, falls back to source)
macOS (x86/ARM)Dynamic (system)libz-sys (uses system)
Windows (MSVC)Static (vendored submodule)libz-sys (compiled from source, static)

Usage

After either building or installing, the binary is invoked using subcommands. Currently, the following subcommands have been implemented:

  • metadata β†’ writes the following to standard out or json
    • row count
    • variable count
    • table name
    • table label
    • file encoding
    • format version
    • bitness
    • creation time
    • modified time
    • compression
    • byte order
    • variable names
    • variable type classes
    • variable types
    • variable labels
    • variable format classes
    • variable formats
    • arrow data types
  • preview β†’ writes the first 10 rows (or optionally the number of rows provided by the user) of parsed data in csv format to standard out
  • data β†’ writes parsed data in csv, feather, ndjson, or parquet format to a file

Metadata

To write metadata to standard out, invoke the following.

readstat metadata /some/dir/to/example.sas7bdat

To write metadata to json, invoke the following. This is useful for reading the metadata programmatically.

readstat metadata /some/dir/to/example.sas7bdat --as-json

The JSON output contains file-level metadata and a vars object keyed by variable index. This makes it straightforward to search for a particular column by piping the output to jq or Python.

Search for a column with jq

# Find the variable entry whose var_name matches "Make"
readstat metadata /some/dir/to/example.sas7bdat --as-json \
  | jq '.vars | to_entries[] | select(.value.var_name == "Make") | .value'

Search for a column with Python

# Find the variable entry whose var_name matches "Make"
readstat metadata /some/dir/to/example.sas7bdat --as-json \
  | python -c "
import json, sys
md = json.load(sys.stdin)
match = [v for v in md['vars'].values() if v['var_name'] == 'Make']
if match:
    print(json.dumps(match[0], indent=2))
"

Preview Data

To write parsed data (as a csv) to standard out, invoke the following (default is to write the first 10 rows).

readstat preview /some/dir/to/example.sas7bdat

To write the first 100 rows of parsed data (as a csv) to standard out, invoke the following.

readstat preview /some/dir/to/example.sas7bdat --rows 100

Data

πŸ“ The data subcommand includes a parameter for --format, which is the file format that is to be written. Currently, the following formats have been implemented:

  • csv
  • feather
  • ndjson
  • parquet

csv

To write parsed data (as csv) to a file, invoke the following (default is to write all parsed data to the specified file).

The default --format is csv. Thus, the parameter is elided from the below examples.

readstat data /some/dir/to/example.sas7bdat --output /some/dir/to/example.csv

To write the first 100 rows of parsed data (as csv) to a file, invoke the following.

readstat data /some/dir/to/example.sas7bdat --output /some/dir/to/example.csv --rows 100

feather

To write parsed data (as feather) to a file, invoke the following (default is to write all parsed data to the specified file).

readstat data /some/dir/to/example.sas7bdat --output /some/dir/to/example.feather --format feather

To write the first 100 rows of parsed data (as feather) to a file, invoke the following.

readstat data /some/dir/to/example.sas7bdat --output /some/dir/to/example.feather --format feather --rows 100

ndjson

To write parsed data (as ndjson) to a file, invoke the following (default is to write all parsed data to the specified file).

readstat data /some/dir/to/example.sas7bdat --output /some/dir/to/example.ndjson --format ndjson

To write the first 100 rows of parsed data (as ndjson) to a file, invoke the following.

readstat data /some/dir/to/example.sas7bdat --output /some/dir/to/example.ndjson --format ndjson --rows 100

parquet

To write parsed data (as parquet) to a file, invoke the following (default is to write all parsed data to the specified file).

readstat data /some/dir/to/example.sas7bdat --output /some/dir/to/example.parquet --format parquet

To write the first 100 rows of parsed data (as parquet) to a file, invoke the following.

readstat data /some/dir/to/example.sas7bdat --output /some/dir/to/example.parquet --format parquet --rows 100

To write parsed data (as parquet) to a file with specific compression settings, invoke the following:

readstat data /some/dir/to/example.sas7bdat --output /some/dir/to/example.parquet --format parquet --compression zstd --compression-level 3

Column Selection

Select specific columns to include when converting or previewing data.

Step 1: View available columns

readstat metadata /some/dir/to/example.sas7bdat

Or as JSON for programmatic use with jq:

readstat metadata /some/dir/to/example.sas7bdat --as-json \
  | jq '.vars | to_entries[] | .value.var_name'

Or with Python:

readstat metadata /some/dir/to/example.sas7bdat --as-json \
  | python -c "
import json, sys
md = json.load(sys.stdin)
for v in md['vars'].values():
    print(v['var_name'])
"

Step 2: Select columns on the command line

readstat data /some/dir/to/example.sas7bdat --output out.parquet --format parquet --columns Brand,Model,EngineSize

Step 2 (alt): Select columns from a file

Create columns.txt:

# Columns to extract from the dataset
Brand
Model
EngineSize

Then pass it to the CLI:

readstat data /some/dir/to/example.sas7bdat --output out.parquet --format parquet --columns-file columns.txt

Preview with column selection

readstat preview /some/dir/to/example.sas7bdat --columns Brand,Model,EngineSize

Parallelism

The data subcommand includes parameters for both parallel reading and parallel writing:

Parallel Reading (--parallel)

If invoked, the reading of a sas7bdat will occur in parallel. If the total rows to process is greater than stream-rows (if unset, the default rows to stream is 10,000), then each chunk of rows is read in parallel. Note that all processors on the user’s machine are used with the --parallel option. In the future, may consider allowing the user to throttle this number.

❗ Utilizing the --parallel parameter will increase memory usage β€” all chunks are read in parallel and collected in memory before being sent to the writer. In addition, because all processors are utilized, CPU usage may be maxed out during reading. Row ordering from the original sas7bdat is preserved.

Parallel Writing (--parallel-write)

When combined with --parallel, the --parallel-write flag enables parallel writing for Parquet format files. This can significantly improve write performance for large datasets by:

  • Writing record batches to temporary files in parallel using all available processors
  • Merging the temporary files into the final output
  • Using spooled temporary files that keep data in memory until a threshold is reached

Note: Parallel writing currently only supports the Parquet format. Other formats (CSV, Feather, NDJSON) will use optimized sequential writes with BufWriter.

Example usage:

readstat data /some/dir/to/example.sas7bdat --output /some/dir/to/example.parquet --format parquet --parallel --parallel-write

Memory Buffer Size (--parallel-write-buffer-mb)

Controls the memory buffer size (in MB) before spilling to disk during parallel writes. Defaults to 100 MB. Valid range: 1-10240 MB.

Smaller buffers will cause data to spill to disk sooner, while larger buffers keep more data in memory. Choose based on your available memory and dataset size:

  • Small datasets (< 100 MB): Use default or larger buffer to keep everything in memory
  • Large datasets (> 1 GB): Consider smaller buffer (10-50 MB) to manage memory usage
  • Memory-constrained systems: Use smaller buffer (1-10 MB)

Example with custom buffer size:

readstat data /some/dir/to/example.sas7bdat --output /some/dir/to/example.parquet --format parquet --parallel --parallel-write --parallel-write-buffer-mb 200

❗ Parallel writing may write batches out of order. This is acceptable for Parquet files as the row order is preserved when merged.

Memory Considerations

Default: Sequential Writes

In the default sequential write mode, a bounded channel (capacity 10) connects the reader thread to the writer. This means at most 10 chunks (each containing up to stream-rows rows) are held in memory at any time, providing natural backpressure when the writer is slower than the reader. For most workloads this keeps memory usage reasonable, but for very wide datasets (hundreds of columns, string-heavy) each chunk can be large β€” consider lowering --stream-rows if memory is a concern.

Sequential Write (default)
==========================

 Reader Thread                 Bounded Channel (cap 10)            Main Thread
+---------------------+       +------------------------+       +---------------------+
|                     |       |                        |       |                     |
| +-----------+       | send  | +--+--+--+--+--+--+   | recv  | +-------+           |
| | chunk  1  |-------|------>| |  |  |  |  |  |  |   |------>| | write |---> file   |
| +-----------+       |       | +--+--+--+--+--+--+   |       | +-------+           |
| +-----------+       | send  |    channel is full!    |       |                     |
| | chunk  2  |-------|------>| +--+--+--+--+--+--+--+|       | +-------+           |
| +-----------+       |       | |  |  |  |  |  |  |  ||       | | write |---> file   |
| +-----------+       |       | +--+--+--+--+--+--+--+|       | +-------+           |
| | chunk  3  |-------|-XXXXX |                        |       |                     |
| +-----------+       | BLOCK | writer drains a slot   |       | +-------+           |
|   ... waits ...     |       |    +--+--+--+--+--+--+ |       | | write |---> file   |
| | chunk  3  |-------|------>|    |  |  |  |  |  |  | |       | +-------+           |
| +-----------+       | ok!   |    +--+--+--+--+--+--+ |       |                     |
|                     |       |                        |       |                     |
+---------------------+       +------------------------+       +---------------------+

 Memory at any moment: <= 10 chunks in the channel + 1 being written
 Backpressure: reader blocks when channel is full

Parallel Writes (--parallel-write)

πŸ“ --parallel-write: Uses bounded-batch processing β€” batches are pulled from the channel in groups (up to 10 at a time), written in parallel to temporary Parquet files, then the next group is pulled. This preserves the channel’s backpressure so that memory usage stays bounded rather than loading the entire dataset at once. All temporary files are merged into the final output at the end.

Parallel Write (--parallel --parallel-write)
============================================

 Reader Thread              Bounded Channel (cap 10)              Main Thread
+------------------+       +------------------------+       +-------------------------+
|                  |       |                        |       |                         |
| +----------+     | send  |                        | recv  |  Pull <= 10 batches     |
| | chunk  1 |-----|------>|  +-+-+-+-+-+-+-+-+-+-+ |------>|  +----+----+----+----+  |
| +----------+     |       |  | | | | | | | | | | | |       |  | b1 | b2 | .. | bN |  |
| +----------+     | send  |  +-+-+-+-+-+-+-+-+-+-+ |       |  +----+----+----+----+  |
| | chunk  2 |-----|------>|                        |       |    |    |         |      |
| +----------+     |       +------------------------+       |    v    v         v      |
| +----------+     |                                        |  Write in parallel      |
| | chunk  3 |-----|----> ...                               |  to temp .parquet files |
| +----------+     |                                        |    |    |         |      |
|     ...          |                                        |    v    v         v      |
|                  |                                        |  tmp_0 tmp_1 ... tmp_N   |
|                  |       +------------------------+       |                         |
| +----------+     | send  |                        | recv  |  Pull next <= 10        |
| | chunk 11 |-----|------>|  +-+-+-+-+-+-+-+-+-+-+ |------>|  +----+----+----+----+  |
| +----------+     |       |  | | | | | | | | | | | |       |  |b11 |b12 | .. | bM |  |
| +----------+     | send  |  +-+-+-+-+-+-+-+-+-+-+ |       |  +----+----+----+----+  |
| | chunk 12 |-----|------>|                        |       |    |    |         |      |
| +----------+     |       +------------------------+       |    v    v         v      |
|     ...          |                                        |  tmp_N+1  ...  tmp_M     |
+------------------+                                        |                         |
                                                            |  ... repeat until done  |
                                                            +-------------------------+
                                                                       |
                              +----------------------------------------+
                              |
                              v
                    +-------------------+       +--------------------+
                    |   Merge all temp  |       |                    |
                    |   .parquet files  |------>|  final output.pqt  |
                    |   in order        |       |                    |
                    +-------------------+       +--------------------+

 Memory at any moment: <= 10 chunks in channel + 10 being written
 Backpressure: preserved -- reader blocks while a batch group is being written

SQL Queries (--sql)

⚠️ --sql (feature-gated): SQL queries require the full dataset to be materialized in memory via DataFusion’s MemTable before query execution. For large files this may result in significant memory usage. Queries that filter rows (e.g. SELECT ... WHERE ...) will reduce the output size but the input must still be fully loaded.

SQL Query Mode (--sql "SELECT ...")
===================================

 Reader Thread              Bounded Channel              Main Thread
+------------------+       +---------------+       +---------------------------+
|                  |       |               |       |                           |
| +----------+     | send  |               | recv  |  Collect ALL batches      |
| | chunk  1 |-----|------>|               |------>|  into memory (required    |
| +----------+     |       |               |       |  by DataFusion MemTable)  |
| +----------+     | send  |               |       |                           |
| | chunk  2 |-----|------>|               |------>|  +-----+-----+-----+     |
| +----------+     |       |               |       |  |  b1 |  b2 | ... |     |
|     ...          |       |               |       |  +-----+-----+-----+     |
| +----------+     | send  |               |       |         |                 |
| | chunk  N |-----|------>|               |------>|         v                 |
| +----------+     |       |               |       |  +-------------+         |
+------------------+       +---------------+       |  |  DataFusion |         |
                                                   |  |  SQL Engine |         |
                                                   |  +-------------+         |
                                                   |         |                 |
                                                   |         v                 |
                                                   |  Write filtered results  |
                                                   |  to output file          |
                                                   +---------------------------+

 Memory at peak: ALL chunks in memory (no backpressure)
 This is inherent to SQL execution over in-memory tables.

Reading Metadata from Output Files

When converting to Parquet or Feather, readstat-rs preserves column metadata (labels, SAS format strings, and storage widths) as Arrow field metadata. Schema-level metadata includes the table label when present.

The following metadata keys may appear on each field:

KeyDescriptionCondition
labelUser-assigned variable labelNon-empty
sas_formatSAS format string (e.g. DATE9, BEST12, $30)Non-empty
storage_widthNumber of bytes used to store the variableAlways
display_widthDisplay width hint from the fileNon-zero

Schema-level metadata:

KeyDescriptionCondition
table_labelUser-assigned file labelNon-empty

Reading metadata with Python (pyarrow)

import pyarrow.parquet as pq

schema = pq.read_schema("example.parquet")

# Table-level metadata
print(schema.metadata.get(b"table_label", b"").decode())

# Per-column metadata
for field in schema:
    meta = field.metadata or {}
    print(f"{field.name}:")
    print(f"  label:         {meta.get(b'label', b'').decode()}")
    print(f"  sas_format:    {meta.get(b'sas_format', b'').decode()}")
    print(f"  storage_width: {meta.get(b'storage_width', b'').decode()}")
    print(f"  display_width: {meta.get(b'display_width', b'').decode()}")

Reading metadata with R (arrow)

library(arrow)

schema <- read_parquet("example.parquet", as_data_frame = FALSE)$schema

# Per-column metadata
for (field in schema) {
  cat(field$name, "\n")
  cat("  label:        ", field$metadata$label, "\n")
  cat("  sas_format:   ", field$metadata$sas_format, "\n")
  cat("  storage_width:", field$metadata$storage_width, "\n")
  cat("  display_width:", field$metadata$display_width, "\n")
}

Reader

The preview and data subcommands include a parameter for --reader. The possible values for --reader include the following.

  • mem β†’ Parse and read the entire sas7bdat into memory before writing to either standard out or a file
  • stream (default) β†’ Parse and read at most stream-rows into memory before writing to disk
    • stream-rows may be set via the command line parameter --stream-rows or if elided will default to 10,000 rows

Why is this useful?

  • mem is useful for testing purposes
  • stream is useful for keeping memory usage low for large datasets (and hence is the default)
  • In general, users should not need to deviate from the default β€” stream β€” unless they have a specific need
  • In addition, by enabling these options as command line parameters hyperfine may be used to benchmark across an assortment of file sizes

Debug

Debug information is printed to standard out by setting the environment variable RUST_LOG=debug before the call to readstat.

⚠️ This is quite verbose! If using the preview or data subcommand, will write debug information for every single value!

# Linux and macOS
RUST_LOG=debug readstat ...
# Windows PowerShell
$env:RUST_LOG="debug"; readstat ...

Help

For full details run with --help.

readstat --help
readstat metadata --help
readstat preview --help
readstat data --help

Architecture

Rust CLI tool and library that reads SAS binary files (.sas7bdat) and converts them to other formats (CSV, Feather, NDJSON, Parquet). Uses FFI bindings to the ReadStat C library for parsing, and Apache Arrow for in-memory representation and output.

Scope: The readstat-sys crate exposes the full ReadStat C API, which supports SAS (.sas7bdat, .xpt), SPSS (.sav, .zsav, .por), and Stata (.dta). However, the readstat, readstat-cli, and readstat-wasm crates currently only implement parsing and conversion for SAS .sas7bdat files.

Workspace Layout

readstat-rs/
β”œβ”€β”€ Cargo.toml              # Workspace root (edition 2024, resolver 2)
β”œβ”€β”€ crates/
β”‚   β”œβ”€β”€ readstat/            # Library crate (parse SAS β†’ Arrow, optional format writers)
β”‚   β”œβ”€β”€ readstat-cli/        # Binary crate (CLI arg parsing, orchestration)
β”‚   β”œβ”€β”€ readstat-sys/        # FFI bindings to ReadStat C library (bindgen)
β”‚   β”œβ”€β”€ readstat-iconv-sys/   # FFI bindings to iconv (Windows only)
β”‚   β”œβ”€β”€ readstat-tests/      # Integration test suite
β”‚   └── readstat-wasm/       # WebAssembly build (excluded from workspace)
β”œβ”€β”€ examples/
β”‚   β”œβ”€β”€ cli-demo/            # CLI conversion demo
β”‚   β”œβ”€β”€ api-demo/            # REST API servers (Rust + Python)
β”‚   β”œβ”€β”€ bun-demo/            # WASM usage from Bun/JS
β”‚   β”œβ”€β”€ web-demo/            # Browser-based viewer and converter
β”‚   └── sql-explorer/        # Browser-based SQL explorer (AlaSQL + WASM)
└── docs/

Crate Details

readstat (v0.20.0) β€” Library Crate

Path: crates/readstat/

Pure library for parsing SAS binary files into Arrow RecordBatch format. Output format writers (CSV, Feather, NDJSON, Parquet) are feature-gated.

Features: csv, feather, ndjson, parquet (all enabled by default), sql.

Key source modules in crates/readstat/src/:

ModulePurpose
lib.rsPublic API exports
cb.rsC callback functions for ReadStat (handle_metadata, handle_variable, handle_value)
rs_data.rsData reading, Arrow RecordBatch conversion
rs_metadata.rsMetadata extraction, Arrow schema building
rs_parser.rsReadStatParser wrapper around C parser
rs_path.rsInput path validation
rs_write_config.rsOutput configuration (path, format, compression)
rs_var.rsVariable types and value handling
rs_write.rsOutput writers (CSV, Feather, NDJSON, Parquet)
progress.rsProgressCallback trait for parsing progress reporting
rs_query.rsSQL query execution via DataFusion (feature-gated)
formats.rsSAS format detection (118 date/time/datetime formats, regex-based)
err.rsError enum (41 variants mapping to C library errors)
common.rsUtility functions
rs_buffer_io.rsBuffer I/O operations

Key public types:

  • ReadStatData β€” coordinates FFI parsing, accumulates values directly into typed Arrow builders, produces Arrow RecordBatch
  • ReadStatMetadata β€” file-level metadata (row/var counts, encoding, compression, schema)
  • ColumnBuilder β€” enum wrapping 12 typed Arrow builders (StringBuilder, Float64Builder, Date32Builder, etc.); values are appended during FFI callbacks with zero intermediate allocation
  • ReadStatWriter β€” writes output in requested format
  • ReadStatPath β€” validated input file path
  • WriteConfig β€” output configuration (path, format, compression)
  • OutFormat β€” output format enum (Csv, Feather, Ndjson, Parquet)
  • ProgressCallback β€” trait for receiving progress updates during parsing

Major dependencies: Arrow v57 ecosystem, Parquet (5 compression codecs, optional), Rayon, chrono, memmap2.

readstat-cli (v0.20.0) β€” CLI Binary

Path: crates/readstat-cli/

Binary crate producing the readstat CLI tool. Uses clap with three subcommands:

  • metadata β€” print file metadata (row/var counts, labels, encoding, etc.)
  • preview β€” preview first N rows
  • data β€” convert to output format (csv, feather, ndjson, parquet)

Owns CLI arg parsing, progress bars, colored output, and reader-writer thread orchestration.

Additional dependencies: clap v4, colored, indicatif, crossbeam, env_logger, path_abs.

readstat-sys (v0.3.0) β€” FFI Bindings

Path: crates/readstat-sys/

build.rs compiles ~49 C source files from vendor/ReadStat/ git submodule via the cc crate, then generates Rust bindings with bindgen. Exposes the full ReadStat API including support for SAS, SPSS, and Stata formats. Platform-specific linking for iconv and zlib:

PlatformiconvzlibNotes
Windows (windows-msvc)Static β€” compiled from vendored readstat-iconv-sys submoduleStatic β€” compiled via libz-sys cratereadstat-iconv-sys is a cfg(windows) dependency; needs LIBCLANG_PATH
macOS (apple-darwin)Dynamic β€” system libiconvlibz-sys (uses system zlib)iconv linked via cargo:rustc-link-lib=iconv
Linux (gnu/musl)Dynamic β€” system librarylibz-sys (prefers system, falls back to source)No explicit iconv link directives; system linker resolves automatically

Header include paths are propagated between crates using Cargo’s links key:

  • readstat-iconv-sys sets cargo:include=... which becomes DEP_ICONV_INCLUDE in readstat-sys
  • libz-sys sets cargo:include=... which becomes DEP_Z_INCLUDE in readstat-sys

readstat-iconv-sys (v0.3.0) β€” iconv FFI (Windows)

Path: crates/readstat-iconv-sys/

Windows-only (#[cfg(windows)]). Compiles libiconv from the vendor/libiconv-win-build/ git submodule using the cc crate, producing a static library. On non-Windows platforms the build script is a no-op. The links = "iconv" key in Cargo.toml allows readstat-sys to discover the include path via the DEP_ICONV_INCLUDE environment variable.

readstat-wasm (v0.1.0) β€” WebAssembly Build

Path: crates/readstat-wasm/

WebAssembly build of the readstat library for parsing SAS .sas7bdat files in JavaScript. Compiles the ReadStat C library and the Rust readstat library to WebAssembly via the wasm32-unknown-emscripten target. Excluded from the Cargo workspace (built separately with Emscripten).

Exports: read_metadata, read_metadata_fast, read_data (CSV), read_data_ndjson, read_data_parquet, read_data_feather, free_string, free_binary. Not published to crates.io (publish = false).

readstat-tests β€” Integration Tests

Path: crates/readstat-tests/

29 test modules covering: all SAS data types, 118 date/time/datetime formats, missing values, malformed UTF-8, large pages, CLI subcommands, parallel read/write, Parquet output, CSV output, Arrow migration, row offsets, scientific notation, column selection, skip row count, memory-mapped file reading, byte-slice reading, and SQL queries. Every sas7bdat file in the test data directory has both metadata and data reading tests.

Test data lives in tests/data/*.sas7bdat (14 datasets). SAS scripts to regenerate test data are in util/.

DatasetMetadata TestData Test
all_dates.sas7bdatβœ…βœ…
all_datetimes.sas7bdatβœ…βœ…
all_times.sas7bdatβœ…βœ…
all_types.sas7bdatβœ…βœ…
cars.sas7bdatβœ…βœ…
hasmissing.sas7bdatβœ…βœ…
intel.sas7bdatβœ…βœ…
malformed_utf8.sas7bdatβœ…βœ…
messydata.sas7bdatβœ…βœ…
rand_ds_largepage_err.sas7bdatβœ…βœ…
rand_ds_largepage_ok.sas7bdatβœ…βœ…
scientific_notation.sas7bdatβœ…βœ…
somedata.sas7bdatβœ…βœ…
somemiss.sas7bdatβœ…βœ…

Build Prerequisites

  • Rust (edition 2024)
  • libclang (for bindgen)
  • Git submodules must be initialized (git submodule update --init --recursive)
  • On Windows: MSVC toolchain

Key Architectural Patterns

  • FFI callback pattern: ReadStat C library calls Rust callbacks (cb.rs) during parsing; data accumulates in ReadStatData via raw pointer casts
  • Streaming: default reader streams rows in chunks (10k) to manage memory
  • Parallel processing: Rayon for parallel reading, Crossbeam channels for reader-writer coordination
  • Column filtering: optional --columns / --columns-file flags restrict parsing to selected variables; unselected values are skipped in the handle_value callback while row-boundary detection uses the original (unfiltered) variable count
  • Arrow pipeline: SAS data β†’ typed Arrow builders (direct append in FFI callbacks) β†’ Arrow RecordBatch β†’ output format
  • Multiple I/O strategies: file path (default), memory-mapped files (memmap2), and in-memory byte slices β€” all feed into the same FFI parsing pipeline
  • Metadata preservation: SAS variable labels, format strings, and storage widths are persisted as Arrow field metadata, surviving round-trips through Parquet and Feather. See TECHNICAL.md for details.

Technical Details

Floating Point Values

⚠️ Decimal values are rounded to contain only 14 decimal digits!

For example, the number 1.1234567890123456 created within SAS would be returned as 1.12345678901235 within Rust.

Why does this happen? Is this an implementation error? No, rounding to only 14 decimal digits has been purposely implemented within the Rust code.

As a specific example, when testing with the cars.sas7bdat dataset (which was created originally on Windows), the numeric value 4.6 as observed within SAS was being returned as 4.600000000000001 (15 digits) within Rust. Values created on Windows with an x64 processor are only accurate to 15 digits.

For comparison, the ReadStat binary truncates to 14 decimal places when writing to csv.

Finally, SAS represents all numeric values in floating-point representation which creates a challenge for all parsed numerics!

Implementation: pure-arithmetic rounding

Rounding is performed using pure f64 arithmetic in cb.rs, avoiding any string formatting or heap allocation:

#![allow(unused)]
fn main() {
const ROUND_SCALE: f64 = 1e14;

fn round_decimal_f64(v: f64) -> f64 {
    if !v.is_finite() { return v; }
    let int_part = v.trunc();
    let frac_part = v.fract();
    let rounded_frac = (frac_part * ROUND_SCALE).round() / ROUND_SCALE;
    int_part + rounded_frac
}
}

The value is split into integer and fractional parts before scaling. This is necessary because large SAS datetime values (~1.9e9) multiplied directly by 1e14 would exceed f64’s exact integer range (2^53), causing precision loss. Since fract() is always in (-1, 1), fract() * 1e14 < 1e14 < 2^53, keeping the scaled value within the exact-integer range.

Why this is equivalent to the previous string roundtrip (format!("{:.14}") + lexical::parse): both approaches produce the nearest representable f64 to the value rounded to 14 decimal places. The tie-breaking rule (half-away-from-zero for .round() vs half-to-even for format!) is never exercised because every f64 is a dyadic rational (m / 2^k), and a true decimal midpoint would require an odd factor of 5 in the denominator β€” which is impossible for any f64 value.

Sources

Date, Time, and Datetimes

All 118 SAS date, time, and datetime formats are recognized and parsed appropriately. For the full list of supported formats, see sas_date_time_formats.md.

⚠️ If the format does not match a recognized SAS date, time, or datetime format, or if the value does not have a format applied, then the value will be parsed and read as a numeric value!

Details

SAS stores dates, times, and datetimes internally as numeric values. To distinguish among dates, times, datetimes, or numeric values, a SAS format is read from the variable metadata. If the format matches a recognized SAS date, time, or datetime format then the numeric value is converted and read into memory using one of the Arrow types:

If values are read into memory as Arrow date, time, or datetime types, then when they are written β€” from an Arrow RecordBatch to csv, feather, ndjson, or parquet β€” they are treated as dates, times, or datetimes and not as numeric values.

Column Metadata in Arrow and Parquet

When converting to Parquet or Feather, readstat-rs persists column-level and table-level metadata into the Arrow schema. This metadata survives round-trips through Parquet and Feather files, allowing downstream consumers to recover SAS-specific information.

Metadata keys

Field (column) metadata

KeyTypeDescriptionSource formats
labelstringUser-assigned variable labelSAS, SPSS, Stata
sas_formatstringSAS format string (e.g. DATE9, BEST12, $30)SAS
storage_widthinteger (as string)Number of bytes used to store the variable valueAll
display_widthinteger (as string)Display width hint from the fileXPORT, SPSS

Schema (table) metadata

KeyTypeDescription
table_labelstringUser-assigned file label

Storage width semantics

  • SAS numeric variables: always 8 bytes (IEEE 754 double-precision)
  • SAS string variables: equal to the declared character length (e.g. $30 β†’ 30 bytes)
  • The storage_width field is always present in metadata

Display width semantics

  • sas7bdat files: typically 0 (not stored in the format)
  • XPORT files: populated from the format width
  • SPSS files: populated from the variable’s print/write format
  • The display_width field is only present in metadata when non-zero

SAS format strings and Arrow types

The SAS format string (e.g. DATE9, DATETIME22.3, TIME8) determines how a numeric variable is mapped to an Arrow type. The original format string is preserved in the sas_format metadata key, allowing downstream tools to reconstruct the original SAS formatting even after conversion.

For the full list of recognized SAS date, time, and datetime formats, see sas_date_time_formats.md.

Reading metadata from output files

See the Reading Metadata from Output Files section in the Usage guide for Python and R examples.

Testing

To perform unit / integration tests, run the following.

cargo test --workspace

To run only integration tests:

cargo test -p readstat-tests

Datasets

Formally tested (via integration tests) against the following datasets. See the README.md for data sources.

  • ahs2019n.sas7bdat β†’ US Census data (download via download_ahs.sh or download_ahs.ps1)
  • all_dates.sas7bdat β†’ SAS dataset containing all possible date formats
  • all_datetimes.sas7bdat β†’ SAS dataset containing all possible datetime formats
  • all_times.sas7bdat β†’ SAS dataset containing all possible time formats
  • all_types.sas7bdat β†’ SAS dataset containing all SAS types
  • cars.sas7bdat β†’ SAS cars dataset
  • hasmissing.sas7bdat β†’ SAS dataset containing missing values
  • intel.sas7bdat
  • malformed_utf8.sas7bdat β†’ SAS dataset with truncated multi-byte UTF-8 characters (issue #78)
  • messydata.sas7bdat
  • rand_ds_largepage_err.sas7bdat β†’ Created using create_rand_ds.sas with BUFSIZE set to 2M
  • rand_ds_largepage_ok.sas7bdat β†’ Created using create_rand_ds.sas with BUFSIZE set to 1M
  • scientific_notation.sas7bdat β†’ Used to test float parsing
  • somedata.sas7bdat β†’ Used to test Parquet label preservation
  • somemiss.sas7bdat

Valgrind

To ensure no memory leaks, valgrind may be utilized. For example, to ensure no memory leaks for the test parse_file_metadata_test, run the following from within the readstat directory.

valgrind ./target/debug/deps/parse_file_metadata_test-<hash>

Memory Safety

This project contains unsafe Rust code (FFI callbacks, pointer casts, memory-mapped I/O) and links against the vendored ReadStat C library. Four automated CI checks guard against memory errors.

CI Jobs

All four jobs run on every workflow dispatch and tag push, in parallel with the build jobs. Any memory error fails the job with a nonzero exit code.

Miri (Rust undefined behavior)

  • Platform: Ubuntu (Linux)
  • Scope: Unit tests in the readstat crate only (cargo miri test -p readstat)
  • What it catches: Undefined behavior in pure-Rust unsafe code β€” invalid pointer arithmetic, uninitialized reads, provenance violations, use-after-free in Rust allocations
  • Limitation: Cannot execute FFI calls into C code, so integration tests (readstat-tests) are excluded

Configuration:

  • Uses Rust nightly with the miri component
  • MIRIFLAGS="-Zmiri-disable-isolation" allows tests that use tempfile to create directories

AddressSanitizer β€” Linux

  • Platform: Ubuntu (Linux)
  • Scope: Full workspace β€” lib tests, integration tests, binary tests (cargo test --workspace --lib --tests --bins)
  • What it catches: Heap/stack buffer overflows, use-after-free, double-free, memory leaks (LeakSanitizer is enabled by default on Linux), across both Rust and C code

Configuration:

  • RUSTFLAGS="-Zsanitizer=address -Clinker=clang" β€” instruments Rust code and links the ASan runtime via clang
  • READSTAT_SANITIZE_ADDRESS=1 β€” triggers readstat-sys/build.rs to compile the ReadStat C library with -fsanitize=address -fno-omit-frame-pointer
  • Doctests are excluded (--lib --tests --bins) because rustdoc does not properly inherit sanitizer linker flags

AddressSanitizer β€” macOS

  • Platform: macOS (arm64)
  • Scope: Full workspace β€” lib tests, integration tests, binary tests
  • What it catches: Buffer overflows, use-after-free, double-free in Rust code and at the FFI boundary

Configuration:

  • RUSTFLAGS="-Zsanitizer=address" β€” instruments Rust code only
  • The ReadStat C library is not instrumented on macOS because Apple Clang and Rust’s LLVM have incompatible ASan runtimes β€” see ASan Runtime Mismatch below
  • LeakSanitizer is not supported on macOS
  • Doctests excluded for the same reason as Linux

AddressSanitizer β€” Windows

  • Platform: Windows (x86_64, MSVC toolchain)
  • Scope: Full workspace β€” lib tests, integration tests, binary tests
  • What it catches: Buffer overflows, use-after-free, double-free in Rust code and at the FFI boundary

Configuration:

  • RUSTFLAGS="-Zsanitizer=address" β€” instruments Rust code only
  • Rust on Windows MSVC uses Microsoft’s ASan runtime (from Visual Studio), not LLVM’s compiler-rt. The compiler passes /INFERASANLIBS to the MSVC linker, which auto-discovers the runtime import library at link time. See PR #118521.
  • Important: the MSVC ASan runtime DLL (clang_rt.asan_dynamic-x86_64.dll) is NOT on PATH by default. The linker finds the import library at build time via /INFERASANLIBS, but the DLL loader needs the DLL on PATH at test runtime. The CI job uses vswhere.exe to locate the DLL directory (e.g., C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\<ver>\bin\Hostx64\x64\) and prepends it to PATH.
  • LLVM is installed only for libclang (required by bindgen), pinned to the same version as the regular Windows build job. It is not used for the ASan runtime.
  • The ReadStat C library is not instrumented on Windows currently. Unlike macOS, there is no runtime mismatch β€” both Rust and cl.exe use the same MSVC ASan runtime. Full C instrumentation is a future improvement (see Future Work).
  • LeakSanitizer is not supported on Windows
  • Doctests excluded for the same reason as Linux

How READSTAT_SANITIZE_ADDRESS Works

The readstat-sys/build.rs build script checks for the READSTAT_SANITIZE_ADDRESS environment variable. When set, it adds sanitizer flags to the C compiler flags for the ReadStat library only. This is intentionally scoped β€” a global CFLAGS would instrument third-party sys crates (e.g., zstd-sys) causing linker failures.

The flags are platform-specific:

  • Linux/macOS: -fsanitize=address -fno-omit-frame-pointer (GCC/Clang syntax)
  • Windows MSVC: /fsanitize=address (MSVC syntax)

Currently only the Linux CI job sets READSTAT_SANITIZE_ADDRESS=1 because it is the only platform where C instrumentation has been validated.

ASan Runtime Mismatch (macOS)

macOS has an ASan runtime mismatch that prevents instrumenting the C code alongside Rust. Apple Clang is a fork of LLVM with its own ASan runtime versioning. When both Rust and the C library are instrumented, the linker sees two incompatible ASan runtimes and fails with ___asan_version_mismatch_check_apple_clang_* vs ___asan_version_mismatch_check_v8. A potential workaround is to install upstream LLVM via Homebrew (brew install llvm) and set CC=/opt/homebrew/opt/llvm/bin/clang so both the C code and Rust use the same LLVM ASan runtime. However, this is fragile β€” the Homebrew LLVM version must stay close to the LLVM version used by Rust nightly, which changes frequently.

Windows does NOT have this problem. Rust on x86_64-pc-windows-msvc uses Microsoft’s ASan runtime (PR #118521), and so does cl.exe /fsanitize=address. Both link the same clang_rt.asan_dynamic-x86_64.dll from Visual Studio. Full C + Rust ASan instrumentation is theoretically possible on Windows β€” see Future Work.

Bottom line: Linux has full C + Rust ASan coverage. macOS provides Rust-only coverage due to the Apple Clang runtime mismatch. Windows provides Rust-only coverage currently, but full coverage is a future improvement since there is no runtime mismatch.

Future Work: Windows C Instrumentation

Since Rust and MSVC share the same ASan runtime on Windows, enabling READSTAT_SANITIZE_ADDRESS=1 in the Windows CI job should allow full C + Rust instrumentation β€” matching Linux’s coverage. This requires:

  1. Setting READSTAT_SANITIZE_ADDRESS=1 so readstat-sys/build.rs adds /fsanitize=address when compiling the ReadStat C library
  2. Verifying there are no linker conflicts (if conflicts arise, the unstable -Zexternal-clangrt flag can tell Rust to skip linking its own runtime copy)
  3. Ensuring the MSVC ASan runtime DLL is on PATH at test time (the CI job already does this via vswhere.exe)

Running Locally

Miri

rustup +nightly component add miri
MIRIFLAGS="-Zmiri-disable-isolation" cargo +nightly miri test -p readstat

ASan on Linux

RUSTFLAGS="-Zsanitizer=address -Clinker=clang" \
READSTAT_SANITIZE_ADDRESS=1 \
cargo +nightly test --workspace --lib --tests --bins --target x86_64-unknown-linux-gnu

ASan on macOS

RUSTFLAGS="-Zsanitizer=address" \
cargo +nightly test --workspace --lib --tests --bins --target aarch64-apple-darwin

ASan on Windows

$env:RUSTFLAGS = "-Zsanitizer=address"
# The MSVC ASAN runtime DLL must be on PATH. Find it via vswhere:
$vsPath = & "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe" -latest -property installationPath
$msvcVer = (Get-ChildItem "$vsPath\VC\Tools\MSVC" | Sort-Object Name -Descending | Select-Object -First 1).Name
$env:PATH = "$vsPath\VC\Tools\MSVC\$msvcVer\bin\Hostx64\x64;$env:PATH"
cargo +nightly test --workspace --lib --tests --bins --target x86_64-pc-windows-msvc

Valgrind (Linux)

For manual checks with full C library coverage, valgrind can also be used against debug test binaries:

cargo test -p readstat-tests --no-run
valgrind ./target/debug/deps/parse_file_metadata_test-<hash>

Coverage Summary

ToolPlatformRust codeC code (ReadStat)Leak detection
MiriLinuxUnit tests onlyNo (FFI excluded)No
ASanLinuxFull workspaceYes (instrumented)Yes
ASanmacOSFull workspaceNo (runtime mismatch)No
ASanWindowsFull workspaceNot yet (no mismatch β€” see future work)No
ValgrindLinux (manual)FullFullYes

Performance Benchmarking with Criterion

Overview

This document provides a comprehensive guide to performance benchmarking in readstat-rs using Criterion.rs.

Quick Start

# Run all benchmarks
cd crates/readstat
cargo bench

# View HTML reports
open target/criterion/report/index.html

What Gets Benchmarked

1. Reading Performance

  • Metadata Reading (~300-950 Β΅s) - File header parsing
  • Single Chunk Reading - Full dataset read performance
  • Chunked Reading - Streaming with different chunk sizes (1K, 5K, 10K rows)

2. Data Conversion

  • Arrow Conversion - SAS types β†’ Arrow RecordBatch overhead

3. Writing Performance

  • CSV Writing - Text format output
  • Parquet Compression - Uncompressed, Snappy, Zstd comparison
  • Format Comparison - CSV vs Parquet vs Feather vs NDJSON

4. Parallel Write Optimization

  • Buffer Sizes - SpooledTempFile memory thresholds (1MB, 10MB, 100MB, 500MB)

5. End-to-End Pipeline

  • Complete Conversion - Read + Write combined (most important)

Sample Results

From initial benchmark run (example output):

metadata_reading/all_types.sas7bdat
                        time:   [299.41 Β΅s 301.84 Β΅s 304.29 Β΅s]

metadata_reading/cars.sas7bdat
                        time:   [935.21 Β΅s 943.52 Β΅s 952.41 Β΅s]

read_single_chunk/cars.sas7bdat
                        time:   [~2-3 ms]
                        thrpt:  [~150-200K rows/sec]

write_parquet_compression/snappy
                        time:   [~4-6 ms]
                        thrpt:  [~70-100K rows/sec]

end_to_end_conversion/parquet
                        time:   [~6-9 ms]
                        thrpt:  [~50-70K rows/sec]

Interpreting Results

Understanding the Output

Time Measurement:

time: [299.41 Β΅s 301.84 Β΅s 304.29 Β΅s]
       ^         ^         ^
       |         |         +-- Upper bound (95% confidence)
       |         +------------ Median
       +---------------------- Lower bound (95% confidence)

Throughput:

thrpt: [150K elem/s 175K elem/s 200K elem/s]
        ^           ^           ^
        |           |           +-- Upper bound
        |           +-------------- Median
        +-------------------------- Lower bound

Change Detection:

change: [-2.3456% -1.2345% +0.1234%] (p = 0.12 > 0.05)
         ^         ^         ^        ^
         |         |         |        +-- Statistical significance
         |         |         +----------- Upper bound of change
         |         +--------------------- Median change
         +------------------------------- Lower bound of change

What to Look For

πŸ”΄ Red Flags (Investigate)

  • High variance (>10%) - Results unreliable
  • Significant regression (>5% slower, p < 0.05)
  • Outliers (>5% of samples)

🟑 Opportunities

  • Chunked reading - Test if different chunk size improves throughput
  • Buffer sizes - If small buffer performs as well as large, save memory
  • Compression - If uncompressed only slightly faster, use compression

🟒 Validation

  • Low variance (<5%) - Reliable results
  • Improvements (>10% faster, p < 0.05)
  • Expected patterns (e.g., compression should be slower but smaller)

Performance Optimization Workflow

Step 1: Establish Baseline

# Save current performance as baseline
cargo bench --save-baseline main

# Results saved to target/criterion/{benchmark}/main/

Step 2: Make Changes

Edit code with optimization hypothesis:

  • Increase buffer size
  • Change algorithm
  • Add caching
  • Parallel processing

Step 3: Measure Impact

# Compare against baseline
cargo bench --baseline main

# Look for "change: [X% Y% Z%]" in output

Step 4: Analyze & Iterate

If improved (>10%, p < 0.05): βœ… Keep the change βœ… Update baseline: cargo bench --save-baseline main

If no change (<5%): ⚠️ Optimization didn’t help - profile to find real bottleneck

If regressed (slower): ❌ Revert change ❌ Investigate why performance decreased

Common Optimization Scenarios

Scenario 1: Slow Reading

Symptoms: read_single_chunk time is high

Investigate:

  1. ReadStat C library overhead (FFI calls)
  2. Memory allocation patterns
  3. Callback overhead

Try:

  • Larger buffers in C library
  • Memory-mapped files (see evaluation doc)
  • Pre-allocate column vectors

Scenario 2: Slow Writing

Symptoms: write_formats time is high

Investigate:

  1. BufWriter buffer size
  2. Format-specific overhead
  3. Compression CPU usage

Try:

  • Increase BufWriter capacity (currently 8KB)
  • Use faster compression (Snappy vs Zstd)
  • Parallel writing (already implemented)

Scenario 3: Memory Issues

Symptoms: System swapping, OOM errors

Investigate:

  1. Chunk size too large
  2. Too many parallel streams
  3. Memory leaks

Try:

  • Reduce stream_rows (default 10,000)
  • Reduce parallel write buffer (default 100MB)
  • Use bounded channels (already implemented)

Scenario 4: High Variance

Symptoms: Large confidence intervals, many outliers

Investigate:

  1. System background activity
  2. CPU frequency scaling
  3. Thermal throttling

Try:

  • Close background apps
  • Disable frequency scaling
  • Run on consistent power mode

Advanced Profiling

CPU Profiling with Flamegraphs

# Install flamegraph
cargo install flamegraph

# Profile a specific benchmark
cargo flamegraph --bench readstat_benchmarks -- --bench read_single_chunk

# Open flamegraph.svg to see hotspots

What to look for:

  • Wide bars = lots of time spent
  • Deep stacks = call overhead
  • Unexpected functions = bugs/inefficiency

Memory Profiling

# Using valgrind (Linux)
valgrind --tool=massif \
  cargo bench read_single_chunk --no-run
ms_print massif.out.* > memory_profile.txt

# Using heaptrack (Linux)
heaptrack cargo bench read_single_chunk
heaptrack_gui heaptrack.*.gz

System Call Tracing

# Linux: strace
strace -c cargo bench read_single_chunk 2>&1 | tail -20

# macOS: dtruss
sudo dtruss -c cargo bench read_single_chunk

Comparing Implementations

Before/After Memory-Mapped Files

# Baseline without mmap
git checkout main
cargo bench --save-baseline without-mmap

# With mmap implementation
git checkout feature/mmap
cargo bench --baseline without-mmap

# Look for improvements in read_single_chunk

Parallel vs Sequential

# Test with different parallelism settings
cargo bench end_to_end -- --parallel
cargo bench end_to_end -- --sequential

CI/CD Integration

Performance Regression Detection

Add to .github/workflows/benchmarks.yml:

name: Performance Benchmarks

on:
  pull_request:
    branches: [main]

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Rust
        uses: dtolnay/rust-toolchain@stable

      - name: Run benchmarks
        run: |
          cd crates/readstat
          cargo bench --no-run  # Just compile for CI

      - name: Compare with baseline (on main branch)
        if: github.event_name == 'pull_request'
        run: |
          git fetch origin main:main
          git checkout main
          cargo bench --save-baseline main
          git checkout -
          cargo bench --baseline main

Best Practices

Do’s βœ…

  • Run benchmarks on consistent hardware
  • Close background applications
  • Use --save-baseline for comparisons
  • Profile after benchmarking to find bottlenecks
  • Document performance changes in PRs
  • Test on representative data sizes

Don’ts ❌

  • Don’t benchmark on laptop (throttling)
  • Don’t optimize without profiling first
  • Don’t trust results with high variance
  • Don’t compare across different systems
  • Don’t commit benchmark artifacts
  • Don’t skip statistical significance checks

Performance Goals

Current Performance (Baseline)

  • Metadata reading: ~300-950 Β΅s
  • Read throughput: ~150-200K rows/sec
  • Write throughput: ~70-100K rows/sec
  • End-to-end: ~50-70K rows/sec

Target Performance (Goals)

  • Metadata reading: <500 Β΅s (↓30%)
  • Read throughput: >250K rows/sec (↑25%)
  • Write throughput: >100K rows/sec (↑30%)
  • End-to-end: >100K rows/sec (↑40%)

Stretch Goals

  • Memory-mapped reads: 2x faster for large files
  • Parallel writes: 3-4x speedup with 4+ cores
  • Compression: <10% overhead for Snappy

Data Files for Benchmarking

Current Test Data

  • all_types.sas7bdat - 3 rows, 10 vars (tiny)
  • cars.sas7bdat - 1081 rows, 13 vars (small)

For comprehensive benchmarking, consider adding:

Small (good for quick iteration):

  • < 1 MB file size
  • < 1,000 rows
  • 5-10 variables

Medium (typical use case):

  • 10-100 MB file size
  • 10,000-100,000 rows
  • 10-50 variables

Large (stress test):

  • 1 GB file size

  • 1,000,000 rows

  • 50+ variables

Resources

Documentation

Tools

Blog Posts

Next Steps

  1. Run full benchmark suite: cargo bench
  2. Review HTML reports: Open target/criterion/report/index.html
  3. Identify bottlenecks: Look for slowest operations
  4. Profile with flamegraph: Focus on hotspots
  5. Implement optimizations: Test one at a time
  6. Validate improvements: Compare against baseline
  7. Document findings: Update this file with results

Questions?

  • See detailed README: crates/readstat/benches/README.md
  • Check Criterion docs: https://bheisler.github.io/criterion.rs/book/
  • Review performance evaluation: Memory-mapped files analysis (separate doc)

Benchmarking with hyperfine

Benchmarking performed with hyperfine.

This example compares the performance of the Rust binary with the performance of the C binary built from the ReadStat repository. In general, hope that performance is fairly close to that of the C binary.

To run, execute the following from within the readstat directory.

# Windows
hyperfine --warmup 5 "ReadStat_App.exe -f crates\readstat-tests\tests\data\cars.sas7bdat tests\data\cars_c.csv" ".\target\release\readstat.exe data crates\readstat-tests\tests\data\cars.sas7bdat --output crates\readstat-tests\tests\data\cars_rust.csv"

πŸ“ First experiments on Windows are challenging to interpret due to file caching. Need further research into utilizing the --prepare option provided by hyperfine on Windows.

# Linux and macOS
hyperfine --prepare "sync; echo 3 | sudo tee /proc/sys/vm/drop_caches" "readstat -f crates/readstat-tests/tests/data/cars.sas7bdat crates/readstat-tests/tests/data/cars_c.csv" "./target/release/readstat data tests/data/cars.sas7bdat --output crates/readstat-tests/tests/data/cars_rust.csv"

Other, future, benchmarking may be performed now that channels and threads have been developed.

Profiling with Flamegraphs

Profiling performed with cargo flamegraph.

To run, execute the following from within the readstat directory.

cargo flamegraph --bin readstat -- data tests/data/_ahs2019n.sas7bdat --output tests/data/_ahs2019n.csv

Flamegraph is written to readstat/flamegraph.svg.

πŸ“ Have yet to utilize flamegraphs in order to improve performance.

GitHub Actions

The CI/CD workflow can be triggered in multiple ways:

1. Tag Push (Release)

Push a tag to trigger a full release build with GitHub Release artifacts:

# add and commit local changes
git add .
git commit -m "commit msg"

# push local changes to remote
git push

# add local tag
git tag -a v0.1.0 -m "v0.1.0"

# push local tag to remote
git push origin --tags

To delete and recreate tags:

# delete local tag
git tag --delete v0.1.0

# delete remote tag
git push origin --delete v0.1.0

2. Manual Trigger (GitHub UI)

Trigger a build manually from the GitHub Actions web interface (build-only, no releases):

  1. Go to the Actions tab
  2. Select the readstat-rs workflow
  3. Click Run workflow
  4. Optionally specify:
    • Version string: Label for artifacts (default: dev)

πŸ“ Manual triggers only build artifacts and do not create GitHub releases. To create a release, use a tag push.

3. API Trigger (External Tools)

Trigger builds programmatically using the GitHub API. This is useful for automation tools like Claude Code.

Using gh CLI

# Trigger a build
gh api repos/curtisalexander/readstat-rs/dispatches \
  -f event_type=build

# Trigger a build with custom version label
gh api repos/curtisalexander/readstat-rs/dispatches \
  -f event_type=build \
  -F client_payload='{"version":"test-build-123"}'

Using curl

curl -X POST \
  -H "Authorization: token $GITHUB_TOKEN" \
  -H "Accept: application/vnd.github.v3+json" \
  https://api.github.com/repos/curtisalexander/readstat-rs/dispatches \
  -d '{"event_type": "build", "client_payload": {"version": "dev"}}'

4. Claude Code Integration

To have Claude Code trigger a CI build, use this prompt:

Trigger a CI build for readstat-rs by running: gh api repos/curtisalexander/readstat-rs/dispatches -f event_type=build

Event Types

Repository dispatch event types for API triggers:

Event TypeDescription
buildBuild all targets and upload artifacts
testSame as build (alias for clarity)
releaseSame as build (reserved for future use)

πŸ“ API triggers only build artifacts and do not create GitHub releases. To create a release, use a tag push.

Artifacts

All builds (regardless of trigger method) upload artifacts that can be downloaded from the workflow run page. Artifacts are retained for the default GitHub Actions retention period.

Releasing to crates.io

Step-by-step guide for publishing readstat-rs crates to crates.io.

Quick Reference

# Run all pre-publish checks
./scripts/release-check.sh        # Linux/macOS
.\scripts\release-check.ps1       # Windows

# Switch vendor dirs from submodules to copied files
./scripts/vendor.sh prepare       # Linux/macOS
.\scripts\vendor.ps1 prepare      # Windows

# Publish (in dependency order)
cargo publish -p readstat-iconv-sys
cargo publish -p readstat-sys
cargo publish -p readstat
cargo publish -p readstat-cli

# Restore submodules after publishing
./scripts/vendor.sh restore       # Linux/macOS
.\scripts\vendor.ps1 restore      # Windows

Pre-Release Checklist

1. Version Bumps

Update version numbers in these files (keep them in sync):

FileFields
crates/readstat/Cargo.tomlversion
crates/readstat-cli/Cargo.tomlversion, readstat dependency version
crates/readstat-sys/Cargo.tomlversion
crates/readstat-iconv-sys/Cargo.tomlversion
crates/readstat/Cargo.tomlreadstat-sys dependency version
crates/readstat-sys/Cargo.tomlreadstat-iconv-sys dependency version

Version conventions:

  • readstat and readstat-cli share the same version (e.g. 0.20.0)
  • readstat-sys and readstat-iconv-sys share the same version (e.g. 0.3.0)
  • Bump sys crate versions only when the vendored C library or bindings change

2. Update CHANGELOG.md

Add an entry for the new version:

## [0.20.0] - 2026-XX-XX

### Added
- ...

### Changed
- ...

### Fixed
- ...

3. Run Automated Checks

./scripts/release-check.sh

This runs:

  • cargo fmt --all -- --check β€” formatting
  • cargo clippy --workspace β€” linting
  • readstat-wasm fmt and clippy (excluded from workspace, checked separately)
  • cargo test --workspace β€” all tests
  • cargo doc --workspace --no-deps β€” documentation build
  • cargo deny check β€” license and security audit (if installed)
  • Version consistency checks
  • CHANGELOG entry check
  • cargo package dry-run for each publishable crate

Fix any failures before proceeding.

4. Manual Checks

  • README.md is up to date
  • Documentation reflects any API changes
  • Architecture docs (docs/ARCHITECTURE.md) are current
  • mdbook builds cleanly: ./scripts/build-book.sh
  • readstat-wasm builds and exports are up to date (excluded from workspace; not published to crates.io)

Vendor Preparation

The readstat-sys and readstat-iconv-sys crates vendor C source code from git submodules. cargo publish cannot include git submodule contents, so the files must be copied as regular files before publishing.

Switch to publish mode

./scripts/vendor.sh prepare       # Linux/macOS
.\scripts\vendor.ps1 prepare      # Windows

This:

  1. Records submodule commit hashes in vendor-lock.txt
  2. Copies only the files needed for building (matching Cargo.toml include patterns)
  3. Deinitializes the git submodules
  4. Places the copied files in the vendor directories

Verify package contents

cargo package --list -p readstat-sys --allow-dirty
cargo package --list -p readstat-iconv-sys --allow-dirty

Publishing

Crates must be published in dependency order. Wait for each crate to appear on the crates.io index before publishing the next one.

# 1. No crate dependencies
cargo publish -p readstat-iconv-sys

# 2. Depends on readstat-iconv-sys (Windows only)
cargo publish -p readstat-sys

# 3. Depends on readstat-sys
cargo publish -p readstat

# 4. Depends on readstat
cargo publish -p readstat-cli

Note: There may be a delay (30 seconds to a few minutes) between publishing a crate and it appearing in the index. If cargo publish fails with a dependency resolution error, wait and retry.


Post-Publish

1. Restore submodules

./scripts/vendor.sh restore       # Linux/macOS
.\scripts\vendor.ps1 restore      # Windows

2. Create a git tag

git tag v0.20.0
git push origin v0.20.0

3. Create a GitHub release

Use the GitHub CLI or web UI to create a release from the tag. The CI pipeline (main.yml) will automatically build platform binaries and attach them.

4. Clean up

  • Remove vendor-lock.txt (or commit it for reference)
  • Verify the published crates on crates.io
  • Verify the docs on docs.rs

Troubleshooting

cargo publish fails with β€œno matching package found”

The dependency crate hasn’t appeared in the index yet. Wait 30-60 seconds and retry.

cargo package includes too many files

Check the include field in the crate’s Cargo.toml. Run cargo package --list to see exactly what will be included.

Vendor files missing after vendor.sh restore

Run git submodule update --init --recursive to re-initialize.

Build fails after switching vendor modes

Clean the build cache: cargo clean then rebuild.

readstat

Pure Rust library for parsing SAS binary files (.sas7bdat) into Apache Arrow RecordBatch format. Uses FFI bindings to the ReadStat C library for parsing.

Note: The ReadStat C library supports SAS, SPSS, and Stata file formats. The readstat-sys crate exposes the full ReadStat API β€” all 125 functions across all formats. However, this crate currently only implements parsing and conversion for SAS .sas7bdat files. SPSS and Stata formats are not supported.

Features

Output format writers are feature-gated (all enabled by default):

  • csv β€” CSV output via arrow-csv
  • parquet β€” Parquet output (Snappy, Zstd, Brotli, Gzip, Lz4 compression)
  • feather β€” Arrow IPC / Feather format
  • ndjson β€” Newline-delimited JSON
  • sql β€” DataFusion SQL query support (optional, not enabled by default)

Key Types

  • ReadStatData β€” Coordinates FFI parsing, accumulates values directly into typed Arrow builders
  • ReadStatMetadata β€” File-level metadata (row/var counts, encoding, compression, schema)
  • ReadStatWriter β€” Writes Arrow batches to the requested output format
  • ReadStatPath β€” Validated input file path
  • WriteConfig β€” Output configuration (path, format, compression)

For the full architecture overview, see docs/ARCHITECTURE.md.

readstat-cli

Binary crate producing the readstat CLI tool for converting SAS binary files (.sas7bdat) to other formats.

Note: The ReadStat C library supports SAS, SPSS, and Stata file formats. The readstat-sys crate exposes the full ReadStat API β€” all 125 functions across all formats. However, this CLI currently only supports SAS .sas7bdat files. SPSS and Stata formats are not supported.

Subcommands

  • metadata β€” Print file metadata (row/var counts, labels, encoding, format version, etc.)
  • preview β€” Preview first N rows as CSV to stdout
  • data β€” Convert to output format (csv, feather, ndjson, parquet)

Key Features

  • Column selection (--columns, --columns-file)
  • Streaming reads with configurable chunk size (--stream-rows)
  • Parallel reading (--parallel) and parallel Parquet writing (--parallel-write)
  • SQL queries via DataFusion (--sql, feature-gated)
  • Parquet compression settings (--compression, --compression-level)

For the full CLI reference, see docs/USAGE.md.

readstat-sys

Raw FFI bindings to the ReadStat C library, generated with bindgen.

The build.rs script compiles ~49 C source files from the vendored vendor/ReadStat/ git submodule via the cc crate and generates Rust bindings with bindgen. Platform-specific linking for iconv and zlib is handled automatically (see docs/BUILDING.md for details).

These bindings expose the full ReadStat API β€” all 125 functions and all 8 enum types β€” including support for SAS (.sas7bdat, .xpt), SPSS (.sav, .zsav, .por), and Stata (.dta) file formats. If you need to work with SPSS or Stata files from Rust, this crate provides the complete FFI surface to do so.

This is a sys crate β€” it exposes raw C types and functions. The higher-level readstat library crate provides a safe API but currently only implements support for SAS .sas7bdat files.

API Coverage

All 125 public C functions and all 8 enum types from readstat.h are bound. All 49 library source files are compiled.

Functions by Category

CategoryCountFormats
Metadata accessors15All
Value accessors14All
Variable accessors14All
Parser lifecycle3All
Parser callbacks7All
Parser I/O handlers6All
Parser config4All
File parsers (readers)10SAS (sas7bdat, sas7bcat, xport), SPSS (sav, por), Stata (dta), text schema (sas_commands, spss_commands, stata_dictionary, txt)
Schema parsing1All
Writer lifecycle3All
Writer label sets5All
Writer variable definition11All
Writer notes/strings3All
Writer metadata setters8All
Writer begin6SAS (sas7bdat, sas7bcat, xport), SPSS (sav, por), Stata (dta)
Writer validation2All
Writer row insertion12All
Error handling1All
Total125

Compiled Source Files

DirectoryFilesDescription
src/ (core)11Hash table, parser, value/variable handling, writer, I/O, error
src/sas/11SAS7BDAT, SAS7BCAT, XPORT read/write, IEEE float, RLE compression
src/spss/16SAV, POR, ZSAV read/write, compression, SPSS parsing
src/stata/4DTA read/write, timestamp parsing
src/txt/7SAS commands, SPSS commands, Stata dictionary, plain text, schema
Total49

Enum Types

C EnumRust Type AliasDescription
readstat_type_ereadstat_type_eData types (string, int8/16/32, float, double, string_ref)
readstat_type_class_ereadstat_type_class_eType classes (string, numeric)
readstat_measure_ereadstat_measure_eMeasurement levels (nominal, ordinal, scale)
readstat_alignment_ereadstat_alignment_eColumn alignment (left, center, right)
readstat_compress_ereadstat_compress_eCompression types (none, rows, binary)
readstat_endian_ereadstat_endian_eByte order (big, little)
readstat_error_ereadstat_error_eError codes (41 variants)
readstat_io_flags_ereadstat_io_flags_eI/O flags

Verifying Bindings

To confirm that the Rust bindings stay in sync with the vendored C header and source files, run the verification script:

# Bash (Linux, macOS, Windows Git Bash)
bash crates/readstat-sys/verify_bindings.sh

# Rebuild first, then verify
bash crates/readstat-sys/verify_bindings.sh --rebuild
# PowerShell (Windows)
.\crates\readstat-sys\verify_bindings.ps1

# Rebuild first, then verify
.\crates\readstat-sys\verify_bindings.ps1 -Rebuild

The script checks three things:

  1. Every function declared in readstat.h has a pub fn binding in the generated bindings.rs
  2. Every typedef enum in the header has a corresponding Rust type alias
  3. Every .c library source file in the vendor directory is listed in build.rs

Run this after updating the ReadStat submodule to catch any new or removed API surface.

readstat-iconv-sys

Windows-only FFI bindings to libiconv for character encoding conversion.

The build.rs script compiles libiconv from the vendored vendor/libiconv-win-build/ git submodule using the cc crate. On non-Windows platforms the build script is a no-op.

The links = "iconv" key in Cargo.toml allows readstat-sys to discover the include path via the DEP_ICONV_INCLUDE environment variable.

readstat-tests

Integration test suite for the readstat library and readstat-cli binary.

Contains 29 test modules covering all SAS data types, 118 date/time/datetime formats, missing values, large pages, CLI subcommands, parallel read/write, Parquet output, CSV output, Arrow migration, row offsets, scientific notation, column selection, skip row count, memory-mapped file reading, byte-slice reading, and SQL queries.

Test data lives in tests/data/*.sas7bdat (14 datasets). SAS scripts to regenerate test data are in util/.

Run with:

cargo test -p readstat-tests

readstat-wasm

WebAssembly build of the readstat library for parsing SAS .sas7bdat files in JavaScript. Reads metadata and converts row data to CSV, NDJSON, Parquet, or Feather (Arrow IPC) entirely in memory β€” no server or native dependencies required at runtime.

Package contents

The pkg/ directory contains everything needed to use the library from JavaScript:

FileDescription
readstat_wasm.wasmPre-built WASM binary (Emscripten target)
readstat_wasm.jsJS wrapper handling module loading, memory management, and type conversion

JS API

All functions accept a Uint8Array of raw .sas7bdat file bytes.

import { init, read_metadata, read_metadata_fast, read_data, read_data_ndjson, read_data_parquet, read_data_feather } from "readstat-wasm";

// Must be called once before using any other function
await init();

const bytes = new Uint8Array(/* .sas7bdat file contents */);

// Metadata (returns JSON string)
const metadataJson = read_metadata(bytes);
const metadataJsonFast = read_metadata_fast(bytes); // skips full row count

// Data as text (returns string)
const csv = read_data(bytes);       // CSV with header row
const ndjson = read_data_ndjson(bytes); // newline-delimited JSON

// Data as binary (returns Uint8Array)
const parquet = read_data_parquet(bytes);  // Parquet bytes
const feather = read_data_feather(bytes);  // Feather (Arrow IPC) bytes

Functions

FunctionReturnsDescription
init()Promise<void>Load and initialize the WASM module
read_metadata(bytes)stringFile and variable metadata as JSON
read_metadata_fast(bytes)stringSame as above but skips full row count for speed
read_data(bytes)stringAll row data as CSV (with header)
read_data_ndjson(bytes)stringAll row data as newline-delimited JSON
read_data_parquet(bytes)Uint8ArrayAll row data as Parquet bytes
read_data_feather(bytes)Uint8ArrayAll row data as Feather (Arrow IPC) bytes

How it works

The crate compiles the ReadStat C library and the Rust readstat parsing library to WebAssembly using the wasm32-unknown-emscripten target. Emscripten is required because the underlying C code needs a C standard library (libc, iconv).

The data functions perform a two-pass parse over the byte buffer: first to extract metadata (schema, row count), then to read row values into an Arrow RecordBatch, which is serialized to CSV, NDJSON, Parquet, or Feather in memory.

C ABI exports

The WASM module exposes these C-compatible functions (used internally by the JS wrapper):

ExportSignaturePurpose
read_metadata(ptr, len) -> *charParse metadata as JSON
read_metadata_fast(ptr, len) -> *charSame, skipping full row count
read_data(ptr, len) -> *charParse data, return as CSV
read_data_ndjson(ptr, len) -> *charParse data, return as NDJSON
read_data_parquet(ptr, len, out_len) -> *u8Parse data, return as Parquet bytes
read_data_feather(ptr, len, out_len) -> *u8Parse data, return as Feather bytes
free_string(ptr)Free a string returned by the above
free_binary(ptr, len)Free a binary buffer returned by parquet/feather

Building from source

Requires Rust, Emscripten SDK, and libclang.

# Activate Emscripten
source /path/to/emsdk/emsdk_env.sh

# Add the target (first time only)
rustup target add wasm32-unknown-emscripten

# Initialize submodules (first time only, from repo root)
git submodule update --init --recursive

# Build
cargo build --target wasm32-unknown-emscripten --release

# Copy binary to pkg/
cp target/wasm32-unknown-emscripten/release/readstat_wasm.wasm pkg/

See the bun-demo for a working example.

readstat CLI Demo

Demonstrates converting a SAS .sas7bdat file to CSV, NDJSON, Parquet, and Feather using the readstat command-line tool.

Quick start

Linux / macOS

# Build the CLI (from repo root)
cargo build -p readstat-cli

# Run the conversion script
cd examples/cli-demo
bash convert.sh

# Verify the output files
uv run verify_output.py

You can also pass a specific path to the readstat binary:

bash convert.sh /path/to/readstat

Windows (PowerShell)

# Build the CLI (from repo root)
cargo build -p readstat-cli

# Run the conversion script
cd examples/cli-demo
./convert.ps1

# Verify the output files
uv run verify_output.py

You can also pass a specific path to the readstat binary:

./convert.ps1 -ReadStat C:\path\to\readstat.exe

What it does

The convert.sh (Bash) and convert.ps1 (PowerShell) scripts:

  1. Displays metadata for the cars.sas7bdat dataset (table name, encoding, row count, variable info)
  2. Previews the first 5 rows of data
  3. Converts the dataset to four output formats:
    • cars.csv β€” comma-separated values
    • cars.ndjson β€” newline-delimited JSON
    • cars.parquet β€” Apache Parquet (columnar binary)
    • cars.feather β€” Arrow IPC / Feather (columnar binary)

The verify_output.py script validates all output files:

  • Checks row and column counts match the expected 1,081 rows x 13 columns
  • Verifies column names are correct
  • Confirms cross-format consistency (all four formats contain identical data)

The cars dataset

PropertyValue
Rows1,081
Columns13
Sourcecrates/readstat-tests/tests/data/cars.sas7bdat
EncodingWINDOWS-1252

Columns: Brand, Model, Minivan, Wagon, Pickup, Automatic, EngineSize, Cylinders, CityMPG, HwyMPG, SUV, AWD, Hybrid

Expected output

Using readstat: /path/to/readstat
Input file:     /path/to/cars.sas7bdat

=== Metadata ===
...

=== Preview (first 5 rows) ===
...

Converting to CSV...
  -> cars.csv
Converting to NDJSON...
  -> cars.ndjson
Converting to Parquet...
  -> cars.parquet
Converting to Feather...
  -> cars.feather

Done! All output files written to /path/to/examples/cli-demo
Run 'uv run verify_output.py' to validate the output files.

API Server Demo

Two identical API servers demonstrating how to integrate readstat into backend applications:

  • Rust server (Axum) β€” direct library integration
  • Python server (FastAPI) β€” cross-language integration via PyO3/maturin bindings

Both servers expose the same endpoints and return identical results for the same input.

Prerequisites

Rust server:

  • Rust toolchain
  • libclang (for readstat-sys bindgen)
  • Git submodules initialized: git submodule update --init --recursive

Python server:

  • Everything above, plus:
  • uv (Python package manager)
  • Python 3.9+

Quick Start

Rust Server (port 3000)

cd examples/api-demo/rust-server
cargo run

You should see:

Rust API server listening on http://localhost:3000

Python Server (port 3001)

cd examples/api-demo/python-server

# Build the PyO3 bindings into the project venv
uv sync
uv run maturin develop -m readstat_py/Cargo.toml

# Start the server
uv run uvicorn server:app --port 3001

You should see:

INFO:     Started server process [...]
INFO:     Uvicorn running on http://127.0.0.1:3001 (Press CTRL+C to quit)

Walking Through the Endpoints

The examples below use port 3000 (Rust server). Replace with 3001 for the Python server β€” the responses are identical.

Set a convenience variable for the test file:

FILE=test-data/cars.sas7bdat

1. Health Check

curl http://localhost:3000/health

Expected output:

{"status":"ok"}

2. File Metadata

Upload a SAS file and get back its metadata as JSON:

curl -F "file=@$FILE" http://localhost:3000/metadata

Expected output (formatted):

{
  "row_count": 1081,
  "var_count": 13,
  "table_name": "CARS",
  "file_label": "Written by SAS",
  "file_encoding": "WINDOWS-1252",
  "version": 9,
  "is64bit": 0,
  "creation_time": "2008-09-30 12:55:01",
  "modified_time": "2008-09-30 12:55:01",
  "compression": "None",
  "endianness": "Little",
  "vars": {
    "0": {
      "var_name": "Brand",
      "var_type": "String",
      "var_type_class": "String",
      "var_label": "",
      "var_format": "",
      "var_format_class": null,
      "storage_width": 13,
      "display_width": 0
    },
    "1": {
      "var_name": "Model",
      "var_type": "String",
      "var_type_class": "String",
      ...
    },
    ...
  }
}

The vars map is keyed by column index and includes type info, labels, and SAS format metadata for all 13 variables.

3. Preview Rows

Get the first N rows as CSV (default 10, here we ask for 5):

curl -F "file=@$FILE" "http://localhost:3000/preview?rows=5"

Expected output:

Brand,Model,Minivan,Wagon,Pickup,Automatic,EngineSize,Cylinders,CityMPG,HwyMPG,SUV,AWD,Hybrid
TOYOTA,Prius,0.0,0.0,0.0,1.0,1.5,4.0,60.0,51.0,0.0,0.0,1.0
HONDA,Civic Hybrid,0.0,0.0,0.0,1.0,1.3,4.0,48.0,47.0,0.0,0.0,1.0
HONDA,Civic Hybrid,0.0,0.0,0.0,1.0,1.3,4.0,47.0,48.0,0.0,0.0,1.0
HONDA,Civic Hybrid,0.0,0.0,0.0,0.0,1.3,4.0,46.0,51.0,0.0,0.0,1.0
HONDA,Civic Hybrid,0.0,0.0,0.0,0.0,1.3,4.0,45.0,51.0,0.0,0.0,1.0

4. Convert to CSV

Export the full dataset (all 1,081 rows) as CSV:

curl -F "file=@$FILE" "http://localhost:3000/data?format=csv" -o output.csv

The response has Content-Type: text/csv and Content-Disposition: attachment; filename="data.csv".

5. Convert to NDJSON

Export as newline-delimited JSON (one JSON object per row):

curl -F "file=@$FILE" "http://localhost:3000/data?format=ndjson"

Expected output (first few lines):

{"Brand":"TOYOTA","Model":"Prius","Minivan":0.0,"Wagon":0.0,"Pickup":0.0,"Automatic":1.0,"EngineSize":1.5,"Cylinders":4.0,"CityMPG":60.0,"HwyMPG":51.0,"SUV":0.0,"AWD":0.0,"Hybrid":1.0}
{"Brand":"HONDA","Model":"Civic Hybrid","Minivan":0.0,"Wagon":0.0,"Pickup":0.0,"Automatic":1.0,"EngineSize":1.3,"Cylinders":4.0,"CityMPG":48.0,"HwyMPG":47.0,"SUV":0.0,"AWD":0.0,"Hybrid":1.0}
{"Brand":"HONDA","Model":"Civic Hybrid","Minivan":0.0,"Wagon":0.0,"Pickup":0.0,"Automatic":1.0,"EngineSize":1.3,"Cylinders":4.0,"CityMPG":47.0,"HwyMPG":48.0,"SUV":0.0,"AWD":0.0,"Hybrid":1.0}
...

The response has Content-Type: application/x-ndjson.

6. Convert to Parquet

Export as Apache Parquet (binary, Snappy-compressed):

curl -F "file=@$FILE" "http://localhost:3000/data?format=parquet" -o output.parquet

This produces a ~15 KB Parquet file. You can inspect it with tools like parquet-tools, DuckDB, or pandas:

import pandas as pd
print(pd.read_parquet("output.parquet").head())

7. Convert to Feather

Export as Arrow IPC (Feather v2) format:

curl -F "file=@$FILE" "http://localhost:3000/data?format=feather" -o output.feather

This produces a ~130 KB Feather file. Read it back with any Arrow-compatible tool:

import pandas as pd
print(pd.read_feather("output.feather").head())

Automated Test Scripts

Both scripts work against either server β€” just change the URL.

Shell script (curl)

cd examples/api-demo
bash client/test_api.sh http://localhost:3000 test-data/cars.sas7bdat
bash client/test_api.sh http://localhost:3001 test-data/cars.sas7bdat

Python script (httpx)

Uses PEP 723 inline script metadata, so uv run handles dependencies automatically β€” no virtual environment setup needed:

cd examples/api-demo/client
uv run test_api.py http://localhost:3000 ../test-data/cars.sas7bdat
uv run test_api.py http://localhost:3001 ../test-data/cars.sas7bdat

Expected output:

=== Testing http://localhost:3000 with ../test-data/cars.sas7bdat ===

--- GET /health ---
{'status': 'ok'}

--- POST /metadata ---
  row_count: 1081
  var_count: 13
  table_name: CARS
  encoding: WINDOWS-1252
  variables: 13

--- POST /preview (5 rows) ---
  Brand,Model,Minivan,Wagon,Pickup,Automatic,EngineSize,Cylinders,CityMPG,HwyMPG,SUV,AWD,Hybrid
  TOYOTA,Prius,0.0,0.0,0.0,1.0,1.5,4.0,60.0,51.0,0.0,0.0,1.0
  ...

--- POST /data?format=csv ---
  Brand,Model,Minivan,Wagon,Pickup,Automatic,EngineSize,Cylinders,CityMPG,HwyMPG,SUV,AWD,Hybrid
  TOYOTA,Prius,0.0,0.0,0.0,1.0,1.5,4.0,60.0,51.0,0.0,0.0,1.0
  HONDA,Civic Hybrid,0.0,0.0,0.0,1.0,1.3,4.0,48.0,47.0,0.0,0.0,1.0

--- POST /data?format=ndjson ---
  {"Brand":"TOYOTA","Model":"Prius","Minivan":0.0,...}
  ...

--- POST /data?format=parquet ---
  15403 bytes

--- POST /data?format=feather ---
  129650 bytes

=== All tests passed ===

API Reference

MethodPathRequestResponseContent-Type
GET/healthβ€”{"status": "ok"}application/json
POST/metadatamultipart fileJSON metadataapplication/json
POST/preview?rows=Nmultipart fileCSV text (first N rows, default 10)text/csv
POST/data?format=csvmultipart fileFull dataset as CSVtext/csv
POST/data?format=ndjsonmultipart fileFull dataset as NDJSONapplication/x-ndjson
POST/data?format=parquetmultipart fileFull dataset as Parquetapplication/octet-stream
POST/data?format=feathermultipart fileFull dataset as Featherapplication/octet-stream

The multipart field name must be file. Binary formats include a Content-Disposition header with a suggested filename.

How It Works

Rust Server

HTTP upload β†’ Axum multipart extraction β†’ Vec<u8>
  β†’ spawn_blocking {
      ReadStatMetadata::read_metadata_from_bytes()
      ReadStatData::read_data_from_bytes() β†’ Arrow RecordBatch
      write_batch_to_{csv,ndjson,parquet,feather}_bytes()
    }
  β†’ HTTP response

All ReadStat C library FFI calls run inside spawn_blocking to avoid blocking the tokio async runtime.

Python Server

HTTP upload β†’ FastAPI UploadFile β†’ bytes
  β†’ readstat_py.read_to_{csv,ndjson,parquet,feather}(bytes)
    β†’ [PyO3 boundary]
      β†’ ReadStatMetadata::read_metadata_from_bytes()
      β†’ ReadStatData::read_data_from_bytes() β†’ Arrow RecordBatch
      β†’ write_batch_to_*_bytes()
    β†’ [back to Python]
  β†’ HTTP response

The PyO3 binding layer is intentionally thin β€” 5 functions that take &[u8] and return Vec<u8> (or String for metadata). No complex types cross the FFI boundary.

readstat-wasm Bun Demo

Demonstrates reading SAS .sas7bdat file metadata and data from JavaScript using the readstat-wasm package compiled to WebAssembly via Emscripten. The demo parses a .sas7bdat file entirely in-memory via WASM and converts it to CSV.

Quick start

If you already have Rust, Emscripten SDK, libclang, and Bun installed:

macOS / Linux:

# Activate Emscripten (first time per terminal session)
source /path/to/emsdk/emsdk_env.sh

# Add the wasm target (first time only)
rustup target add wasm32-unknown-emscripten

# Initialize submodules (first time only)
git submodule update --init --recursive

# Build the wasm package
cd crates/readstat-wasm
cargo build --target wasm32-unknown-emscripten --release
cp target/wasm32-unknown-emscripten/release/readstat_wasm.wasm pkg/

# Run the demo
cd ../../examples/bun-demo
bun install
bun run index.ts

Windows (Git Bash):

# Activate Emscripten (first time per terminal session)
/c/path/to/emsdk/emsdk.bat activate latest
export EMSDK=C:/path/to/emsdk

# Add the wasm target (first time only)
rustup target add wasm32-unknown-emscripten

# Initialize submodules (first time only)
git submodule update --init --recursive

# Build the wasm package
cd crates/readstat-wasm
cargo build --target wasm32-unknown-emscripten --release
cp target/wasm32-unknown-emscripten/release/readstat_wasm.wasm pkg/

# Run the demo
cd ../../examples/bun-demo
bun install
bun run index.ts

Windows (PowerShell):

# Activate Emscripten (first time per terminal session)
C:\path\to\emsdk\emsdk.bat activate latest
$env:EMSDK = "C:\path\to\emsdk"

# Add the wasm target (first time only)
rustup target add wasm32-unknown-emscripten

# Initialize submodules (first time only)
git submodule update --init --recursive

# Build the wasm package
cd crates\readstat-wasm
cargo build --target wasm32-unknown-emscripten --release
copy target\wasm32-unknown-emscripten\release\readstat_wasm.wasm pkg\

# Run the demo
cd ..\..\examples\bun-demo
bun install
bun run index.ts

1. Install dependencies

Rust + wasm target

# Install Rust (if not already installed)
# macOS / Linux
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Windows β€” download and run rustup-init.exe from https://rustup.rs

# Add the Emscripten wasm target (all platforms)
rustup target add wasm32-unknown-emscripten

Emscripten SDK

# Clone the SDK
git clone https://github.com/emscripten-core/emsdk.git
cd emsdk

# Install and activate the latest toolchain
./emsdk install latest
./emsdk activate latest

Activate in your shell (run every new terminal session, or add to your profile):

PlatformCommand
macOS / Linuxsource ./emsdk_env.sh
Windows (cmd)emsdk_env.bat
Windows (PowerShell)emsdk_env.bat (then set $env:EMSDK = "C:\path\to\emsdk" if needed)
Windows (Git Bash)source ./emsdk_env.sh (then export EMSDK=C:/path/to/emsdk if needed)

Note: On Windows, emsdk_env.sh / emsdk_env.bat may update PATH without exporting the EMSDK variable. If the build fails with β€œEMSDK must be set”, set it manually as shown above. The build script will also attempt to auto-detect the emsdk root from PATH.

libclang (required by bindgen)

PlatformCommand
macOSbrew install llvm
Ubuntu / Debiansudo apt-get install libclang-dev
Fedorasudo dnf install clang-devel
WindowsInstall LLVM from https://releases.llvm.org/download.html and set LIBCLANG_PATH to the lib directory (e.g., C:\Program Files\LLVM\lib)

Bun

# macOS / Linux
curl -fsSL https://bun.sh/install | bash

# Windows (PowerShell)
powershell -c "irm bun.sh/install.ps1 | iex"

2. Initialize git submodules

From the repository root:

git submodule update --init --recursive

3. Build the WASM package

# Make sure Emscripten is activated in your shell (see table above)

# From the readstat-wasm crate directory
cd crates/readstat-wasm

# Build with Emscripten target (release mode)
cargo build --target wasm32-unknown-emscripten --release

# Copy the .wasm binary into the pkg/ directory
# macOS / Linux
cp target/wasm32-unknown-emscripten/release/readstat_wasm.wasm pkg/
# Windows (PowerShell)
# copy target\wasm32-unknown-emscripten\release\readstat_wasm.wasm pkg\

4. Run the demo

cd examples/bun-demo
bun install
bun run index.ts

Expected output

=== SAS7BDAT Metadata ===
Table name:    CARS
File encoding: WINDOWS-1252
Row count:     1081
Variable count:13
Compression:   None
Endianness:    Little
Created:       2008-09-30 12:55:01
Modified:      2008-09-30 12:55:01

=== Variables ===
  [0] Brand (String, )
  [1] Model (String, )
  [2] Minivan (Double, )
  [3] Wagon (Double, )
  [4] Pickup (Double, )
  [5] Automatic (Double, )
  [6] EngineSize (Double, )
  [7] Cylinders (Double, )
  [8] CityMPG (Double, )
  [9] HwyMPG (Double, )
  [10] SUV (Double, )
  [11] AWD (Double, )
  [12] Hybrid (Double, )

=== CSV Data (preview) ===
Brand,Model,Minivan,Wagon,Pickup,Automatic,EngineSize,Cylinders,CityMPG,HwyMPG,SUV,AWD,Hybrid
TOYOTA,Prius,0.0,0.0,0.0,1.0,1.5,4.0,60.0,51.0,0.0,0.0,1.0
HONDA,Civic Hybrid,0.0,0.0,0.0,1.0,1.3,4.0,48.0,47.0,0.0,0.0,1.0
HONDA,Civic Hybrid,0.0,0.0,0.0,1.0,1.3,4.0,47.0,48.0,0.0,0.0,1.0
HONDA,Civic Hybrid,0.0,0.0,0.0,0.0,1.3,4.0,46.0,51.0,0.0,0.0,1.0
HONDA,Civic Hybrid,0.0,0.0,0.0,0.0,1.3,4.0,45.0,51.0,0.0,0.0,1.0
... (1081 total data rows)

Wrote 1081 rows to cars.csv

How it works

The readstat-wasm crate compiles the ReadStat C library and the Rust readstat parsing library to WebAssembly using the wasm32-unknown-emscripten target. Emscripten is required because the underlying ReadStat C code needs a C standard library (libc, iconv) β€” which Emscripten provides for wasm. (Note: zlib is only needed for SPSS zsav support, which is not included in the current wasm build.)

The crate exports eight C-compatible functions:

ExportSignaturePurpose
read_metadata(ptr, len) -> *charParse metadata as JSON from a byte buffer
read_metadata_fast(ptr, len) -> *charSame, but skips full row count
read_data(ptr, len) -> *charParse data and return as CSV string
read_data_ndjson(ptr, len) -> *charParse data and return as NDJSON string
read_data_parquet(ptr, len, out_len) -> *u8Parse data and return as Parquet bytes
read_data_feather(ptr, len, out_len) -> *u8Parse data and return as Feather bytes
free_string(ptr)Free a string returned by the string functions
free_binary(ptr, len)Free binary data returned by parquet/feather

The data functions perform a two-pass parse over the same byte buffer: first to extract metadata (schema, row count), then to read row values into an Arrow RecordBatch, which is serialized to CSV or NDJSON in memory.

The JS wrapper in pkg/readstat_wasm.js handles:

  • Loading the .wasm module
  • Providing minimal WASI and Emscripten import stubs
  • Memory management (malloc/free for input bytes, free_string for output)
  • Converting between JS types and wasm pointers

Troubleshooting

EMSDK must be set for Emscripten builds Set the EMSDK environment variable to point to your emsdk installation directory. On macOS/Linux: export EMSDK=/path/to/emsdk. On Windows (PowerShell): $env:EMSDK = "C:\path\to\emsdk". On Windows (Git Bash): export EMSDK=C:/path/to/emsdk. The build script also attempts to auto-detect the emsdk root from your PATH, so simply having Emscripten activated may be sufficient.

error: linking with emcc failed / undefined symbol: main Make sure you’re building from crates/readstat-wasm/ (not the repo root). The .cargo/config.toml in that directory provides the necessary linker flags.

The command line is too long (Windows) This was a known issue when building all ReadStat C source files for the Emscripten target. It has been fixed β€” the build script now compiles only the SAS format sources for Emscripten builds, keeping the archiver command within Windows’ command-line length limit.

Web Demo: SAS7BDAT Viewer & Converter

Browser-based demo that reads SAS .sas7bdat files entirely client-side using WebAssembly. Upload a file to view metadata, preview data in a sortable table, and export to CSV, NDJSON, Parquet, or Feather.

No build tools, no npm install, no framework β€” just static files served over HTTP.

Quick start

  1. Copy the WASM binary into this directory (if not already present):

    cp crates/readstat-wasm/pkg/readstat_wasm.wasm examples/web-demo/
    

    If you need to rebuild it first, see the bun-demo README for build instructions.

  2. Serve the directory with any static HTTP server. You must point the server at the directory, not at index.html directly:

    # From the repo root:
    python -m http.server 8000 -d examples/web-demo
    npx serve examples/web-demo
    bunx serve examples/web-demo
    
    # Or from the web-demo directory:
    cd examples/web-demo
    python -m http.server 8000
    npx serve
    bunx serve
    

    Note: Do not pass index.html as the argument (e.g., bunx serve index.html). That tells serve to look for a directory named index.html, which will cause the WASM and JS files to 404.

  3. Open http://localhost:3000 (for serve) or http://localhost:8000 (for Python) in your browser.

  4. Upload a .sas7bdat file (e.g., crates/readstat-tests/tests/data/cars.sas7bdat).

Features

  • Metadata panel β€” table name, encoding, row/variable count, compression, timestamps
  • Variable table β€” name, type, label, and format for each column
  • Data preview β€” first 100 rows in a sortable table (uses Tabulator from CDN, with plain HTML table fallback)
  • Export β€” download as CSV, NDJSON, Parquet, or Feather

WASM binary

The readstat_wasm.wasm file is built from the readstat-wasm crate (crates/readstat-wasm/). It compiles the ReadStat C library and the Rust readstat parsing library to WebAssembly via the wasm32-unknown-emscripten target. The binary is ~9.7 MB.

A pre-built copy is checked in at crates/readstat-wasm/pkg/readstat_wasm.wasm.

Browser compatibility

  • Requires a modern browser with WebAssembly support (Chrome 57+, Firefox 52+, Safari 11+, Edge 16+)
  • Must be served over HTTP(S) β€” file:// URLs will not work due to WASM fetch() requirements
  • Tabulator.js is loaded from CDN; if offline, the data preview falls back to a plain HTML table

File structure

examples/web-demo/
β”œβ”€β”€ index.html          # App (HTML + inline CSS + inline JS)
β”œβ”€β”€ readstat_wasm.js    # Browser-compatible WASM wrapper
β”œβ”€β”€ readstat_wasm.wasm  # WASM binary (copied from pkg/)
└── README.md           # This file

SAS7BDAT SQL Explorer

An interactive browser-based tool for uploading .sas7bdat files and querying them with SQL β€” entirely client-side using WebAssembly.

How It Works

  1. Upload a .sas7bdat file (drag-and-drop or file picker)
  2. The file is parsed in-browser via the readstat-wasm WebAssembly module
  3. Data is loaded into AlaSQL, a client-side SQL engine
  4. Write SQL queries in a syntax-highlighted editor (powered by CodeMirror 6)
  5. View results in an interactive, sortable table (powered by Tabulator)
  6. Export query results as CSV

No data leaves your browser β€” all processing happens locally.

Quick Start

Serve the directory with any static HTTP server. The entire directory must be served (not just index.html) so the browser can load the .js and .wasm files alongside it.

From the repository root:

# Python
python -m http.server 8000 -d examples/sql-explorer

# Bun
bunx serve examples/sql-explorer

Or cd into the directory and serve from there:

cd examples/sql-explorer

# Python
python -m http.server 8000

# Bun
bunx serve .

Then open http://localhost:8000 in your browser.

Note: The page must be served over HTTP(S) β€” opening index.html directly as a file:// URL won’t work because browsers block WASM loading from the local filesystem.

WASM Files

The readstat_wasm.js and readstat_wasm.wasm files are copies from examples/web-demo/. If you rebuild the WASM module, copy the updated files here as well.

To rebuild from source (requires Emscripten):

cd crates/readstat-wasm
./build.sh
cp pkg/readstat_wasm.js pkg/readstat_wasm.wasm ../../examples/sql-explorer/

CDN Dependencies

All loaded automatically from CDNs β€” no npm install required:

LibraryVersionCDNPurpose
AlaSQL4.xjsdelivrClient-side SQL engine
CodeMirror 66.xesm.shSQL editor with syntax highlighting
Tabulator6.xunpkgInteractive sortable/filterable result tables

Example Queries

Once a file is loaded, the data is available as a table named data. Some queries to try:

-- Preview all rows
SELECT * FROM data LIMIT 100

-- Count rows
SELECT COUNT(*) AS total_rows FROM data

-- Filter rows
SELECT * FROM data WHERE column_name = 'value'

-- Aggregate
SELECT column_name, COUNT(*) AS n FROM data GROUP BY column_name ORDER BY n DESC

-- Select specific columns
SELECT col1, col2, col3 FROM data LIMIT 50

Column names with spaces or special characters should be wrapped in square brackets: [Column Name].

For the full list of supported SQL syntax, see the AlaSQL SQL Reference.

API Documentation (Rustdocs)

Auto-generated API documentation for each crate is available below:

Note: These docs are generated by cargo doc and deployed alongside this book by CI.