Keyboard shortcuts

Press ← or β†’ to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

readstat-rs

Read, inspect, and convert SAS binary (.sas7bdat) files β€” from Rust code, the command line, or the browser. Converts to CSV, Parquet, Feather, and NDJSON using Apache Arrow.

The original use case was a command-line tool for converting SAS files, but the project has since expanded into a workspace of crates that can be used as a Rust library, a CLI, or compiled to WebAssembly for browser and JavaScript runtimes.

πŸ”‘ Dependencies

The command-line tool is developed in Rust and is only possible due to the following excellent projects:

The ReadStat library is used to parse and read sas7bdat files, and the arrow crate is used to convert the read sas7bdat data into the Arrow memory format. Once in the Arrow memory format, the data can be written to other file formats.

πŸ’‘ Note: The ReadStat C library supports SAS, SPSS, and Stata file formats. The readstat-sys crate exposes the full ReadStat API β€” all 125 functions across all formats. However, the higher-level crates (readstat, readstat-cli, readstat-wasm, readstat-tests) currently only implement support for SAS .sas7bdat files.

πŸš€ CLI Quickstart

Convert the first 50,000 rows of example.sas7bdat (by performing the read in parallel) to the file example.parquet, overwriting the file if it already exists.

readstat data /some/dir/to/example.sas7bdat --output /some/dir/to/example.parquet --format parquet --rows 50000 --overwrite --parallel

πŸ“¦ CLI Install

Download a Release

[Mostly] static binaries for Linux, macOS, and Windows may be found at the Releases page.

Setup

Move the readstat binary to a known directory and add the binary to the user’s PATH.

Linux & macOS

Ensure the path to readstat is added to the appropriate shell configuration file.

Windows

For Windows users, path configuration may be found within the Environment Variables menu. Executing the following from the command line opens the Environment Variables menu for the current user.

rundll32.exe sysdm.cpl,EditEnvironmentVariables

Alternatively, update the user-level PATH in PowerShell (replace C:\path\to\readstat with the actual directory):

$currentPath = [Environment]::GetEnvironmentVariable("Path", "User")
[Environment]::SetEnvironmentVariable("Path", "$currentPath;C:\path\to\readstat", "User")

After running the above, restart your terminal for the change to take effect.

Run

Run the binary.

readstat --help

βš™οΈ CLI Usage

The binary is invoked using subcommands:

  • metadata β†’ writes file and variable metadata to standard out or JSON
  • preview β†’ writes the first N rows of parsed data as csv to standard out
  • data β†’ writes parsed data in csv, feather, ndjson, or parquet format to a file

Column metadata β€” labels, SAS format strings, and storage widths β€” is preserved in Parquet and Feather output as Arrow field metadata. See docs/TECHNICAL.md for details.

For the full CLI reference β€” including column selection, parallelism, memory considerations, SQL queries, reader modes, and debug options β€” see docs/USAGE.md.

For library, API server, and WebAssembly usage, see Examples below.

πŸ› οΈ Build from Source

Clone the repository (with submodules), install platform-specific developer tools, and run cargo build. Platform-specific instructions for Linux, macOS, and Windows are in docs/BUILDING.md.

πŸ’» Platform Support

PlatformStatusC libraryNotes
Linux (glibc)βœ… Builds and runsSystem iconv, system zlibβ€”
Linux (musl)βœ… Builds and runsSystem iconv, system zlibβ€”
macOSβœ… Builds and runsSystem libiconv, system zlibβ€”
Windows (MSVC)βœ… Builds and runsVendored iconv, vendored zlibRequires libclang for bindgen. MSVC supported since ReadStat 1.1.5 (no msys2 needed).

πŸ“š Documentation

DocumentDescription
docs/ARCHITECTURE.mdCrate layout, key types, and architectural patterns
docs/USAGE.mdFull CLI reference and examples
docs/BUILDING.mdClone, build, and linking details per platform
docs/TECHNICAL.mdFloating-point precision and date/time handling
docs/TESTING.mdRunning tests, dataset table, valgrind
docs/BENCHMARKING.mdCriterion benchmarks, hyperfine, and profiling
docs/CI-CD.mdGitHub Actions triggers and artifacts
docs/MEMORY_SAFETY.mdAutomated memory-safety CI checks (Valgrind, ASan, Miri, unsafe audit)
docs/RELEASING.mdStep-by-step guide for publishing crates to crates.io

🧩 Workspace Crates

CratePathDescription
readstatcrates/readstat/Pure library for parsing SAS files into Arrow RecordBatch format. Output writers are feature-gated.
readstat-clicrates/readstat-cli/Binary crate producing the readstat CLI tool (arg parsing, progress bars, orchestration).
readstat-syscrates/readstat-sys/Raw FFI bindings to the full ReadStat C library (SAS, SPSS, Stata) via bindgen.
readstat-iconv-syscrates/readstat-iconv-sys/Windows-only FFI bindings to libiconv for character encoding conversion.
readstat-testscrates/readstat-tests/Integration test suite (29 modules, 14 datasets).
readstat-wasmcrates/readstat-wasm/WebAssembly build for browser/JS usage (excluded from workspace, built with Emscripten).

For full architectural details, see docs/ARCHITECTURE.md.

πŸ’‘ Examples

The examples/ directory contains runnable demos showing different ways to use readstat-rs.

ExampleDescription
cli-demoConvert a .sas7bdat file to CSV, NDJSON, Parquet, and Feather using the readstat CLI
api-demoAPI servers in Rust (Axum) and Python (FastAPI + PyO3) β€” upload, inspect, and convert SAS files over HTTP
bun-demoParse a .sas7bdat file from JavaScript using the WebAssembly build with Bun
web-demoBrowser-based viewer and converter β€” upload, preview, and export entirely client-side via WASM
sql-explorerBrowser-based SQL explorer β€” upload a .sas7bdat file and query it interactively with SQL via AlaSQL

To use readstat as a library in your own Rust project, add the readstat crate as a dependency.

πŸ”— Resources

The following have been incredibly helpful while developing!