readstat
Pure Rust library for parsing SAS binary files (.sas7bdat) into Apache Arrow RecordBatch format. Uses FFI bindings to the ReadStat C library for parsing.
Note: The ReadStat C library supports SAS, SPSS, and Stata file formats. The
readstat-syscrate exposes the full ReadStat API — all 125 functions across all formats. However, this crate currently only implements parsing and conversion for SAS.sas7bdatfiles. SPSS and Stata formats are not supported.
Features
Output format writers are feature-gated (all enabled by default):
csv— CSV output viaarrow-csvparquet— Parquet output (Snappy, Zstd, Brotli, Gzip, Lz4 compression)feather— Arrow IPC / Feather formatndjson— Newline-delimited JSONsql— DataFusion SQL query support (optional, not enabled by default)
Key Types
ReadStatData— Coordinates FFI parsing, accumulates values directly into typed Arrow buildersReadStatMetadata— File-level metadata (row/var counts, encoding, compression, schema)ReadStatWriter— Writes Arrow batches to the requested output formatReadStatPath— Validated input file pathWriteConfig— Output configuration (path, format, compression)
For the full architecture overview, see docs/ARCHITECTURE.md.