readstat Cheatsheetv0.22.0

Convert SAS .sas7bdat files to CSV, Feather, NDJSON, or Parquet
Output: CSV, Feather, NDJSON, Parquet
cargo install readstat-cli

Getting Started

1

Install

cargo install readstat-cli

2

Inspect

readstat metadata file.sas7bdat

3

Preview

readstat preview file.sas7bdat

4

Convert

readstat data file.sas7bdat -o out.parquet -f parquet

Metadata

readstat metadata FILEDisplay metadata
--as-jsonOutput as JSON
--skip-row-countSkip row count (faster)
--no-progressHide progress bar
Includes: row/var counts, table name & label, encoding, format version, bitness, compression, byte order, variable names/types/labels/formats, Arrow types.

Preview

readstat preview FILEFirst 10 rows as CSV
--rows 100First N rows
--columns A,B,CSelect columns
--columns-file cols.txtColumns from file
--no-progressHide progress bar
Output: Always CSV to stdout. Pipe to | head, | column -t -s,, or redirect to a file.

Data Conversion

readstat data FILE -o OUTConvert (default: CSV)
-f csvCSV (default)
-f featherFeather / Arrow IPC
-f ndjsonNewline-delimited JSON
-f parquetApache Parquet
--rows 1000Limit rows
--overwriteOverwrite existing output
--columns A,B,CSelect columns
--columns-file cols.txtColumns from file

Parquet Compression

--compression snappyFast, moderate ratio
--compression zstdBest balance
--compression gzipWide compatibility
--compression brotliHigh ratio
--compression lz4-rawFastest decompression
--compression uncompressedNo compression
--compression-level NLevel (codec-specific)
Example: readstat data f.sas7bdat -o f.parquet -f parquet --compression zstd --compression-level 3

Parallelism

--parallelRead chunks in parallel
--parallel-writeWrite in parallel (Parquet)
--parallel-write-buffer-mb NBuffer before spill (default 100)
Note: --parallel increases memory (all chunks in memory). --parallel-write currently supports Parquet only. Row order is preserved.

Reader Modes

--reader streamDefault — chunked reads
--reader memRead all into memory
--stream-rows 10000Chunk size (default 10k)
stream keeps memory low for large files. mem is useful for benchmarking. Lower --stream-rows for wide/string-heavy datasets.

Column Selection

Discover columns
readstat metadata FILE --as-json | jq '.vars | to_entries[] | .value.var_name'
Select inline
--columns Brand,Model,EngineSize
Select from file
--columns-file columns.txt# comments, one per line
Works with: both preview and data subcommands.

Metadata Preservation

labelVariable label
sas_formatSAS format string
storage_widthStorage bytes
display_widthDisplay width hint
table_labelFile label (schema-level)
Parquet & Feather preserve SAS metadata as Arrow field metadata. Read with pyarrow.parquet.read_schema() or R's arrow::read_parquet().

Common Workflows

Quick data exploration
readstat metadata data.sas7bdatSchema overview
readstat metadata data.sas7bdat --as-json | jqProgrammatic metadata
readstat preview data.sas7bdat --rows 20Eyeball sample rows
readstat preview data.sas7bdat --columns Name,AgeSubset columns
Production conversion
readstat data big.sas7bdat -o big.parquet -f parquet --compression zstd
... --parallel --parallel-writeMax throughput
... --columns-file keep.txtOnly needed columns
... --rows 100000Partial extract

More Examples

web-demoBrowser-based viewer & converter (WASM)
sql-explorerUpload & query with SQL in-browser (WASM)
api-demoREST API servers (Rust/Axum + Python/FastAPI)
bun-demoRead .sas7bdat from JavaScript via WASM
cli-demoShell scripts for batch conversion

Debug & Help

RUST_LOG=debug readstat ...Verbose debug output
readstat --versionShow version
readstat --helpTop-level help
readstat metadata --helpSubcommand help
Warning: RUST_LOG=debug prints info for every single value — extremely verbose with preview or data!
Input: SAS .sas7bdat  |  Output: CSV, Feather, NDJSON, Parquet readstat-cli v0.22.0  |  Rust + ReadStat C library  |  Arrow v58  |  github.com/curtisalexander/readstat-rs