Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

API Server Demo

Two identical API servers demonstrating how to integrate readstat into backend applications:

  • Rust server (Axum) — direct library integration
  • Python server (FastAPI) — cross-language integration via PyO3/maturin bindings

Both servers expose the same endpoints and return identical results for the same input.

Prerequisites

Rust server:

  • Rust toolchain
  • libclang (for readstat-sys bindgen)
  • Git submodules initialized: git submodule update --init --recursive

Python server:

  • Everything above, plus:
  • uv (Python package manager)
  • Python 3.9+

Quick Start

Rust Server (port 3000)

cd examples/api-demo/rust-server
cargo run

You should see:

Rust API server listening on http://localhost:3000

Python Server (port 3001)

cd examples/api-demo/python-server

# Build the PyO3 bindings into the project venv
uv sync
uv run maturin develop -m readstat_py/Cargo.toml

# Start the server
uv run uvicorn server:app --port 3001

You should see:

INFO:     Started server process [...]
INFO:     Uvicorn running on http://127.0.0.1:3001 (Press CTRL+C to quit)

Walking Through the Endpoints

The examples below use port 3000 (Rust server). Replace with 3001 for the Python server — the responses are identical.

Set a convenience variable for the test file:

FILE=test-data/cars.sas7bdat

1. Health Check

curl http://localhost:3000/health

Expected output:

{"status":"ok"}

2. File Metadata

Upload a SAS file and get back its metadata as JSON:

curl -F "file=@$FILE" http://localhost:3000/metadata

Expected output (formatted):

{
  "row_count": 1081,
  "var_count": 13,
  "table_name": "CARS",
  "file_label": "Written by SAS",
  "file_encoding": "WINDOWS-1252",
  "version": 9,
  "is64bit": 0,
  "creation_time": "2008-09-30 12:55:01",
  "modified_time": "2008-09-30 12:55:01",
  "compression": "None",
  "endianness": "Little",
  "vars": {
    "0": {
      "var_name": "Brand",
      "var_type": "String",
      "var_type_class": "String",
      "var_label": "",
      "var_format": "",
      "var_format_class": null,
      "storage_width": 13,
      "display_width": 0
    },
    "1": {
      "var_name": "Model",
      "var_type": "String",
      "var_type_class": "String",
      ...
    },
    ...
  }
}

The vars map is keyed by column index and includes type info, labels, and SAS format metadata for all 13 variables.

3. Preview Rows

Get the first N rows as CSV (default 10, here we ask for 5):

curl -F "file=@$FILE" "http://localhost:3000/preview?rows=5"

Expected output:

Brand,Model,Minivan,Wagon,Pickup,Automatic,EngineSize,Cylinders,CityMPG,HwyMPG,SUV,AWD,Hybrid
TOYOTA,Prius,0.0,0.0,0.0,1.0,1.5,4.0,60.0,51.0,0.0,0.0,1.0
HONDA,Civic Hybrid,0.0,0.0,0.0,1.0,1.3,4.0,48.0,47.0,0.0,0.0,1.0
HONDA,Civic Hybrid,0.0,0.0,0.0,1.0,1.3,4.0,47.0,48.0,0.0,0.0,1.0
HONDA,Civic Hybrid,0.0,0.0,0.0,0.0,1.3,4.0,46.0,51.0,0.0,0.0,1.0
HONDA,Civic Hybrid,0.0,0.0,0.0,0.0,1.3,4.0,45.0,51.0,0.0,0.0,1.0

4. Convert to CSV

Export the full dataset (all 1,081 rows) as CSV:

curl -F "file=@$FILE" "http://localhost:3000/data?format=csv" -o output.csv

The response has Content-Type: text/csv and Content-Disposition: attachment; filename="data.csv".

5. Convert to NDJSON

Export as newline-delimited JSON (one JSON object per row):

curl -F "file=@$FILE" "http://localhost:3000/data?format=ndjson"

Expected output (first few lines):

{"Brand":"TOYOTA","Model":"Prius","Minivan":0.0,"Wagon":0.0,"Pickup":0.0,"Automatic":1.0,"EngineSize":1.5,"Cylinders":4.0,"CityMPG":60.0,"HwyMPG":51.0,"SUV":0.0,"AWD":0.0,"Hybrid":1.0}
{"Brand":"HONDA","Model":"Civic Hybrid","Minivan":0.0,"Wagon":0.0,"Pickup":0.0,"Automatic":1.0,"EngineSize":1.3,"Cylinders":4.0,"CityMPG":48.0,"HwyMPG":47.0,"SUV":0.0,"AWD":0.0,"Hybrid":1.0}
{"Brand":"HONDA","Model":"Civic Hybrid","Minivan":0.0,"Wagon":0.0,"Pickup":0.0,"Automatic":1.0,"EngineSize":1.3,"Cylinders":4.0,"CityMPG":47.0,"HwyMPG":48.0,"SUV":0.0,"AWD":0.0,"Hybrid":1.0}
...

The response has Content-Type: application/x-ndjson.

6. Convert to Parquet

Export as Apache Parquet (binary, Snappy-compressed):

curl -F "file=@$FILE" "http://localhost:3000/data?format=parquet" -o output.parquet

This produces a ~15 KB Parquet file. You can inspect it with tools like parquet-tools, DuckDB, or pandas:

import pandas as pd
print(pd.read_parquet("output.parquet").head())

7. Convert to Feather

Export as Arrow IPC (Feather v2) format:

curl -F "file=@$FILE" "http://localhost:3000/data?format=feather" -o output.feather

This produces a ~130 KB Feather file. Read it back with any Arrow-compatible tool:

import pandas as pd
print(pd.read_feather("output.feather").head())

Automated Test Scripts

Both scripts work against either server — just change the URL.

Shell script (curl)

cd examples/api-demo
bash client/test_api.sh http://localhost:3000 test-data/cars.sas7bdat
bash client/test_api.sh http://localhost:3001 test-data/cars.sas7bdat

Python script (httpx)

Uses PEP 723 inline script metadata, so uv run handles dependencies automatically — no virtual environment setup needed:

cd examples/api-demo/client
uv run test_api.py http://localhost:3000 ../test-data/cars.sas7bdat
uv run test_api.py http://localhost:3001 ../test-data/cars.sas7bdat

Expected output:

=== Testing http://localhost:3000 with ../test-data/cars.sas7bdat ===

--- GET /health ---
{'status': 'ok'}

--- POST /metadata ---
  row_count: 1081
  var_count: 13
  table_name: CARS
  encoding: WINDOWS-1252
  variables: 13

--- POST /preview (5 rows) ---
  Brand,Model,Minivan,Wagon,Pickup,Automatic,EngineSize,Cylinders,CityMPG,HwyMPG,SUV,AWD,Hybrid
  TOYOTA,Prius,0.0,0.0,0.0,1.0,1.5,4.0,60.0,51.0,0.0,0.0,1.0
  ...

--- POST /data?format=csv ---
  Brand,Model,Minivan,Wagon,Pickup,Automatic,EngineSize,Cylinders,CityMPG,HwyMPG,SUV,AWD,Hybrid
  TOYOTA,Prius,0.0,0.0,0.0,1.0,1.5,4.0,60.0,51.0,0.0,0.0,1.0
  HONDA,Civic Hybrid,0.0,0.0,0.0,1.0,1.3,4.0,48.0,47.0,0.0,0.0,1.0

--- POST /data?format=ndjson ---
  {"Brand":"TOYOTA","Model":"Prius","Minivan":0.0,...}
  ...

--- POST /data?format=parquet ---
  15403 bytes

--- POST /data?format=feather ---
  129650 bytes

=== All tests passed ===

API Reference

MethodPathRequestResponseContent-Type
GET/health{"status": "ok"}application/json
POST/metadatamultipart fileJSON metadataapplication/json
POST/preview?rows=Nmultipart fileCSV text (first N rows, default 10)text/csv
POST/data?format=csvmultipart fileFull dataset as CSVtext/csv
POST/data?format=ndjsonmultipart fileFull dataset as NDJSONapplication/x-ndjson
POST/data?format=parquetmultipart fileFull dataset as Parquetapplication/octet-stream
POST/data?format=feathermultipart fileFull dataset as Featherapplication/octet-stream

The multipart field name must be file. Binary formats include a Content-Disposition header with a suggested filename.

How It Works

Rust Server

HTTP upload → Axum multipart extraction → Vec<u8>
  → spawn_blocking {
      ReadStatMetadata::read_metadata_from_bytes()
      ReadStatData::read_data_from_bytes() → Arrow RecordBatch
      write_batch_to_{csv,ndjson,parquet,feather}_bytes()
    }
  → HTTP response

All ReadStat C library FFI calls run inside spawn_blocking to avoid blocking the tokio async runtime.

Python Server

HTTP upload → FastAPI UploadFile → bytes
  → readstat_py.read_to_{csv,ndjson,parquet,feather}(bytes)
    → [PyO3 boundary]
      → ReadStatMetadata::read_metadata_from_bytes()
      → ReadStatData::read_data_from_bytes() → Arrow RecordBatch
      → write_batch_to_*_bytes()
    → [back to Python]
  → HTTP response

The PyO3 binding layer is intentionally thin — 5 functions that take &[u8] and return Vec<u8> (or String for metadata). No complex types cross the FFI boundary.