Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Technical Details

Floating Point Values

⚠️ Decimal values are rounded to contain only 14 decimal digits!

For example, the number 1.1234567890123456 created within SAS would be returned as 1.12345678901235 within Rust.

Why does this happen? Is this an implementation error? No, rounding to only 14 decimal digits has been purposely implemented within the Rust code.

As a specific example, when testing with the cars.sas7bdat dataset (which was created originally on Windows), the numeric value 4.6 as observed within SAS was being returned as 4.600000000000001 (15 digits) within Rust. Values created on Windows with an x64 processor are only accurate to 15 digits.

For comparison, the ReadStat binary truncates to 14 decimal places when writing to csv.

Finally, SAS represents all numeric values in floating-point representation which creates a challenge for all parsed numerics!

Implementation: pure-arithmetic rounding

Rounding is performed using pure f64 arithmetic in cb.rs, avoiding any string formatting or heap allocation:

#![allow(unused)]
fn main() {
const ROUND_SCALE: f64 = 1e14;

fn round_decimal_f64(v: f64) -> f64 {
    if !v.is_finite() { return v; }
    let int_part = v.trunc();
    let frac_part = v.fract();
    let rounded_frac = (frac_part * ROUND_SCALE).round() / ROUND_SCALE;
    int_part + rounded_frac
}
}

The value is split into integer and fractional parts before scaling. This is necessary because large SAS datetime values (~1.9e9) multiplied directly by 1e14 would exceed f64’s exact integer range (2^53), causing precision loss. Since fract() is always in (-1, 1), fract() * 1e14 < 1e14 < 2^53, keeping the scaled value within the exact-integer range.

Why this is equivalent to the previous string roundtrip (format!("{:.14}") + lexical::parse): both approaches produce the nearest representable f64 to the value rounded to 14 decimal places. The tie-breaking rule (half-away-from-zero for .round() vs half-to-even for format!) is never exercised because every f64 is a dyadic rational (m / 2^k), and a true decimal midpoint would require an odd factor of 5 in the denominator — which is impossible for any f64 value.

Sources

Date, Time, and Datetimes

All 118 SAS date, time, and datetime formats are recognized and parsed appropriately. For the full list of supported formats, see sas_date_time_formats.md.

⚠️ If the format does not match a recognized SAS date, time, or datetime format, or if the value does not have a format applied, then the value will be parsed and read as a numeric value!

Details

SAS stores dates, times, and datetimes internally as numeric values. To distinguish among dates, times, datetimes, or numeric values, a SAS format is read from the variable metadata. If the format matches a recognized SAS date, time, or datetime format then the numeric value is converted and read into memory using one of the Arrow types:

If values are read into memory as Arrow date, time, or datetime types, then when they are written — from an Arrow RecordBatch to csv, feather, ndjson, or parquet — they are treated as dates, times, or datetimes and not as numeric values.

Column Metadata in Arrow and Parquet

When converting to Parquet or Feather, readstat-rs persists column-level and table-level metadata into the Arrow schema. This metadata survives round-trips through Parquet and Feather files, allowing downstream consumers to recover SAS-specific information.

Metadata keys

Field (column) metadata

KeyTypeDescriptionSource formats
labelstringUser-assigned variable labelSAS, SPSS, Stata
sas_formatstringSAS format string (e.g. DATE9, BEST12, $30)SAS
storage_widthinteger (as string)Number of bytes used to store the variable valueAll
display_widthinteger (as string)Display width hint from the fileXPORT, SPSS

Schema (table) metadata

KeyTypeDescription
table_labelstringUser-assigned file label

Storage width semantics

  • SAS numeric variables: always 8 bytes (IEEE 754 double-precision)
  • SAS string variables: equal to the declared character length (e.g. $30 → 30 bytes)
  • The storage_width field is always present in metadata

Display width semantics

  • sas7bdat files: typically 0 (not stored in the format)
  • XPORT files: populated from the format width
  • SPSS files: populated from the variable’s print/write format
  • The display_width field is only present in metadata when non-zero

SAS format strings and Arrow types

The SAS format string (e.g. DATE9, DATETIME22.3, TIME8) determines how a numeric variable is mapped to an Arrow type. The original format string is preserved in the sas_format metadata key, allowing downstream tools to reconstruct the original SAS formatting even after conversion.

For the full list of recognized SAS date, time, and datetime formats, see sas_date_time_formats.md.

Reading metadata from output files

See the Reading Metadata from Output Files section in the Usage guide for Python and R examples.