ReadStatData

Struct ReadStatData 

Source
pub struct ReadStatData {
Show 16 fields pub var_count: i32, pub vars: Arc<BTreeMap<i32, ReadStatVarMetadata>>, pub(crate) builders: Vec<ColumnBuilder>, pub schema: Arc<Schema>, pub batch: Option<RecordBatch>, pub chunk_rows_to_process: usize, pub chunk_row_start: usize, pub chunk_row_end: usize, pub chunk_rows_processed: usize, pub total_rows_to_process: usize, pub total_rows_processed: Option<Arc<AtomicUsize>>, pub progress: Option<Arc<dyn ProgressCallback>>, pub no_progress: bool, pub errors: Vec<String>, pub column_filter: Option<Arc<BTreeMap<i32, i32>>>, pub total_var_count: i32,
}
Expand description

Holds parsed row data from a .sas7bdat file and converts it to Arrow format.

Each instance processes one streaming chunk of rows. Values are appended directly into typed Arrow ColumnBuilders during the handle_value callback, then finished into an Arrow [RecordBatch] via cols_to_batch.

Fields§

§var_count: i32

Number of variables (columns) in the dataset.

§vars: Arc<BTreeMap<i32, ReadStatVarMetadata>>

Per-variable metadata, keyed by variable index. Wrapped in Arc so parallel chunks share the same metadata without deep cloning.

§builders: Vec<ColumnBuilder>

Typed Arrow builders — one per variable, pre-sized with capacity hints.

§schema: Arc<Schema>

Arrow schema for the dataset. Wrapped in Arc for cheap sharing across parallel chunks.

§batch: Option<RecordBatch>

The Arrow RecordBatch produced after parsing, if available.

§chunk_rows_to_process: usize

Number of rows to process in this chunk.

§chunk_row_start: usize

Starting row offset for this chunk.

§chunk_row_end: usize

Ending row offset (exclusive) for this chunk.

§chunk_rows_processed: usize

Number of rows actually processed so far in this chunk.

§total_rows_to_process: usize

Total rows to process across all chunks.

§total_rows_processed: Option<Arc<AtomicUsize>>

Shared atomic counter of total rows processed across all chunks.

§progress: Option<Arc<dyn ProgressCallback>>

Optional progress callback for visual feedback during parsing.

§no_progress: bool

Whether progress display is disabled.

§errors: Vec<String>

Errors collected during value parsing callbacks.

§column_filter: Option<Arc<BTreeMap<i32, i32>>>

Optional mapping: original var index -> filtered column index. Wrapped in Arc so parallel chunks share the same filter without deep cloning.

§total_var_count: i32

Total variable count in the unfiltered dataset. Used for row-boundary detection in handle_value when filtering is active. Defaults to var_count when no filter is set.

Implementations§

Source§

impl ReadStatData

Source

pub fn new() -> Self

Creates a new ReadStatData with default (empty) values.

Source

pub fn allocate_builders(self) -> Self

Allocates typed Arrow builders with capacity for chunk_rows_to_process.

Each builder’s type is determined by the variable metadata. String builders are additionally pre-sized with storage_width * chunk_rows bytes.

Source

pub(crate) fn cols_to_batch(&mut self) -> Result<(), ReadStatError>

Finishes all builders and assembles the Arrow [RecordBatch].

Each builder produces its final array via finish(), which is an O(1) operation (no data copying). The heavy work was already done during handle_value when values were appended directly into the builders.

Source

pub fn read_data(&mut self, rsp: &ReadStatPath) -> Result<(), ReadStatError>

Parses row data from the file and converts it to an Arrow [RecordBatch].

§Errors

Returns ReadStatError if FFI parsing or Arrow conversion fails.

Source

pub fn read_data_from_bytes( &mut self, bytes: &[u8], ) -> Result<(), ReadStatError>

Parses row data from an in-memory byte slice and converts it to an Arrow [RecordBatch].

Equivalent to read_data but reads from a &[u8] buffer instead of a file path.

§Errors

Returns ReadStatError if FFI parsing or Arrow conversion fails.

Source

pub fn read_data_from_mmap(&mut self, path: &Path) -> Result<(), ReadStatError>

Parses row data from a memory-mapped .sas7bdat file and converts it to an Arrow [RecordBatch].

Opens the file at path and memory-maps it, avoiding explicit read syscalls. Especially beneficial for large files and repeated chunk reads against the same file, as the OS manages page caching automatically.

§Safety

Memory mapping is safe as long as the file is not modified or truncated by another process while the map is active.

§Errors

Returns ReadStatError if the file cannot be opened, mapped, or parsed.

Source

pub(crate) fn parse_data( &mut self, rsp: &ReadStatPath, ) -> Result<(), ReadStatError>

Parses row data from the file via FFI callbacks (without Arrow conversion).

Source

fn parse_data_from_bytes(&mut self, bytes: &[u8]) -> Result<(), ReadStatError>

Source

pub fn init(self, md: ReadStatMetadata, row_start: u32, row_end: u32) -> Self

Initializes this instance with metadata and chunk boundaries, allocating builders.

Wraps vars and schema in Arc internally. For the parallel read path, prefer init_shared which accepts pre-wrapped Arcs to avoid repeated deep clones.

Source

pub fn init_shared( self, var_count: i32, vars: Arc<BTreeMap<i32, ReadStatVarMetadata>>, schema: Arc<Schema>, row_start: u32, row_end: u32, ) -> Self

Initializes this instance with pre-shared metadata and chunk boundaries.

Accepts Arc-wrapped vars and schema for cheap cloning in parallel loops. Each call only increments reference counts (atomic +1) instead of deep-cloning the entire metadata tree.

Source

fn set_chunk_counts(self, row_start: u32, row_end: u32) -> Self

Source

fn set_metadata(self, md: ReadStatMetadata) -> Self

Source

pub fn set_no_progress(self, no_progress: bool) -> Self

Disables or enables the progress bar display.

Source

pub fn set_total_rows_to_process(self, total_rows_to_process: usize) -> Self

Sets the total number of rows to process across all chunks.

Source

pub fn set_total_rows_processed( self, total_rows_processed: Arc<AtomicUsize>, ) -> Self

Sets the shared atomic counter for tracking rows processed across chunks.

Source

pub fn set_column_filter( self, filter: Option<Arc<BTreeMap<i32, i32>>>, total_var_count: i32, ) -> Self

Sets the column filter and original (unfiltered) variable count.

Accepts an Arc-wrapped filter for cheap sharing across parallel chunks. Must be called before init so that total_var_count is preserved when set_metadata runs.

Source

pub fn set_progress(self, progress: Arc<dyn ProgressCallback>) -> Self

Attaches a progress callback for feedback during parsing.

The callback receives progress increments and parsing status updates. See ProgressCallback for the required interface.

Trait Implementations§

Source§

impl Default for ReadStatData

Source§

fn default() -> Self

Returns the “default value” for a type. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
§

impl<T> Pointable for T

§

const ALIGN: usize

The alignment of pointer.
§

type Init = T

The type for initializers.
§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.