pub struct ReadStatData {Show 16 fields
pub var_count: i32,
pub vars: Arc<BTreeMap<i32, ReadStatVarMetadata>>,
pub(crate) builders: Vec<ColumnBuilder>,
pub schema: Arc<Schema>,
pub batch: Option<RecordBatch>,
pub chunk_rows_to_process: usize,
pub chunk_row_start: usize,
pub chunk_row_end: usize,
pub chunk_rows_processed: usize,
pub total_rows_to_process: usize,
pub total_rows_processed: Option<Arc<AtomicUsize>>,
pub progress: Option<Arc<dyn ProgressCallback>>,
pub no_progress: bool,
pub errors: Vec<String>,
pub column_filter: Option<Arc<BTreeMap<i32, i32>>>,
pub total_var_count: i32,
}Expand description
Holds parsed row data from a .sas7bdat file and converts it to Arrow format.
Each instance processes one streaming chunk of rows. Values are appended
directly into typed Arrow ColumnBuilders during the handle_value
callback, then finished into an Arrow [RecordBatch] via cols_to_batch.
Fields§
§var_count: i32Number of variables (columns) in the dataset.
vars: Arc<BTreeMap<i32, ReadStatVarMetadata>>Per-variable metadata, keyed by variable index.
Wrapped in Arc so parallel chunks share the same metadata without deep cloning.
builders: Vec<ColumnBuilder>Typed Arrow builders — one per variable, pre-sized with capacity hints.
schema: Arc<Schema>Arrow schema for the dataset.
Wrapped in Arc for cheap sharing across parallel chunks.
batch: Option<RecordBatch>The Arrow RecordBatch produced after parsing, if available.
chunk_rows_to_process: usizeNumber of rows to process in this chunk.
chunk_row_start: usizeStarting row offset for this chunk.
chunk_row_end: usizeEnding row offset (exclusive) for this chunk.
chunk_rows_processed: usizeNumber of rows actually processed so far in this chunk.
total_rows_to_process: usizeTotal rows to process across all chunks.
total_rows_processed: Option<Arc<AtomicUsize>>Shared atomic counter of total rows processed across all chunks.
progress: Option<Arc<dyn ProgressCallback>>Optional progress callback for visual feedback during parsing.
no_progress: boolWhether progress display is disabled.
errors: Vec<String>Errors collected during value parsing callbacks.
column_filter: Option<Arc<BTreeMap<i32, i32>>>Optional mapping: original var index -> filtered column index.
Wrapped in Arc so parallel chunks share the same filter without deep cloning.
total_var_count: i32Total variable count in the unfiltered dataset.
Used for row-boundary detection in handle_value when filtering is active.
Defaults to var_count when no filter is set.
Implementations§
Source§impl ReadStatData
impl ReadStatData
Sourcepub fn allocate_builders(self) -> Self
pub fn allocate_builders(self) -> Self
Allocates typed Arrow builders with capacity for chunk_rows_to_process.
Each builder’s type is determined by the variable metadata. String builders
are additionally pre-sized with storage_width * chunk_rows bytes.
Sourcepub(crate) fn cols_to_batch(&mut self) -> Result<(), ReadStatError>
pub(crate) fn cols_to_batch(&mut self) -> Result<(), ReadStatError>
Finishes all builders and assembles the Arrow [RecordBatch].
Each builder produces its final array via finish(), which is an O(1)
operation (no data copying). The heavy work was already done during
handle_value when values were appended directly into the builders.
Sourcepub fn read_data(&mut self, rsp: &ReadStatPath) -> Result<(), ReadStatError>
pub fn read_data(&mut self, rsp: &ReadStatPath) -> Result<(), ReadStatError>
Parses row data from the file and converts it to an Arrow [RecordBatch].
§Errors
Returns ReadStatError if FFI parsing or Arrow conversion fails.
Sourcepub fn read_data_from_bytes(
&mut self,
bytes: &[u8],
) -> Result<(), ReadStatError>
pub fn read_data_from_bytes( &mut self, bytes: &[u8], ) -> Result<(), ReadStatError>
Parses row data from an in-memory byte slice and converts it to an Arrow [RecordBatch].
Equivalent to read_data but reads from a &[u8]
buffer instead of a file path.
§Errors
Returns ReadStatError if FFI parsing or Arrow conversion fails.
Sourcepub fn read_data_from_mmap(&mut self, path: &Path) -> Result<(), ReadStatError>
pub fn read_data_from_mmap(&mut self, path: &Path) -> Result<(), ReadStatError>
Parses row data from a memory-mapped .sas7bdat file and converts it to an Arrow [RecordBatch].
Opens the file at path and memory-maps it, avoiding explicit read syscalls.
Especially beneficial for large files and repeated chunk reads against the
same file, as the OS manages page caching automatically.
§Safety
Memory mapping is safe as long as the file is not modified or truncated by another process while the map is active.
§Errors
Returns ReadStatError if the file cannot be opened, mapped, or parsed.
Sourcepub(crate) fn parse_data(
&mut self,
rsp: &ReadStatPath,
) -> Result<(), ReadStatError>
pub(crate) fn parse_data( &mut self, rsp: &ReadStatPath, ) -> Result<(), ReadStatError>
Parses row data from the file via FFI callbacks (without Arrow conversion).
fn parse_data_from_bytes(&mut self, bytes: &[u8]) -> Result<(), ReadStatError>
Sourcepub fn init(self, md: ReadStatMetadata, row_start: u32, row_end: u32) -> Self
pub fn init(self, md: ReadStatMetadata, row_start: u32, row_end: u32) -> Self
Initializes this instance with metadata and chunk boundaries, allocating builders.
Wraps vars and schema in Arc internally. For the parallel read path,
prefer init_shared which accepts pre-wrapped
Arcs to avoid repeated deep clones.
Initializes this instance with pre-shared metadata and chunk boundaries.
Accepts Arc-wrapped vars and schema for cheap cloning in parallel loops.
Each call only increments reference counts (atomic +1) instead of deep-cloning
the entire metadata tree.
fn set_chunk_counts(self, row_start: u32, row_end: u32) -> Self
fn set_metadata(self, md: ReadStatMetadata) -> Self
Sourcepub fn set_no_progress(self, no_progress: bool) -> Self
pub fn set_no_progress(self, no_progress: bool) -> Self
Disables or enables the progress bar display.
Sourcepub fn set_total_rows_to_process(self, total_rows_to_process: usize) -> Self
pub fn set_total_rows_to_process(self, total_rows_to_process: usize) -> Self
Sets the total number of rows to process across all chunks.
Sourcepub fn set_total_rows_processed(
self,
total_rows_processed: Arc<AtomicUsize>,
) -> Self
pub fn set_total_rows_processed( self, total_rows_processed: Arc<AtomicUsize>, ) -> Self
Sets the shared atomic counter for tracking rows processed across chunks.
Sourcepub fn set_column_filter(
self,
filter: Option<Arc<BTreeMap<i32, i32>>>,
total_var_count: i32,
) -> Self
pub fn set_column_filter( self, filter: Option<Arc<BTreeMap<i32, i32>>>, total_var_count: i32, ) -> Self
Sets the column filter and original (unfiltered) variable count.
Accepts an Arc-wrapped filter for cheap sharing across parallel chunks.
Must be called before init so that
total_var_count is preserved when set_metadata runs.
Sourcepub fn set_progress(self, progress: Arc<dyn ProgressCallback>) -> Self
pub fn set_progress(self, progress: Arc<dyn ProgressCallback>) -> Self
Attaches a progress callback for feedback during parsing.
The callback receives progress increments and parsing status updates.
See ProgressCallback for the required interface.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for ReadStatData
impl !RefUnwindSafe for ReadStatData
impl Send for ReadStatData
impl Sync for ReadStatData
impl Unpin for ReadStatData
impl !UnwindSafe for ReadStatData
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more