Data Handling Module¶

The data_handling module provides atomic data caching and batch processing capabilities.

Atomic Data Cache¶

High-performance atomic data cache system.

This module provides a pre-populated cache of atomic data for common elements to eliminate expensive database queries to the Mendeleev library during runtime.

xraylabtool.data_handling.atomic_cache.get_atomic_data_fast(element)[source]¶

Fast atomic data lookup with preloaded cache and fallback to Mendeleev.

This function first checks the preloaded cache, then the runtime cache, and only falls back to expensive Mendeleev queries as a last resort.

Parameters:: element (str) – Element symbol (e.g., ‘H’, ‘C’, ‘Si’)
Return type:: MappingProxyType[str, float]
Returns:: Dictionary with ‘atomic_number’ and ‘atomic_weight’ keys
Raises:: ValueError – If element symbol is not recognized

xraylabtool.data_handling.atomic_cache.get_bulk_atomic_data_fast(elements_tuple)[source]¶

High-performance bulk atomic data loader with caching.

This function loads atomic data for multiple elements efficiently, using the preloaded cache to avoid expensive database queries.

Parameters:: elements_tuple (tuple[str, ...]) – Tuple of element symbols
Return type:: dict[str, MappingProxyType[str, float]]
Returns:: Dictionary mapping element symbols to their atomic data (as immutable views)

xraylabtool.data_handling.atomic_cache.warm_up_cache(elements)[source]¶

Pre-warm the cache with specific elements.

Parameters:: elements (list[str]) – List of element symbols to preload
Return type:: None

xraylabtool.data_handling.atomic_cache.warm_cache_for_compounds(formulas, include_similar=True, include_family=True, timing_info=False)[source]¶

Intelligently warm cache for compounds and their related elements.

This function performs intelligent cache warming by analyzing compound formulas, extracting their constituent elements, and pre-loading both atomic data and scattering factor interpolators. It can also include similar compounds and compound families for comprehensive warming.

Parameters:

formulas (list[str]) – List of chemical formulas to warm cache for
include_similar (bool) – Whether to include similar compounds
include_family (bool) – Whether to include compound family members
timing_info (bool) – Whether to return timing information

Return type:

dict[str, Any]

Returns:

Dictionary with warming results and statistics

Examples

>>> result = warm_cache_for_compounds(["SiO2", "Al2O3"])
>>> result["elements_warmed"]
['Si', 'O', 'Al']
>>> result["success_rate"] > 0.9
True

xraylabtool.data_handling.atomic_cache.get_cache_stats()[source]¶

Get cache statistics for monitoring.

Return type:: dict[str, int]
Returns:: Dictionary with cache statistics

xraylabtool.data_handling.atomic_cache.is_element_preloaded(element)[source]¶

Check if an element is in the preloaded cache.

Parameters:: element (str) – Element symbol
Return type:: bool
Returns:: True if element is preloaded, False otherwise

class xraylabtool.data_handling.atomic_cache.FastAtomicDataProvider[source]¶

Bases: object

High-performance atomic data provider implementing AtomicDataProvider protocol.

This implementation uses preloaded atomic data and interpolated scattering factors for maximum performance in X-ray calculations.

__init__()[source]¶

Initialize the atomic data provider.

Return type:: None

get_scattering_factors(element, energies)[source]¶

Get atomic scattering factors for element at given energies.

This method loads scattering factor data and interpolates it to the requested energies, returning complex scattering factors (f1 + if2).

Parameters:

element (str) – Chemical element symbol (e.g., ‘Si’, ‘O’)
energies (ndarray[tuple[Any, ...], dtype[double]]) – X-ray energies in keV

Returns:

Complex scattering factors (f1 + if2)

Return type:

ndarray[tuple[Any, ...], dtype[cdouble]]

is_element_cached(element)[source]¶

Check if element data is cached for fast access.

Parameters:: element (str) – Element symbol to check
Returns:: True if element is cached for fast access
Return type:: bool

preload_elements(elements)[source]¶

Preload scattering factor data for elements.

Parameters:: elements (list[str]) – List of element symbols to preload
Return type:: None

get_atomic_properties(element)[source]¶

Get basic atomic properties for an element.

Parameters:: element (str) – Element symbol
Returns:: Immutable mapping with atomic properties
Return type:: MappingProxyType[str, float]

xraylabtool.data_handling.atomic_cache.get_atomic_data_provider()[source]¶

Get the global atomic data provider instance.

Returns:: Shared atomic data provider instance
Return type:: FastAtomicDataProvider

Performance Features¶

The atomic data cache provides several performance optimizations:

Preloaded Common Elements: 92 elements are preloaded at startup
LRU Caching: Least Recently Used cache for computed scattering factors
Vectorized Operations: NumPy-based calculations for energy arrays
Memory Management: Efficient data structures and automatic cleanup

Usage Example¶

from xraylabtool.data_handling.atomic_cache import get_atomic_scattering_factors

# Get scattering factors for silicon at 8 keV
f1, f2 = get_atomic_scattering_factors("Si", 8000)

print(f"f1 (real): {f1}")
print(f"f2 (imaginary): {f2}")

Cache Statistics¶

from xraylabtool.data_handling.atomic_cache import get_cache_info

stats = get_cache_info()
print(f"Cache hits: {stats['hits']}")
print(f"Cache misses: {stats['misses']}")
print(f"Cache size: {stats['current_size']}")

Batch Processing¶

High-performance batch processing module for X-ray calculations.

This module provides optimized batch processing capabilities with memory management, parallel execution, and progress tracking for large-scale X-ray property calculations.

class xraylabtool.data_handling.batch_processing.BatchConfig(max_workers=None, chunk_size=100, memory_limit_gb=4.0, enable_progress=True, cache_results=False)[source]¶

Bases: object

Configuration for batch processing operations.

Parameters:

max_workers (int | None) – Maximum number of parallel workers (default: auto-detect)
chunk_size (int) – Number of calculations per chunk (default: 100)
memory_limit_gb (float) – Memory limit in GB before forcing garbage collection
enable_progress (bool) – Whether to show progress bars
cache_results (bool) – Whether to cache intermediate results

max_workers: int | None = None¶

chunk_size: int = 100¶

memory_limit_gb: float = 4.0¶

enable_progress: bool = True¶

cache_results: bool = False¶

__post_init__()[source]¶

Initialize the configuration after object creation.

Return type:: None

__init__(max_workers=None, chunk_size=100, memory_limit_gb=4.0, enable_progress=True, cache_results=False)¶

Parameters:

max_workers (int | None)
chunk_size (int)
memory_limit_gb (float)
enable_progress (bool)
cache_results (bool)

Return type:

None

class xraylabtool.data_handling.batch_processing.MemoryMonitor(limit_gb=4.0)[source]¶

Bases: object

Memory usage monitor for batch operations.

Parameters:: limit_gb (float)

__init__(limit_gb=4.0)[source]¶

Initialize the memory monitor.

Parameters:: limit_gb (float)

check_memory()[source]¶

Check if memory usage is below limit.

Return type:: bool
Returns:: True if within limits, False if exceeded

get_memory_usage_mb()[source]¶

Get current memory usage in MB.

Return type:: float
Returns:: Memory usage in megabytes

force_gc()[source]¶

Force garbage collection and clear caches to free memory.

Return type:: None

xraylabtool.data_handling.batch_processing.chunk_iterator(data, chunk_size)[source]¶

Yield successive chunks of data.

Parameters:

data (list[tuple[Any, ...]]) – List of data tuples to chunk
chunk_size (int) – Size of each chunk

Yields:

Lists of data tuples of specified chunk size

Return type:

Iterator[list[tuple[Any, …]]]

xraylabtool.data_handling.batch_processing.process_single_calculation(formula, energies, density)[source]¶

Process a single X-ray calculation.

Parameters:

formula (str) – Chemical formula
energies (ndarray) – Energy array
density (float) – Material density

Return type:

tuple[str, XRayResult | None]

Returns:

Tuple of (formula, XRayResult)

xraylabtool.data_handling.batch_processing.process_batch_chunk(chunk, config)[source]¶

Process a chunk of calculations in parallel.

Parameters:

chunk (list[tuple[str, ndarray, float]]) – List of (formula, energies, density) tuples
config (BatchConfig) – Batch processing configuration

Return type:

list[tuple[str, XRayResult | None]]

Returns:

List of (formula, result) tuples

xraylabtool.data_handling.batch_processing.calculate_batch_properties(formulas, energies, densities, config=None)[source]¶

Calculate X-ray properties for multiple materials with optimized batch processing.

This function processes large batches of calculations efficiently using chunking, parallel processing, and memory management.

Parameters:

formulas (list[str]) – List of chemical formulas
energies (float | list[float] | ndarray) – Energy values (shared across all materials)
densities (list[float]) – List of material densities
config (BatchConfig | None) – Batch processing configuration (optional)

Return type:

dict[str, XRayResult | None]

Returns:

Dictionary mapping formulas to XRayResult objects

Raises:

ValueError – If input validation fails

Examples

>>> import numpy as np
>>> from xraylabtool.data_handling.batch_processing import calculate_batch_properties
>>> formulas = ["SiO2", "SiO2", "Al2O3"]  # Same formula with different densities
>>> energies = np.linspace(5, 15, 101)  # 101 energy points
>>> densities = [2.2, 2.5, 3.95]  # Different densities for SiO2
>>> results = calculate_batch_properties(formulas, energies, densities)
>>> print(f"Processed {len(results)} materials")
Processed 3 materials

xraylabtool.data_handling.batch_processing.save_batch_results(results, output_file, format='csv', fields=None)[source]¶

Save batch calculation results to file.

Parameters:

results (dict[str, XRayResult | None]) – Dictionary of calculation results
output_file (str | Path) – Output file path
format (str) – Output format (‘csv’, ‘json’, ‘parquet’)
fields (list[str] | None) – List of fields to include (default: all)

Raises:

ValueError – If format is not supported
IOError – If file cannot be written

Return type:

None

xraylabtool.data_handling.batch_processing.load_batch_input(input_file, formula_column='formula', density_column='density', energy_column=None)[source]¶

Load batch input data from file.

Parameters:

input_file (str | Path) – Input file path
formula_column (str) – Name of formula column
density_column (str) – Name of density column
energy_column (str | None) – Name of energy column (optional, for per-material energies)

Return type:

tuple[list[str], list[float], list[ndarray] | None]

Returns:

Tuple of (formulas, densities, energies) where energies is either None or a list of numpy arrays

Raises:

FileNotFoundError – If input file doesn’t exist
ValueError – If required columns are missing

Batch Processing Features¶

Memory Management: Automatic chunking for large datasets
Progress Tracking: Built-in progress bars with tqdm
Error Handling: Reliable error recovery and reporting
Parallel Processing: Multi-core support for independent calculations

Usage Example¶

from xraylabtool.data_handling.batch_processing import process_batch

materials = [
    {"formula": "Si", "density": 2.33},
    {"formula": "Al", "density": 2.70},
    {"formula": "Cu", "density": 8.96}
]

energies = [5000, 8000, 10000, 12000]

results = process_batch(materials, energies, show_progress=True)

Performance Benchmarks¶

Typical performance characteristics:

Operation	Cold Cache	Warm Cache
Single element lookup	~0.5 ms	~0.05 ms
Complex formula (SiO₂)	~1.2 ms	~0.1 ms
Batch 1000 materials	~50 ms	~8 ms
Energy array (100 points)	~15 ms	~1.5 ms

Memory Usage¶

The atomic cache uses approximately:

Startup: ~10 MB for preloaded elements
Per element: ~50 KB for full energy range
Peak usage: Scales with number of unique elements used

Cache Management¶

from xraylabtool.data_handling.atomic_cache import clear_cache, preload_elements

# Clear all cached data
clear_cache()

# Preload specific elements for better performance
preload_elements(["Si", "O", "Al", "Fe"])

Data Sources¶

Atomic scattering factor data is sourced from:

CXRO Database: Center for X-ray Optics, Lawrence Berkeley National Laboratory
NIST Database: National Institute of Standards and Technology
Henke Tables: Widely used X-ray optical constants

The data files are in Henke format (.nff files) and cover the energy range from ~10 eV to ~100 keV with high precision interpolation between tabulated values.