Data Handling Module

The data_handling module provides atomic data caching and batch processing capabilities.

Atomic Data Cache

High-performance atomic data cache system.

This module provides a pre-populated cache of atomic data for common elements to eliminate expensive database queries to the Mendeleev library during runtime.

xraylabtool.data_handling.atomic_cache.get_atomic_data_fast(element)[source]

Fast atomic data lookup with preloaded cache and fallback to Mendeleev.

This function first checks the preloaded cache, then the runtime cache, and only falls back to expensive Mendeleev queries as a last resort.

Parameters:

element (str) – Element symbol (e.g., ‘H’, ‘C’, ‘Si’)

Return type:

MappingProxyType[str, float]

Returns:

Dictionary with ‘atomic_number’ and ‘atomic_weight’ keys

Raises:

ValueError – If element symbol is not recognized

xraylabtool.data_handling.atomic_cache.get_bulk_atomic_data_fast(elements_tuple)[source]

High-performance bulk atomic data loader with caching.

This function loads atomic data for multiple elements efficiently, using the preloaded cache to avoid expensive database queries.

Parameters:

elements_tuple (tuple[str, ...]) – Tuple of element symbols

Return type:

dict[str, MappingProxyType[str, float]]

Returns:

Dictionary mapping element symbols to their atomic data (as immutable views)

xraylabtool.data_handling.atomic_cache.warm_up_cache(elements)[source]

Pre-warm the cache with specific elements.

Parameters:

elements (list[str]) – List of element symbols to preload

Return type:

None

xraylabtool.data_handling.atomic_cache.warm_cache_for_compounds(formulas, include_similar=True, include_family=True, timing_info=False)[source]

Intelligently warm cache for compounds and their related elements.

This function performs intelligent cache warming by analyzing compound formulas, extracting their constituent elements, and pre-loading both atomic data and scattering factor interpolators. It can also include similar compounds and compound families for comprehensive warming.

Parameters:
  • formulas (list[str]) – List of chemical formulas to warm cache for

  • include_similar (bool) – Whether to include similar compounds

  • include_family (bool) – Whether to include compound family members

  • timing_info (bool) – Whether to return timing information

Return type:

dict[str, Any]

Returns:

Dictionary with warming results and statistics

Examples

>>> result = warm_cache_for_compounds(["SiO2", "Al2O3"])
>>> result["elements_warmed"]
['Si', 'O', 'Al']
>>> result["success_rate"] > 0.9
True
xraylabtool.data_handling.atomic_cache.get_cache_stats()[source]

Get cache statistics for monitoring.

Return type:

dict[str, int]

Returns:

Dictionary with cache statistics

xraylabtool.data_handling.atomic_cache.is_element_preloaded(element)[source]

Check if an element is in the preloaded cache.

Parameters:

element (str) – Element symbol

Return type:

bool

Returns:

True if element is preloaded, False otherwise

class xraylabtool.data_handling.atomic_cache.FastAtomicDataProvider[source]

Bases: object

High-performance atomic data provider implementing AtomicDataProvider protocol.

This implementation uses preloaded atomic data and interpolated scattering factors for maximum performance in X-ray calculations.

__init__()[source]

Initialize the atomic data provider.

Return type:

None

get_scattering_factors(element, energies)[source]

Get atomic scattering factors for element at given energies.

This method loads scattering factor data and interpolates it to the requested energies, returning complex scattering factors (f1 + if2).

Parameters:
Returns:

Complex scattering factors (f1 + if2)

Return type:

ndarray[tuple[Any, ...], dtype[cdouble]]

is_element_cached(element)[source]

Check if element data is cached for fast access.

Parameters:

element (str) – Element symbol to check

Returns:

True if element is cached for fast access

Return type:

bool

preload_elements(elements)[source]

Preload scattering factor data for elements.

Parameters:

elements (list[str]) – List of element symbols to preload

Return type:

None

get_atomic_properties(element)[source]

Get basic atomic properties for an element.

Parameters:

element (str) – Element symbol

Returns:

Immutable mapping with atomic properties

Return type:

MappingProxyType[str, float]

xraylabtool.data_handling.atomic_cache.get_atomic_data_provider()[source]

Get the global atomic data provider instance.

Returns:

Shared atomic data provider instance

Return type:

FastAtomicDataProvider

Performance Features

The atomic data cache provides several performance optimizations:

  1. Preloaded Common Elements: 92 elements are preloaded at startup

  2. LRU Caching: Least Recently Used cache for computed scattering factors

  3. Vectorized Operations: NumPy-based calculations for energy arrays

  4. Memory Management: Efficient data structures and automatic cleanup

Usage Example

from xraylabtool.data_handling.atomic_cache import get_atomic_scattering_factors

# Get scattering factors for silicon at 8 keV
f1, f2 = get_atomic_scattering_factors("Si", 8000)

print(f"f1 (real): {f1}")
print(f"f2 (imaginary): {f2}")

Cache Statistics

from xraylabtool.data_handling.atomic_cache import get_cache_info

stats = get_cache_info()
print(f"Cache hits: {stats['hits']}")
print(f"Cache misses: {stats['misses']}")
print(f"Cache size: {stats['current_size']}")

Batch Processing

High-performance batch processing module for X-ray calculations.

This module provides optimized batch processing capabilities with memory management, parallel execution, and progress tracking for large-scale X-ray property calculations.

class xraylabtool.data_handling.batch_processing.BatchConfig(max_workers=None, chunk_size=100, memory_limit_gb=4.0, enable_progress=True, cache_results=False)[source]

Bases: object

Configuration for batch processing operations.

Parameters:
  • max_workers (int | None) – Maximum number of parallel workers (default: auto-detect)

  • chunk_size (int) – Number of calculations per chunk (default: 100)

  • memory_limit_gb (float) – Memory limit in GB before forcing garbage collection

  • enable_progress (bool) – Whether to show progress bars

  • cache_results (bool) – Whether to cache intermediate results

max_workers: int | None = None
chunk_size: int = 100
memory_limit_gb: float = 4.0
enable_progress: bool = True
cache_results: bool = False
__post_init__()[source]

Initialize the configuration after object creation.

Return type:

None

__init__(max_workers=None, chunk_size=100, memory_limit_gb=4.0, enable_progress=True, cache_results=False)
Parameters:
  • max_workers (int | None)

  • chunk_size (int)

  • memory_limit_gb (float)

  • enable_progress (bool)

  • cache_results (bool)

Return type:

None

class xraylabtool.data_handling.batch_processing.MemoryMonitor(limit_gb=4.0)[source]

Bases: object

Memory usage monitor for batch operations.

Parameters:

limit_gb (float)

__init__(limit_gb=4.0)[source]

Initialize the memory monitor.

Parameters:

limit_gb (float)

check_memory()[source]

Check if memory usage is below limit.

Return type:

bool

Returns:

True if within limits, False if exceeded

get_memory_usage_mb()[source]

Get current memory usage in MB.

Return type:

float

Returns:

Memory usage in megabytes

force_gc()[source]

Force garbage collection and clear caches to free memory.

Return type:

None

xraylabtool.data_handling.batch_processing.chunk_iterator(data, chunk_size)[source]

Yield successive chunks of data.

Parameters:
  • data (list[tuple[Any, ...]]) – List of data tuples to chunk

  • chunk_size (int) – Size of each chunk

Yields:

Lists of data tuples of specified chunk size

Return type:

Iterator[list[tuple[Any, …]]]

xraylabtool.data_handling.batch_processing.process_single_calculation(formula, energies, density)[source]

Process a single X-ray calculation.

Parameters:
  • formula (str) – Chemical formula

  • energies (ndarray) – Energy array

  • density (float) – Material density

Return type:

tuple[str, XRayResult | None]

Returns:

Tuple of (formula, XRayResult)

xraylabtool.data_handling.batch_processing.process_batch_chunk(chunk, config)[source]

Process a chunk of calculations in parallel.

Parameters:
Return type:

list[tuple[str, XRayResult | None]]

Returns:

List of (formula, result) tuples

xraylabtool.data_handling.batch_processing.calculate_batch_properties(formulas, energies, densities, config=None)[source]

Calculate X-ray properties for multiple materials with optimized batch processing.

This function processes large batches of calculations efficiently using chunking, parallel processing, and memory management.

Parameters:
  • formulas (list[str]) – List of chemical formulas

  • energies (float | list[float] | ndarray) – Energy values (shared across all materials)

  • densities (list[float]) – List of material densities

  • config (BatchConfig | None) – Batch processing configuration (optional)

Return type:

dict[str, XRayResult | None]

Returns:

Dictionary mapping formulas to XRayResult objects

Raises:

ValueError – If input validation fails

Examples

>>> import numpy as np
>>> from xraylabtool.data_handling.batch_processing import calculate_batch_properties
>>> formulas = ["SiO2", "SiO2", "Al2O3"]  # Same formula with different densities
>>> energies = np.linspace(5, 15, 101)  # 101 energy points
>>> densities = [2.2, 2.5, 3.95]  # Different densities for SiO2
>>> results = calculate_batch_properties(formulas, energies, densities)
>>> print(f"Processed {len(results)} materials")
Processed 3 materials
xraylabtool.data_handling.batch_processing.save_batch_results(results, output_file, format='csv', fields=None)[source]

Save batch calculation results to file.

Parameters:
  • results (dict[str, XRayResult | None]) – Dictionary of calculation results

  • output_file (str | Path) – Output file path

  • format (str) – Output format (‘csv’, ‘json’, ‘parquet’)

  • fields (list[str] | None) – List of fields to include (default: all)

Raises:
Return type:

None

xraylabtool.data_handling.batch_processing.load_batch_input(input_file, formula_column='formula', density_column='density', energy_column=None)[source]

Load batch input data from file.

Parameters:
  • input_file (str | Path) – Input file path

  • formula_column (str) – Name of formula column

  • density_column (str) – Name of density column

  • energy_column (str | None) – Name of energy column (optional, for per-material energies)

Return type:

tuple[list[str], list[float], list[ndarray] | None]

Returns:

Tuple of (formulas, densities, energies) where energies is either None or a list of numpy arrays

Raises:

Batch Processing Features

  • Memory Management: Automatic chunking for large datasets

  • Progress Tracking: Built-in progress bars with tqdm

  • Error Handling: Reliable error recovery and reporting

  • Parallel Processing: Multi-core support for independent calculations

Usage Example

from xraylabtool.data_handling.batch_processing import process_batch

materials = [
    {"formula": "Si", "density": 2.33},
    {"formula": "Al", "density": 2.70},
    {"formula": "Cu", "density": 8.96}
]

energies = [5000, 8000, 10000, 12000]

results = process_batch(materials, energies, show_progress=True)

Performance Benchmarks

Typical performance characteristics:

Operation

Cold Cache

Warm Cache

Single element lookup

~0.5 ms

~0.05 ms

Complex formula (SiO₂)

~1.2 ms

~0.1 ms

Batch 1000 materials

~50 ms

~8 ms

Energy array (100 points)

~15 ms

~1.5 ms

Memory Usage

The atomic cache uses approximately:

  • Startup: ~10 MB for preloaded elements

  • Per element: ~50 KB for full energy range

  • Peak usage: Scales with number of unique elements used

Cache Management

from xraylabtool.data_handling.atomic_cache import clear_cache, preload_elements

# Clear all cached data
clear_cache()

# Preload specific elements for better performance
preload_elements(["Si", "O", "Al", "Fe"])

Data Sources

Atomic scattering factor data is sourced from:

  1. CXRO Database: Center for X-ray Optics, Lawrence Berkeley National Laboratory

  2. NIST Database: National Institute of Standards and Technology

  3. Henke Tables: Widely used X-ray optical constants

The data files are in Henke format (.nff files) and cover the energy range from ~10 eV to ~100 keV with high precision interpolation between tabulated values.