Data Handling Module¶
The data_handling module provides atomic data caching and batch processing capabilities.
Atomic Data Cache¶
High-performance atomic data cache system.
This module provides a pre-populated cache of atomic data for common elements to eliminate expensive database queries to the Mendeleev library during runtime.
- xraylabtool.data_handling.atomic_cache.get_atomic_data_fast(element)[source]¶
Fast atomic data lookup with preloaded cache and fallback to Mendeleev.
This function first checks the preloaded cache, then the runtime cache, and only falls back to expensive Mendeleev queries as a last resort.
- Parameters:
element (
str) – Element symbol (e.g., ‘H’, ‘C’, ‘Si’)- Return type:
- Returns:
Dictionary with ‘atomic_number’ and ‘atomic_weight’ keys
- Raises:
ValueError – If element symbol is not recognized
- xraylabtool.data_handling.atomic_cache.get_bulk_atomic_data_fast(elements_tuple)[source]¶
High-performance bulk atomic data loader with caching.
This function loads atomic data for multiple elements efficiently, using the preloaded cache to avoid expensive database queries.
- xraylabtool.data_handling.atomic_cache.warm_up_cache(elements)[source]¶
Pre-warm the cache with specific elements.
- xraylabtool.data_handling.atomic_cache.warm_cache_for_compounds(formulas, include_similar=True, include_family=True, timing_info=False)[source]¶
Intelligently warm cache for compounds and their related elements.
This function performs intelligent cache warming by analyzing compound formulas, extracting their constituent elements, and pre-loading both atomic data and scattering factor interpolators. It can also include similar compounds and compound families for comprehensive warming.
- Parameters:
- Return type:
- Returns:
Dictionary with warming results and statistics
Examples
>>> result = warm_cache_for_compounds(["SiO2", "Al2O3"]) >>> result["elements_warmed"] ['Si', 'O', 'Al'] >>> result["success_rate"] > 0.9 True
- xraylabtool.data_handling.atomic_cache.get_cache_stats()[source]¶
Get cache statistics for monitoring.
- xraylabtool.data_handling.atomic_cache.is_element_preloaded(element)[source]¶
Check if an element is in the preloaded cache.
- class xraylabtool.data_handling.atomic_cache.FastAtomicDataProvider[source]¶
Bases:
objectHigh-performance atomic data provider implementing AtomicDataProvider protocol.
This implementation uses preloaded atomic data and interpolated scattering factors for maximum performance in X-ray calculations.
- get_scattering_factors(element, energies)[source]¶
Get atomic scattering factors for element at given energies.
This method loads scattering factor data and interpolates it to the requested energies, returning complex scattering factors (f1 + if2).
- xraylabtool.data_handling.atomic_cache.get_atomic_data_provider()[source]¶
Get the global atomic data provider instance.
- Returns:
Shared atomic data provider instance
- Return type:
Performance Features¶
The atomic data cache provides several performance optimizations:
Preloaded Common Elements: 92 elements are preloaded at startup
LRU Caching: Least Recently Used cache for computed scattering factors
Vectorized Operations: NumPy-based calculations for energy arrays
Memory Management: Efficient data structures and automatic cleanup
Usage Example¶
from xraylabtool.data_handling.atomic_cache import get_atomic_scattering_factors
# Get scattering factors for silicon at 8 keV
f1, f2 = get_atomic_scattering_factors("Si", 8000)
print(f"f1 (real): {f1}")
print(f"f2 (imaginary): {f2}")
Cache Statistics¶
from xraylabtool.data_handling.atomic_cache import get_cache_info
stats = get_cache_info()
print(f"Cache hits: {stats['hits']}")
print(f"Cache misses: {stats['misses']}")
print(f"Cache size: {stats['current_size']}")
Batch Processing¶
High-performance batch processing module for X-ray calculations.
This module provides optimized batch processing capabilities with memory management, parallel execution, and progress tracking for large-scale X-ray property calculations.
- class xraylabtool.data_handling.batch_processing.BatchConfig(max_workers=None, chunk_size=100, memory_limit_gb=4.0, enable_progress=True, cache_results=False)[source]¶
Bases:
objectConfiguration for batch processing operations.
- Parameters:
max_workers (
int|None) – Maximum number of parallel workers (default: auto-detect)chunk_size (
int) – Number of calculations per chunk (default: 100)memory_limit_gb (
float) – Memory limit in GB before forcing garbage collectionenable_progress (
bool) – Whether to show progress barscache_results (
bool) – Whether to cache intermediate results
- class xraylabtool.data_handling.batch_processing.MemoryMonitor(limit_gb=4.0)[source]¶
Bases:
objectMemory usage monitor for batch operations.
- Parameters:
limit_gb (float)
- check_memory()[source]¶
Check if memory usage is below limit.
- Return type:
- Returns:
True if within limits, False if exceeded
- xraylabtool.data_handling.batch_processing.chunk_iterator(data, chunk_size)[source]¶
Yield successive chunks of data.
- xraylabtool.data_handling.batch_processing.process_single_calculation(formula, energies, density)[source]¶
Process a single X-ray calculation.
- xraylabtool.data_handling.batch_processing.process_batch_chunk(chunk, config)[source]¶
Process a chunk of calculations in parallel.
- xraylabtool.data_handling.batch_processing.calculate_batch_properties(formulas, energies, densities, config=None)[source]¶
Calculate X-ray properties for multiple materials with optimized batch processing.
This function processes large batches of calculations efficiently using chunking, parallel processing, and memory management.
- Parameters:
- Return type:
- Returns:
Dictionary mapping formulas to XRayResult objects
- Raises:
ValueError – If input validation fails
Examples
>>> import numpy as np >>> from xraylabtool.data_handling.batch_processing import calculate_batch_properties >>> formulas = ["SiO2", "SiO2", "Al2O3"] # Same formula with different densities >>> energies = np.linspace(5, 15, 101) # 101 energy points >>> densities = [2.2, 2.5, 3.95] # Different densities for SiO2 >>> results = calculate_batch_properties(formulas, energies, densities) >>> print(f"Processed {len(results)} materials") Processed 3 materials
- xraylabtool.data_handling.batch_processing.save_batch_results(results, output_file, format='csv', fields=None)[source]¶
Save batch calculation results to file.
- Parameters:
- Raises:
ValueError – If format is not supported
IOError – If file cannot be written
- Return type:
- xraylabtool.data_handling.batch_processing.load_batch_input(input_file, formula_column='formula', density_column='density', energy_column=None)[source]¶
Load batch input data from file.
- Parameters:
- Return type:
- Returns:
Tuple of (formulas, densities, energies) where energies is either None or a list of numpy arrays
- Raises:
FileNotFoundError – If input file doesn’t exist
ValueError – If required columns are missing
Batch Processing Features¶
Memory Management: Automatic chunking for large datasets
Progress Tracking: Built-in progress bars with tqdm
Error Handling: Reliable error recovery and reporting
Parallel Processing: Multi-core support for independent calculations
Usage Example¶
from xraylabtool.data_handling.batch_processing import process_batch
materials = [
{"formula": "Si", "density": 2.33},
{"formula": "Al", "density": 2.70},
{"formula": "Cu", "density": 8.96}
]
energies = [5000, 8000, 10000, 12000]
results = process_batch(materials, energies, show_progress=True)
Performance Benchmarks¶
Typical performance characteristics:
Operation |
Cold Cache |
Warm Cache |
|---|---|---|
Single element lookup |
~0.5 ms |
~0.05 ms |
Complex formula (SiO₂) |
~1.2 ms |
~0.1 ms |
Batch 1000 materials |
~50 ms |
~8 ms |
Energy array (100 points) |
~15 ms |
~1.5 ms |
Memory Usage¶
The atomic cache uses approximately:
Startup: ~10 MB for preloaded elements
Per element: ~50 KB for full energy range
Peak usage: Scales with number of unique elements used
Cache Management¶
from xraylabtool.data_handling.atomic_cache import clear_cache, preload_elements
# Clear all cached data
clear_cache()
# Preload specific elements for better performance
preload_elements(["Si", "O", "Al", "Fe"])
Data Sources¶
Atomic scattering factor data is sourced from:
CXRO Database: Center for X-ray Optics, Lawrence Berkeley National Laboratory
NIST Database: National Institute of Standards and Technology
Henke Tables: Widely used X-ray optical constants
The data files are in Henke format (.nff files) and cover the energy range from ~10 eV to ~100 keV with high precision interpolation between tabulated values.