Performance and Optimization¶

Key Features: Atomic data cache (10-50x speedup), vectorized calculations, batch processing

Typical Performance: - Single calculation: < 0.1 ms - Batch 1000 materials: < 10 ms - Energy array (100 points): < 1 ms

Performance Benchmarks¶

Single Material Performance: - Simple element (Si): 0.5 ms → 0.05 ms (warm cache, 10x speedup) - Complex formula: 2.1 ms → 0.15 ms (warm cache, 14x speedup)

Batch Processing Scaling: - 1,000 materials: 1.5s sequential → 0.05s batch (30x speedup) - 100,000 materials: 150s sequential → 2.5s batch (60x speedup)

Memory Usage: - Atomic data cache: 10-50 MB - Batch 1000 materials: 2-5 MB - Energy array (1000 points): 8-15 MB

Optimization Strategies¶

Caching:

from xraylabtool.data_handling.atomic_cache import preload_elements
import xraylabtool as xrt

# Preload common elements
preload_elements(["Si", "O", "Al", "Fe", "C", "N"])

# Configure caching
xrt.configure_cache(disk_cache=True, max_memory_mb=100)

Batch Processing:

# Efficient batch processing
results = xrt.calculate_xray_properties(materials, energies)

# For large datasets, use chunks
results = xrt.calculate_xray_properties(
    materials, energies, chunk_size=1000
)

Energy Arrays:

import numpy as np

# Use logarithmic spacing
energies = np.logspace(3, 5, 100)  # 1-100 keV

# Adaptive spacing near edges
edge_region = np.linspace(7900, 8100, 200)
far_region = np.logspace(3, 5, 50)
energies = np.concatenate([far_region[far_region < 7900],
                          edge_region, far_region[far_region > 8100]])

Performance Monitoring¶

import xraylabtool as xrt
import time

# Built-in profiling
xrt.enable_profiling()
results = xrt.calculate_xray_properties(materials, energies)
stats = xrt.get_performance_stats()
print(f"Time: {stats['total_time']:.3f}s, Cache: {stats['cache_hit_rate']:.1%}")

# Custom benchmarking
start = time.time()
result = xrt.calculate_xray_properties(materials, energies)
print(f"Calculation time: {time.time() - start:.3f}s")

Platform Optimizations¶

# Check NumPy configuration
python -c "import numpy; numpy.show_config()"
conda install numpy  # Intel MKL optimized

import os
# Control threading
os.environ['OMP_NUM_THREADS'] = '4'
os.environ['MKL_NUM_THREADS'] = '4'

Best Practices¶

Do: - Use batch processing for multiple materials - Preload common elements at startup - Use NumPy arrays for energy ranges - Profile code to identify bottlenecks

Don’t: - Process materials individually in loops - Use Python lists for large energy arrays - Clear caches unnecessarily - Use excessive energy points

Tuning Examples¶

Energy Scan Optimization:

# Bad: too many points
energies_bad = np.linspace(1000, 30000, 10000)

# Good: logarithmic spacing
energies_good = np.logspace(3, 4.5, 100)

# Best: adaptive spacing
low_e = np.logspace(3, 3.85, 30)
si_edge = np.linspace(1830, 1860, 50)
high_e = np.logspace(3.9, 4.5, 30)
energies_adaptive = np.concatenate([low_e, si_edge, high_e])

Large Dataset Processing:

def process_huge_dataset(filename, output_filename):
    import csv
    with open(filename, 'r') as infile, open(output_filename, 'w') as outfile:
        reader, writer = csv.DictReader(infile), csv.writer(outfile)
        batch, batch_size = [], 1000

        for row in reader:
            batch.append({'formula': row['formula'], 'density': float(row['density'])})
            if len(batch) >= batch_size:
                results = xrt.calculate_xray_properties(batch, [8000])
                for result in results:
                    writer.writerow([result.formula, result.density_g_cm3, ...])
                batch = []

Troubleshooting¶

Slow Calculations: - Check cache hit rate (should be >90%) - Verify optimized NumPy/BLAS installation - Use chunked processing for large datasets

High Memory Usage: - Process data in chunks - Clear caches: xrt.clear_cache() - Use generators for large datasets

Cache Misses: - Preload frequently used elements - Use consistent energy grids - Warm up cache before timing

Enhanced Performance Mode¶

New Optimizations: 20-40x speedup for single calculations, 2-3x faster data loading

# Enable optimizations
import os
os.environ['XRAYLABTOOL_ENABLE_OPTIMIZATIONS'] = '1'

Performance Improvements: - Single calculation: 2.1ms → 0.05ms (42x speedup) - Data loading: 18-21ms → 6-7ms (2.9x speedup) - Arrays: 1.4x speedup for 50-500 point arrays

Future Plans: GPU acceleration, JIT compilation, distributed processing