Performance and Optimization ============================ **Key Features:** Atomic data cache (10-50x speedup), vectorized calculations, batch processing **Typical Performance:** - Single calculation: < 0.1 ms - Batch 1000 materials: < 10 ms - Energy array (100 points): < 1 ms Performance Benchmarks ---------------------- **Single Material Performance:** - Simple element (Si): 0.5 ms → 0.05 ms (warm cache, 10x speedup) - Complex formula: 2.1 ms → 0.15 ms (warm cache, 14x speedup) **Batch Processing Scaling:** - 1,000 materials: 1.5s sequential → 0.05s batch (30x speedup) - 100,000 materials: 150s sequential → 2.5s batch (60x speedup) **Memory Usage:** - Atomic data cache: 10-50 MB - Batch 1000 materials: 2-5 MB - Energy array (1000 points): 8-15 MB Optimization Strategies ----------------------- **Caching:** .. code-block:: python from xraylabtool.data_handling.atomic_cache import preload_elements import xraylabtool as xrt # Preload common elements preload_elements(["Si", "O", "Al", "Fe", "C", "N"]) # Configure caching xrt.configure_cache(disk_cache=True, max_memory_mb=100) **Batch Processing:** .. code-block:: python # Efficient batch processing results = xrt.calculate_xray_properties(materials, energies) # For large datasets, use chunks results = xrt.calculate_xray_properties( materials, energies, chunk_size=1000 ) **Energy Arrays:** .. code-block:: python import numpy as np # Use logarithmic spacing energies = np.logspace(3, 5, 100) # 1-100 keV # Adaptive spacing near edges edge_region = np.linspace(7900, 8100, 200) far_region = np.logspace(3, 5, 50) energies = np.concatenate([far_region[far_region < 7900], edge_region, far_region[far_region > 8100]]) Performance Monitoring ---------------------- .. code-block:: python import xraylabtool as xrt import time # Built-in profiling xrt.enable_profiling() results = xrt.calculate_xray_properties(materials, energies) stats = xrt.get_performance_stats() print(f"Time: {stats['total_time']:.3f}s, Cache: {stats['cache_hit_rate']:.1%}") # Custom benchmarking start = time.time() result = xrt.calculate_xray_properties(materials, energies) print(f"Calculation time: {time.time() - start:.3f}s") Platform Optimizations ---------------------- .. code-block:: bash # Check NumPy configuration python -c "import numpy; numpy.show_config()" conda install numpy # Intel MKL optimized .. code-block:: python import os # Control threading os.environ['OMP_NUM_THREADS'] = '4' os.environ['MKL_NUM_THREADS'] = '4' Best Practices -------------- **Do:** - Use batch processing for multiple materials - Preload common elements at startup - Use NumPy arrays for energy ranges - Profile code to identify bottlenecks **Don't:** - Process materials individually in loops - Use Python lists for large energy arrays - Clear caches unnecessarily - Use excessive energy points Tuning Examples --------------- **Energy Scan Optimization:** .. code-block:: python # Bad: too many points energies_bad = np.linspace(1000, 30000, 10000) # Good: logarithmic spacing energies_good = np.logspace(3, 4.5, 100) # Best: adaptive spacing low_e = np.logspace(3, 3.85, 30) si_edge = np.linspace(1830, 1860, 50) high_e = np.logspace(3.9, 4.5, 30) energies_adaptive = np.concatenate([low_e, si_edge, high_e]) **Large Dataset Processing:** .. code-block:: python def process_huge_dataset(filename, output_filename): import csv with open(filename, 'r') as infile, open(output_filename, 'w') as outfile: reader, writer = csv.DictReader(infile), csv.writer(outfile) batch, batch_size = [], 1000 for row in reader: batch.append({'formula': row['formula'], 'density': float(row['density'])}) if len(batch) >= batch_size: results = xrt.calculate_xray_properties(batch, [8000]) for result in results: writer.writerow([result.formula, result.density_g_cm3, ...]) batch = [] Troubleshooting --------------- **Slow Calculations:** - Check cache hit rate (should be >90%) - Verify optimized NumPy/BLAS installation - Use chunked processing for large datasets **High Memory Usage:** - Process data in chunks - Clear caches: ``xrt.clear_cache()`` - Use generators for large datasets **Cache Misses:** - Preload frequently used elements - Use consistent energy grids - Warm up cache before timing Enhanced Performance Mode ------------------------- **New Optimizations:** 20-40x speedup for single calculations, 2-3x faster data loading .. code-block:: python # Enable optimizations import os os.environ['XRAYLABTOOL_ENABLE_OPTIMIZATIONS'] = '1' **Performance Improvements:** - Single calculation: 2.1ms → 0.05ms (42x speedup) - Data loading: 18-21ms → 6-7ms (2.9x speedup) - Arrays: 1.4x speedup for 50-500 point arrays **Future Plans:** GPU acceleration, JIT compilation, distributed processing