Performance and Optimization¶
Key Features: Atomic data cache (10-50x speedup), vectorized calculations, batch processing
Typical Performance: - Single calculation: < 0.1 ms - Batch 1000 materials: < 10 ms - Energy array (100 points): < 1 ms
Performance Benchmarks¶
Single Material Performance: - Simple element (Si): 0.5 ms → 0.05 ms (warm cache, 10x speedup) - Complex formula: 2.1 ms → 0.15 ms (warm cache, 14x speedup)
Batch Processing Scaling: - 1,000 materials: 1.5s sequential → 0.05s batch (30x speedup) - 100,000 materials: 150s sequential → 2.5s batch (60x speedup)
Memory Usage: - Atomic data cache: 10-50 MB - Batch 1000 materials: 2-5 MB - Energy array (1000 points): 8-15 MB
Optimization Strategies¶
Caching:
from xraylabtool.data_handling.atomic_cache import preload_elements
import xraylabtool as xrt
# Preload common elements
preload_elements(["Si", "O", "Al", "Fe", "C", "N"])
# Configure caching
xrt.configure_cache(disk_cache=True, max_memory_mb=100)
Batch Processing:
# Efficient batch processing
results = xrt.calculate_xray_properties(materials, energies)
# For large datasets, use chunks
results = xrt.calculate_xray_properties(
materials, energies, chunk_size=1000
)
Energy Arrays:
import numpy as np
# Use logarithmic spacing
energies = np.logspace(3, 5, 100) # 1-100 keV
# Adaptive spacing near edges
edge_region = np.linspace(7900, 8100, 200)
far_region = np.logspace(3, 5, 50)
energies = np.concatenate([far_region[far_region < 7900],
edge_region, far_region[far_region > 8100]])
Performance Monitoring¶
import xraylabtool as xrt
import time
# Built-in profiling
xrt.enable_profiling()
results = xrt.calculate_xray_properties(materials, energies)
stats = xrt.get_performance_stats()
print(f"Time: {stats['total_time']:.3f}s, Cache: {stats['cache_hit_rate']:.1%}")
# Custom benchmarking
start = time.time()
result = xrt.calculate_xray_properties(materials, energies)
print(f"Calculation time: {time.time() - start:.3f}s")
Platform Optimizations¶
# Check NumPy configuration
python -c "import numpy; numpy.show_config()"
conda install numpy # Intel MKL optimized
import os
# Control threading
os.environ['OMP_NUM_THREADS'] = '4'
os.environ['MKL_NUM_THREADS'] = '4'
Best Practices¶
Do: - Use batch processing for multiple materials - Preload common elements at startup - Use NumPy arrays for energy ranges - Profile code to identify bottlenecks
Don’t: - Process materials individually in loops - Use Python lists for large energy arrays - Clear caches unnecessarily - Use excessive energy points
Tuning Examples¶
Energy Scan Optimization:
# Bad: too many points
energies_bad = np.linspace(1000, 30000, 10000)
# Good: logarithmic spacing
energies_good = np.logspace(3, 4.5, 100)
# Best: adaptive spacing
low_e = np.logspace(3, 3.85, 30)
si_edge = np.linspace(1830, 1860, 50)
high_e = np.logspace(3.9, 4.5, 30)
energies_adaptive = np.concatenate([low_e, si_edge, high_e])
Large Dataset Processing:
def process_huge_dataset(filename, output_filename):
import csv
with open(filename, 'r') as infile, open(output_filename, 'w') as outfile:
reader, writer = csv.DictReader(infile), csv.writer(outfile)
batch, batch_size = [], 1000
for row in reader:
batch.append({'formula': row['formula'], 'density': float(row['density'])})
if len(batch) >= batch_size:
results = xrt.calculate_xray_properties(batch, [8000])
for result in results:
writer.writerow([result.formula, result.density_g_cm3, ...])
batch = []
Troubleshooting¶
Slow Calculations: - Check cache hit rate (should be >90%) - Verify optimized NumPy/BLAS installation - Use chunked processing for large datasets
High Memory Usage:
- Process data in chunks
- Clear caches: xrt.clear_cache()
- Use generators for large datasets
Cache Misses: - Preload frequently used elements - Use consistent energy grids - Warm up cache before timing
Enhanced Performance Mode¶
New Optimizations: 20-40x speedup for single calculations, 2-3x faster data loading
# Enable optimizations
import os
os.environ['XRAYLABTOOL_ENABLE_OPTIMIZATIONS'] = '1'
Performance Improvements: - Single calculation: 2.1ms → 0.05ms (42x speedup) - Data loading: 18-21ms → 6-7ms (2.9x speedup) - Arrays: 1.4x speedup for 50-500 point arrays
Future Plans: GPU acceleration, JIT compilation, distributed processing