Machine Learning Experiment Design Prompt
Overview
This prompt guides you through designing robust machine learning experiments that produce reliable, reproducible results.
Experiment Framework
1. Problem Definition
- Business Objective: What problem are you solving?
- Success Metrics: How will you measure success?
- Constraints: Time, budget, computational resources
- Stakeholder Requirements: What outcomes matter most?
2. Data Strategy
- Data Sources: Where will data come from?
- Data Quality: Completeness, accuracy, bias assessment
- Data Splitting: Train/validation/test sets
- Data Preprocessing: Cleaning, feature engineering, normalization
3. Model Selection
- Algorithm Choices: Why these specific algorithms?
- Baseline Models: Simple models for comparison
- Complexity Trade-offs: Accuracy vs. interpretability vs. speed
- Ensemble Methods: When and how to combine models
Experimental Design
Controlled Experiments
Control Group: Current system/production model
Treatment Group: New model/proposed changes
Randomization: Ensure fair comparison
Blinding: Remove bias from evaluation
A/B Testing Framework
- Sample Size Calculation: Statistical power analysis
- Duration: How long to run the experiment
- Success Criteria: Pre-defined thresholds
- Early Stopping: When to conclude the experiment
Validation Strategy
Cross-Validation Techniques
- K-Fold CV: Standard validation method
- Stratified CV: Maintain class distribution
- Time Series CV: For temporal data
- Nested CV: Hyperparameter tuning + model selection
Performance Metrics
- Classification: Accuracy, Precision, Recall, F1-Score, AUC-ROC
- Regression: MSE, RMSE, MAE, R²
- Ranking: NDCG, MAP, MRR
- Custom Metrics: Business-specific KPIs
Statistical Significance
Hypothesis Testing
- Null Hypothesis: No difference between models
- Alternative Hypothesis: New model is better
- P-Value: Probability of observing results by chance
- Confidence Intervals: Range of likely true effects
Sample Size Considerations
Power = 0.8 (80% chance of detecting real effect)
Alpha = 0.05 (5% false positive rate)
Effect Size = Minimum detectable difference
Reproducibility Checklist
- Version control for code and data
- Random seed setting
- Environment documentation
- Data lineage tracking
- Model artifact storage
- Experiment logging
Common Pitfalls
- Data Leakage: Training data contaminating test sets
- Overfitting: Models that don’t generalize
- Selection Bias: Non-representative samples
- Confirmation Bias: Seeking results that confirm hypotheses
- Multiple Testing: Inflated false positive rates
Documentation Template
Experiment Name: [Descriptive name]
Date: [YYYY-MM-DD]
Hypothesis: [Clear, testable statement]
Methodology: [Detailed experimental setup]
Results: [Quantitative outcomes]
Conclusions: [Business implications]
Next Steps: [Follow-up actions]
Remember, well-designed experiments provide confidence in your results and guide data-driven decision making.