| Classification |
Accuracy |
Accuracy = (TP + TN) / (TP + TN + FP + FN)
|
Overall correctness |
Classes are balanced |
Misleading on imbalanced datasets |
| Classification |
Precision |
Precision = TP / (TP + FP)
|
How correct positive predictions are
|
False positives are costly
|
May miss many real positives
|
| Classification |
Recall |
Recall = TP / (TP + FN)
|
How many actual positives are caught
|
Missing positives is dangerous
|
Can increase false alarms
|
| Classification |
F1 Score |
F1 = 2 × (Precision × Recall) / (Precision + Recall)
|
Balance between precision & recall
|
Both FP and FN matter
|
Harder to interpret intuitively
|
| Classification |
ROC-AUC (Receiver Operating Characteristic - Area Under Curve) |
Area under ROC curve
|
Class separation capability
|
Comparing probabilistic classifiers
|
Can look good on imbalanced data
|
| Classification |
Log Loss / Cross Entropy |
Penalizes confident wrong predictions
|
Probability quality
|
Neural networks, probabilistic outputs
|
Less interpretable
|
| Classification |
MCC (Mathews Correlation Coefficient) |
Correlation between predictions & truth
|
Balanced evaluation
|
Imbalanced datasets
|
More mathematically complex
|
| Regression |
MAE |
MAE = (1/n) Σ |yi - ŷi|
|
Average absolute error
|
Want interpretable error
|
Treats all errors equally
|
| Regression |
MSE (Mean Square Error) |
MSE = (1/n) Σ (yi - ŷi)2
|
Squared prediction error
|
Large errors must be punished
|
Sensitive to outliers
|
| Regression |
RMSE (Root Mean Square Error) |
RMSE = √[(1/n) Σ (yi - ŷi)2]
|
Root of squared error
|
Need same unit as target
|
Still sensitive to outliers
|
| Ranking / Retrieval |
Precision@K |
Relevant items in top K results
|
Retrieval accuracy in top results
|
Search, RAG, recommenders
|
Ignores missed relevant items
|
| Ranking / Retrieval |
Recall@K |
Relevant items retrieved in top K
|
Retrieval coverage
|
RAG retrieval
|
Can retrieve irrelevant items
|
| Ranking / Retrieval |
MRR (Mean Reciprocal Rank) |
Reciprocal rank of first correct result
|
How early first correct answer appears
|
QA systems, search
|
Ignores later results
|
| Ranking / Retrieval |
NDCG (Normalized Discounted Cumulative Gain) |
Ranking quality with graded relevance
|
Overall ranking usefulness
|
Search/recommendation
|
More complex
|
| Ranking / Retrieval |
MAP (Mean Average Precision) |
Mean average precision across queries
|
Retrieval quality across dataset
|
Information retrieval
|
Computationally heavier
|
No comments:
Post a Comment