🔑 1: Chunk by meaning, not by size
🔑 2: Keep chunks self-contained
“If this chunk is retrieved alone, does it still make sense?”
🔑 3: Avoid mixing unrelated topics
🔑 4: Include cause + symptom + fix (if applicable)
🔑 5: Maintain moderate size
- 50–200 words per chunk (rule of thumb)
🔑 6: Overlap slightly
🔑 7: Preserve important keywords
👉 These are retrieval anchors
🔑 8: Structure beats raw text
Structured chunks > plain sentences
🔑 9: One chunk = one intent
🔑 10: Think like a query
“What question would retrieve this?” => no answer means weak chunk
No comments:
Post a Comment