Container version: 1.5.6, tag: stable
Improvements
Self-Hosted
- Reduced pod startup times during warmup
- Introduced
MAX_WARMUP_TOKENS=50000
to cap warmup tokens- Number of tokens used is minimum of
MAX_WARMUP_TOKENS
andMAX_INPUT_TOKENS
- Number of tokens used is minimum of