tokenizer
Description
The Tokenizer subcommand provides encoding and decoding functionality between text and token sequences. It also allows viewing or exporting model vocabulary information. Both text and multimodal models are supported.
Usage
fastdeploy tokenizer --model MODEL (--encode TEXT | --decode TOKENS | --vocab-size | --info)
Parameters
| Parameter | Description | Default |
|---|---|---|
| --model, -m | Model path or name | None |
| --encode, -e | Encode text into a list of tokens | None |
| --decode, -d | Decode a list of tokens back into text | None |
| --vocab-size, -vs | Display the vocabulary size | None |
| --info, -i | Display detailed tokenizer information (special tokens, IDs, max length, etc.) | None |
| --vocab-export FILE, -ve FILE | Export the vocabulary to a file | None |
Examples
# 1. Encode text into tokens
# Convert input text into a token sequence recognizable by the model
fastdeploy tokenizer --model baidu/ERNIE-4.5-0.3B-Paddle --encode "Hello, world!"
# 2. Decode tokens into text
# Convert a token sequence back into readable text
fastdeploy tokenizer --model baidu/ERNIE-4.5-0.3B-Paddle --decode "[1, 2, 3]"
# 3. View vocabulary size
# Output the total number of tokens in the model’s vocabulary
fastdeploy tokenizer --model baidu/ERNIE-4.5-0.3B-Paddle --vocab-size
# 4. View tokenizer details
# Includes special symbols, ID mappings, max token length, etc.
fastdeploy tokenizer --model baidu/ERNIE-4.5-0.3B-Paddle --info
# 5. Export vocabulary to a file
# Save the tokenizer’s vocabulary to a local file
fastdeploy tokenizer --model baidu/ERNIE-4.5-0.3B-Paddle --vocab-export ./vocab.txt
# 6. Support for multimodal models
# Decode tokens for a multimodal model
fastdeploy tokenizer --model baidu/EB-VL-Lite-d --decode "[5300, 96382]"
# 7. Combine multiple functions
# Encode, decode, view vocabulary, and export vocabulary in a single command
fastdeploy tokenizer \
-m baidu/ERNIE-4.5-0.3B-PT \
-e "你好哇" \
-d "[5300, 96382]" \
-i \
-vs \
-ve vocab.json