-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add vLLM surpport for gradio demo, inference script and openai api demo #35
Conversation
Why we need openai_server_demo/README_vllm.md? |
Some contents in |
scripts/inference/inference_hf.py
Outdated
model_vocab_size = base_model.get_input_embeddings().weight.size(0) | ||
tokenzier_vocab_size = len(tokenizer) | ||
print(f"Vocab of the base model: {model_vocab_size}") | ||
print(f"Vocab of the tokenizer: {tokenzier_vocab_size}") | ||
if model_vocab_size!=tokenzier_vocab_size: | ||
print("Resize model embeddings to fit tokenizer") | ||
base_model.resize_token_embeddings(tokenzier_vocab_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tokenzier->tokenizer
Description
This PR add support for gradio demo (
gradio_demo.py
), inference script (inference_hf.py
) and openai api demo (add 3 new file in directionaryopenai_server_demo
)Related Issue
None.