%pip install sagemaker --upgrade --quiet
import os
import sagemaker
from sagemaker.djl_inference.model import DJLModel
role = sagemaker.get_execution_role() # execution role for the endpoint
session = sagemaker.session.Session() # sagemaker session for interacting with different AWS APIs
Step 2: Start building SageMaker endpoint¶
In this step, we will build SageMaker endpoint from scratch
Getting the container image URI (optional)¶
Check out available images: Large Model Inference available DLC
# Choose a specific version of LMI image directly:
# image_uri = "763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.28.0-lmi10.0.0-cu124"
Create SageMaker model¶
Here we are using LMI PySDK to create the model.
Checkout more configuration options.
model_id = "mistralai/Mistral-7B-v0.1" # model will be download form Huggingface hub
hf_token = os.getenv("HF_TOKEN", "hf_XXXXXXXXXXX") # use your HF_TOKEN to access this model
env = {
"TENSOR_PARALLEL_DEGREE": "1", # use 1 GPU, set to "max" to use all GPUs on the instance
"HF_TOKEN": hf_token,
"OPTION_ROLLING_BATCH": "auto", # optional, enabled by default
"OPTION_TRUST_REMOTE_CODE": "true",
}
model = DJLModel(
model_id=model_id,
env=env,
role=role)
Create SageMaker endpoint¶
You need to specify the instance to use and endpoint names
instance_type = "ml.g5.2xlarge"
endpoint_name = sagemaker.utils.name_from_base("lmi-model")
predictor = model.deploy(initial_instance_count=1,
instance_type=instance_type,
endpoint_name=endpoint_name,
)
Step 3: Run inference¶
predictor.predict(
{"inputs": "tell me a story of the little red riding hood", "parameters": {"max_new_tokens":128, "do_sample":True}}
)
benchmark¶
This can be done outside this notebook, in a bash shell terminal. The awscurl here is a benchmark tool, obtainable from
curl -O https://publish.djl.ai/awscurl/awscurl && chmod +x awscurl
See: Benchmarking your Endpoint for more detail.
%%sh
curl -O https://publish.djl.ai/awscurl/awscurl && chmod +x awscurl
endpoint_url=f"https://runtime.sagemaker.{session._region_name}.amazonaws.com/endpoints/{endpoint_name}/invocations"
endpoint_url
!TOKENIZER=codellama/CodeLlama-34b-hf ./awscurl -c 4 -N 10 -n sagemaker {endpoint_url} \
-H "Content-type: application/json" \
-d '{{"inputs":"The new movie that got Oscar this year","parameters":{{"max_new_tokens":256, "do_sample":true, "temperature":0.8, "top_k":5}}}}' \
-t
Clean up¶
session.delete_endpoint(endpoint_name)
session.delete_endpoint_config(endpoint_name)
model.delete_model()