Triton Deployment at Coffea-Casa facility

This environment provides access to a Triton inference server deployment with 2 GPUs. Credentials for the Triton model repository are mounted in the session and are available through environment variables.

Available Deployment

  • Triton server accessible at triton-grpc.cmsaf-prod.flatiron.hollandhpc.org

  • 2 GPUs available for inference

  • Model repository backed by S3-compatible storage

  • Credentials mounted in the session via AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, TRITON_BUCKET_HOST, and TRITON_BUCKET_NAME

Uploading Models

Use the following example to upload a model directory to the mounted Triton model repository bucket:

import boto3
import os
from pathlib import Path

s3 = boto3.client(
    "s3",
    endpoint_url=f"http://{os.environ['TRITON_BUCKET_HOST']}",
    aws_access_key_id=os.environ["AWS_ACCESS_KEY_ID"],
    aws_secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
)

def upload_directory(local_path: str, bucket: str, s3_prefix: str = ""):
    local = Path(local_path)

    for file_path in local.rglob("*"):
        if file_path.is_file():
            s3_key = f"{s3_prefix}/{file_path.relative_to(local)}".lstrip("/")
            print(f"Uploading {file_path} -> s3://{bucket}/{s3_key}")
            s3.upload_file(str(file_path), bucket, s3_key)

upload_directory(
    local_path="reconstruction_bdt_xgb",
    bucket=os.environ["TRITON_BUCKET_NAME"],
    s3_prefix="reconstruction_bdt_xgb",
)

Accessing Triton from Coffea-Casa

Connect to Triton from coffea-casa using the GRPC endpoint:

  • triton-grpc.cmsaf-prod.flatiron.hollandhpc.org

If a port is required, use the standard Triton gRPC port, for example:

  • triton-grpc.cmsaf-prod.flatiron.hollandhpc.org:8001