1. Architecture
Inference runs entirely on the GPU machine paired by the customer. Prompts and responses never leave the customer boundary unless an administrator explicitly enables history on the OwnLLM side (opt-in, encrypted at rest).
No inbound port is opened on the customer side. Communication between the site and the agent uses an outbound Cloudflare Tunnel initiated by the app and terminated with TLS on Cloudflare infrastructure.