Foundry Model Deployment Inference Job

The Foundry Model Deployment Inference job type calls a model deployment (chat or embedding) hosted in an Azure AI Foundry Project synchronously. The deployment is identified by its name — Polysync stores that in the External Id. The deployment is classified as embedding when its model name contains embedding, otherwise as chat.

This job type is supported on the Azure AI Foundry Project platform.

Required job fields

  • External Id — the deployment name.
  • Job TypeFoundry Model Deployment Inference (set automatically on import).

Job discovery

Control-plane (requires AAD auth):

GET /deployments?api-version=2023-05-01

Each deployment becomes a Job. Classification (chat vs embedding) is done from the deployment's model name.

Parameter handling

Chat deployments

Parameter Direction Role
system_prompt Input Optional; falls back to job attribute OpenAI System Prompt.
user_prompt Input Required; becomes the user message.
response Output choices[0].message.content.
prompt_tokens Output usage.prompt_tokens.
completion_tokens Output usage.completion_tokens.
total_tokens Output usage.total_tokens.
finish_reason Output choices[0].finish_reason.

Request body:

{
  "messages": [
    { "role": "system", "content": "<system_prompt>" },
    { "role": "user",   "content": "<user_prompt>"   }
  ],
  "max_tokens":  <optional>,
  "temperature": <optional>
}

Embedding deployments

Parameter Direction Role
input Input Text to embed.
embedding Output JSON array from data[0].embedding.
prompt_tokens Output usage.prompt_tokens.
total_tokens Output usage.total_tokens.

Request body: { "input": "<input>" }.

Execution flow

Synchronous — no real polling:

  1. Embedding: POST /openai/deployments/{deployment}/embeddings?api-version=2024-10-21. Chat: POST /openai/deployments/{deployment}/chat/completions?api-version=2024-10-21. (Data-plane scope: https://cognitiveservices.azure.com/.default.)
  2. Response body is parsed and output parameters are written immediately.
  3. The composite RunId encodes the status: "{deployment}#sync#Success" (or …#Failed) — Polysync decodes the status from the RunId without re-calling the service.
  4. Cancel is not supported (synchronous).

Monitor URL

Same as Foundry Agent Run — generic Cognitive Services account overview.

Best practices

  • Set OpenAI System Prompt, OpenAI Max Tokens, and OpenAI Temperature on the Job to provide defaults that downstream Tasks can override per execution by setting system_prompt as an Input parameter.
  • For RAG and tool-using flows, use Foundry Agent Run instead — it provides threads and tool semantics.

Troubleshooting

  • DeploymentNotFound during discovery — control-plane auth failed; API Key auth does not support /deployments?… — switch the Platform to AAD (Service Principal / Certificate / Polysync SPN) for discovery.
  • Long latency — synchronous chat calls block the dispatcher thread; for high-volume work prefer Azure OpenAI Batch Job.