Foundry Model Deployment Inference Job

The Foundry Model Deployment Inference job type calls a model deployment (chat or embedding) hosted in an Azure AI Foundry Project synchronously. The deployment is identified by its name — Polysync stores that in the External Id. The deployment is classified as embedding when its model name contains embedding, otherwise as chat.

This job type is supported on the Azure AI Foundry Project platform.

Required job fields

External Id — the deployment name.
Job Type — Foundry Model Deployment Inference (set automatically on import).

Job discovery

Control-plane (requires AAD auth):

GET /deployments?api-version=2023-05-01

Each deployment becomes a Job. Classification (chat vs embedding) is done from the deployment's model name.

Parameter handling

Chat deployments

Parameter	Direction	Role
`system_prompt`	`Input`	Optional; falls back to job attribute `OpenAI System Prompt`.
`user_prompt`	`Input`	Required; becomes the `user` message.
`response`	`Output`	`choices[0].message.content`.
`prompt_tokens`	`Output`	`usage.prompt_tokens`.
`completion_tokens`	`Output`	`usage.completion_tokens`.
`total_tokens`	`Output`	`usage.total_tokens`.
`finish_reason`	`Output`	`choices[0].finish_reason`.

Request body:

{
  "messages": [
    { "role": "system", "content": "<system_prompt>" },
    { "role": "user",   "content": "<user_prompt>"   }
  ],
  "max_tokens":  <optional>,
  "temperature": <optional>
}

Embedding deployments

Parameter	Direction	Role
`input`	`Input`	Text to embed.
`embedding`	`Output`	JSON array from `data[0].embedding`.
`prompt_tokens`	`Output`	`usage.prompt_tokens`.
`total_tokens`	`Output`	`usage.total_tokens`.

Request body: { "input": "<input>" }.

Execution flow

Synchronous — no real polling:

Embedding: POST /openai/deployments/{deployment}/embeddings?api-version=2024-10-21. Chat: POST /openai/deployments/{deployment}/chat/completions?api-version=2024-10-21. (Data-plane scope: https://cognitiveservices.azure.com/.default.)
Response body is parsed and output parameters are written immediately.
The composite RunId encodes the status: "{deployment}#sync#Success" (or …#Failed) — Polysync decodes the status from the RunId without re-calling the service.
Cancel is not supported (synchronous).

Monitor URL

Same as Foundry Agent Run — generic Cognitive Services account overview.

Best practices

Set OpenAI System Prompt, OpenAI Max Tokens, and OpenAI Temperature on the Job to provide defaults that downstream Tasks can override per execution by setting system_prompt as an Input parameter.
For RAG and tool-using flows, use Foundry Agent Run instead — it provides threads and tool semantics.

Troubleshooting

DeploymentNotFound during discovery — control-plane auth failed; API Key auth does not support /deployments?… — switch the Platform to AAD (Service Principal / Certificate / Polysync SPN) for discovery.
Long latency — synchronous chat calls block the dispatcher thread; for high-volume work prefer Azure OpenAI Batch Job.

Documentation