Databricks Notebook Job

The Databricks Notebook job type runs a single notebook in a Databricks workspace as a one-off run via POST jobs/runs/submit (Jobs API 2.1). The notebook is identified by its workspace path — Polysync stores that path as the Job's External Id.

This job type is supported on the Azure Databricks platform.

Required job fields

  • External Id — the notebook path (e.g., /Users/alice@contoso.com/etl/sales-load).
  • Job TypeDatabricks Notebook (set automatically on import).

Compute is taken from the Platform's Compute Type attribute and any cluster-shape attributes (Cluster Version, Cluster Node Type, Number of Worker Nodes, Existing Cluster ID, Existing Instance Pool). The Job does not store compute settings of its own.

Job discovery

Polysync calls workspace/list?path={DefaultNotebookPath} (default /) and recursively descends, importing every object where object_type == "NOTEBOOK" as a Polysync Job. For each notebook, workspace/export is fetched and dbutils.widgets.{text,dropdown,combobox,multiselect} declarations are parsed to seed the Job's parameter list with the widget names and default values.

Parameter handling

Notebook widgets are exposed by Databricks as strings only. The provider sends Input + Input&Output parameters in the notebook_task.base_parameters dictionary, exactly as configured:

{
  "notebook_task": {
    "notebook_path": "<external-id>",
    "base_parameters": {
      "<parameter-name>": "<value-coerced-to-string>"
    }
  }
}
Direction Sent in runs/submit Updated from response
Input
Output (not supported)
Input&Output (not supported)

Output parameters are not supported for this job type. The Jobs API does not surface notebook return values in the run status response. Use Databricks's dbutils.notebook.exit() only for logging; downstream Tasks should read results from a shared store (Delta table, ADLS path) rather than expecting Polysync to ferry them.

All values are sent as JSON strings regardless of the Polysync Data Type — pick the Data Type that helps the editor validate the value, but be aware your notebook will always receive a string from dbutils.widgets.get.

Execution flow

  1. Polysync builds the notebook_task body, attaches the compute block resolved from the Platform, then calls POST jobs/runs/submit (Jobs API 2.1).

  2. The response run_id becomes the Polysync RunId.

  3. Status is polled via GET jobs/runs/get?run_id={runId} and decoded from the combination of life_cycle_state and result_state:

    Databricks state Polysync status
    life_cycle_state = PENDING / RUNNING / TERMINATING Running
    life_cycle_state = SKIPPED Cancelled
    life_cycle_state = INTERNAL_ERROR Failed
    result_state = SUCCESS Success
    result_state = FAILED / TIMEDOUT Failed
    result_state = CANCELED Cancelled
  4. Cancel is supported via POST jobs/runs/cancel ({ "run_id": <run_id> }).

Monitor URL

{workspace_url}/#job/{run_id}/run/1

If the submit response includes run_page_url, Polysync uses that value verbatim.

Best practices

  • Always declare the notebook's widgets explicitly with dbutils.widgets.text/dropdown — they're what Polysync's parameter importer reads.
  • Don't rely on dbutils.notebook.exit() to return data; persist results to Delta / object storage instead.
  • Use Serverless compute or Existing Instance Pool to keep run startup latency low — runs/submit provisions a new cluster otherwise.

Troubleshooting

  • Run fails with RESOURCE_DOES_NOT_EXIST — the External Id is not a valid notebook path in the workspace. Re-import from the Platform editor to pick the correct path.
  • Widgets not picked up on importworkspace/export may have returned a SOURCE rather than a JUPYTER notebook; ensure dbutils.widgets calls are at the top level of cells (not inside conditional blocks) so the importer's regex can find them.
  • Run stays in PENDING for a long time — the compute block is spinning up a new cluster; switch the Platform's Compute Type to ExistingInteractiveCluster, ExistingInstancePool, or Serverless for faster starts.