Dataflow Classic Template Job

The Dataflow Classic Template job type launches an Apache Beam Classic Template (a .template file hosted in GCS) on Google Dataflow via POST projects/{project}/locations/{location}/templates:launch?gcsPath={templatePath}. The discovery artefact is an existing Dataflow Job that was previously launched from a Classic Template — Polysync stores its Job ID in the External Id and the template GCS path in the Dataflow Template Path job attribute.

This job type is supported on the Google Dataflow platform.

Required job fields

  • External Id — a representative Dataflow Job ID launched from the template (used to seed the parameter list).
  • Job TypeDataflow Classic Template (set automatically on import).
  • Dataflow Template Path (job attribute) — required; the gs://…/templates/… path used as the gcsPath launch parameter.

Job discovery

GET projects/{project}/locations/{location}/jobs?pageSize=100&view=JOB_VIEW_SUMMARY&filter=ACTIVE (falls back to &filter=ALL if none found). Jobs are deduplicated by the goog-dataflow-provided-template-name label, and Classic vs Flex is determined by inspecting the running job metadata.

Parameter handling

Polysync sends Input + Input&Output parameters as the flat parameters dict in the launch body:

{
  "jobName": "<polysync-name>-<timestamp>",
  "parameters": {
    "<param-1>": "<value-as-string>",
    "<param-2>": "<value-as-string>"
  },
  "environment": {}
}
Direction Sent in parameters Updated from response
Input
Output ⚠ See below
Input&Output ⚠ See below

Dataflow does not perform type coercion on Classic Template parameters — values are sent as strings, and the template is responsible for parsing them.

Output parameters

On each status poll, Polysync reads environment.sdkPipelineOptions from the Dataflow job and writes any matching Output / Input&Output parameter back. This only fills in values that the pipeline echoes via SDK options; pipelines that don't do that won't surface outputs.

Execution flow

  1. Polysync POSTs the launch body to templates:launch?gcsPath={templatePath}; the response job.id becomes the Polysync RunId.

  2. Status is polled via GET projects/{project}/locations/{location}/jobs/{jobId}?view=JOB_VIEW_ALL and decoded from currentState:

    Dataflow state Polysync status
    JOB_STATE_PENDING / JOB_STATE_QUEUED Starting
    JOB_STATE_RUNNING Running
    JOB_STATE_DRAINING / JOB_STATE_CANCELLING / JOB_STATE_RESOURCE_CLEANING_UP Running
    JOB_STATE_DONE / JOB_STATE_UPDATED / JOB_STATE_DRAINED Success
    JOB_STATE_FAILED Failed
    JOB_STATE_CANCELLED Cancelled
    (other) Unknown
  3. Cancel is supported via PUT projects/{project}/locations/{location}/jobs/{jobId} with body { "requestedState": "JOB_STATE_CANCELLED" }.

Monitor URL

https://console.cloud.google.com/dataflow/jobs/{location}/{jobId}?project={project}

Best practices

  • Stage Classic Templates under a versioned GCS path (e.g., gs://bucket/templates/v1/MyTemplate) so you can pin Polysync Jobs to a specific build.
  • Use Dataflow Flex Template for new pipelines — Classic Templates are in maintenance mode.
  • Persist pipeline outputs to GCS / BigQuery; don't rely on environment.sdkPipelineOptions for substantial results.

Troubleshooting

  • HTTP 404 on launchDataflow Template Path is wrong or the runtime identity lacks Storage Object Viewer on the template bucket.
  • Job stuck JOB_STATE_QUEUED — region has no available workers for the requested machine type. Inspect Compute Engine quotas.