AWS Glue Job

The AWS Glue Job job type submits an AWS Glue ETL job run via StartJobRun, passing Polysync parameters as Glue Arguments. The Glue job is identified by its name — Polysync stores that in the Job's External Id.

This job type is supported on the AWS Glue platform.

Required job fields

  • External Id — the AWS Glue job name.
  • Job TypeAWS Glue Job (set automatically on import).

Job attributes captured from the Glue job definition during discovery (used as compute overrides at run time):

  • Glue Worker TypeG.1X, G.2X, G.4X, G.8X, G.025X, Z.2X, or Standard.
  • Glue Number Of Workers — positive integer.
  • Glue Version — sent as the --glue-version argument.
  • Glue Timeout (Minutes) — positive integer.
  • Glue Max Retries — informational only (Glue honors retries on the job definition, not on StartJobRun).

Compute precedence: job attribute → platform default → Glue job definition.

Job discovery

ListJobs + GetJobs (paginated via NextToken, MaxResults=100). Default arguments declared on the Glue job's DefaultArguments are imported as Polysync parameters, with Polysync-reserved system arguments filtered out (--job-language, --job-bookmark-option, --enable-metrics, …).

Parameter handling

Input + Input&Output parameters are sent as Glue Arguments:

{
  "JobName": "<external-id>",
  "Arguments": {
    "--<param-1>": "<value-as-string>",
    "--<param-2>": "<value-as-string>"
  }
}
Direction Sent in Arguments Updated from response
Input
Output (not supported)
Input&Output (not supported)

Glue argument keys must start with --. If a Polysync parameter name is missing the prefix, Polysync prepends it automatically. All values are strings — Glue does not perform type coercion on StartJobRun arguments. Internal parameters (those starting with _) are excluded.

Output parameters are not supported. Glue surfaces job-run results only through CloudWatch Logs and metrics; there is no structured output channel.

Execution flow

  1. Polysync calls StartJobRun(JobName, Arguments), optionally including WorkerType, NumberOfWorkers, and Timeout from the resolved compute overrides.

  2. The composite RunId is "{jobName}/{jobRunId}" — Glue's GetJobRun API requires both components.

  3. Status is polled via GetJobRun(JobName, RunId):

    Glue JobRunState Polysync status
    STARTING / WAITING Starting
    RUNNING / STOPPING Running
    SUCCEEDED Success
    STOPPED Cancelled
    FAILED / TIMEOUT / ERROR Failed
    (other / null) Unknown
  4. Cancel is supported via BatchStopJobRun(JobName, [RunId]).

Monitor URL

https://{region}.console.aws.amazon.com/gluestudio/home?region={region}
  #/job/{jobName}/run/{runId}

Best practices

  • Pre-define Glue jobs with explicit DefaultArguments so Polysync's parameter import seeds the schema correctly.
  • Use the job-level Glue Worker Type and Glue Number Of Workers overrides for one-off compute changes rather than editing the Glue definition.
  • Persist job results to S3; rely on CloudWatch Logs for failure diagnostics.

Troubleshooting

  • InvalidInputException about arguments — confirm parameter keys are valid Glue argument names. Polysync prepends --, but a literal value of key=value will not be split.
  • AccessDenied on StartJobRun — the caller is missing glue:StartJobRun on arn:aws:glue:{region}:{account}:job/{jobName}.
  • Run reaches TIMEOUT — increase Glue Timeout (Minutes) on the Polysync job, or revisit worker type / DPU allocation in the Glue console.