LDX hubLDX hub
  • Pricing
  • Documentation
  • API Reference
Information
Files
    List uploaded filesgetUpload a filepostGet file infogetDownload file contentget
Uploads
    Create an upload sessionpostAppend a chunk to an uploadput
StructFlow Models
    List available modelsget
StructFlow Jobs
    List StructFlow job historygetCreate a StructFlow jobpostGet StructFlow job status and resultsget
RefineLoop Models
    List available modelsget
RefineLoop Jobs
    List RefineLoop job historygetCreate a RefineLoop jobpostGet RefineLoop job status and resultsget
RenderOCR Engines
    List available enginesget
RenderOCR Jobs
    List RenderOCR job historygetCreate a RenderOCR jobpostGet RenderOCR job status and resultsget
CastDoc Engines
    List available enginesget
CastDoc Jobs
    List CastDoc job historygetCreate a CastDoc jobpostGet CastDoc job status and resultsget
ExtractDoc Engines
    List available enginesget
ExtractDoc Jobs
    List ExtractDoc job historygetCreate an ExtractDoc jobpostGet ExtractDoc job status and resultsget
MCP
    MCP Serverpost
Schemas
powered by Zuplo
LDX hub API
LDX hub API

ExtractDoc Jobs

Create and manage ExtractDoc text extraction jobs


List ExtractDoc job history

GET
https://gw.ldxhub.io
/extractdoc/jobs

Returns a list of jobs for the authenticated user, ordered by creation date descending. Use GET /extractdoc/jobs/{job_id} to retrieve full details for a specific job.

List ExtractDoc job history › Headers

Authorization
​string · required

The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

List ExtractDoc job history › Responses

A list of jobs

​ExtractDocJobSummary[]
GET/extractdoc/jobs
curl https://gw.ldxhub.io/extractdoc/jobs \ --header 'Authorization: Bearer <token>'
Example Responses
{ "data": [ { "job_id": "akv0z92bmdwphs95dxclndk3erh3x8jy", "file_id": "akv0ghq3ys9epxbpzi5xsig4gi61cl59", "engine": "ki/extract", "status": "completed", "progress": 100, "created_at": "2026-04-28T22:58:54Z", "updated_at": "2026-04-28T22:59:01Z", "completed_at": "2026-04-28T22:59:00Z", "expires_at": "2026-04-29T22:59:00Z", "usage": { "total_input_characters": 12345, "processed_input_characters": 12345, "skipped_characters": 0, "input_pages": 5 } } ] }
json
application/json

Create an ExtractDoc job

POST
https://gw.ldxhub.io
/extractdoc/jobs

Enqueues an asynchronous text extraction job. Upload your file via POST /files first, then submit the file_id here. Returns 202 Accepted immediately with a job_id. Poll GET /extractdoc/jobs/{job_id} to check completion. When completed, download the extracted output via GET /files/{output_file_id}/content.

Supported input formats: PDF, DOCX, XLSX, PPTX. The input format is automatically detected from the uploaded file.

Create an ExtractDoc job › Headers

Authorization
​string · required

The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

Create an ExtractDoc job › Request Body

ExtractDocCreateJobRequest
engine
​string · required

Engine ID obtained from GET /extractdoc/engines

file_id
​string · required

File ID of the input file to process, obtained from POST /files. Supported input formats: PDF, DOCX, XLSX, PPTX. The input format is automatically detected from the uploaded file.

output_format
​string · enum · required

Output format. Use 'text' for plain text, or 'jsonl' for a single-line JSON object compatible with StructFlow input.

Enum values:
text
jsonl

Create an ExtractDoc job › Responses

Job accepted

Full job details. Returned by GET /extractdoc/jobs/{job_id}.
ExtractDocJobDetail
job_id
​string
file_id
​string

Input file ID

engine
​string
status
​string · enum
Enum values:
queued
processing
completed
failed
progress
​number · float
output_file_id
​string

File ID of the extracted output. Present only when status is completed. Download via GET /files/{output_file_id}/content.

​JobError

Job-level error. Present only when job status is failed.

created_at
​string · date-time
updated_at
​string · date-time
completed_at
​string · date-time
expires_at
​string · date-time
​Usage
POST/extractdoc/jobs
curl https://gw.ldxhub.io/extractdoc/jobs \ --request POST \ --header 'Content-Type: application/json' \ --header 'Authorization: Bearer <token>' \ --data '{ "engine": "ki/extract", "file_id": "ako5rgx309ehwlbqprs8s2xxnsp0y85q", "output_format": "jsonl" }'
Example Request Body
{ "engine": "ki/extract", "file_id": "ako5rgx309ehwlbqprs8s2xxnsp0y85q", "output_format": "jsonl" }
json
application/json
Example Responses
{ "job_id": "akv0z92bmdwphs95dxclndk3erh3x8jy", "engine": "ki/extract", "status": "queued", "progress": 0, "created_at": "2026-04-28T22:58:54Z", "updated_at": "2026-04-28T22:58:54Z", "expires_at": "2026-04-29T22:58:54Z" }
json
application/json

Get ExtractDoc job status and results

GET
https://gw.ldxhub.io
/extractdoc/jobs/{job_id}

Returns the current status of an ExtractDoc job. Poll this endpoint until status is completed or failed. Recommended polling interval: 1-5 seconds. When status is completed, use output_file_id to download the extracted output via GET /files/{output_file_id}/content. Results are retained for a limited period after completion (expires_at).

Get ExtractDoc job status and results › path Parameters

job_id
​string · required

The unique identifier of the job

Get ExtractDoc job status and results › Headers

Authorization
​string · required

The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

Get ExtractDoc job status and results › Responses

Job details

Full job details. Returned by GET /extractdoc/jobs/{job_id}.
ExtractDocJobDetail
job_id
​string
file_id
​string

Input file ID

engine
​string
status
​string · enum
Enum values:
queued
processing
completed
failed
progress
​number · float
output_file_id
​string

File ID of the extracted output. Present only when status is completed. Download via GET /files/{output_file_id}/content.

​JobError

Job-level error. Present only when job status is failed.

created_at
​string · date-time
updated_at
​string · date-time
completed_at
​string · date-time
expires_at
​string · date-time
​Usage
GET/extractdoc/jobs/{job_id}
curl https://gw.ldxhub.io/extractdoc/jobs/:job_id \ --header 'Authorization: Bearer <token>'
Example Responses
{ "job_id": "akv0z92bmdwphs95dxclndk3erh3x8jy", "file_id": "akv0ghq3ys9epxbpzi5xsig4gi61cl59", "engine": "ki/extract", "status": "processing", "progress": 50, "created_at": "2026-04-28T22:58:54Z", "updated_at": "2026-04-28T22:58:57Z", "expires_at": "2026-04-29T22:58:54Z" }
json
application/json

ExtractDoc EnginesMCP