Hanzo API Documentation

Authentication & privileges

Authentication is provided via Basic Auth, for anything non-trivial we recommend you create a separate user for your service.
Browsable API

All endpoints should be accessible via their regular URLs in HTML form thanks to our browsable API.
Concurrency control

Concurrency control ensures the correct processing of data under concurrent operations by clients.

We implement optimistic concurrency control using DRF-extensions, and their documentation describes the purpose and workflow pretty well: http://chibisov.github.io/drf-extensions/docs/#concurrency-control

Typical flow:
- Retrieve crawl => 200 (with ETag header)
- Patch crawl (with updated entry_points and If-Match header) => 412
- Retrieve crawl again => 200 (with newer ETag header)
- Patch crawl (with updated entry_pointsand newer If-Match header) => 200
Error handling

Any errors that occur are reported under an error key in the response. Each key in the dictionary will be the field name, and the values will be lists of strings of any error messages corresponding to that field. The non_field_errors key may also be present, and will list any general validation errors.
```
{
  "docs": "https://portal.hanzoarchives.com/api/docs",
  "error": {
    "organization": [
      "This field is required."
    ],
    "name": [
      "This field is required."
    ]
  }
}
```

Identifiers

We use consistent identifiers across all of our endpoints.

Whenever referencing an object relation, you should find that the identifier used is as per the following quick reference guide:

Name           Identifier
------------   ----------
Archive Unit   crawlkey (Organization code/Archive Unit name)
Crawl          uuid
Export         uuid
Plugin         slug
Scope          slug
Organization   code
User           username

Pagination

Requests that return multiple items will be paginated to 100 items by default. You can specify further pages with the ?page parameter. For some resources, you can also set a custom page size up to 1000 with the ?per_page parameter.
```
curl 'https://portal.hanzoarchives.com/api/crawls?page=2&per_page=100'
```
Note that page numbering is 1-based and that omitting the ?page parameter will return the first page.

For more information on our pagination implementation, check out out GitHub's guide on Traversing with Pagination, from which ours is based upon.

Link header

The pagination info is included in the Link header. It is important to follow these Link header values instead of constructing your own URLs.
```
Link: <https://portal.hanzoarchives.com/api/crawls?page=2&per_page=100>; rel="next",
      <https://portal.hanzoarchives.com/api/crawls?page=10&per_page=100>; rel="last"
```
Linebreak is included for readability.

This Link response header contains one or more Hypermedia link relations, some of which may require expansion as URI templates.

The possible rel values are:
```
Name    Description
next    The link relation for the immediate next page of results.
last    The link relation for the last page of results.
first   The link relation for the first page of results.
prev    The link relation for the immediate previous page of results.
```
Meta headers

In addition to the link header, we also expose headers that describe the page the response represents and totals for the full queryset.
```
Name            Description
X-Page          The page number
X-Per-Page      The number of results per page
X-Total         The total number of results
X-Total-Pages   The total number of pages
```
Plugins

Archives subscribe to a particular plugin and scope. The plugins and scopes available to your organization depend on what has been set up for you by our engineers.

For more information regarding what plugins, scopes and settings are available to your organization sign into your account, or use the list plugins endpoint.
Request a capture

An archive within Hanzo Archives is the top-level representation of your capture i.e. it stores all of the information required in order to perform a crawl of the website(s) you wish to capture. Archives track a plugin (and a scope of the plugin) which instructs our crawler how to interact with the content you want to capture, additionally this plugin/scope also yields settings which are stored against the archive.

A crawl in Hanzo Archives represents a capture of your archive at a given point in time, the bulk of the configuration is typically performed on the archive so that performing an additional crawl is relatively trivial.

You may have seen the request a capture form on the portal website, what this form automates for you is the creation of an archive, a crawl within it, and an export of that crawl - which is essentially the same as how the API works:

Create an archive unit (read more)
POST /api/archive-units
```
{
  "name": "Example Archive",
  "crawlkey": "EXAMPLE/Example Archive", # generated from {{ organization code }}/{{ name }} if omitted
  "organization": "EXAMPLE",
  "entry_points": ["http://example.com"] # copied from seeds if omitted
  "seeds": ["http://example.com"]
  "plugin": "webpage",
  "scope": "one_page_and_one_hop",
  "settings": {
    "all_video": "on",
    "include_referered": "yes"
  },
  "tags": ["my first archive"],
  "teams": ["managers"]
}
```
For more information on what settings you'll need to pass for your chosen plugin/scope see the settings documentation.

Create a crawl (read more)
POST /api/crawls
```
{
  "organization": "EXAMPLE",
  "archive_unit": "EXAMPLE/Example Archive",
  "status": "requested:user"
}
```
Create an export (read more)
POST /api/exports
```
{
  "name": "ESIV-1",
  "organization": "EXAMPLE",
  "crawl": "41f087bc-ae7f-4ce8-9bd8-83e9a5a19373",
  "status": "requested:user",
  "type": "load_file"
}
```
Settings

Settings for the crawler are varied depending on your chosen plugin/scope combination, any custom configuration set up for your organization, and/or whether you or a Hanzo engineer has customised your Archive Unit.

If you want to see what settings you need to pass when either creating or updating an Archive Unit, you can perform an OPTIONS request on either the list archive units or update an archive unit endpoints.

You'll find the information you need within the JSON structure at:

actions.(POST|PUT).plugin.choices[your plugin].scopes[your scope].settings

Any fields marked as field_required and with no defaultvalue specified will need to be passed via the settings property in key: value in your create/partial update/update request.

Validation is performed on when an Archive Unit is saved to ensure that the settings required by the plugin (and scope) have been supplied. If required settings are omitted or invalid values are passed, the API will return an error describing the problem(s).

List archive units

GET /api/archive-units

Query params

name	string
slug	string
organization	code

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

name string
crawlkey string	A unique identifier for this archive, constructed from {organization_code}/{name}
organization code
entry_points array[string] (optional)	An array of URLs from which any new crawls can be entered via native access
seeds array[string] (optional)	An array of URLs the crawler starts from for any new crawls
metadata object (optional)	Additional user metadata store
notes string (optional)
jira_issue string (read only)
jira_status string (read only)
autoexport boolean (optional)
plugin slug (optional)
scope slug (optional)
settings object (optional)	Settings to be passed to the crawler (keys required depends on the plugin/scope)
tags array (optional)
teams array (optional)
url string (read only)
portal string (deprecated) (read only)
portal_url string (read only)
settings_url string (read only)
plugin_module_settings object (optional)
plugin_modules array (optional)
created_at datetime (read only)
updated_at datetime (read only)

Create an archive unit

POST /api/archive-units

Form params

name	string
crawlkey	string	A unique identifier for this archive, constructed from {organization_code}/{name}
organization	code
entry_points	array[string]	An array of URLs from which any new crawls can be entered via native access (optional)
seeds	array[string]	An array of URLs the crawler starts from for any new crawls (optional)
metadata	object	Additional user metadata store (optional)
notes	string	(optional)
autoexport	boolean	(optional)
plugin	slug	(optional)
scope	slug	(optional)
settings	object	Settings to be passed to the crawler (keys required depends on the plugin/scope) (optional)
tags	array[name]	(optional)
teams	array[slug]	(optional)
plugin_module_settings	object	(optional)
plugin_modules	array[package_name]	(optional)

Response messages

201	The archive unit was created
400	The fields name, organisation must make a unique set
401	Not authenticated

Show samples Hide samples

Request examples

Create an archive

{
  "name": "Example Archive",
  "organization": "EXAMPLE",
  "seeds": [
    "http://www.example.com",
    "http://blog.example.com"
  ],
  "plugin": "webpage",
  "scope": "one_page_and_one_hop",
  "settings": {
    "warcloader_url": "http://warcloader-1.hanzoman.com:1647/"
  }
}

Create an archive with auth, custom entry points and metadata

{
  "name": "Example Archive",
  "organization": "EXAMPLE",
  "auth": {
    "username": "test",
    "password": "test"
  },
  "entry_points": [
    "http://www.example.com"
  ],
  "seeds": [
    "http://www.example.com",
    "http://blog.example.com"
  ],
  "plugin": "webpage",
  "scope": "one_page_and_one_hop",
  "settings": {
    "warcloader_url": "http://warcloader-1.hanzoman.com:1647/"
  },
  "metadata": {
    "example_key": "example_value"
  }
}

Request schema

name string
crawlkey string	A unique identifier for this archive, constructed from {organization_code}/{name}
organization code
entry_points array[string] (optional)	An array of URLs from which any new crawls can be entered via native access
seeds array[string] (optional)	An array of URLs the crawler starts from for any new crawls
metadata object (optional)	Additional user metadata store
notes string (optional)
autoexport boolean (optional)
plugin slug (optional)
scope slug (optional)
settings object (optional)	Settings to be passed to the crawler (keys required depends on the plugin/scope)
tags array[name] (optional)
teams array[slug] (optional)
plugin_module_settings object (optional)
plugin_modules array[package_name] (optional)

Response examples

Create an archive

{
  "name": "Example Archive",
  "crawlkey": "EXAMPLE/Example Archive",
  "organization": "EXAMPLE",
  "entry_points": [
    "http://www.example.com",
    "http://blog.example.com"
  ],
  "seeds": [
    "http://www.example.com",
    "http://blog.example.com"
  ],
  "metadata": null,
  "notes": null,
  "jira_issue": null,
  "jira_status": null,
  "autoexport": false,
  "plugin": "webpage",
  "scope": "one_page_and_one_hop",
  "settings": {},
  "tags": [],
  "teams": [],
  "url": "https://portal.hanzoarchives.com/api/archive-units/EXAMPLE/Example%20Archive",
  "portal": "https://portal.hanzoarchives.com/captures/example-archive",
  "portal_url": "https://portal.hanzoarchives.com/captures/example-archive",
  "settings_url": "https://portal.hanzoarchives.com/api/archive-units/EXAMPLE/Example%20Archive/settings",
  "plugin_module_settings": null,
  "plugin_modules": [],
  "created_at": "2025-07-02T03:50:34Z",
  "updated_at": "2025-07-02T03:50:34Z"
}

Create an archive with auth, custom entry points and metadata

{
  "name": "Example Archive",
  "crawlkey": "EXAMPLE/Example Archive",
  "organization": "EXAMPLE",
  "entry_points": [
    "http://www.example.com"
  ],
  "seeds": [
    "http://www.example.com",
    "http://blog.example.com"
  ],
  "metadata": {
    "example_key": "example_value"
  },
  "notes": null,
  "jira_issue": null,
  "jira_status": null,
  "autoexport": false,
  "plugin": "webpage",
  "scope": "one_page_and_one_hop",
  "settings": {},
  "tags": [],
  "teams": [],
  "url": "https://portal.hanzoarchives.com/api/archive-units/EXAMPLE/Example%20Archive",
  "portal": "https://portal.hanzoarchives.com/captures/example-archive",
  "portal_url": "https://portal.hanzoarchives.com/captures/example-archive",
  "settings_url": "https://portal.hanzoarchives.com/api/archive-units/EXAMPLE/Example%20Archive/settings",
  "plugin_module_settings": null,
  "plugin_modules": [],
  "created_at": "2025-07-02T03:50:34Z",
  "updated_at": "2025-07-02T03:50:34Z"
}

Response schema

name string
crawlkey string	A unique identifier for this archive, constructed from {organization_code}/{name}
organization code
entry_points array[string] (optional)	An array of URLs from which any new crawls can be entered via native access
seeds array[string] (optional)	An array of URLs the crawler starts from for any new crawls
metadata object (optional)	Additional user metadata store
notes string (optional)
jira_issue string (read only)
jira_status string (read only)
autoexport boolean (optional)
plugin slug (optional)
scope slug (optional)
settings object (optional)	Settings to be passed to the crawler (keys required depends on the plugin/scope)
tags array (optional)
teams array (optional)
url string (read only)
portal string (deprecated) (read only)
portal_url string (read only)
settings_url string (read only)
plugin_module_settings object (optional)
plugin_modules array (optional)
created_at datetime (read only)
updated_at datetime (read only)

Retrieve an archive unit

GET /api/archive-units/{crawlkey}

Path params

crawlkey	string

Response messages

401	Not authenticated

Show samples Hide samples

Response examples

Retrieve an archive

{
  "name": "Example Archive",
  "crawlkey": "EXAMPLE/Example Archive",
  "organization": "EXAMPLE",
  "entry_points": [
    "http://example.com"
  ],
  "seeds": [
    "http://example.com"
  ],
  "metadata": null,
  "notes": null,
  "jira_issue": null,
  "jira_status": null,
  "autoexport": false,
  "plugin": "website",
  "scope": "default",
  "settings": {},
  "tags": [],
  "teams": [],
  "url": "https://portal.hanzoarchives.com/api/archive-units/EXAMPLE/Example%20Archive",
  "portal": "https://portal.hanzoarchives.com/captures/example-archive",
  "portal_url": "https://portal.hanzoarchives.com/captures/example-archive",
  "settings_url": "https://portal.hanzoarchives.com/api/archive-units/EXAMPLE/Example%20Archive/settings",
  "plugin_module_settings": null,
  "plugin_modules": [],
  "created_at": "2025-07-02T03:50:34Z",
  "updated_at": "2025-07-02T03:50:34Z"
}

Response schema

name string
crawlkey string	A unique identifier for this archive, constructed from {organization_code}/{name}
organization code
entry_points array[string] (optional)	An array of URLs from which any new crawls can be entered via native access
seeds array[string] (optional)	An array of URLs the crawler starts from for any new crawls
metadata object (optional)	Additional user metadata store
notes string (optional)
jira_issue string (read only)
jira_status string (read only)
autoexport boolean (optional)
plugin slug (optional)
scope slug (optional)
settings object (optional)	Settings to be passed to the crawler (keys required depends on the plugin/scope)
tags array (optional)
teams array (optional)
url string (read only)
portal string (deprecated) (read only)
portal_url string (read only)
settings_url string (read only)
plugin_module_settings object (optional)
plugin_modules array (optional)
created_at datetime (read only)
updated_at datetime (read only)

Partial update an archive unit

PATCH /api/archive-units/{crawlkey}

Path params

crawlkey	string

Form params

name	string	(optional)
crawlkey	string	A unique identifier for this archive, constructed from {organization_code}/{name} (optional)
organization	code	(optional)
entry_points	array[string]	An array of URLs from which any new crawls can be entered via native access (optional)
seeds	array[string]	An array of URLs the crawler starts from for any new crawls (optional)
metadata	object	Additional user metadata store (optional)
notes	string	(optional)
autoexport	boolean	(optional)
plugin	slug	(optional)
scope	slug	(optional)
settings	object	Settings to be passed to the crawler (keys required depends on the plugin/scope) (optional)
tags	array[name]	(optional)
teams	array[slug]	(optional)
plugin_module_settings	object	(optional)
plugin_modules	array[package_name]	(optional)

Response messages

200	The archive unit was updated
400	The fields name, organisation must make a unique set
401	Not authenticated
412	The archive unit has been updated since you last retrieved it

Show samples Hide samples

Request schema

name string (optional)
crawlkey string (optional)	A unique identifier for this archive, constructed from {organization_code}/{name}
organization code (optional)
entry_points array[string] (optional)	An array of URLs from which any new crawls can be entered via native access
seeds array[string] (optional)	An array of URLs the crawler starts from for any new crawls
metadata object (optional)	Additional user metadata store
notes string (optional)
autoexport boolean (optional)
plugin slug (optional)
scope slug (optional)
settings object (optional)	Settings to be passed to the crawler (keys required depends on the plugin/scope)
tags array[name] (optional)
teams array[slug] (optional)
plugin_module_settings object (optional)
plugin_modules array[package_name] (optional)

Response schema

name string
crawlkey string	A unique identifier for this archive, constructed from {organization_code}/{name}
organization code
entry_points array[string] (optional)	An array of URLs from which any new crawls can be entered via native access
seeds array[string] (optional)	An array of URLs the crawler starts from for any new crawls
metadata object (optional)	Additional user metadata store
notes string (optional)
jira_issue string (read only)
jira_status string (read only)
autoexport boolean (optional)
plugin slug (optional)
scope slug (optional)
settings object (optional)	Settings to be passed to the crawler (keys required depends on the plugin/scope)
tags array (optional)
teams array (optional)
url string (read only)
portal string (deprecated) (read only)
portal_url string (read only)
settings_url string (read only)
plugin_module_settings object (optional)
plugin_modules array (optional)
created_at datetime (read only)
updated_at datetime (read only)

Update an archive unit

PUT /api/archive-units/{crawlkey}

Path params

crawlkey	string

Form params

name	string
crawlkey	string	A unique identifier for this archive, constructed from {organization_code}/{name}
organization	code
entry_points	array[string]	An array of URLs from which any new crawls can be entered via native access (optional)
seeds	array[string]	An array of URLs the crawler starts from for any new crawls (optional)
metadata	object	Additional user metadata store (optional)
notes	string	(optional)
autoexport	boolean	(optional)
plugin	slug	(optional)
scope	slug	(optional)
settings	object	Settings to be passed to the crawler (keys required depends on the plugin/scope) (optional)
tags	array[name]	(optional)
teams	array[slug]	(optional)
plugin_module_settings	object	(optional)
plugin_modules	array[package_name]	(optional)

Response messages

200	The archive unit was updated
400	The fields name, organisation must make a unique set
401	Not authenticated
412	The archive unit has been updated since you last retrieved it

Show samples Hide samples

Request schema

name string
crawlkey string	A unique identifier for this archive, constructed from {organization_code}/{name}
organization code
entry_points array[string] (optional)	An array of URLs from which any new crawls can be entered via native access
seeds array[string] (optional)	An array of URLs the crawler starts from for any new crawls
metadata object (optional)	Additional user metadata store
notes string (optional)
autoexport boolean (optional)
plugin slug (optional)
scope slug (optional)
settings object (optional)	Settings to be passed to the crawler (keys required depends on the plugin/scope)
tags array[name] (optional)
teams array[slug] (optional)
plugin_module_settings object (optional)
plugin_modules array[package_name] (optional)

Response schema

name string
crawlkey string	A unique identifier for this archive, constructed from {organization_code}/{name}
organization code
entry_points array[string] (optional)	An array of URLs from which any new crawls can be entered via native access
seeds array[string] (optional)	An array of URLs the crawler starts from for any new crawls
metadata object (optional)	Additional user metadata store
notes string (optional)
jira_issue string (read only)
jira_status string (read only)
autoexport boolean (optional)
plugin slug (optional)
scope slug (optional)
settings object (optional)	Settings to be passed to the crawler (keys required depends on the plugin/scope)
tags array (optional)
teams array (optional)
url string (read only)
portal string (deprecated) (read only)
portal_url string (read only)
settings_url string (read only)
plugin_module_settings object (optional)
plugin_modules array (optional)
created_at datetime (read only)
updated_at datetime (read only)

Delete an archive unit

DELETE /api/archive-units/{crawlkey}

Path params

crawlkey	string

Response messages

204	The archive unit was deleted
401	Not authenticated

List archive unit settings

GET /api/archive-units/{archive_unit_crawlkey}/settings

Path params

archive_unit_crawlkey	string

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

key string
value string (optional)
field_name string (optional)
field_description string (optional)
field_type choice (choices: ['text', 'integer', 'url', 'email', 'choice', 'boolean']) (optional)
field_options object (optional)
field_required boolean (optional)
type string (read only)
url string (read only)

Retrieve an archive unit setting

GET /api/archive-units/{archive_unit_crawlkey}/settings/{key}

Path params

archive_unit_crawlkey	string
key	string

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

key string
value string (optional)
field_name string (optional)
field_description string (optional)
field_type choice (choices: ['text', 'integer', 'url', 'email', 'choice', 'boolean']) (optional)
field_options object (optional)
field_required boolean (optional)
type string (read only)
url string (read only)

List crawls

GET /api/crawls

Query params

name	string
archive_unit	crawlkey
organization	code
status	string
captured_before	datetime
captured_after	datetime
created_before	datetime
created_after	datetime
aggregate	boolean
draft	boolean
partial	boolean
completed_before	datetime
completed_after	datetime

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

name string
uuid string
archive_unit crawlkey
organization code (read only)
status string (optional)
entry_points array[object] (optional)	An array of objects with the URI from which the crawl can be entered via native access as well as the UUID of the page instance
driver string (read only)	Address of the crawl driver
seeds array[string] (optional)	An array of URLs the crawler starts from
crawldata object (read only)	A metadata object maintained by the crawler
metadata object (optional)	Additional user metadata store
aggregate boolean (optional)	Whether this crawl is an aggregation of one or more other crawls (requires `components`)
partial boolean (optional)	Whether this crawl is a partial capture i.e. custom seeds/settings were supplied specifically for use in an aggregation
components array (optional)
plugin object (read only)	Plugin name, slug, scope and sha as derived on creation
settings object (optional)	Settings derived from the settings cascade on creation
processing int (read only)	The number of pages processing
remaining int (read only)	The number of pages remaining
captured int (read only)	The number of pages captured
errored int (read only)	The number of pages errored
excluded int (read only)	The number of pages excluded
url string (read only)
attachments_url string (read only)
portal string (deprecated) (read only)
portal_url string (read only)
captured_at datetime (read only)	The date that the crawler started capturing
capture_last_active_at datetime (read only)	The date that the crawler last reported activity
first_completed_at datetime (optional)
last_completed_at datetime (optional)
storage_state choice (choices: ['unknown', 'hot', 'cooling', 'cold', 'repack']) (optional)
nearline_storage_after_date datetime (optional)
retrieved_until datetime (optional)
nearline_storage_metadata object (optional)
created_at datetime (read only)
updated_at datetime (read only)

Create a crawl

POST /api/crawls

Form params

name	string
uuid	string
archive_unit	crawlkey
status	string	(optional)
entry_points	array[object]	An array of objects with the URI from which the crawl can be entered via native access as well as the UUID of the page instance (optional)
seeds	array[string]	An array of URLs the crawler starts from (optional)
metadata	object	Additional user metadata store (optional)
aggregate	boolean	Whether this crawl is an aggregation of one or more other crawls (requires `components`) (optional)
partial	boolean	Whether this crawl is a partial capture i.e. custom seeds/settings were supplied specifically for use in an aggregation (optional)
components	array[uuid]	(optional)
settings	object	Settings derived from the settings cascade on creation (optional)
first_completed_at	datetime	(optional)
last_completed_at	datetime	(optional)
storage_state	choice	(choices: ['unknown', 'hot', 'cooling', 'cold', 'repack']) (optional)
nearline_storage_after_date	datetime	(optional)
retrieved_until	datetime	(optional)
nearline_storage_metadata	object	(optional)

Response messages

201	The crawl was created
400	All entry points must be an object with a uri
400	Requested crawls must not have a captured at date
400	Complete crawls must have at least one entry point
400	Complete crawls must have a captured at date
401	Not authenticated

Show samples Hide samples

Request examples

Create a crawl

{
  "archive_unit": "EXAMPLE/Example Archive",
  "status": "requested:user"
}

Create an aggregate crawl

{
  "archive_unit": "EXAMPLE/Example Archive",
  "entry_points": [
    {
      "uri": "http://blog.example.com/new-post-1"
    },
    {
      "uri": "http://blog.example.com/new-post-2"
    },
    {
      "uri": "http://blog.example.com/new-post-3"
    }
  ],
  "status": "requested:user",
  "aggregate": true,
  "components": [
    "a0204d5a-1ec9-4247-b0c8-5bc47be2c134",
    "eb259f2d-62a4-4ae1-a69c-e63ba7410775",
    "66e9f3b2-9bb2-47a8-8fa1-6c2457d11c96"
  ]
}

Create a partial crawl

{
  "archive_unit": "EXAMPLE/Example Archive",
  "entry_points": [
    {
      "uri": "http://blog.example.com/new-post-1"
    },
    {
      "uri": "http://blog.example.com/new-post-2"
    }
  ],
  "seeds": [
    "http://blog.example.com/new-post-1",
    "http://blog.example.com/new-post-2"
  ],
  "status": "requested:user",
  "partial": true
}

Request schema

name string
uuid string
archive_unit crawlkey
status string (optional)
entry_points array[object] (optional)	An array of objects with the URI from which the crawl can be entered via native access as well as the UUID of the page instance
seeds array[string] (optional)	An array of URLs the crawler starts from
metadata object (optional)	Additional user metadata store
aggregate boolean (optional)	Whether this crawl is an aggregation of one or more other crawls (requires `components`)
partial boolean (optional)	Whether this crawl is a partial capture i.e. custom seeds/settings were supplied specifically for use in an aggregation
components array[uuid] (optional)
settings object (optional)	Settings derived from the settings cascade on creation
first_completed_at datetime (optional)
last_completed_at datetime (optional)
storage_state choice (optional) (choices: ['unknown', 'hot', 'cooling', 'cold', 'repack'])
nearline_storage_after_date datetime (optional)
retrieved_until datetime (optional)
nearline_storage_metadata object (optional)

Response examples

Create a crawl

{
  "name": "example-example-archive-202507020350",
  "uuid": "46177e12-ad1f-4d97-9a27-926e26b4ec73",
  "archive_unit": "EXAMPLE/Example Archive",
  "organization": "EXAMPLE",
  "status": "requested:user",
  "entry_points": [
    {
      "uri": "http://example.com"
    }
  ],
  "driver": null,
  "seeds": [
    "http://example.com"
  ],
  "crawldata": null,
  "metadata": null,
  "aggregate": false,
  "partial": false,
  "components": [],
  "plugin": {
    "requires_qa": false,
    "name": "Website",
    "sha": null,
    "requires_eng": false,
    "scope": {
      "name": "Default",
      "slug": "default"
    },
    "slug": "website"
  },
  "settings": {
    "warcloader_ingestor_image": "icr.io/chronicle-prod/warcloader/ingestor:1e2b71cd",
    "page_handler_image": "icr.io/chronicle-prod/hanzo-page-handler:5b3ce3b6",
    "single_crawl_per_crawldb": "True",
    "customer_code": "EXAMPLE",
    "db_instance_type": "t3.micro",
    "warcloader_postgres_image": "icr.io/chronicle-prod/warcloader/postgres:1e2b71cd",
    "restrict_domains": "on",
    "job_server_security_groups": "prod/jobservers",
    "db_security_groups": "prod/frontiers",
    "reset_errors": "on",
    "ntp_server": "169.254.169.123",
    "task_log_queue": "s3://hanzo.software/task_queues",
    "portal_api_endpoint": "https://portal.hanzoarchives.com/api/",
    "thomas_image": "icr.io/chronicle-prod/thomas-the-crawl-engine:f0a71682",
    "include_referered": "off",
    "warcloader_timeout": "1200",
    "mitmproxy_image": "icr.io/chronicle-prod/hanzo-mitmproxy:94aaf1c3",
    "proxy_image": "icr.io/chronicle-prod/hanzo-qt-warcproxy:a274c6db",
    "entrypoint": [
      "http://example.com"
    ],
    "frontier_image": "icr.io/chronicle-prod/miyamoto/frontier:5a284946",
    "warcloader_url": "http://warcloader.inf.hanzoman.com:1647",
    "job_server_instance_type": "t3.large",
    "max_depth": "2",
    "extract_handler_image": "icr.io/chronicle-prod/hanzo-extract-handler:58d8ca9a",
    "instance_manager_url": "http://instance-manager.inf.hanzoman.com:1666/",
    "tags": [],
    "frontier_db_image": "icr.io/chronicle-prod/miyamoto/frontier-db:4ec4b271",
    "max_workers": "5",
    "final_frontier_image": "icr.io/chronicle-prod/final-frontier:dde45433",
    "crawl_id": "46177e12-ad1f-4d97-9a27-926e26b4ec73",
    "chromium_image": "icr.io/chronicle-prod/chrome:107.0.5304.87-1_fonts",
    "job_server_ami": "ami-0d8f8c6131ba8d0b0",
    "arkwright_image": "icr.io/chronicle-prod/mr-arkwright:ibm-cloud___20241209_120232",
    "warcloader_queue_server_image": "icr.io/chronicle-prod/warcloader/queue_server:1e2b71cd",
    "manifest_path": "s3://hanzo.manifests/",
    "global_setup": "off",
    "crawl": "example-example-archive-202507020350",
    "customer": "",
    "warcloader_aggregator_image": "icr.io/chronicle-prod/warcloader/aggregator:1e2b71cd",
    "snapshot_video_use_proxy": "no",
    "seeds": [
      "http://example.com"
    ],
    "output_path": "s3://hanzoenterprise/RaC/",
    "au": "Example Archive",
    "capture_scope": "default",
    "job_server_subnets": "prod/semi-private/*",
    "db_ami": "ami-0875fac396b63e4e1",
    "warcloader_crons_image": "icr.io/chronicle-prod/warcloader/crons:1e2b71cd",
    "db_subnets": "prod/private/*"
  },
  "processing": null,
  "remaining": null,
  "captured": null,
  "errored": null,
  "excluded": null,
  "url": "https://portal.hanzoarchives.com/api/crawls/46177e12-ad1f-4d97-9a27-926e26b4ec73",
  "attachments_url": "https://portal.hanzoarchives.com/api/crawls/46177e12-ad1f-4d97-9a27-926e26b4ec73/attachments",
  "portal": "https://portal.hanzoarchives.com/captures/example-archive/46177e12-ad1f-4d97-9a27-926e26b4ec73",
  "portal_url": "https://portal.hanzoarchives.com/captures/example-archive/46177e12-ad1f-4d97-9a27-926e26b4ec73",
  "captured_at": null,
  "capture_last_active_at": null,
  "first_completed_at": null,
  "last_completed_at": null,
  "storage_state": "hot",
  "nearline_storage_after_date": null,
  "retrieved_until": null,
  "nearline_storage_metadata": null,
  "created_at": "2025-07-02T03:50:34Z",
  "updated_at": "2025-07-02T03:50:34Z"
}

Create an aggregate crawl

{
  "name": "example-example-archive-202507020350",
  "uuid": "b9f7a815-0b4d-424c-95ec-6cd55b4d377a",
  "archive_unit": "EXAMPLE/Example Archive",
  "organization": "EXAMPLE",
  "status": "requested:user",
  "entry_points": [
    {
      "uri": "http://blog.example.com/new-post-1"
    },
    {
      "uri": "http://blog.example.com/new-post-2"
    },
    {
      "uri": "http://blog.example.com/new-post-3"
    }
  ],
  "driver": null,
  "seeds": [
    "http://example.com"
  ],
  "crawldata": null,
  "metadata": null,
  "aggregate": true,
  "partial": false,
  "components": [],
  "plugin": {
    "requires_qa": false,
    "name": "Website",
    "sha": null,
    "requires_eng": false,
    "scope": {
      "name": "Default",
      "slug": "default"
    },
    "slug": "website"
  },
  "settings": {
    "warcloader_ingestor_image": "icr.io/chronicle-prod/warcloader/ingestor:1e2b71cd",
    "page_handler_image": "icr.io/chronicle-prod/hanzo-page-handler:5b3ce3b6",
    "single_crawl_per_crawldb": "True",
    "customer_code": "EXAMPLE",
    "db_instance_type": "t3.micro",
    "warcloader_postgres_image": "icr.io/chronicle-prod/warcloader/postgres:1e2b71cd",
    "restrict_domains": "on",
    "job_server_security_groups": "prod/jobservers",
    "db_security_groups": "prod/frontiers",
    "reset_errors": "on",
    "ntp_server": "169.254.169.123",
    "task_log_queue": "s3://hanzo.software/task_queues",
    "portal_api_endpoint": "https://portal.hanzoarchives.com/api/",
    "thomas_image": "icr.io/chronicle-prod/thomas-the-crawl-engine:f0a71682",
    "include_referered": "off",
    "warcloader_timeout": "1200",
    "mitmproxy_image": "icr.io/chronicle-prod/hanzo-mitmproxy:94aaf1c3",
    "proxy_image": "icr.io/chronicle-prod/hanzo-qt-warcproxy:a274c6db",
    "entrypoint": [
      "http://blog.example.com/new-post-1",
      "http://blog.example.com/new-post-2",
      "http://blog.example.com/new-post-3"
    ],
    "frontier_image": "icr.io/chronicle-prod/miyamoto/frontier:5a284946",
    "warcloader_url": "http://warcloader.inf.hanzoman.com:1647",
    "job_server_instance_type": "t3.large",
    "max_depth": "2",
    "extract_handler_image": "icr.io/chronicle-prod/hanzo-extract-handler:58d8ca9a",
    "instance_manager_url": "http://instance-manager.inf.hanzoman.com:1666/",
    "tags": [],
    "frontier_db_image": "icr.io/chronicle-prod/miyamoto/frontier-db:4ec4b271",
    "max_workers": "5",
    "final_frontier_image": "icr.io/chronicle-prod/final-frontier:dde45433",
    "crawl_id": "b9f7a815-0b4d-424c-95ec-6cd55b4d377a",
    "chromium_image": "icr.io/chronicle-prod/chrome:107.0.5304.87-1_fonts",
    "job_server_ami": "ami-0d8f8c6131ba8d0b0",
    "arkwright_image": "icr.io/chronicle-prod/mr-arkwright:ibm-cloud___20241209_120232",
    "warcloader_queue_server_image": "icr.io/chronicle-prod/warcloader/queue_server:1e2b71cd",
    "manifest_path": "s3://hanzo.manifests/",
    "global_setup": "off",
    "crawl": "example-example-archive-202507020350",
    "customer": "",
    "warcloader_aggregator_image": "icr.io/chronicle-prod/warcloader/aggregator:1e2b71cd",
    "snapshot_video_use_proxy": "no",
    "seeds": [
      "http://example.com"
    ],
    "output_path": "s3://hanzoenterprise/RaC/",
    "au": "Example Archive",
    "capture_scope": "default",
    "job_server_subnets": "prod/semi-private/*",
    "db_ami": "ami-0875fac396b63e4e1",
    "warcloader_crons_image": "icr.io/chronicle-prod/warcloader/crons:1e2b71cd",
    "db_subnets": "prod/private/*"
  },
  "processing": null,
  "remaining": null,
  "captured": null,
  "errored": null,
  "excluded": null,
  "url": "https://portal.hanzoarchives.com/api/crawls/b9f7a815-0b4d-424c-95ec-6cd55b4d377a",
  "attachments_url": "https://portal.hanzoarchives.com/api/crawls/b9f7a815-0b4d-424c-95ec-6cd55b4d377a/attachments",
  "portal": "https://portal.hanzoarchives.com/captures/example-archive/b9f7a815-0b4d-424c-95ec-6cd55b4d377a",
  "portal_url": "https://portal.hanzoarchives.com/captures/example-archive/b9f7a815-0b4d-424c-95ec-6cd55b4d377a",
  "captured_at": null,
  "capture_last_active_at": null,
  "first_completed_at": null,
  "last_completed_at": null,
  "storage_state": "hot",
  "nearline_storage_after_date": null,
  "retrieved_until": null,
  "nearline_storage_metadata": null,
  "created_at": "2025-07-02T03:50:34Z",
  "updated_at": "2025-07-02T03:50:34Z"
}

Create a partial crawl

{
  "name": "example-example-archive-202507020350",
  "uuid": "46c26554-1bd2-4048-bd5d-21830e42e1e6",
  "archive_unit": "EXAMPLE/Example Archive",
  "organization": "EXAMPLE",
  "status": "requested:user",
  "entry_points": [
    {
      "uri": "http://blog.example.com/new-post-1"
    },
    {
      "uri": "http://blog.example.com/new-post-2"
    }
  ],
  "driver": null,
  "seeds": [
    "http://blog.example.com/new-post-1",
    "http://blog.example.com/new-post-2"
  ],
  "crawldata": null,
  "metadata": null,
  "aggregate": false,
  "partial": true,
  "components": [],
  "plugin": {
    "requires_qa": false,
    "name": "Website",
    "sha": null,
    "requires_eng": false,
    "scope": {
      "name": "Default",
      "slug": "default"
    },
    "slug": "website"
  },
  "settings": {
    "warcloader_ingestor_image": "icr.io/chronicle-prod/warcloader/ingestor:1e2b71cd",
    "page_handler_image": "icr.io/chronicle-prod/hanzo-page-handler:5b3ce3b6",
    "single_crawl_per_crawldb": "True",
    "customer_code": "EXAMPLE",
    "db_instance_type": "t3.micro",
    "warcloader_postgres_image": "icr.io/chronicle-prod/warcloader/postgres:1e2b71cd",
    "restrict_domains": "on",
    "job_server_security_groups": "prod/jobservers",
    "db_security_groups": "prod/frontiers",
    "reset_errors": "on",
    "ntp_server": "169.254.169.123",
    "task_log_queue": "s3://hanzo.software/task_queues",
    "portal_api_endpoint": "https://portal.hanzoarchives.com/api/",
    "thomas_image": "icr.io/chronicle-prod/thomas-the-crawl-engine:f0a71682",
    "include_referered": "off",
    "warcloader_timeout": "1200",
    "mitmproxy_image": "icr.io/chronicle-prod/hanzo-mitmproxy:94aaf1c3",
    "proxy_image": "icr.io/chronicle-prod/hanzo-qt-warcproxy:a274c6db",
    "entrypoint": [
      "http://blog.example.com/new-post-1",
      "http://blog.example.com/new-post-2"
    ],
    "frontier_image": "icr.io/chronicle-prod/miyamoto/frontier:5a284946",
    "warcloader_url": "http://warcloader.inf.hanzoman.com:1647",
    "job_server_instance_type": "t3.large",
    "max_depth": "2",
    "extract_handler_image": "icr.io/chronicle-prod/hanzo-extract-handler:58d8ca9a",
    "instance_manager_url": "http://instance-manager.inf.hanzoman.com:1666/",
    "tags": [],
    "frontier_db_image": "icr.io/chronicle-prod/miyamoto/frontier-db:4ec4b271",
    "max_workers": "5",
    "final_frontier_image": "icr.io/chronicle-prod/final-frontier:dde45433",
    "crawl_id": "46c26554-1bd2-4048-bd5d-21830e42e1e6",
    "chromium_image": "icr.io/chronicle-prod/chrome:107.0.5304.87-1_fonts",
    "job_server_ami": "ami-0d8f8c6131ba8d0b0",
    "arkwright_image": "icr.io/chronicle-prod/mr-arkwright:ibm-cloud___20241209_120232",
    "warcloader_queue_server_image": "icr.io/chronicle-prod/warcloader/queue_server:1e2b71cd",
    "manifest_path": "s3://hanzo.manifests/",
    "global_setup": "off",
    "crawl": "example-example-archive-202507020350",
    "customer": "",
    "warcloader_aggregator_image": "icr.io/chronicle-prod/warcloader/aggregator:1e2b71cd",
    "snapshot_video_use_proxy": "no",
    "seeds": [
      "http://blog.example.com/new-post-1",
      "http://blog.example.com/new-post-2"
    ],
    "output_path": "s3://hanzoenterprise/RaC/",
    "au": "Example Archive",
    "capture_scope": "default",
    "job_server_subnets": "prod/semi-private/*",
    "db_ami": "ami-0875fac396b63e4e1",
    "warcloader_crons_image": "icr.io/chronicle-prod/warcloader/crons:1e2b71cd",
    "db_subnets": "prod/private/*"
  },
  "processing": null,
  "remaining": null,
  "captured": null,
  "errored": null,
  "excluded": null,
  "url": "https://portal.hanzoarchives.com/api/crawls/46c26554-1bd2-4048-bd5d-21830e42e1e6",
  "attachments_url": "https://portal.hanzoarchives.com/api/crawls/46c26554-1bd2-4048-bd5d-21830e42e1e6/attachments",
  "portal": "https://portal.hanzoarchives.com/captures/example-archive/46c26554-1bd2-4048-bd5d-21830e42e1e6",
  "portal_url": "https://portal.hanzoarchives.com/captures/example-archive/46c26554-1bd2-4048-bd5d-21830e42e1e6",
  "captured_at": null,
  "capture_last_active_at": null,
  "first_completed_at": null,
  "last_completed_at": null,
  "storage_state": "hot",
  "nearline_storage_after_date": null,
  "retrieved_until": null,
  "nearline_storage_metadata": null,
  "created_at": "2025-07-02T03:50:34Z",
  "updated_at": "2025-07-02T03:50:34Z"
}

Response schema

name string
uuid string
archive_unit crawlkey
organization code (read only)
status string (optional)
entry_points array[object] (optional)	An array of objects with the URI from which the crawl can be entered via native access as well as the UUID of the page instance
driver string (read only)	Address of the crawl driver
seeds array[string] (optional)	An array of URLs the crawler starts from
crawldata object (read only)	A metadata object maintained by the crawler
metadata object (optional)	Additional user metadata store
aggregate boolean (optional)	Whether this crawl is an aggregation of one or more other crawls (requires `components`)
partial boolean (optional)	Whether this crawl is a partial capture i.e. custom seeds/settings were supplied specifically for use in an aggregation
components array (optional)
plugin object (read only)	Plugin name, slug, scope and sha as derived on creation
settings object (optional)	Settings derived from the settings cascade on creation
processing int (read only)	The number of pages processing
remaining int (read only)	The number of pages remaining
captured int (read only)	The number of pages captured
errored int (read only)	The number of pages errored
excluded int (read only)	The number of pages excluded
url string (read only)
attachments_url string (read only)
portal string (deprecated) (read only)
portal_url string (read only)
captured_at datetime (read only)	The date that the crawler started capturing
capture_last_active_at datetime (read only)	The date that the crawler last reported activity
first_completed_at datetime (optional)
last_completed_at datetime (optional)
storage_state choice (choices: ['unknown', 'hot', 'cooling', 'cold', 'repack']) (optional)
nearline_storage_after_date datetime (optional)
retrieved_until datetime (optional)
nearline_storage_metadata object (optional)
created_at datetime (read only)
updated_at datetime (read only)

Retrieve a crawl

GET /api/crawls/{uuid}

Path params

uuid	string

Response messages

401	Not authenticated

Show samples Hide samples

Response examples

Retrieve a crawl

{
  "name": "-example-archive-202507020350",
  "uuid": "c5b0f8fb-ad10-4006-80ed-160615e6b8b4",
  "archive_unit": "EXAMPLE/Example Archive",
  "organization": "EXAMPLE",
  "status": "crawling",
  "entry_points": [
    {
      "uri": "http://example.com"
    }
  ],
  "driver": null,
  "seeds": [
    "http://example.com"
  ],
  "crawldata": null,
  "metadata": null,
  "aggregate": false,
  "partial": false,
  "components": [],
  "plugin": {
    "requires_qa": false,
    "name": "Website",
    "sha": null,
    "requires_eng": false,
    "scope": {
      "name": "Default",
      "slug": "default"
    },
    "slug": "website"
  },
  "settings": {
    "warcloader_ingestor_image": "icr.io/chronicle-prod/warcloader/ingestor:1e2b71cd",
    "page_handler_image": "icr.io/chronicle-prod/hanzo-page-handler:5b3ce3b6",
    "single_crawl_per_crawldb": "True",
    "customer_code": "EXAMPLE",
    "db_instance_type": "t3.micro",
    "warcloader_postgres_image": "icr.io/chronicle-prod/warcloader/postgres:1e2b71cd",
    "restrict_domains": "on",
    "job_server_security_groups": "prod/jobservers",
    "db_security_groups": "prod/frontiers",
    "reset_errors": "on",
    "ntp_server": "169.254.169.123",
    "task_log_queue": "s3://hanzo.software/task_queues",
    "portal_api_endpoint": "https://portal.hanzoarchives.com/api/",
    "thomas_image": "icr.io/chronicle-prod/thomas-the-crawl-engine:f0a71682",
    "include_referered": "off",
    "warcloader_timeout": "1200",
    "mitmproxy_image": "icr.io/chronicle-prod/hanzo-mitmproxy:94aaf1c3",
    "proxy_image": "icr.io/chronicle-prod/hanzo-qt-warcproxy:a274c6db",
    "entrypoint": [
      "http://example.com"
    ],
    "frontier_image": "icr.io/chronicle-prod/miyamoto/frontier:5a284946",
    "warcloader_url": "http://warcloader.inf.hanzoman.com:1647",
    "job_server_instance_type": "t3.large",
    "max_depth": "2",
    "extract_handler_image": "icr.io/chronicle-prod/hanzo-extract-handler:58d8ca9a",
    "instance_manager_url": "http://instance-manager.inf.hanzoman.com:1666/",
    "tags": [],
    "frontier_db_image": "icr.io/chronicle-prod/miyamoto/frontier-db:4ec4b271",
    "max_workers": "5",
    "final_frontier_image": "icr.io/chronicle-prod/final-frontier:dde45433",
    "crawl_id": "c5b0f8fb-ad10-4006-80ed-160615e6b8b4",
    "chromium_image": "icr.io/chronicle-prod/chrome:107.0.5304.87-1_fonts",
    "job_server_ami": "ami-0d8f8c6131ba8d0b0",
    "arkwright_image": "icr.io/chronicle-prod/mr-arkwright:ibm-cloud___20241209_120232",
    "warcloader_queue_server_image": "icr.io/chronicle-prod/warcloader/queue_server:1e2b71cd",
    "manifest_path": "s3://hanzo.manifests/",
    "global_setup": "off",
    "crawl": "-example-archive-202507020350",
    "customer": "",
    "warcloader_aggregator_image": "icr.io/chronicle-prod/warcloader/aggregator:1e2b71cd",
    "snapshot_video_use_proxy": "no",
    "seeds": [
      "http://example.com"
    ],
    "output_path": "s3://hanzoenterprise/RaC/",
    "au": "Example Archive",
    "capture_scope": "default",
    "job_server_subnets": "prod/semi-private/*",
    "db_ami": "ami-0875fac396b63e4e1",
    "warcloader_crons_image": "icr.io/chronicle-prod/warcloader/crons:1e2b71cd",
    "db_subnets": "prod/private/*"
  },
  "processing": 12,
  "remaining": 243,
  "captured": 145,
  "errored": 15,
  "excluded": 214,
  "url": "https://portal.hanzoarchives.com/api/crawls/c5b0f8fb-ad10-4006-80ed-160615e6b8b4",
  "attachments_url": "https://portal.hanzoarchives.com/api/crawls/c5b0f8fb-ad10-4006-80ed-160615e6b8b4/attachments",
  "portal": "https://portal.hanzoarchives.com/captures/example-archive/c5b0f8fb-ad10-4006-80ed-160615e6b8b4",
  "portal_url": "https://portal.hanzoarchives.com/captures/example-archive/c5b0f8fb-ad10-4006-80ed-160615e6b8b4",
  "captured_at": "2025-07-02T03:50:34Z",
  "capture_last_active_at": "2025-07-02T03:50:34Z",
  "first_completed_at": null,
  "last_completed_at": null,
  "storage_state": "hot",
  "nearline_storage_after_date": null,
  "retrieved_until": null,
  "nearline_storage_metadata": null,
  "created_at": "2025-07-02T03:50:34Z",
  "updated_at": "2025-07-02T03:50:34Z"
}

Response schema

name string
uuid string
archive_unit crawlkey
organization code (read only)
status string (optional)
entry_points array[object] (optional)	An array of objects with the URI from which the crawl can be entered via native access as well as the UUID of the page instance
driver string (read only)	Address of the crawl driver
seeds array[string] (optional)	An array of URLs the crawler starts from
crawldata object (read only)	A metadata object maintained by the crawler
metadata object (optional)	Additional user metadata store
aggregate boolean (optional)	Whether this crawl is an aggregation of one or more other crawls (requires `components`)
partial boolean (optional)	Whether this crawl is a partial capture i.e. custom seeds/settings were supplied specifically for use in an aggregation
components array (optional)
plugin object (read only)	Plugin name, slug, scope and sha as derived on creation
settings object (optional)	Settings derived from the settings cascade on creation
processing int (read only)	The number of pages processing
remaining int (read only)	The number of pages remaining
captured int (read only)	The number of pages captured
errored int (read only)	The number of pages errored
excluded int (read only)	The number of pages excluded
url string (read only)
attachments_url string (read only)
portal string (deprecated) (read only)
portal_url string (read only)
captured_at datetime (read only)	The date that the crawler started capturing
capture_last_active_at datetime (read only)	The date that the crawler last reported activity
first_completed_at datetime (optional)
last_completed_at datetime (optional)
storage_state choice (choices: ['unknown', 'hot', 'cooling', 'cold', 'repack']) (optional)
nearline_storage_after_date datetime (optional)
retrieved_until datetime (optional)
nearline_storage_metadata object (optional)
created_at datetime (read only)
updated_at datetime (read only)

Partial update a crawl

PATCH /api/crawls/{uuid}

Path params

uuid	string

Form params

name	string	(optional)
uuid	string	(optional)
archive_unit	crawlkey	(optional)
status	string	(optional)
entry_points	array[object]	An array of objects with the URI from which the crawl can be entered via native access as well as the UUID of the page instance (optional)
seeds	array[string]	An array of URLs the crawler starts from (optional)
metadata	object	Additional user metadata store (optional)
aggregate	boolean	Whether this crawl is an aggregation of one or more other crawls (requires `components`) (optional)
partial	boolean	Whether this crawl is a partial capture i.e. custom seeds/settings were supplied specifically for use in an aggregation (optional)
components	array[uuid]	(optional)
settings	object	Settings derived from the settings cascade on creation (optional)
first_completed_at	datetime	(optional)
last_completed_at	datetime	(optional)
storage_state	choice	(choices: ['unknown', 'hot', 'cooling', 'cold', 'repack']) (optional)
nearline_storage_after_date	datetime	(optional)
retrieved_until	datetime	(optional)
nearline_storage_metadata	object	(optional)

Response messages

200	The crawl was updated
400	All entry points must be an object with a uri
400	Requested crawls must not have a captured at date
400	Complete crawls must have at least one entry point
400	Complete crawls must have a captured at date
401	Not authenticated
412	The crawl has been updated since you last retrieved it

Show samples Hide samples

Request schema

name string (optional)
uuid string (optional)
archive_unit crawlkey (optional)
status string (optional)
entry_points array[object] (optional)	An array of objects with the URI from which the crawl can be entered via native access as well as the UUID of the page instance
seeds array[string] (optional)	An array of URLs the crawler starts from
metadata object (optional)	Additional user metadata store
aggregate boolean (optional)	Whether this crawl is an aggregation of one or more other crawls (requires `components`)
partial boolean (optional)	Whether this crawl is a partial capture i.e. custom seeds/settings were supplied specifically for use in an aggregation
components array[uuid] (optional)
settings object (optional)	Settings derived from the settings cascade on creation
first_completed_at datetime (optional)
last_completed_at datetime (optional)
storage_state choice (optional) (choices: ['unknown', 'hot', 'cooling', 'cold', 'repack'])
nearline_storage_after_date datetime (optional)
retrieved_until datetime (optional)
nearline_storage_metadata object (optional)

Response schema

name string
uuid string
archive_unit crawlkey
organization code (read only)
status string (optional)
entry_points array[object] (optional)	An array of objects with the URI from which the crawl can be entered via native access as well as the UUID of the page instance
driver string (read only)	Address of the crawl driver
seeds array[string] (optional)	An array of URLs the crawler starts from
crawldata object (read only)	A metadata object maintained by the crawler
metadata object (optional)	Additional user metadata store
aggregate boolean (optional)	Whether this crawl is an aggregation of one or more other crawls (requires `components`)
partial boolean (optional)	Whether this crawl is a partial capture i.e. custom seeds/settings were supplied specifically for use in an aggregation
components array (optional)
plugin object (read only)	Plugin name, slug, scope and sha as derived on creation
settings object (optional)	Settings derived from the settings cascade on creation
processing int (read only)	The number of pages processing
remaining int (read only)	The number of pages remaining
captured int (read only)	The number of pages captured
errored int (read only)	The number of pages errored
excluded int (read only)	The number of pages excluded
url string (read only)
attachments_url string (read only)
portal string (deprecated) (read only)
portal_url string (read only)
captured_at datetime (read only)	The date that the crawler started capturing
capture_last_active_at datetime (read only)	The date that the crawler last reported activity
first_completed_at datetime (optional)
last_completed_at datetime (optional)
storage_state choice (choices: ['unknown', 'hot', 'cooling', 'cold', 'repack']) (optional)
nearline_storage_after_date datetime (optional)
retrieved_until datetime (optional)
nearline_storage_metadata object (optional)
created_at datetime (read only)
updated_at datetime (read only)

Update a crawl

PUT /api/crawls/{uuid}

Path params

uuid	string

Form params

name	string
uuid	string
archive_unit	crawlkey
status	string	(optional)
entry_points	array[object]	An array of objects with the URI from which the crawl can be entered via native access as well as the UUID of the page instance (optional)
seeds	array[string]	An array of URLs the crawler starts from (optional)
metadata	object	Additional user metadata store (optional)
aggregate	boolean	Whether this crawl is an aggregation of one or more other crawls (requires `components`) (optional)
partial	boolean	Whether this crawl is a partial capture i.e. custom seeds/settings were supplied specifically for use in an aggregation (optional)
components	array[uuid]	(optional)
settings	object	Settings derived from the settings cascade on creation (optional)
first_completed_at	datetime	(optional)
last_completed_at	datetime	(optional)
storage_state	choice	(choices: ['unknown', 'hot', 'cooling', 'cold', 'repack']) (optional)
nearline_storage_after_date	datetime	(optional)
retrieved_until	datetime	(optional)
nearline_storage_metadata	object	(optional)

Response messages

200	The crawl was updated
400	All entry points must be an object with a uri
400	Requested crawls must not have a captured at date
400	Complete crawls must have at least one entry point
400	Complete crawls must have a captured at date
401	Not authenticated
412	The crawl has been updated since you last retrieved it

Show samples Hide samples

Request schema

name string
uuid string
archive_unit crawlkey
status string (optional)
entry_points array[object] (optional)	An array of objects with the URI from which the crawl can be entered via native access as well as the UUID of the page instance
seeds array[string] (optional)	An array of URLs the crawler starts from
metadata object (optional)	Additional user metadata store
aggregate boolean (optional)	Whether this crawl is an aggregation of one or more other crawls (requires `components`)
partial boolean (optional)	Whether this crawl is a partial capture i.e. custom seeds/settings were supplied specifically for use in an aggregation
components array[uuid] (optional)
settings object (optional)	Settings derived from the settings cascade on creation
first_completed_at datetime (optional)
last_completed_at datetime (optional)
storage_state choice (optional) (choices: ['unknown', 'hot', 'cooling', 'cold', 'repack'])
nearline_storage_after_date datetime (optional)
retrieved_until datetime (optional)
nearline_storage_metadata object (optional)

Response schema

name string
uuid string
archive_unit crawlkey
organization code (read only)
status string (optional)
entry_points array[object] (optional)	An array of objects with the URI from which the crawl can be entered via native access as well as the UUID of the page instance
driver string (read only)	Address of the crawl driver
seeds array[string] (optional)	An array of URLs the crawler starts from
crawldata object (read only)	A metadata object maintained by the crawler
metadata object (optional)	Additional user metadata store
aggregate boolean (optional)	Whether this crawl is an aggregation of one or more other crawls (requires `components`)
partial boolean (optional)	Whether this crawl is a partial capture i.e. custom seeds/settings were supplied specifically for use in an aggregation
components array (optional)
plugin object (read only)	Plugin name, slug, scope and sha as derived on creation
settings object (optional)	Settings derived from the settings cascade on creation
processing int (read only)	The number of pages processing
remaining int (read only)	The number of pages remaining
captured int (read only)	The number of pages captured
errored int (read only)	The number of pages errored
excluded int (read only)	The number of pages excluded
url string (read only)
attachments_url string (read only)
portal string (deprecated) (read only)
portal_url string (read only)
captured_at datetime (read only)	The date that the crawler started capturing
capture_last_active_at datetime (read only)	The date that the crawler last reported activity
first_completed_at datetime (optional)
last_completed_at datetime (optional)
storage_state choice (choices: ['unknown', 'hot', 'cooling', 'cold', 'repack']) (optional)
nearline_storage_after_date datetime (optional)
retrieved_until datetime (optional)
nearline_storage_metadata object (optional)
created_at datetime (read only)
updated_at datetime (read only)

Delete a crawl

DELETE /api/crawls/{uuid}

Path params

uuid	string

Response messages

204	The crawl was deleted
401	Not authenticated

List crawl attachments

GET /api/crawls/{crawl_uuid}/attachments

Path params

crawl_uuid	string

Query params

name	string
type	string

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

name string
description string (optional)
uri string	A file or s3 scheme URI to the file
size int	The size of the file in bytes
type string	The mimetype of the file
url string (read only)
credentials object (optional)	Any credentials required to open the attachment
metadata object (optional)	Additional user metadata store
created_at datetime (read only)
updated_at datetime (read only)

Create a crawl attachment

POST /api/crawls/{crawl_uuid}/attachments

Path params

crawl_uuid	string

Form params

name	string
description	string	(optional)
uri	string	A file or s3 scheme URI to the file
size	int	The size of the file in bytes
type	string	The mimetype of the file
credentials	object	Any credentials required to open the attachment (optional)
metadata	object	Additional user metadata store (optional)

Response messages

201	The attachment was created
401	Not authenticated

Show samples Hide samples

Request schema

name string
description string (optional)
uri string	A file or s3 scheme URI to the file
size int	The size of the file in bytes
type string	The mimetype of the file
credentials object (optional)	Any credentials required to open the attachment
metadata object (optional)	Additional user metadata store

Response schema

name string
description string (optional)
uri string	A file or s3 scheme URI to the file
size int	The size of the file in bytes
type string	The mimetype of the file
url string (read only)
credentials object (optional)	Any credentials required to open the attachment
metadata object (optional)	Additional user metadata store
created_at datetime (read only)
updated_at datetime (read only)

Retrieve a crawl attachment

GET /api/crawls/{crawl_uuid}/attachments/{name}

Path params

crawl_uuid	string
name	string

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

name string
description string (optional)
uri string	A file or s3 scheme URI to the file
size int	The size of the file in bytes
type string	The mimetype of the file
url string (read only)
credentials object (optional)	Any credentials required to open the attachment
metadata object (optional)	Additional user metadata store
created_at datetime (read only)
updated_at datetime (read only)

Partial update a crawl attachment

PATCH /api/crawls/{crawl_uuid}/attachments/{name}

Path params

crawl_uuid	string
name	string

Form params

name	string	(optional)
description	string	(optional)
uri	string	A file or s3 scheme URI to the file (optional)
size	int	The size of the file in bytes (optional)
type	string	The mimetype of the file (optional)
credentials	object	Any credentials required to open the attachment (optional)
metadata	object	Additional user metadata store (optional)

Response messages

200	The attachment was updated
401	Not authenticated
412	The attachment has been updated since you last retrieved it

Show samples Hide samples

Request schema

name string (optional)
description string (optional)
uri string (optional)	A file or s3 scheme URI to the file
size int (optional)	The size of the file in bytes
type string (optional)	The mimetype of the file
credentials object (optional)	Any credentials required to open the attachment
metadata object (optional)	Additional user metadata store

Response schema

name string
description string (optional)
uri string	A file or s3 scheme URI to the file
size int	The size of the file in bytes
type string	The mimetype of the file
url string (read only)
credentials object (optional)	Any credentials required to open the attachment
metadata object (optional)	Additional user metadata store
created_at datetime (read only)
updated_at datetime (read only)

Update a crawl attachment

PUT /api/crawls/{crawl_uuid}/attachments/{name}

Path params

crawl_uuid	string
name	string

Form params

name	string
description	string	(optional)
uri	string	A file or s3 scheme URI to the file
size	int	The size of the file in bytes
type	string	The mimetype of the file
credentials	object	Any credentials required to open the attachment (optional)
metadata	object	Additional user metadata store (optional)

Response messages

200	The attachment was updated
401	Not authenticated
412	The attachment has been updated since you last retrieved it

Show samples Hide samples

Request schema

name string
description string (optional)
uri string	A file or s3 scheme URI to the file
size int	The size of the file in bytes
type string	The mimetype of the file
credentials object (optional)	Any credentials required to open the attachment
metadata object (optional)	Additional user metadata store

Response schema

name string
description string (optional)
uri string	A file or s3 scheme URI to the file
size int	The size of the file in bytes
type string	The mimetype of the file
url string (read only)
credentials object (optional)	Any credentials required to open the attachment
metadata object (optional)	Additional user metadata store
created_at datetime (read only)
updated_at datetime (read only)

Delete a crawl attachment

DELETE /api/crawls/{crawl_uuid}/attachments/{name}

Path params

crawl_uuid	string
name	string

Response messages

204	The attachment was deleted
401	Not authenticated

List plugins

GET /api/crawler/plugins

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

name string
slug string
auth_provider slug (optional)
url string (read only)
scopes_url string (read only)
settings_url string (read only)

Retrieve a plugin

GET /api/crawler/plugins/{slug}

Path params

slug	string

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

name string
slug string
auth_provider slug (optional)
url string (read only)
scopes_url string (read only)
settings_url string (read only)

Plugin raw

GET /api/crawler/plugins/{slug}/{sha}/{path}

Path params

slug	string
sha	string
path	string

Response messages

401	Not authenticated

List scopes

GET /api/crawler/plugins/{plugin_slug}/scopes

Path params

plugin_slug	string

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

name string
slug string
url string (read only)
settings_url string (read only)

Retrieve a scope

GET /api/crawler/plugins/{plugin_slug}/scopes/{slug}

Path params

plugin_slug	string
slug	string

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

name string
slug string
url string (read only)
settings_url string (read only)

List scope settings

GET /api/crawler/plugins/{plugin_slug}/scopes/{scope_slug}/settings

Path params

plugin_slug	string
scope_slug	string

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

key string
value string (optional)
field_name string (optional)
field_description string (optional)
field_type choice (choices: ['text', 'integer', 'url', 'email', 'choice', 'boolean']) (optional)
field_options object (optional)
field_required boolean (optional)
type string (read only)
url string (read only)

Retrieve a scope setting

GET /api/crawler/plugins/{plugin_slug}/scopes/{scope_slug}/settings/{key}

Path params

plugin_slug	string
scope_slug	string
key	string

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

key string
value string (optional)
field_name string (optional)
field_description string (optional)
field_type choice (choices: ['text', 'integer', 'url', 'email', 'choice', 'boolean']) (optional)
field_options object (optional)
field_required boolean (optional)
type string (read only)
url string (read only)

List plugin settings

GET /api/crawler/plugins/{plugin_slug}/settings

Path params

plugin_slug	string

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

key string
value string (optional)
field_name string (optional)
field_description string (optional)
field_type choice (choices: ['text', 'integer', 'url', 'email', 'choice', 'boolean']) (optional)
field_options object (optional)
field_required boolean (optional)
type string (read only)
url string (read only)

Retrieve a plugin setting

GET /api/crawler/plugins/{plugin_slug}/settings/{key}

Path params

plugin_slug	string
key	string

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

key string
value string (optional)
field_name string (optional)
field_description string (optional)
field_type choice (choices: ['text', 'integer', 'url', 'email', 'choice', 'boolean']) (optional)
field_options object (optional)
field_required boolean (optional)
type string (read only)
url string (read only)

List settings

GET /api/crawler/settings

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

key string
value string (optional)
field_name string (optional)
field_description string (optional)
field_type choice (choices: ['text', 'integer', 'url', 'email', 'choice', 'boolean']) (optional)
field_options object (optional)
field_required boolean (optional)
type string (read only)
url string (read only)

Retrieve a setting

GET /api/crawler/settings/{key}

Path params

key	string

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

key string
value string (optional)
field_name string (optional)
field_description string (optional)
field_type choice (choices: ['text', 'integer', 'url', 'email', 'choice', 'boolean']) (optional)
field_options object (optional)
field_required boolean (optional)
type string (read only)
url string (read only)

Sync plugin modules

POST /api/crawler/sync-plugin-modules

Response messages

201	The object was created
401	Not authenticated

Show samples Hide samples

Request schema

List exports

GET /api/exports

Query params

name	string
slug	string
crawl	uuid
archive_unit	crawlkey
organization	code
status	string

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

name string
uuid string
crawl uuid
crawl_details crawl (read only)
organization code
status string (optional)
type choice (choices: ['load_file'])	The type of export, a `load_file` export is a standardised concordance load file
credentials object (optional)	Any credentials required to open the export
url string (read only)
attachments_url string (read only)
portal string (deprecated) (read only)
portal_url string (read only)
attachments_count int (read only)
attachments_size int (read only)
exported_at datetime (optional)	The date that the exporter started exporting
created_at datetime (read only)
updated_at datetime (read only)

Create an export

POST /api/exports

Form params

name	string
uuid	string
crawl	uuid
organization	code
status	string	(optional)
type	choice	The type of export, a `load_file` export is a standardised concordance load file (choices: ['load_file'])
credentials	object	Any credentials required to open the export (optional)
exported_at	datetime	The date that the exporter started exporting (optional)

Response messages

201	The export was created
400	Uris must be an array of strings or null
400	Uris must be of the file
400	Requested exports must not have an exported at date
400	Complete exports must have at least one uri
400	Complete exports must have an exported at date
401	Not authenticated

Show samples Hide samples

Request examples

Create an export

{
  "name": "ESIV-1",
  "organization": "EXAMPLE",
  "crawl": "a6db8591-3735-4f1e-b72e-3574351b248e",
  "status": "requested:user",
  "type": "load_file"
}

Request schema

name string
uuid string
crawl uuid
organization code
status string (optional)
type choice (choices: ['load_file'])	The type of export, a `load_file` export is a standardised concordance load file
credentials object (optional)	Any credentials required to open the export
exported_at datetime (optional)	The date that the exporter started exporting

Response examples

Create an export

{
  "name": "ESIV-1",
  "uuid": "880e095b-d23d-436a-ad9a-3994f33e49ef",
  "crawl_details": "a6db8591-3735-4f1e-b72e-3574351b248e",
  "organization": "EXAMPLE",
  "status": "requested:user",
  "type": "load_file",
  "credentials": null,
  "url": "https://portal.hanzoarchives.com/api/exports/880e095b-d23d-436a-ad9a-3994f33e49ef",
  "attachments_url": "https://portal.hanzoarchives.com/api/exports/880e095b-d23d-436a-ad9a-3994f33e49ef/attachments",
  "portal": "https://portal.hanzoarchives.com/exports/880e095b-d23d-436a-ad9a-3994f33e49ef",
  "portal_url": "https://portal.hanzoarchives.com/exports/880e095b-d23d-436a-ad9a-3994f33e49ef",
  "exported_at": null,
  "created_at": "2025-07-02T03:50:34Z",
  "updated_at": "2025-07-02T03:50:34Z"
}

Response schema

name string
uuid string
crawl uuid
crawl_details crawl (read only)
organization code
status string (optional)
type choice (choices: ['load_file'])	The type of export, a `load_file` export is a standardised concordance load file
credentials object (optional)	Any credentials required to open the export
url string (read only)
attachments_url string (read only)
portal string (deprecated) (read only)
portal_url string (read only)
attachments_count int (read only)
attachments_size int (read only)
exported_at datetime (optional)	The date that the exporter started exporting
created_at datetime (read only)
updated_at datetime (read only)

Retrieve an export

GET /api/exports/{uuid}

Path params

uuid	string

Response messages

401	Not authenticated

Show samples Hide samples

Response examples

Retrieve an export

{
  "name": "ESIV-10",
  "uuid": "bcbfe519-15a0-4106-882b-ca4ca6064b2b",
  "crawl": "f7698519-fa24-4dff-92e0-a43a98017882",
  "crawl_details": {
    "name": "-example-archive-202507020350",
    "uuid": "f7698519-fa24-4dff-92e0-a43a98017882",
    "archive_unit": "EXAMPLE/Example Archive",
    "organization": "EXAMPLE",
    "status": "requested:user",
    "auth": null,
    "entry_points": [
      {
        "uri": "http://example.com"
      }
    ],
    "driver": null,
    "seeds": [
      "http://example.com"
    ],
    "crawldata": null,
    "metadata": null,
    "aggregate": false,
    "draft": false,
    "partial": false,
    "components": [],
    "plugin": {
      "requires_qa": false,
      "name": "Website",
      "sha": null,
      "requires_eng": false,
      "scope": {
        "name": "Default",
        "slug": "default"
      },
      "slug": "website"
    },
    "settings": {
      "warcloader_ingestor_image": "icr.io/chronicle-prod/warcloader/ingestor:1e2b71cd",
      "page_handler_image": "icr.io/chronicle-prod/hanzo-page-handler:5b3ce3b6",
      "single_crawl_per_crawldb": "True",
      "customer_code": "EXAMPLE",
      "db_instance_type": "t3.micro",
      "warcloader_postgres_image": "icr.io/chronicle-prod/warcloader/postgres:1e2b71cd",
      "restrict_domains": "on",
      "job_server_security_groups": "prod/jobservers",
      "db_security_groups": "prod/frontiers",
      "reset_errors": "on",
      "ntp_server": "169.254.169.123",
      "task_log_queue": "s3://hanzo.software/task_queues",
      "portal_api_endpoint": "https://portal.hanzoarchives.com/api/",
      "thomas_image": "icr.io/chronicle-prod/thomas-the-crawl-engine:f0a71682",
      "include_referered": "off",
      "warcloader_timeout": "1200",
      "mitmproxy_image": "icr.io/chronicle-prod/hanzo-mitmproxy:94aaf1c3",
      "proxy_image": "icr.io/chronicle-prod/hanzo-qt-warcproxy:a274c6db",
      "entrypoint": [
        "http://example.com"
      ],
      "frontier_image": "icr.io/chronicle-prod/miyamoto/frontier:5a284946",
      "warcloader_url": "http://warcloader.inf.hanzoman.com:1647",
      "job_server_instance_type": "t3.large",
      "max_depth": "2",
      "extract_handler_image": "icr.io/chronicle-prod/hanzo-extract-handler:58d8ca9a",
      "instance_manager_url": "http://instance-manager.inf.hanzoman.com:1666/",
      "tags": [],
      "frontier_db_image": "icr.io/chronicle-prod/miyamoto/frontier-db:4ec4b271",
      "max_workers": "5",
      "final_frontier_image": "icr.io/chronicle-prod/final-frontier:dde45433",
      "crawl_id": "f7698519-fa24-4dff-92e0-a43a98017882",
      "chromium_image": "icr.io/chronicle-prod/chrome:107.0.5304.87-1_fonts",
      "job_server_ami": "ami-0d8f8c6131ba8d0b0",
      "arkwright_image": "icr.io/chronicle-prod/mr-arkwright:ibm-cloud___20241209_120232",
      "warcloader_queue_server_image": "icr.io/chronicle-prod/warcloader/queue_server:1e2b71cd",
      "manifest_path": "s3://hanzo.manifests/",
      "global_setup": "off",
      "crawl": "-example-archive-202507020350",
      "customer": "",
      "warcloader_aggregator_image": "icr.io/chronicle-prod/warcloader/aggregator:1e2b71cd",
      "snapshot_video_use_proxy": "no",
      "seeds": [
        "http://example.com"
      ],
      "output_path": "s3://hanzoenterprise/RaC/",
      "au": "Example Archive",
      "capture_scope": "default",
      "job_server_subnets": "prod/semi-private/*",
      "db_ami": "ami-0875fac396b63e4e1",
      "warcloader_crons_image": "icr.io/chronicle-prod/warcloader/crons:1e2b71cd",
      "db_subnets": "prod/private/*"
    },
    "processing": 12,
    "remaining": 243,
    "captured": 145,
    "errored": 15,
    "excluded": 214,
    "url": "https://portal.hanzoarchives.com/api/crawls/f7698519-fa24-4dff-92e0-a43a98017882",
    "attachments_url": "https://portal.hanzoarchives.com/api/crawls/f7698519-fa24-4dff-92e0-a43a98017882/attachments",
    "portal": "https://portal.hanzoarchives.com/captures/example-archive/f7698519-fa24-4dff-92e0-a43a98017882",
    "portal_url": "https://portal.hanzoarchives.com/captures/example-archive/f7698519-fa24-4dff-92e0-a43a98017882",
    "captured_at": null,
    "capture_last_active_at": null,
    "first_completed_at": null,
    "last_completed_at": null,
    "storage_state": "hot",
    "nearline_storage_after_date": null,
    "retrieved_until": null,
    "nearline_storage_metadata": null,
    "created_at": "2025-07-02T03:50:34Z",
    "updated_at": "2025-07-02T03:50:34Z"
  },
  "organization": "EXAMPLE",
  "status": "exporting",
  "type": "load_file",
  "credentials": null,
  "url": "https://portal.hanzoarchives.com/api/exports/bcbfe519-15a0-4106-882b-ca4ca6064b2b",
  "attachments_url": "https://portal.hanzoarchives.com/api/exports/bcbfe519-15a0-4106-882b-ca4ca6064b2b/attachments",
  "portal": "https://portal.hanzoarchives.com/exports/esiv-10",
  "portal_url": "https://portal.hanzoarchives.com/exports/esiv-10",
  "exported_at": "2025-07-02T03:50:34Z",
  "created_at": "2025-07-02T03:50:34Z",
  "updated_at": "2025-07-02T03:50:34Z"
}

Response schema

name string
uuid string
crawl uuid
crawl_details crawl (read only)
organization code
status string (optional)
type choice (choices: ['load_file'])	The type of export, a `load_file` export is a standardised concordance load file
credentials object (optional)	Any credentials required to open the export
url string (read only)
attachments_url string (read only)
portal string (deprecated) (read only)
portal_url string (read only)
attachments_count int (read only)
attachments_size int (read only)
exported_at datetime (optional)	The date that the exporter started exporting
created_at datetime (read only)
updated_at datetime (read only)

Partial update an export

PATCH /api/exports/{uuid}

Path params

uuid	string

Form params

name	string	(optional)
uuid	string	(optional)
crawl	uuid	(optional)
organization	code	(optional)
status	string	(optional)
type	choice	The type of export, a `load_file` export is a standardised concordance load file (choices: ['load_file']) (optional)
credentials	object	Any credentials required to open the export (optional)
exported_at	datetime	The date that the exporter started exporting (optional)

Response messages

200	The export was updated
400	Uris must be an array of strings or null
400	Uris must be of the file
400	Requested exports must not have an exported at date
400	Complete exports must have at least one uri
400	Complete exports must have an exported at date
401	Not authenticated
412	The export has been updated since you last retrieved it

Show samples Hide samples

Request schema

name string (optional)
uuid string (optional)
crawl uuid (optional)
organization code (optional)
status string (optional)
type choice (optional) (choices: ['load_file'])	The type of export, a `load_file` export is a standardised concordance load file
credentials object (optional)	Any credentials required to open the export
exported_at datetime (optional)	The date that the exporter started exporting

Response schema

name string
uuid string
crawl uuid
crawl_details crawl (read only)
organization code
status string (optional)
type choice (choices: ['load_file'])	The type of export, a `load_file` export is a standardised concordance load file
credentials object (optional)	Any credentials required to open the export
url string (read only)
attachments_url string (read only)
portal string (deprecated) (read only)
portal_url string (read only)
attachments_count int (read only)
attachments_size int (read only)
exported_at datetime (optional)	The date that the exporter started exporting
created_at datetime (read only)
updated_at datetime (read only)

Update an export

PUT /api/exports/{uuid}

Path params

uuid	string

Form params

name	string
uuid	string
crawl	uuid
organization	code
status	string	(optional)
type	choice	The type of export, a `load_file` export is a standardised concordance load file (choices: ['load_file'])
credentials	object	Any credentials required to open the export (optional)
exported_at	datetime	The date that the exporter started exporting (optional)

Response messages

200	The export was updated
400	Uris must be an array of strings or null
400	Uris must be of the file
400	Requested exports must not have an exported at date
400	Complete exports must have at least one uri
400	Complete exports must have an exported at date
401	Not authenticated
412	The export has been updated since you last retrieved it

Show samples Hide samples

Request schema

name string
uuid string
crawl uuid
organization code
status string (optional)
type choice (choices: ['load_file'])	The type of export, a `load_file` export is a standardised concordance load file
credentials object (optional)	Any credentials required to open the export
exported_at datetime (optional)	The date that the exporter started exporting

Response schema

name string
uuid string
crawl uuid
crawl_details crawl (read only)
organization code
status string (optional)
type choice (choices: ['load_file'])	The type of export, a `load_file` export is a standardised concordance load file
credentials object (optional)	Any credentials required to open the export
url string (read only)
attachments_url string (read only)
portal string (deprecated) (read only)
portal_url string (read only)
attachments_count int (read only)
attachments_size int (read only)
exported_at datetime (optional)	The date that the exporter started exporting
created_at datetime (read only)
updated_at datetime (read only)

Delete an export

DELETE /api/exports/{uuid}

Path params

uuid	string

Response messages

204	The export was deleted
401	Not authenticated

List export attachments

GET /api/exports/{export_uuid}/attachments

Path params

export_uuid	string

Query params

name	string
type	string

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

name string
description string (optional)
uri string	A file or s3 scheme URI to the file
size int	The size of the file in bytes
type string	The mimetype of the file
url string (read only)
credentials object (optional)	Any credentials required to open the attachment
metadata object (optional)	Additional user metadata store
created_at datetime (read only)
updated_at datetime (read only)

Create an export attachment

POST /api/exports/{export_uuid}/attachments

Path params

export_uuid	string

Form params

name	string
description	string	(optional)
uri	string	A file or s3 scheme URI to the file
size	int	The size of the file in bytes
type	string	The mimetype of the file
credentials	object	Any credentials required to open the attachment (optional)
metadata	object	Additional user metadata store (optional)

Response messages

201	The attachment was created
401	Not authenticated

Show samples Hide samples

Request schema

name string
description string (optional)
uri string	A file or s3 scheme URI to the file
size int	The size of the file in bytes
type string	The mimetype of the file
credentials object (optional)	Any credentials required to open the attachment
metadata object (optional)	Additional user metadata store

Response schema

name string
description string (optional)
uri string	A file or s3 scheme URI to the file
size int	The size of the file in bytes
type string	The mimetype of the file
url string (read only)
credentials object (optional)	Any credentials required to open the attachment
metadata object (optional)	Additional user metadata store
created_at datetime (read only)
updated_at datetime (read only)

Retrieve an export attachment

GET /api/exports/{export_uuid}/attachments/{name}

Path params

export_uuid	string
name	string

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

name string
description string (optional)
uri string	A file or s3 scheme URI to the file
size int	The size of the file in bytes
type string	The mimetype of the file
url string (read only)
credentials object (optional)	Any credentials required to open the attachment
metadata object (optional)	Additional user metadata store
created_at datetime (read only)
updated_at datetime (read only)

Partial update an export attachment

PATCH /api/exports/{export_uuid}/attachments/{name}

Path params

export_uuid	string
name	string

Form params

name	string	(optional)
description	string	(optional)
uri	string	A file or s3 scheme URI to the file (optional)
size	int	The size of the file in bytes (optional)
type	string	The mimetype of the file (optional)
credentials	object	Any credentials required to open the attachment (optional)
metadata	object	Additional user metadata store (optional)

Response messages

200	The attachment was updated
401	Not authenticated
412	The attachment has been updated since you last retrieved it

Show samples Hide samples

Request schema

name string (optional)
description string (optional)
uri string (optional)	A file or s3 scheme URI to the file
size int (optional)	The size of the file in bytes
type string (optional)	The mimetype of the file
credentials object (optional)	Any credentials required to open the attachment
metadata object (optional)	Additional user metadata store

Response schema

name string
description string (optional)
uri string	A file or s3 scheme URI to the file
size int	The size of the file in bytes
type string	The mimetype of the file
url string (read only)
credentials object (optional)	Any credentials required to open the attachment
metadata object (optional)	Additional user metadata store
created_at datetime (read only)
updated_at datetime (read only)

Update an export attachment

PUT /api/exports/{export_uuid}/attachments/{name}

Path params

export_uuid	string
name	string

Form params

name	string
description	string	(optional)
uri	string	A file or s3 scheme URI to the file
size	int	The size of the file in bytes
type	string	The mimetype of the file
credentials	object	Any credentials required to open the attachment (optional)
metadata	object	Additional user metadata store (optional)

Response messages

200	The attachment was updated
401	Not authenticated
412	The attachment has been updated since you last retrieved it

Show samples Hide samples

Request schema

name string
description string (optional)
uri string	A file or s3 scheme URI to the file
size int	The size of the file in bytes
type string	The mimetype of the file
credentials object (optional)	Any credentials required to open the attachment
metadata object (optional)	Additional user metadata store

Response schema

name string
description string (optional)
uri string	A file or s3 scheme URI to the file
size int	The size of the file in bytes
type string	The mimetype of the file
url string (read only)
credentials object (optional)	Any credentials required to open the attachment
metadata object (optional)	Additional user metadata store
created_at datetime (read only)
updated_at datetime (read only)

Delete an export attachment

DELETE /api/exports/{export_uuid}/attachments/{name}

Path params

export_uuid	string
name	string

Response messages

204	The attachment was deleted
401	Not authenticated

Download an export attachment

GET /api/exports/{export_uuid}/attachments/{name}/download

Path params

export_uuid	string
name	string

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

name string
description string (optional)
uri string	A file or s3 scheme URI to the file
size int	The size of the file in bytes
type string	The mimetype of the file
url string (read only)
credentials object (optional)	Any credentials required to open the attachment
metadata object (optional)	Additional user metadata store
created_at datetime (read only)
updated_at datetime (read only)

List folders

GET /api/folders

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

name string
description string (optional)
slug string
uuid string (optional)
url string (read only)
portal_url string (read only)
created_by user (read only)
created_at datetime (read only)
updated_at datetime (read only)

Retrieve a folder

GET /api/folders/{uuid}

Path params

uuid	string

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

name string
description string (optional)
slug string
uuid string (optional)
url string (read only)
portal_url string (read only)
created_by user (read only)
created_at datetime (read only)
updated_at datetime (read only)

Partial update a folder

PATCH /api/folders/{uuid}

Path params

uuid	string

Form params

name	string	(optional)
description	string	(optional)
slug	string	(optional)
uuid	string	(optional)

Response messages

200	The folder was updated
401	Not authenticated
412	The folder has been updated since you last retrieved it

Show samples Hide samples

Request schema

name string (optional)
description string (optional)
slug string (optional)
uuid string (optional)

Response schema

name string
description string (optional)
slug string
uuid string (optional)
url string (read only)
portal_url string (read only)
created_by user (read only)
created_at datetime (read only)
updated_at datetime (read only)

List investigations

GET /api/investigations

Query params

uuid	string
job_id	string
organization	code
status	enum	(choices: [u'matching', u'matched', u'crawling', u'complete', u'error'])
updated_before	datetime
updated_after	datetime
created_before	datetime
created_after	datetime

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

name string
description string (optional)
status choice (choices: [u'matching', u'matched', u'crawling', u'complete', u'error']) (optional)
slug string
uuid string (optional)
job_id string (optional)
url string (read only)
portal_url string (read only)
created_by user (read only)
created_at datetime (read only)
updated_at datetime (read only)
teams array (read only)
is_archived boolean (optional)
s3_bucket string (read only)
profile_archive_units string (read only)
search_archive_units string (read only)
extra_information object (read only)

Create an investigation

POST /api/investigations

This is required because I want the detail view below to accept POST but I don't want the other detail view to accept POST. Ideally DRF would do that automatically for you.

Form params

name	string
description	string	(optional)
status	choice	(choices: [u'matching', u'matched', u'crawling', u'complete', u'error']) (optional)
slug	string
uuid	string	(optional)
job_id	string	(optional)
is_archived	boolean	(optional)

Response messages

201	The investigation was created
401	Not authenticated

Show samples Hide samples

Request schema

name string
description string (optional)
status choice (optional) (choices: [u'matching', u'matched', u'crawling', u'complete', u'error'])
slug string
uuid string (optional)
job_id string (optional)
is_archived boolean (optional)

Response schema

name string
description string (optional)
status choice (choices: [u'matching', u'matched', u'crawling', u'complete', u'error']) (optional)
slug string
uuid string (optional)
job_id string (optional)
url string (read only)
portal_url string (read only)
created_by user (read only)
created_at datetime (read only)
updated_at datetime (read only)
teams array (read only)
is_archived boolean (optional)
s3_bucket string (read only)
profile_archive_units string (read only)
search_archive_units string (read only)
extra_information object (read only)

Retrieve an investigation

GET /api/investigations/{uuid}

Path params

uuid	string

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

name string
description string (optional)
status choice (choices: [u'matching', u'matched', u'crawling', u'complete', u'error']) (optional)
slug string
uuid string (optional)
job_id string (optional)
url string (read only)
portal_url string (read only)
created_by user (read only)
created_at datetime (read only)
updated_at datetime (read only)
teams array (read only)
is_archived boolean (optional)
s3_bucket string (read only)
profile_archive_units string (read only)
search_archive_units string (read only)
extra_information object (read only)

Add search crawl an investigation

POST /api/investigations/{uuid}/add_search_crawl

Method for adding related search crawl to the investigation

Path params

uuid	string

Form params

name	string
description	string	(optional)
status	choice	(choices: [u'matching', u'matched', u'crawling', u'complete', u'error']) (optional)
slug	string
uuid	string	(optional)
job_id	string	(optional)
is_archived	boolean	(optional)

Response messages

201	The investigation was created
401	Not authenticated

Show samples Hide samples

Request schema

name string
description string (optional)
status choice (optional) (choices: [u'matching', u'matched', u'crawling', u'complete', u'error'])
slug string
uuid string (optional)
job_id string (optional)
is_archived boolean (optional)

Response schema

name string
description string (optional)
status choice (choices: [u'matching', u'matched', u'crawling', u'complete', u'error']) (optional)
slug string
uuid string (optional)
job_id string (optional)
url string (read only)
portal_url string (read only)
created_by user (read only)
created_at datetime (read only)
updated_at datetime (read only)
teams array (read only)
is_archived boolean (optional)
s3_bucket string (read only)
profile_archive_units string (read only)
search_archive_units string (read only)
extra_information object (read only)

Matched profiles an investigation

GET /api/investigations/{uuid}/matched_profiles

Path params

uuid	string

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

name string
description string (optional)
status choice (choices: [u'matching', u'matched', u'crawling', u'complete', u'error']) (optional)
slug string
uuid string (optional)
job_id string (optional)
url string (read only)
portal_url string (read only)
created_by user (read only)
created_at datetime (read only)
updated_at datetime (read only)
teams array (read only)
is_archived boolean (optional)
s3_bucket string (read only)
profile_archive_units string (read only)
search_archive_units string (read only)
extra_information object (read only)

Relevant crawls an investigation

GET /api/investigations/{uuid}/relevant_crawls

Path params

uuid	string

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

name string
description string (optional)
status choice (choices: [u'matching', u'matched', u'crawling', u'complete', u'error']) (optional)
slug string
uuid string (optional)
job_id string (optional)
url string (read only)
portal_url string (read only)
created_by user (read only)
created_at datetime (read only)
updated_at datetime (read only)
teams array (read only)
is_archived boolean (optional)
s3_bucket string (read only)
profile_archive_units string (read only)
search_archive_units string (read only)
extra_information object (read only)

List organizations

GET /api/organizations

Query params

name	string
slug	string
code	string
jira_reference	string

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

name string
slug string (read only)
code string
archive_units_count int (read only)
teams_count int (read only)
users_count int (read only)
has_captures boolean (optional)
has_change boolean (optional)
has_search boolean (optional)
url string (read only)
logo_url string (optional)
portal_url string (read only)

Create an organization

POST /api/organizations

Form params

name	string
code	string
has_captures	boolean	(optional)
has_change	boolean	(optional)
has_search	boolean	(optional)
logo_url	string	(optional)

Response messages

201	The organization was created
401	Not authenticated

Show samples Hide samples

Request schema

name string
code string
has_captures boolean (optional)
has_change boolean (optional)
has_search boolean (optional)
logo_url string (optional)

Response schema

name string
slug string (read only)
code string
archive_units_count int (read only)
teams_count int (read only)
users_count int (read only)
has_captures boolean (optional)
has_change boolean (optional)
has_search boolean (optional)
url string (read only)
logo_url string (optional)
portal_url string (read only)

Retrieve an organization

GET /api/organizations/{slug}

Path params

slug	string

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

name string
slug string (read only)
code string
archive_units_count int (read only)
teams_count int (read only)
users_count int (read only)
has_captures boolean (optional)
has_change boolean (optional)
has_search boolean (optional)
url string (read only)
logo_url string (optional)
portal_url string (read only)

Create core an organization

POST /api/organizations/{slug}/create-core

Path params

slug	string

Form params

name	string
code	string
has_captures	boolean	(optional)
has_change	boolean	(optional)
has_search	boolean	(optional)
logo_url	string	(optional)

Response messages

201	The organization was created
401	Not authenticated

Show samples Hide samples

Request schema

name string
code string
has_captures boolean (optional)
has_change boolean (optional)
has_search boolean (optional)
logo_url string (optional)

Response schema

name string
slug string (read only)
code string
archive_units_count int (read only)
teams_count int (read only)
users_count int (read only)
has_captures boolean (optional)
has_change boolean (optional)
has_search boolean (optional)
url string (read only)
logo_url string (optional)
portal_url string (read only)

List retrievals

GET /api/retrievals

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

uuid string (read only)
crawl uuid
initiated_date datetime (read only)
completed_date datetime (read only)
until_date datetime (read only)
duration_days int
initiated_by string (read only)
error string (optional)

Create a retrieval

POST /api/retrievals

Form params

crawl	uuid
duration_days	int
error	string	(optional)

Response messages

201	The retrieval was created
401	Not authenticated

Show samples Hide samples

Request schema

crawl uuid
duration_days int
error string (optional)

Response schema

uuid string (read only)
crawl uuid
initiated_date datetime (read only)
completed_date datetime (read only)
until_date datetime (read only)
duration_days int
initiated_by string (read only)
error string (optional)

Retrieve a retrieval

GET /api/retrievals/{uuid}

Path params

uuid	string

Response messages

401	Not authenticated

Show samples Hide samples

Response schema

uuid string (read only)
crawl uuid
initiated_date datetime (read only)
completed_date datetime (read only)
until_date datetime (read only)
duration_days int
initiated_by string (read only)
error string (optional)

Partial update a retrieval

PATCH /api/retrievals/{uuid}

Path params

uuid	string

Form params

crawl	uuid	(optional)
duration_days	int	(optional)
error	string	(optional)

Response messages

200	The retrieval was updated
401	Not authenticated
412	The retrieval has been updated since you last retrieved it

Show samples Hide samples

Request schema

crawl uuid (optional)
duration_days int (optional)
error string (optional)

Response schema

uuid string (read only)
crawl uuid
initiated_date datetime (read only)
completed_date datetime (read only)
until_date datetime (read only)
duration_days int
initiated_by string (read only)
error string (optional)

Update a retrieval

PUT /api/retrievals/{uuid}

Path params

uuid	string

Form params

crawl	uuid
duration_days	int
error	string	(optional)

Response messages

200	The retrieval was updated
401	Not authenticated
412	The retrieval has been updated since you last retrieved it

Show samples Hide samples

Request schema

crawl uuid
duration_days int
error string (optional)

Response schema

uuid string (read only)
crawl uuid
initiated_date datetime (read only)
completed_date datetime (read only)
until_date datetime (read only)
duration_days int
initiated_by string (read only)
error string (optional)

Delete a retrieval

DELETE /api/retrievals/{uuid}

Path params

uuid	string

Response messages

204	The retrieval was deleted
401	Not authenticated

Complete a retrieval

POST /api/retrievals/{uuid}/complete

Path params

uuid	string

Form params

crawl	uuid
duration_days	int
error	string	(optional)

Response messages

201	The retrieval was created
401	Not authenticated

Show samples Hide samples

Request schema

crawl uuid
duration_days int
error string (optional)

Response schema

uuid string (read only)
crawl uuid
initiated_date datetime (read only)
completed_date datetime (read only)
until_date datetime (read only)
duration_days int
initiated_by string (read only)
error string (optional)

Error a retrieval

POST /api/retrievals/{uuid}/error

Path params

uuid	string

Form params

crawl	uuid
duration_days	int
error	string	(optional)

Response messages

201	The retrieval was created
401	Not authenticated

Show samples Hide samples

Request schema

crawl uuid
duration_days int
error string (optional)

Response schema

uuid string (read only)
crawl uuid
initiated_date datetime (read only)
completed_date datetime (read only)
until_date datetime (read only)
duration_days int
initiated_by string (read only)
error string (optional)

Notify backend a retrieval

POST /api/retrievals/{uuid}/notify-backend

Path params

uuid	string

Form params

crawl	uuid
duration_days	int
error	string	(optional)

Response messages

201	The retrieval was created
401	Not authenticated

Show samples Hide samples

Request schema

crawl uuid
duration_days int
error string (optional)

Response schema

uuid string (read only)
crawl uuid
initiated_date datetime (read only)
completed_date datetime (read only)
until_date datetime (read only)
duration_days int
initiated_by string (read only)
error string (optional)

Connection

POST /api/viewer/connection

Response messages

201	The object was created
401	Not authenticated

Show samples Hide samples

Hanzo API Documentation

Introduction

Authentication & privileges

Browsable API

Concurrency control

Error handling

Identifiers

Pagination

Link header

Meta headers

Plugins

Request a capture

Create an archive unit (read more)POST /api/archive-units

Create a crawl (read more)POST /api/crawls

Create an export (read more)POST /api/exports

Settings

Archive units

List archive units

Query params

Response messages

Response schema

Create an archive unit

Form params

Response messages

Request examples

Create an archive

Create an archive with auth, custom entry points and metadata

Request schema

Response examples

Create an archive

Create an archive with auth, custom entry points and metadata

Response schema

Retrieve an archive unit

Path params

Response messages

Response examples

Retrieve an archive

Response schema

Partial update an archive unit

Path params

Form params

Response messages

Request schema

Response schema

Update an archive unit

Path params

Form params

Response messages

Request schema

Response schema

Delete an archive unit

Path params

Response messages

List archive unit settings

Path params

Response messages

Response schema

Retrieve an archive unit setting

Path params

Response messages

Response schema

Crawls

List crawls

Query params

Response messages

Response schema

Create a crawl

Form params

Response messages

Request examples

Create a crawl

Create an aggregate crawl

Create a partial crawl

Request schema

Response examples

Create a crawl

Create an aggregate crawl

Create a partial crawl

Response schema

Retrieve a crawl

Create an archive unit (read more)
POST /api/archive-units

Create a crawl (read more)
POST /api/crawls

Create an export (read more)
POST /api/exports