downloader

Download manager for asynchronic parallel downloading.

Functions

download(url, to_dir, *[, stem_pattern, ...])

Download a file synchronously.

download_async(url, to_dir, *[, ...])

Download a file asynchronously.

download_parallel(items, *[, ...])

Download multiple files in parallel.

download_parallel_aiter(items, *[, ...])

Download multiple files in parallel, return asynchronous iterator.

validate_stem_pattern(stem_pattern)

Validate parameter stem_pattern of module functions.

Classes

DownloadResult

Dataclass for result information from a finished download.

DownloadSpecs

Dataclass of parameter values for downloader.download_async().

Exceptions

CorruptDownloadError

SHA-256 checksum does not match expected value from API.

xbrl_filings_api.downloader.download(url, to_dir, *, stem_pattern=None, filename=None, sha256=None, timeout=30.0)

Download a file synchronously.

See documentation of download_async.

Parameters:
Return type:

str

async xbrl_filings_api.downloader.download_async(url, to_dir, *, stem_pattern=None, filename=None, sha256=None, timeout=30.0)

Download a file asynchronously.

The directories in parameter to_dir will be created if they do not exist. If no filename is given, name is derived from parameter url. If file already exists, it will be overwritten.

If the sha256 does not match with the checksum of the downloaded file, xbrl_filings_api.downloader.exceptions.CorruptDownloadError will be raised and the name of the downloaded file will be appended with ".corrupt".

If download is interrupted, the file will be left with a suffix ".unfinished".

If no name could be derived from url, the file will be named file0001, file0002, etc. In this case a new file is always created.

Parameters:
  • url (str) – URL to download.

  • to_dir (path-like) – Directory to save the file.

  • stem_pattern (str, optional) – Pattern to add to the filename stems. Placeholder "/name/" is always required.

  • filename (str, optional) – Name to be used for the saved file.

  • sha256 (str, optional) – Expected SHA-256 checksum as a hex string. Case-insensitive. No checksum is calculated if this parameter is not given.

  • timeout (float, default 30.0) – Maximum timeout for getting an initial response from the server in seconds.

Returns:

Local path where the downloaded file was saved.

Return type:

str

Raises:
xbrl_filings_api.downloader.download_parallel(items, *, max_concurrent=None, timeout=30.0)

Download multiple files in parallel.

The order in parameter items is not guaranteed on the returned list.

See documentation of download_parallel_aiter.

Parameters:
Returns:

Contains information on the finished download.

Return type:

list of DownloadResult

async xbrl_filings_api.downloader.download_parallel_aiter(items, *, max_concurrent=None, timeout=30.0)

Download multiple files in parallel, return asynchronous iterator.

The ordering in parameter items defines the order in which the requests will be started. As the downloads take arbitrary periods of time to finish, it does not guarantee the same order in the yielded results. For this purpose, an additional any-typed attribute info of both DownloadSpecs and DownloadResult is provided to keep track of individual downloads.

Yielded DownloadResult objects will not have the path attribute value when the sha256 check fails even though the file is in fact saved with filename suffix ".corrupt".

Calls function download_async via parameter items.

Parameters:
  • items (list of DownloadSpecs) – Instances of DownloadSpecs accept the same parameters as function download_async with an additional no-op attribute info.

  • max_concurrent (int or None, default None) – Maximum number of simultaneous downloads allowed at any moment. If None, all downloads will be started immediately. If 1, downloading will be sequential.

  • timeout (float, default 30.0) – Maximum timeout for getting the initial response for a single download from the server in seconds.

Yields:

DownloadResult – Contains information on the finished download.

Return type:

AsyncIterator[DownloadResult]

xbrl_filings_api.downloader.validate_stem_pattern(stem_pattern)

Validate parameter stem_pattern of module functions.

Parameters:

stem_pattern (str or None) – Stem pattern parameter.

Raises:

ValueError – When stem pattern is invalid.

class xbrl_filings_api.downloader.DownloadResult

Bases: object

Dataclass for result information from a finished download.

url: str

URL which was downloaded or attempted to download.

path: str | None = None

Path where the downloaded file was saved.

__hash__()

Return hash(self).

__repr__()

Return repr(self).

err: Exception | None = None

Exception raised while the file was being downloaded.

info: Any = None

Value of DownloadSpecs.info for parallel downloads.

class xbrl_filings_api.downloader.DownloadSpecs

Bases: object

Dataclass of parameter values for downloader.download_async().

Used as download instructions in lists for parallel download functions which eventually end up as parameters for download_async(). Attribute info is only for keeping track of downloads and is not used as a function parameter.

url: str

URL to download.

to_dir: str | PurePath

Directory to save the downloaded file.

stem_pattern: str | None = None

Pattern to add to the filename stems.

Placeholder "/name/" is always required.

__hash__()

Return hash(self).

__repr__()

Return repr(self).

filename: str | None = None

Name to be used for the saved file.

sha256: str | None = None

Expected SHA-256 checksum as a hex string.

Case-insensitive. No checksum is calculated if this parameter is not given.

info: Any = None

Download-specific information.

exception xbrl_filings_api.downloader.CorruptDownloadError

Bases: Exception

SHA-256 checksum does not match expected value from API.

This is a different exception than the one in top package xbrl_filings_api.

path: str

Path where the file was saved.

url: str

URL where the file was downloaded from.

calculated_hash: str

Actual SHA-256 checksum of the file in lowercase hex.

expected_hash: str

Expected SHA-256 checksum of the file in lowercase hex.