FilingSet

class xbrl_filings_api.FilingSet

Bases: object

Mutable set for Filing objects.

Can be initialized with the single argument being an iterable of Filing objects. This class provides a similar but broader interface as builtin set class. All set-like operators and methods accept iterables instead of strict sets. This class implements a mutable set and isinstance(filingset, collections.abc.MutableSet) is True (virtual subclass).

In addition to set functionality it provides certain filing-related attributes and methods.

If working with large sets, in-place operations (e.g. update method and |= operator) are recommended over new set operations (union method and | operator). See section Notes.

Defines operators |, |=, &, &=, -, -=, ^, ^=, <, <=, ==, >, >=, and !=. Instead of just set-like objects, the operators accept any iterables of Filing objects.

Filing objects, as subclass of APIResource, have a custom __hash__() method and their hash is based on a tuple of strings ‘APIResource’, Filing.TYPE, and Filing.api_id. This means that equality checks (== and != operators) and set content uniqueness are based on this tuple. For example, when the actual filing object is not available, the fastest way to check if a filing with api_id ‘123’ is included in the filing set fs is:

('APIResource', Filing.TYPE, '123') in fs

Same applies for ResourceCollection in attributes entities and validation_messages. These collections are, however, lazy iterators.

Notes

It is possible to combine filing sets from different queries into a single FilingSet without redundant copies of objects. Due to cross-referencing, the operations returning a new set always deep copy all objects to the results set. The in-place operations retain the objects from the left set but deep copy everything from the right set.

As the operators work on an iterable basis, for example the >= operator or issuperset() method returns True for a FilingSet and any iterable with the same Filings but is undefined if the iterable contains any item other than a filing. However, operators == and != are never undefined.

Methods

add(elem)

Add and update cross-references.

clear()

Clear the filing set of filings.

copy()

Return shallow copy of FilingSet.

difference(*others)

Return difference FilingSet and update cross-references.

difference_update(*others)

Apply difference to self and update cross-references.

discard(elem)

Discard and update cross-references.

download(files[, to_dir, stem_pattern, ...])

Download files according to parameter files.

download_aiter(files[, to_dir, ...])

Download files and yield DownloadResult objects.

get_pandas_data([attr_names, with_entity, ...])

Get filings as data for pandas.DataFrame constructor.

intersection(*others)

Return intersection FilingSet and update cross-references.

intersection_update(*others)

Apply intersection in self and update cross-references.

isdisjoint(other)

Return True if two filing sets have a null intersection.

issubset(other)

Report whether another filing set contains this set.

issuperset(other)

Report whether this set contains another filing set.

pop()

Remove a filing, return it, and update cross-references.

pop_duplicates([languages, ...])

Pops duplicates of the same enclosure from the set of filings.

remove(elem)

Remove and update cross-references.

symmetric_difference(other)

Return symmetric difference and update cross-references.

symmetric_difference_update(other)

Apply symmetric difference in self and update cross-refs.

to_sqlite(path, *[, update, flags])

Save set to an SQLite3 database.

union(*others)

Return union FilingSet and update cross-references.

update(*others)

Apply union in self and update cross-references.

__repr__()

Return repr with len() of self, entities, validation_messages.

__str__()

Return str(self).

Attributes

columns

List of available columns for filings of this set.

entities

Lazy iterator for entity references in filings.

validation_messages

Lazy iterator for validation message references in filings.

__hash__

entities

Lazy iterator for entity references in filings.

See documentation for ResourceCollection class.

validation_messages

Lazy iterator for validation message references in filings.

See documentation for ResourceCollection class.

property columns: list[str]

List of available columns for filings of this set.

download(files, to_dir=None, *, stem_pattern=None, check_corruption=True, max_concurrent=5)

Download files according to parameter files.

The files parameter accepts three formats:

fs.download('json', to_dir='dir/path')
fs.download(['json', 'package'], to_dir='dir/path')
fs.download({
        'json': DownloadItem(),
        'package': DownloadItem(to_dir=other_dir)
    }, to_dir='dir/path')

The filesystem path of the downloaded file will be saved in the Filing object attributes <file>_download_path such as json_download_path for the downloaded JSON file.

If package files are requested to be downloaded and parameter check_corruption is True, the downloaded package files will be checked through the package_sha256 attribute. If these attribute values do not match the ones calculated from the downloaded files, an exception CorruptDownloadError of the first corrupt file is raised after all downloads have finished. The downloaded files will not be deleted but the filenames will be appended with ending ".corrupt". However, attributes Filing.package_download_path will not store these corrupt paths.

The directories in the path of parameter to_dir will be created if they do not exist. By default, filename is derived from download URL. If the file already exists, it will be overwritten.

If download is interrupted, the files will be left with ending ".unfinished".

If no name could be derived from the url attribute, the file will be named file0001, file0002, etc. In this case a new file is always created.

Parameter stem_pattern requires a placeholder "/name/". For example pattern /name/_second_try will change original filename 743700XJC24THUPK0S03-2022-12-31-fi.xhtml into 743700XJC24THUPK0S03-2022-12-31-fi_second_try.xhtml. Not recommended for packages as their names should not be changed.

HTTP request timeout is defined in options.timeout_sec.

Parameters:
  • files (str or iterable of str or mapping of {str: DownloadItem}) – All of the str values in annotation are FileStringType literals. DownloadItem attributes override method arguments for the file.

  • to_dir (path-like, optional) – Directory to save the files. Defaults to working directory.

  • stem_pattern (str, optional) – Pattern to add to the filename stems. Placeholder "/name/" is always required.

  • check_corruption (bool, default True) – Raise CorruptDownloadError for any corrupt 'package' file.

  • max_concurrent (int or None, default 5) – Maximum number of simultaneous downloads allowed. Value None means unlimited.

Raises:
Return type:

None

See also

Filing.download

For a single filing.

async download_aiter(files, to_dir=None, *, stem_pattern=None, check_corruption=True, max_concurrent=5)

Download files and yield DownloadResult objects.

The function follows the same logic as method download(). See documentation.

Parameters:
  • files (str or iterable of str or mapping of {str: DownloadItem}) – All of the str values in annotation are FileStringType literals. DownloadItem attributes override method arguments for the file.

  • to_dir (path-like, optional) – Directory to save the files. Defaults to working directory.

  • stem_pattern (str, optional) – Pattern to add to the filename stems. Placeholder "/name/" is always required.

  • check_corruption (bool, default True) – Raise CorruptDownloadError for any corrupt 'package' file.

  • max_concurrent (int or None, default 5) – Maximum number of simultaneous downloads allowed. Value None means unlimited.

Yields:

DownloadResult – Contains information on the finished download.

Return type:

AsyncIterator[DownloadResult]

See also

Filing.download_aiter

For a single filing.

get_pandas_data(attr_names=None, *, with_entity=False, strip_timezone=True, date_as_datetime=True, include_urls=False, include_paths=False)

Get filings as data for pandas.DataFrame constructor.

A new dataframe can be instantiated by:

import pandas as pd
df = pd.DataFrame(data=filingset.get_pandas_data())

If parameter attr_names is not given, data attributes excluding ones ending _date_str will be extracted. Attributes ending in _download_path will be extracted only if at least one file of this type has been downloaded (and include_paths is True) and entity_api_id if there is at least one entity object in the set and parameter with_entity is False.

Parameters:
  • attr_names (iterable of str, optional) – Valid attributes names of Filing object or entity. prefixed attributes of its Entity object.

  • with_entity (bool, default False) – When parameter attr_names is not given, include entity attributes to the filing.

  • strip_timezone (bool, default True) – Strip timezone information (always UTC) from datetime values.

  • date_as_datetime (bool, default True) – Convert date values to naive datetime to be converted to pandas.datetime64 by pandas.

  • include_urls (bool, default False) – When parameter attr_names is not given, include attributes ending _url.

  • include_paths (bool, default False) – When parameter attr_names is not given, include attributes ending _path.

Returns:

data – Column names are the same as the attributes for resource of this type.

Return type:

dict of {str: list of DataAttributeType}

See also

ResourceCollection.get_pandas_data

For other resources.

pop_duplicates(languages=['en'], *, use_reporting_date=False, all_markets=False)

Pops duplicates of the same enclosure from the set of filings.

Entities must be available on the FilingSet.

The method searches the FilingSet and leaves only one filing for each group of same entity_api_id, last_end_date pairs, i.e., one filing for each unique enclosure of the same entity for the same financial period. If parameter use_reporting_date is True, grouping is based on entity_api_id, reporting_date instead.

Some entities report on multiple markets. If all these country-specific filings are wished to retain, set parameter all_markets as True. Grouping will then also include country as the last item.

The selected filing from the group is chosen primarily on languages parameter values matched on the Filing.language attribute. Parameter value ['sv', 'fi'] thus means that Swedish filings are preferred, secondarily Finnish, and lastly the ones which have language as None. Value None can be used in the iterable as well. Parameter value None means no language preference.

If there are more than one filing for the language match (or language is None), the filings will be ordered based on their filing_index and the last one is chosen which is practically the one with highest filing number part of filing_index.

Parameters:
  • languages (iterable of str or None, default ['en']) – Preferred languages for the retained filing.

  • use_reporting_date (bool, default False) – Use reporting_date instead of last_end_date when grouping.

  • all_markets (bool, default False) – Append country as the last item in grouping.

Returns:

The set of removed filings.

Return type:

FilingSet

to_sqlite(path, *, update=False, flags=<ScopeFlag.GET_ENTITY|GET_VALIDATION_MESSAGES: 6>)

Save set to an SQLite3 database.

The method has the same signature and follows the same rules as the query function to_sqlite() with the exception of missing all query parameters.

Flags also default to all tables turned on. If no additional information is present in the set, the tables will not be created if they do not exist.

Parameters:
  • path (path-like) – Path to the SQLite database.

  • update (bool, default False) – If the database already exists, update it with these records. Old records are updated and new ones are added.

  • flags (ScopeFlag, default GET_ENTITY | GET_VALIDATION_MESSAGES) – Scope of saving. Flag GET_ENTITY will save entity records of filings and GET_VALIDATION_MESSAGES the validation messages.

Raises:
  • FileExistsError – When update is False and the intended save path for the database is an existing file.

  • DatabaseSchemaUnmatchError – When update is True and the file contains a database whose schema does not match the expected format.

  • sqlite3.DatabaseError – For example when update is True and the file is not a database etc.

Return type:

None

See also

xbrl_filings_api.to_sqlite

Query and save to SQLite.

__repr__()

Return repr with len() of self, entities, validation_messages.

Values len(entities) and len(validation_messages) are only shown if more than zero are present.

Return type:

str

clear()

Clear the filing set of filings.

Return type:

None

union(*others)

Return union FilingSet and update cross-references.

Parameters:

*others (iterable of Filing) – One or more arguments of Filing iterables.

Returns:

A new set which has filings of this set and all others.

Return type:

FilingSet

Raises:

ValueError – When any item in an iterable is not Filing.

update(*others)

Apply union in self and update cross-references.

Parameters:

*others (iterable of Filing) – One or more arguments of Filing iterables.

Raises:

ValueError – When any item in an iterable is not Filing.

Return type:

None

intersection(*others)

Return intersection FilingSet and update cross-references.

Parameters:

*others (iterable of Filing) – One or more arguments of Filing iterables.

Returns:

A new set which has filings common with this set and any set in others.

Return type:

FilingSet

Raises:

ValueError – When any item in an iterable is not Filing.

intersection_update(*others)

Apply intersection in self and update cross-references.

Parameters:

*others (iterable of Filing) – One or more arguments of Filing iterables.

Raises:

ValueError – When any item in an iterable is not Filing.

Return type:

None

difference(*others)

Return difference FilingSet and update cross-references.

Parameters:

*others (iterable of Filing) – One or more arguments of Filing iterables.

Returns:

A new set which is this set without filings in all others.

Return type:

FilingSet

Raises:

ValueError – When any item in an iterable is not Filing.

difference_update(*others)

Apply difference to self and update cross-references.

Parameters:

*others (iterable of Filing) – One or more arguments of Filing iterables.

Raises:

ValueError – When any item in an iterable is not Filing.

Return type:

None

symmetric_difference(other)

Return symmetric difference and update cross-references.

Parameters:

other (iterable of Filing) – An iterable of Filing objects.

Returns:

A new set which has filings in this set or other but not in both.

Return type:

FilingSet

Raises:

ValueError – When any item in parameter other is not Filing.

symmetric_difference_update(other)

Apply symmetric difference in self and update cross-refs.

Parameters:
  • *others (iterable of Filing) – One or more arguments of Filing iterables.

  • other (Iterable[Filing])

Raises:

ValueError – When any item in an iterable is not Filing.

Return type:

None

add(elem)

Add and update cross-references.

Parameters:

elem (Filing)

Return type:

None

discard(elem)

Discard and update cross-references.

Parameters:

elem (Filing)

Return type:

None

remove(elem)

Remove and update cross-references.

Parameters:

elem (Filing)

Return type:

None

pop()

Remove a filing, return it, and update cross-references.

Return type:

Filing

copy()

Return shallow copy of FilingSet.

Return type:

FilingSet

isdisjoint(other)

Return True if two filing sets have a null intersection.

Parameters:

other (iterable of Filing) – An iterable of Filing objects.

Returns:

True if there are no common filings in the two sets.

Return type:

bool

Raises:

ValueError – When any item in an iterable is not Filing.

issubset(other)

Report whether another filing set contains this set.

Parameters:

other (iterable of Filing) – An iterable of Filing objects.

Returns:

True if other contains all filings in this set.

Return type:

bool

Raises:

ValueError – When any item in an iterable is not Filing.

issuperset(other)

Report whether this set contains another filing set.

Parameters:

other (iterable of Filing) – An iterable of Filing objects.

Returns:

True if this set contains all filings in other.

Return type:

bool

Raises:

ValueError – When any item in an iterable is not Filing.

__hash__ = None