FilingSet¶
- class xbrl_filings_api.FilingSet¶
Bases:
objectMutable set for
Filingobjects.Can be initialized with the single argument being an iterable of
Filingobjects. This class provides a similar but broader interface as builtinsetclass. All set-like operators and methods accept iterables instead of strict sets. This class implements a mutable set andisinstance(filingset, collections.abc.MutableSet)isTrue(virtual subclass).In addition to set functionality it provides certain filing-related attributes and methods.
If working with large sets, in-place operations (e.g.
updatemethod and|=operator) are recommended over new set operations (unionmethod and|operator). See section Notes.Defines operators
|,|=,&,&=,-,-=,^,^=,<,<=,==,>,>=, and!=. Instead of just set-like objects, the operators accept any iterables of Filing objects.Filingobjects, as subclass ofAPIResource, have a custom__hash__()method and their hash is based on a tuple of strings ‘APIResource’,Filing.TYPE, andFiling.api_id. This means that equality checks (==and!=operators) and set content uniqueness are based on this tuple. For example, when the actual filing object is not available, the fastest way to check if a filing withapi_id‘123’ is included in the filing setfsis:('APIResource', Filing.TYPE, '123') in fs
Same applies for
ResourceCollectionin attributesentitiesandvalidation_messages. These collections are, however, lazy iterators.Notes
It is possible to combine filing sets from different queries into a single
FilingSetwithout redundant copies of objects. Due to cross-referencing, the operations returning a new set always deep copy all objects to the results set. The in-place operations retain the objects from the left set but deep copy everything from the right set.As the operators work on an iterable basis, for example the
>=operator orissuperset()method returns True for a FilingSet and any iterable with the same Filings but is undefined if the iterable contains any item other than a filing. However, operators==and!=are never undefined.Methods
add(elem)Add and update cross-references.
clear()Clear the filing set of filings.
copy()Return shallow copy of FilingSet.
difference(*others)Return difference FilingSet and update cross-references.
difference_update(*others)Apply difference to self and update cross-references.
discard(elem)Discard and update cross-references.
download(files[, to_dir, stem_pattern, ...])Download files according to parameter
files.download_aiter(files[, to_dir, ...])Download files and yield
DownloadResultobjects.get_pandas_data([attr_names, with_entity, ...])Get filings as data for
pandas.DataFrameconstructor.intersection(*others)Return intersection FilingSet and update cross-references.
intersection_update(*others)Apply intersection in self and update cross-references.
isdisjoint(other)Return True if two filing sets have a null intersection.
issubset(other)Report whether another filing set contains this set.
issuperset(other)Report whether this set contains another filing set.
pop()Remove a filing, return it, and update cross-references.
pop_duplicates([languages, ...])Pops duplicates of the same enclosure from the set of filings.
remove(elem)Remove and update cross-references.
symmetric_difference(other)Return symmetric difference and update cross-references.
symmetric_difference_update(other)Apply symmetric difference in self and update cross-refs.
to_sqlite(path, *[, update, flags])Save set to an SQLite3 database.
union(*others)Return union FilingSet and update cross-references.
update(*others)Apply union in self and update cross-references.
__repr__()Return repr with len() of self, entities, validation_messages.
__str__()Return str(self).
Attributes
List of available columns for filings of this set.
Lazy iterator for entity references in filings.
Lazy iterator for validation message references in filings.
- entities¶
Lazy iterator for entity references in filings.
See documentation for
ResourceCollectionclass.
- validation_messages¶
Lazy iterator for validation message references in filings.
See documentation for
ResourceCollectionclass.
- download(files, to_dir=None, *, stem_pattern=None, check_corruption=True, max_concurrent=5)¶
Download files according to parameter
files.The
filesparameter accepts three formats:fs.download('json', to_dir='dir/path') fs.download(['json', 'package'], to_dir='dir/path') fs.download({ 'json': DownloadItem(), 'package': DownloadItem(to_dir=other_dir) }, to_dir='dir/path')
The filesystem path of the downloaded file will be saved in the
Filingobject attributes<file>_download_pathsuch asjson_download_pathfor the downloaded JSON file.If
packagefiles are requested to be downloaded and parametercheck_corruptionisTrue, the downloaded package files will be checked through thepackage_sha256attribute. If these attribute values do not match the ones calculated from the downloaded files, an exceptionCorruptDownloadErrorof the first corrupt file is raised after all downloads have finished. The downloaded files will not be deleted but the filenames will be appended with ending".corrupt". However, attributesFiling.package_download_pathwill not store these corrupt paths.The directories in the path of parameter
to_dirwill be created if they do not exist. By default, filename is derived from download URL. If the file already exists, it will be overwritten.If download is interrupted, the files will be left with ending
".unfinished".If no name could be derived from the url attribute, the file will be named
file0001,file0002, etc. In this case a new file is always created.Parameter
stem_patternrequires a placeholder"/name/". For example pattern/name/_second_trywill change original filename743700XJC24THUPK0S03-2022-12-31-fi.xhtmlinto743700XJC24THUPK0S03-2022-12-31-fi_second_try.xhtml. Not recommended for packages as their names should not be changed.HTTP request timeout is defined in
options.timeout_sec.- Parameters:
files (str or iterable of str or mapping of {str: DownloadItem}) – All of the
strvalues in annotation areFileStringTypeliterals.DownloadItemattributes override method arguments for the file.to_dir (path-like, optional) – Directory to save the files. Defaults to working directory.
stem_pattern (str, optional) – Pattern to add to the filename stems. Placeholder
"/name/"is always required.check_corruption (bool, default True) – Raise
CorruptDownloadErrorfor any corrupt'package'file.max_concurrent (int or None, default 5) – Maximum number of simultaneous downloads allowed. Value
Nonemeans unlimited.
- Raises:
CorruptDownloadError – When attribute
Filing.package_sha256does not match the calculated hash of'package'file andcheck_corruptionisTrue.requests.HTTPError – When HTTP status error occurs.
requests.ConnectionError – When connection fails.
- Return type:
None
See also
Filing.downloadFor a single filing.
- async download_aiter(files, to_dir=None, *, stem_pattern=None, check_corruption=True, max_concurrent=5)¶
Download files and yield
DownloadResultobjects.The function follows the same logic as method
download(). See documentation.- Parameters:
files (str or iterable of str or mapping of {str: DownloadItem}) – All of the
strvalues in annotation areFileStringTypeliterals.DownloadItemattributes override method arguments for the file.to_dir (path-like, optional) – Directory to save the files. Defaults to working directory.
stem_pattern (str, optional) – Pattern to add to the filename stems. Placeholder
"/name/"is always required.check_corruption (bool, default True) – Raise
CorruptDownloadErrorfor any corrupt'package'file.max_concurrent (int or None, default 5) – Maximum number of simultaneous downloads allowed. Value
Nonemeans unlimited.
- Yields:
DownloadResult – Contains information on the finished download.
- Return type:
See also
Filing.download_aiterFor a single filing.
- get_pandas_data(attr_names=None, *, with_entity=False, strip_timezone=True, date_as_datetime=True, include_urls=False, include_paths=False)¶
Get filings as data for
pandas.DataFrameconstructor.A new dataframe can be instantiated by:
import pandas as pd df = pd.DataFrame(data=filingset.get_pandas_data())
If parameter
attr_namesis not given, data attributes excluding ones ending_date_strwill be extracted. Attributes ending in_download_pathwill be extracted only if at least one file of this type has been downloaded (andinclude_pathsisTrue) andentity_api_idif there is at least one entity object in the set and parameterwith_entityisFalse.- Parameters:
attr_names (iterable of str, optional) – Valid attributes names of
Filingobject orentity.prefixed attributes of itsEntityobject.with_entity (bool, default False) – When parameter
attr_namesis not given, include entity attributes to the filing.strip_timezone (bool, default True) – Strip timezone information (always UTC) from
datetimevalues.date_as_datetime (bool, default True) – Convert
datevalues to naivedatetimeto be converted topandas.datetime64by pandas.include_urls (bool, default False) – When parameter
attr_namesis not given, include attributes ending_url.include_paths (bool, default False) – When parameter
attr_namesis not given, include attributes ending_path.
- Returns:
data – Column names are the same as the attributes for resource of this type.
- Return type:
dict of {str: list of DataAttributeType}
See also
ResourceCollection.get_pandas_dataFor other resources.
- pop_duplicates(languages=['en'], *, use_reporting_date=False, all_markets=False)¶
Pops duplicates of the same enclosure from the set of filings.
Entities must be available on the
FilingSet.The method searches the
FilingSetand leaves only one filing for each group of sameentity_api_id,last_end_datepairs, i.e., one filing for each unique enclosure of the same entity for the same financial period. If parameteruse_reporting_dateisTrue, grouping is based onentity_api_id,reporting_dateinstead.Some entities report on multiple markets. If all these country-specific filings are wished to retain, set parameter
all_marketsasTrue. Grouping will then also includecountryas the last item.The selected filing from the group is chosen primarily on
languagesparameter values matched on theFiling.languageattribute. Parameter value['sv', 'fi']thus means that Swedish filings are preferred, secondarily Finnish, and lastly the ones which have language asNone. ValueNonecan be used in the iterable as well. Parameter valueNonemeans no language preference.If there are more than one filing for the language match (or
languageisNone), the filings will be ordered based on theirfiling_indexand the last one is chosen which is practically the one with highest filing number part offiling_index.- Parameters:
languages (iterable of str or None, default ['en']) – Preferred languages for the retained filing.
use_reporting_date (bool, default False) – Use
reporting_dateinstead oflast_end_datewhen grouping.all_markets (bool, default False) – Append
countryas the last item in grouping.
- Returns:
The set of removed filings.
- Return type:
- to_sqlite(path, *, update=False, flags=<ScopeFlag.GET_ENTITY|GET_VALIDATION_MESSAGES: 6>)¶
Save set to an SQLite3 database.
The method has the same signature and follows the same rules as the query function
to_sqlite()with the exception of missing all query parameters.Flags also default to all tables turned on. If no additional information is present in the set, the tables will not be created if they do not exist.
- Parameters:
path (path-like) – Path to the SQLite database.
update (bool, default False) – If the database already exists, update it with these records. Old records are updated and new ones are added.
flags (ScopeFlag, default GET_ENTITY | GET_VALIDATION_MESSAGES) – Scope of saving. Flag
GET_ENTITYwill save entity records of filings andGET_VALIDATION_MESSAGESthe validation messages.
- Raises:
FileExistsError – When
updateisFalseand the intended save path for the database is an existing file.DatabaseSchemaUnmatchError – When
updateisTrueand the file contains a database whose schema does not match the expected format.sqlite3.DatabaseError – For example when
updateisTrueand the file is not a database etc.
- Return type:
None
See also
xbrl_filings_api.to_sqliteQuery and save to SQLite.
- __repr__()¶
Return repr with len() of self, entities, validation_messages.
Values len(
entities) and len(validation_messages) are only shown if more than zero are present.- Return type:
- clear()¶
Clear the filing set of filings.
- Return type:
None
- union(*others)¶
Return union FilingSet and update cross-references.
- Parameters:
*others (iterable of Filing) – One or more arguments of
Filingiterables.- Returns:
A new set which has filings of this set and all
others.- Return type:
- Raises:
ValueError – When any item in an iterable is not
Filing.
- update(*others)¶
Apply union in self and update cross-references.
- Parameters:
*others (iterable of Filing) – One or more arguments of
Filingiterables.- Raises:
ValueError – When any item in an iterable is not
Filing.- Return type:
None
- intersection(*others)¶
Return intersection FilingSet and update cross-references.
- Parameters:
*others (iterable of Filing) – One or more arguments of
Filingiterables.- Returns:
A new set which has filings common with this set and any set in
others.- Return type:
- Raises:
ValueError – When any item in an iterable is not
Filing.
- intersection_update(*others)¶
Apply intersection in self and update cross-references.
- Parameters:
*others (iterable of Filing) – One or more arguments of
Filingiterables.- Raises:
ValueError – When any item in an iterable is not
Filing.- Return type:
None
- difference(*others)¶
Return difference FilingSet and update cross-references.
- Parameters:
*others (iterable of Filing) – One or more arguments of
Filingiterables.- Returns:
A new set which is this set without filings in all
others.- Return type:
- Raises:
ValueError – When any item in an iterable is not
Filing.
- difference_update(*others)¶
Apply difference to self and update cross-references.
- Parameters:
*others (iterable of Filing) – One or more arguments of
Filingiterables.- Raises:
ValueError – When any item in an iterable is not
Filing.- Return type:
None
- symmetric_difference(other)¶
Return symmetric difference and update cross-references.
- Parameters:
other (iterable of Filing) – An iterable of
Filingobjects.- Returns:
A new set which has filings in this set or
otherbut not in both.- Return type:
- Raises:
ValueError – When any item in parameter
otheris notFiling.
- symmetric_difference_update(other)¶
Apply symmetric difference in self and update cross-refs.
- Parameters:
- Raises:
ValueError – When any item in an iterable is not
Filing.- Return type:
None
- isdisjoint(other)¶
Return True if two filing sets have a null intersection.
- Parameters:
other (iterable of Filing) – An iterable of
Filingobjects.- Returns:
True if there are no common filings in the two sets.
- Return type:
- Raises:
ValueError – When any item in an iterable is not
Filing.
- issubset(other)¶
Report whether another filing set contains this set.
- Parameters:
other (iterable of Filing) – An iterable of
Filingobjects.- Returns:
True if
othercontains all filings in this set.- Return type:
- Raises:
ValueError – When any item in an iterable is not
Filing.
- issuperset(other)¶
Report whether this set contains another filing set.
- Parameters:
other (iterable of Filing) – An iterable of
Filingobjects.- Returns:
True if this set contains all filings in
other.- Return type:
- Raises:
ValueError – When any item in an iterable is not
Filing.
- __hash__ = None¶