FilingSet¶
- class xbrl_filings_api.FilingSet¶
Bases:
object
Mutable set for
Filing
objects.Can be initialized with the single argument being an iterable of
Filing
objects. This class provides a similar but broader interface as builtinset
class. All set-like operators and methods accept iterables instead of strict sets. This class implements a mutable set andisinstance(filingset, collections.abc.MutableSet)
isTrue
(virtual subclass).In addition to set functionality it provides certain filing-related attributes and methods.
If working with large sets, in-place operations (e.g.
update
method and|=
operator) are recommended over new set operations (union
method and|
operator). See section Notes.Defines operators
|
,|=
,&
,&=
,-
,-=
,^
,^=
,<
,<=
,==
,>
,>=
, and!=
. Instead of just set-like objects, the operators accept any iterables of Filing objects.Filing
objects, as subclass ofAPIResource
, have a custom__hash__()
method and their hash is based on a tuple of strings ‘APIResource’,Filing.TYPE
, andFiling.api_id
. This means that equality checks (==
and!=
operators) and set content uniqueness are based on this tuple. For example, when the actual filing object is not available, the fastest way to check if a filing withapi_id
‘123’ is included in the filing setfs
is:('APIResource', Filing.TYPE, '123') in fs
Same applies for
ResourceCollection
in attributesentities
andvalidation_messages
. These collections are, however, lazy iterators.Notes
It is possible to combine filing sets from different queries into a single
FilingSet
without redundant copies of objects. Due to cross-referencing, the operations returning a new set always deep copy all objects to the results set. The in-place operations retain the objects from the left set but deep copy everything from the right set.As the operators work on an iterable basis, for example the
>=
operator orissuperset()
method returns True for a FilingSet and any iterable with the same Filings but is undefined if the iterable contains any item other than a filing. However, operators==
and!=
are never undefined.Methods
add
(elem)Add and update cross-references.
clear
()Clear the filing set of filings.
copy
()Return shallow copy of FilingSet.
difference
(*others)Return difference FilingSet and update cross-references.
difference_update
(*others)Apply difference to self and update cross-references.
discard
(elem)Discard and update cross-references.
download
(files[, to_dir, stem_pattern, ...])Download files according to parameter
files
.download_aiter
(files[, to_dir, ...])Download files and yield
DownloadResult
objects.get_pandas_data
([attr_names, with_entity, ...])Get filings as data for
pandas.DataFrame
constructor.intersection
(*others)Return intersection FilingSet and update cross-references.
intersection_update
(*others)Apply intersection in self and update cross-references.
isdisjoint
(other)Return True if two filing sets have a null intersection.
issubset
(other)Report whether another filing set contains this set.
issuperset
(other)Report whether this set contains another filing set.
pop
()Remove a filing, return it, and update cross-references.
pop_duplicates
([languages, ...])Pops duplicates of the same enclosure from the set of filings.
remove
(elem)Remove and update cross-references.
symmetric_difference
(other)Return symmetric difference and update cross-references.
symmetric_difference_update
(other)Apply symmetric difference in self and update cross-refs.
to_sqlite
(path, *[, update, flags])Save set to an SQLite3 database.
union
(*others)Return union FilingSet and update cross-references.
update
(*others)Apply union in self and update cross-references.
__repr__
()Return repr with len() of self, entities, validation_messages.
__str__
()Return str(self).
Attributes
List of available columns for filings of this set.
Lazy iterator for entity references in filings.
Lazy iterator for validation message references in filings.
- entities¶
Lazy iterator for entity references in filings.
See documentation for
ResourceCollection
class.
- validation_messages¶
Lazy iterator for validation message references in filings.
See documentation for
ResourceCollection
class.
- download(files, to_dir=None, *, stem_pattern=None, check_corruption=True, max_concurrent=5)¶
Download files according to parameter
files
.The
files
parameter accepts three formats:fs.download('json', to_dir='dir/path') fs.download(['json', 'package'], to_dir='dir/path') fs.download({ 'json': DownloadItem(), 'package': DownloadItem(to_dir=other_dir) }, to_dir='dir/path')
The filesystem path of the downloaded file will be saved in the
Filing
object attributes<file>_download_path
such asjson_download_path
for the downloaded JSON file.If
package
files are requested to be downloaded and parametercheck_corruption
isTrue
, the downloaded package files will be checked through thepackage_sha256
attribute. If these attribute values do not match the ones calculated from the downloaded files, an exceptionCorruptDownloadError
of the first corrupt file is raised after all downloads have finished. The downloaded files will not be deleted but the filenames will be appended with ending".corrupt"
. However, attributesFiling.package_download_path
will not store these corrupt paths.The directories in the path of parameter
to_dir
will be created if they do not exist. By default, filename is derived from download URL. If the file already exists, it will be overwritten.If download is interrupted, the files will be left with ending
".unfinished"
.If no name could be derived from the url attribute, the file will be named
file0001
,file0002
, etc. In this case a new file is always created.Parameter
stem_pattern
requires a placeholder"/name/"
. For example pattern/name/_second_try
will change original filename743700XJC24THUPK0S03-2022-12-31-fi.xhtml
into743700XJC24THUPK0S03-2022-12-31-fi_second_try.xhtml
. Not recommended for packages as their names should not be changed.HTTP request timeout is defined in
options.timeout_sec
.- Parameters:
files (str or iterable of str or mapping of {str: DownloadItem}) – All of the
str
values in annotation areFileStringType
literals.DownloadItem
attributes override method arguments for the file.to_dir (path-like, optional) – Directory to save the files. Defaults to working directory.
stem_pattern (str, optional) – Pattern to add to the filename stems. Placeholder
"/name/"
is always required.check_corruption (bool, default True) – Raise
CorruptDownloadError
for any corrupt'package'
file.max_concurrent (int or None, default 5) – Maximum number of simultaneous downloads allowed. Value
None
means unlimited.
- Raises:
CorruptDownloadError – When attribute
Filing.package_sha256
does not match the calculated hash of'package'
file andcheck_corruption
isTrue
.requests.HTTPError – When HTTP status error occurs.
requests.ConnectionError – When connection fails.
- Return type:
None
See also
Filing.download
For a single filing.
- async download_aiter(files, to_dir=None, *, stem_pattern=None, check_corruption=True, max_concurrent=5)¶
Download files and yield
DownloadResult
objects.The function follows the same logic as method
download()
. See documentation.- Parameters:
files (str or iterable of str or mapping of {str: DownloadItem}) – All of the
str
values in annotation areFileStringType
literals.DownloadItem
attributes override method arguments for the file.to_dir (path-like, optional) – Directory to save the files. Defaults to working directory.
stem_pattern (str, optional) – Pattern to add to the filename stems. Placeholder
"/name/"
is always required.check_corruption (bool, default True) – Raise
CorruptDownloadError
for any corrupt'package'
file.max_concurrent (int or None, default 5) – Maximum number of simultaneous downloads allowed. Value
None
means unlimited.
- Yields:
DownloadResult – Contains information on the finished download.
- Return type:
See also
Filing.download_aiter
For a single filing.
- get_pandas_data(attr_names=None, *, with_entity=False, strip_timezone=True, date_as_datetime=True, include_urls=False, include_paths=False)¶
Get filings as data for
pandas.DataFrame
constructor.A new dataframe can be instantiated by:
import pandas as pd df = pd.DataFrame(data=filingset.get_pandas_data())
If parameter
attr_names
is not given, data attributes excluding ones ending_date_str
will be extracted. Attributes ending in_download_path
will be extracted only if at least one file of this type has been downloaded (andinclude_paths
isTrue
) andentity_api_id
if there is at least one entity object in the set and parameterwith_entity
isFalse
.- Parameters:
attr_names (iterable of str, optional) – Valid attributes names of
Filing
object orentity.
prefixed attributes of itsEntity
object.with_entity (bool, default False) – When parameter
attr_names
is not given, include entity attributes to the filing.strip_timezone (bool, default True) – Strip timezone information (always UTC) from
datetime
values.date_as_datetime (bool, default True) – Convert
date
values to naivedatetime
to be converted topandas.datetime64
by pandas.include_urls (bool, default False) – When parameter
attr_names
is not given, include attributes ending_url
.include_paths (bool, default False) – When parameter
attr_names
is not given, include attributes ending_path
.
- Returns:
data – Column names are the same as the attributes for resource of this type.
- Return type:
dict of {str: list of DataAttributeType}
See also
ResourceCollection.get_pandas_data
For other resources.
- pop_duplicates(languages=['en'], *, use_reporting_date=False, all_markets=False)¶
Pops duplicates of the same enclosure from the set of filings.
Entities must be available on the
FilingSet
.The method searches the
FilingSet
and leaves only one filing for each group of sameentity_api_id
,last_end_date
pairs, i.e., one filing for each unique enclosure of the same entity for the same financial period. If parameteruse_reporting_date
isTrue
, grouping is based onentity_api_id
,reporting_date
instead.Some entities report on multiple markets. If all these country-specific filings are wished to retain, set parameter
all_markets
asTrue
. Grouping will then also includecountry
as the last item.The selected filing from the group is chosen primarily on
languages
parameter values matched on theFiling.language
attribute. Parameter value['sv', 'fi']
thus means that Swedish filings are preferred, secondarily Finnish, and lastly the ones which have language asNone
. ValueNone
can be used in the iterable as well. Parameter valueNone
means no language preference.If there are more than one filing for the language match (or
language
isNone
), the filings will be ordered based on theirfiling_index
and the last one is chosen which is practically the one with highest filing number part offiling_index
.- Parameters:
languages (iterable of str or None, default ['en']) – Preferred languages for the retained filing.
use_reporting_date (bool, default False) – Use
reporting_date
instead oflast_end_date
when grouping.all_markets (bool, default False) – Append
country
as the last item in grouping.
- Returns:
The set of removed filings.
- Return type:
- to_sqlite(path, *, update=False, flags=<ScopeFlag.GET_ENTITY|GET_VALIDATION_MESSAGES: 6>)¶
Save set to an SQLite3 database.
The method has the same signature and follows the same rules as the query function
to_sqlite()
with the exception of missing all query parameters.Flags also default to all tables turned on. If no additional information is present in the set, the tables will not be created if they do not exist.
- Parameters:
path (path-like) – Path to the SQLite database.
update (bool, default False) – If the database already exists, update it with these records. Old records are updated and new ones are added.
flags (ScopeFlag, default GET_ENTITY | GET_VALIDATION_MESSAGES) – Scope of saving. Flag
GET_ENTITY
will save entity records of filings andGET_VALIDATION_MESSAGES
the validation messages.
- Raises:
FileExistsError – When
update
isFalse
and the intended save path for the database is an existing file.DatabaseSchemaUnmatchError – When
update
isTrue
and the file contains a database whose schema does not match the expected format.sqlite3.DatabaseError – For example when
update
isTrue
and the file is not a database etc.
- Return type:
None
See also
xbrl_filings_api.to_sqlite
Query and save to SQLite.
- __repr__()¶
Return repr with len() of self, entities, validation_messages.
Values len(
entities
) and len(validation_messages
) are only shown if more than zero are present.- Return type:
- clear()¶
Clear the filing set of filings.
- Return type:
None
- union(*others)¶
Return union FilingSet and update cross-references.
- Parameters:
*others (iterable of Filing) – One or more arguments of
Filing
iterables.- Returns:
A new set which has filings of this set and all
others
.- Return type:
- Raises:
ValueError – When any item in an iterable is not
Filing
.
- update(*others)¶
Apply union in self and update cross-references.
- Parameters:
*others (iterable of Filing) – One or more arguments of
Filing
iterables.- Raises:
ValueError – When any item in an iterable is not
Filing
.- Return type:
None
- intersection(*others)¶
Return intersection FilingSet and update cross-references.
- Parameters:
*others (iterable of Filing) – One or more arguments of
Filing
iterables.- Returns:
A new set which has filings common with this set and any set in
others
.- Return type:
- Raises:
ValueError – When any item in an iterable is not
Filing
.
- intersection_update(*others)¶
Apply intersection in self and update cross-references.
- Parameters:
*others (iterable of Filing) – One or more arguments of
Filing
iterables.- Raises:
ValueError – When any item in an iterable is not
Filing
.- Return type:
None
- difference(*others)¶
Return difference FilingSet and update cross-references.
- Parameters:
*others (iterable of Filing) – One or more arguments of
Filing
iterables.- Returns:
A new set which is this set without filings in all
others
.- Return type:
- Raises:
ValueError – When any item in an iterable is not
Filing
.
- difference_update(*others)¶
Apply difference to self and update cross-references.
- Parameters:
*others (iterable of Filing) – One or more arguments of
Filing
iterables.- Raises:
ValueError – When any item in an iterable is not
Filing
.- Return type:
None
- symmetric_difference(other)¶
Return symmetric difference and update cross-references.
- Parameters:
other (iterable of Filing) – An iterable of
Filing
objects.- Returns:
A new set which has filings in this set or
other
but not in both.- Return type:
- Raises:
ValueError – When any item in parameter
other
is notFiling
.
- symmetric_difference_update(other)¶
Apply symmetric difference in self and update cross-refs.
- Parameters:
- Raises:
ValueError – When any item in an iterable is not
Filing
.- Return type:
None
- isdisjoint(other)¶
Return True if two filing sets have a null intersection.
- Parameters:
other (iterable of Filing) – An iterable of
Filing
objects.- Returns:
True if there are no common filings in the two sets.
- Return type:
- Raises:
ValueError – When any item in an iterable is not
Filing
.
- issubset(other)¶
Report whether another filing set contains this set.
- Parameters:
other (iterable of Filing) – An iterable of
Filing
objects.- Returns:
True if
other
contains all filings in this set.- Return type:
- Raises:
ValueError – When any item in an iterable is not
Filing
.
- issuperset(other)¶
Report whether this set contains another filing set.
- Parameters:
other (iterable of Filing) – An iterable of
Filing
objects.- Returns:
True if this set contains all filings in
other
.- Return type:
- Raises:
ValueError – When any item in an iterable is not
Filing
.
- __hash__ = None¶