Currently working on parsing summary of business results of those listed on JPX.
Starting from April 23, 2025, we are laying out 27,534 documents (the ASRs), which amounts to 465,693 HTML files, over a web app for crawling. Some files seem to have failed during the initial crawl, so, stay tuned. There's nothing proprietary about the documents, but we'll have to see if the serving costs are permissible or somehow funded.
On November 12, 2024, the FSA of Japan released a new version of the taxonomy. This taxonomy encompasses deeper insight into the consolidated number of employees, but let us (actually, it's only me as of yet) keep in mind that the segment information is unique to the reporting company and will have some performance implications,or complications, rather.
We began collecting the pay equity ratio from EDINET, starting from the year ending on March 31, 2024.
This month, June 2024, is the month where the filings are released under the new taxonomy dated December 1, 2023. Our focus is on stable and proper data ingestion. KNOWN ISSUE: our CMEK may not become available for a blob after its `object_finalize` event, sometimes, for quite a while. We've set a temporary goal at "300s" (Corrected) for CMEK availability.
Preparing for upcoming updates effective March 31, 2024. The new taxonomy is dated December 1, 2023.
Starting from September 18, 2023, company-specific, top-line accounting items will be parsed into the data warehouse. Retroactively parsed for the past 5 years.
Known issue: the current translation table cannot recognize revenues stated under rare or non-standard accounting items.
I began moving my servers from bullseye to bookworm on the day of its release.
I updated the translation table accordingly on November 13, 2022.
Dated November 8, 2022, the FSA of Japan quietly released the EDINET Taxonomy for the forthcoming year of 2023.
https://www.fsa.go.jp/search/20221108.html
The fundamental data pipeline is pretty much done.