List view
- No due date•1/5 issues closed
Alaveteli must balance the benefits of using personal data to hold those in power to account and protecting individuals' right to privacy. There are situations where there is a legitimate interest to retain personal data but restrict the publishing of it. Our tooling to do this is limited and results in valuable public records being destroyed or removed from publication because small amounts of personal information cannot be effectively redacted.
No due date•0/4 issues closedThere are some significant difficulties in locating PII within requests due to the architecture and search indexing system used. We’ll explore replacing this with more modern vector-based technology which will prove more effective and resolve internationalisation issues, as well as reduce external dependencies making the function easier to operate. We'll also improve the text extraction. Currently we are unable to index Zip and Office 2007+ files, which are very common in ATI releases. Our PDF/Image text extraction is likely quite limited compared to modern capabilities. This could be done in the current format by shelling out to improved external tools, or we could consider a document extraction framework like Apache Tika.
No due date•0/11 issues closedAlaveteli's architecture makes it challenging to delete or anonymise user data completely. We want to create a hard boundary between the source archive raw email data and the active published content, so that we disconnect the email archive from the published content, and fill in the gaps in functionality so that all information we are publishing can be editable and deletable by site administrators.
No due date•1/10 issues closed