Intelligent screen reading #19200
fernando-jose-silva
started this conversation in
Ideas
Replies: 1 comment
-
|
This is a huge conceptual change and too broad as an issue. To make a change like this, we would need deeper research. To be able to consider this, please put together a thorough research proposal which involves user testing from an add-on / fork of NVDA. Here's a copy of our internal engineering project plan to serve as a guide # Engineering Project Plan Documentation
This document aims to explain how how to complete the engineering project plan, and then provides an example of a completed plan.
# How to complete the template
- **Project Name:** _A way to refer to the project_
- **External Backlog Link:** _Leave empty or link to doc on Google Drive_
- **Additional Requirements Doc Link:** _Leave empty or link to doc on Google Drive_
## Project Description
_A paragraph containing a high level overview of the project or 'elevator pitch'. This should include an overview of, background to, and purpose of the project._
To help with the elevator pitch, also answer / consider the following:
1. What will the project produce and when?
2. Who is this project for and why do THEY need it?
3. Why are WE doing this project? _Consider how it enhances the success of NVDA & NV Access_
4. How does this project fit in to the big picture or the broad plan?
5. What preliminary work has already been done?
## Project Goals
This section gives the broad goals of the project. Add a new level 3 heading 'Objective' for each.
### Objective
_A description of the Objective_
#### Test to measure success:
_Use bullets for clarity._
## Backlog
_This section contains [user-stories](https://en.wikipedia.org/wiki/User_story) that explain the project goals in more detail. Delete this section for larger projects using an external backlog._
It contains a table with 5 columns:
1. **User Story:** A description of the user story
1. **Priority:** One of _Must have_, _Should have_, _Nice to have_
1. **Estimate:** The number of developer days required to implement the story. This should aim to be accurate to within 50% of reality.
1. **Uncertainty:** How similar is this story to other work we have done in the past. As a general guide:
- **2**: We have done similar work many times.
- **3**: Contains some variation on work previously completed.
- **4**: Unlike other work we have done.
1. **Final Estimate:** Estimate * Uncertainty
Separate each of these fields with a vertical bar '|' character, with a newline at the end of the row. An initial and final vertical bar character is not required.
### Total
_The sum of all final estimates_
## Engineering risks
_What may impact on the success of this project and how can the risk be mitigated. Add a new level 3 heading for each risk_
### Risk Description
_Try to describe the risk. Identifying the type of risk will help when thinking of mitigation strategy eg: technical risk, or lack of domain knowledge"._
#### Management strategy
_Description of how we can mitigate the risk should it occur_
#### Resulting severity
- **Probability:** _How likely is the risk. EG:_
1. _low_
2. _medium_
3. _high_
- **Impact:** _Should the risk occur what is the impact to the success of the project, taking into account mitigations. EG:_
1. _low_
2. _medium_
3. _high_
**Final severity:**
_What is the final severity of the risk. Multiply together to get final value:
- **1-3**: Low
- **4-6**: Medium
- **7-9**: High
# Example plan
- **Project Name:** Offline image descriptions
- **External Backlog Link:**
- **Additional Requirements Doc Link:**
## Project Description
When an NVDA user encounters an image with a missing or inadequate description, they are at a disadvantage when compared to their peers. The implications may be an inability to proceed with their current task or social alienation. Previous projects have delivered Optical Character Recognition (OCR) and attempts have been made to utilise cloud based solutions to describing images. This project aims to provide a mechanism to provide automatic generation of basic image descriptions.
To help with the elevator pitch, briefly answer the following:
1. What will the project produce and when?
- Automatic description of an image.
- Highlighting and labelling of objects in an image.
2. Who is this project for and why do THEY need it?
- Blind users who may have no other means to understand what an image contains.
- Low vision users who may struggle to identify objects in an image.
3. Why are WE doing this project? (Consider how it enhances the success of NVDA & NV Access)
- To attract and keep more blind and low-vision users via innovative features.
4. How does this project fit in to the big picture or the broad plan?
- Expanding into low-vision domain.
5. What preliminary work has already been done?
- Adding OCR support
- Using caption bot for cloud based image descriptions, this was not accepted to core due to licensing concerns.
## Project Goals
### Objective
NVDA can be used to describe images missing metadata during offline usage.
#### Test to measure success:
- Demonstration using pre-prepared test data.
- Positive feedback from the community about the feature.
## Backlog
User Story | Priority | Estimate | Uncertainty | Final Estimate
------------|---------|-----------|--------------|----------------
As a blind user I can manually trigger an automatic description of an image. This helps me to get more context when there is no description or I am unhappy with the current description. | Must have | 10d | 4 | 40d
As a low vision user I can manually trigger NVDA to highlight and describe objects within an image. This helps me to understand what an image contains and how the subjects of the image relate to one another. | Must have | 10d | 4 | 40d
### Total
80d
## Engineering risks
### Risk Description
- Technical risk: Unable to reduce latency of speech.
- The project team may not be able to solve this problem to the satisfaction of stakeholders.
#### Management strategy
- Halt the project, and run an R&D prototyping project to solve the technical problem before commencing on the rest of the project.
#### Resulting severity
- **Probability:** Medium
- **Impact:** Medium
**Final severity:** Medium
### Risk Description
- Domain knowledge risk: Automatic Image descriptions.
- The project team does not have the specific domain knowledge required to work efficiently on the project.
#### Management strategy
_Example mitigation_
- Consult with experts in the domain for part or full duration of the project.
- Prepare with an introductory course on the domain.
#### Resulting severity
- **Probability:** High
- **Impact:** High, with an expert: Low
**Final severity:** Low |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Is your feature request related to a problem? Please describe.
Some Windows screens contain valuable information for the user, which the GUI doesn't provide:
Examples:
File movement screens that show which file is being copied and the copy speed.
File compressors that show which file is being manipulated, the total time, and the estimated completion time.
Windows settings screens that have texts such as "Inside system, about," and information about the Windows edition.
Windows Update, which has explanatory texts about power management.
Screens like Windows Incident, which displays information about the current channel and unavailable channels.
Windows update unattended installation windows, which show what the installer is doing, copying to the cache, and installations that were not performed.
Office installations that show explanatory texts about the programs to be installed.
And countless others.
All of this can be accessed if the user searches for it; if the user has never navigated through objects, they will never know that this exists.
Describe the solution you'd like
The proposal is that if the user wishes, NVDA could listen for and even actively search for updated texts and controls in the window.
And also for windows where this is not possible, NVDA itself could extract the texts from the window, and through text analysis, for example, text lengths, texts shown, texts in specific places on the screen, searching for words localized by language as keywords, among other analyses that people more intelligent than me would certainly know how to do.
This is without using AI; if an AI is available, this is further empowered.
The report can be automatic, or upon user request with a command. I also leave implementation strategies open for discussion.
Describe alternatives you've considered
No response
Additional context
Again, I'm leaving a proof of concept made by the gpt chat, and tested by me.
I don't want my code implemente
init.py
d, as it has glaring errors; remember, I'm not a programmer, but perhaps it can serve as inspiration.
Beta Was this translation helpful? Give feedback.
All reactions