The Inside Story of NVDA: where do control names and roles come from #NVDA_Internals
Hi all, For our new friends: if you are a new NVDA user, or for that matter, new to screen readers, parts of what I am about to describe can be overwhelming since it will go into the inner workings of a screen reader. As a developer about to embark on a different journey in life soon, I feel this is one of those opportunities where I can pass on whatever I know about screen reading to help the next group of users and would-be developers. I will do everything I can to make the information digestible. This Inside Story is part of a series of posts that will go into NVDA objects, a central part of NVDA that makes screen reading possible. People who have been participating in NVDA users list know that I can go on and on for hours about a single thing, and NVDA objects is one of those. However, since the story of NVDA objects will touch a key concept in computer programming that will take several posts to unpack (namely object-oriented programming), I’m dividing the whole discussion of NVDA objects into multiple parts. Besides, I want to be practical in most of what I write, especially as I respond to posts coming from this forum and elsewhere. The NVDA objects series consist of:
As a first step, let’s find out how NVDA can announce control labels (names) and roles without doing screen scraping (hint: accessibility API’s):
A few days ago, a question arose in this thread about what NVDA will say when encountering a gear icon in web browser applications, with the answer being “depending on what the app developer says.” Shortly after, while talking about obtaining system specs, Brian V asked if we can determine the presence of child objects by looking at roles, to which I replied, “perhaps treating roles as a heuristic.” But how are these related to NVDA objects? Enter accessibility API’s and pointers (yes, for people coming from older programming languages, remember the concept of pointers?). And no, NVDA does not scrape the screen to determine what the control role is, nor analyze icons to find out what the label (name) is… well, for the most part (there are times when NVDA does rely on screen information, called “display model” to determine what a control is but that is in some special circumstances). So to give you the “real” answer to the linked discussions: NVDA asks accessibility API for information such as control name (label) and role and other properties. How NVDA does it (internally) is the focus of the rest of this post. For the most part, controls on screen have various properties: name, role, state, location, you name it. But how exactly do screen readers know this information? This is done by asking accessibility API’s to fetch information on demand. Think of accessibility API’s such as Microsoft Active Accessibility and UI Automation as a “middleman” that keeps the transactions between a screen reader and the app going. If you think about this statement, you may come to the conclusion that control properties are ultimately set by app developers, telling the app and the accessibility API to expose properties in certain ways for consumption by the screen reader, and in the end, announce properties in certain way to users. But that’s only part of it. You may then ask, since NVDA seems to know about various accessibility API’s, how can the screen reader recognize which accessibility API can be used to ask the control for name and role? This happens close to creation of an NVDA object (one such instance is handling focus change events) to represent the screen control you are dealing with. NVDA performs a number of tests to determine which accessibility API to use to interact with a given control (one of them is window class name as exposed through Windows API), and based on test results, it constructs an appropriate NVDA object representing an API class (an API class is a collection of NVDA objects for a given accessibility API; I will come back to this concept in a later Inside Story post). But this is still not the complete story. So NVDA has constructed an NVDA object representing the focused control, and how does NVDA obtain its name and role? Two pieces makes this possible: object properties and accessibility API representative. Every NVDA object derives from an abstract class appropriately named “NVDAObject” (NVDAObjects.NVDAObject), defining a base implementation of ways to obtain properties such as name and role. These are defined as “_get_property” (a getter method) e.g. _get_name to obtain control name, _get_role for role. Doing so allows NVDA to query properties by attribute access (object.something e.g. object.name for control label, object.role for role). But since the base NVDA object is just a blueprint (technically called an “abstract base class”) that either provides a default implementation or does nothing when properties are asked, it cannot be used on its own to announce control labels and roles. This is why almost all NVDA objects derive their power from the base NVDA object and are seen as IAccessible/IAccessible2/JAB/UIA objects, a subject of another Inside Story post as it will go over some object-oriented programming concepts. So what allows NVDA to work with different API’s to obtain control properties? Through an accessibility API representative, technically called accessibility objects/elements; in reality, these are pointers (a pointer is something that directs a program to a specific memory location/address; why that’s such an important material is beyond the scope of this forum as it touches on various aspects of programming and computer science). Almost all NVDA objects must have a pointer to an accessibility API object or element as an attribute in order for the “magic” of control property announcement to occur (exceptions do exist, including window objects, and of course, the base NVDA object). For example, IAccessible objects include IAccessibleObject, a pointer to the MSAA object representing the control at hand; in UIA world, this is UIA element. Although accessibility API objects operate differently and may expose the same property in different ways, NVDA can work across API’s simply because the same code will work across API’s to obtain the same property. To illustrate, suppose you move system focus to desktop (Windows+D), and NVDA announces the name of the focused desktop icon. But how can NVDA do so without analyzing icons? This is how:
As another example, suppose you open Windows 10/11 Start menu, and the search box receives focus. The difference from the above example is that NVDA is dealing with a UIA object. This calls for creating a UIA NVDA object, containing a pointer to a UIA element (UIAElement). The UIA element in turn is called upon to obtain name and role coming from name and control type properties, respectively. Just like MSAA control roles, UIA API handler (UIAHandler) includes a map of UIA control types to NVDA roles. Regardless of which accessibility API is in use, two things remain the same: control properties such as name and role are determined by app developers, and a single source code can handle various accessibility API’s. For example, calling focus.name from NVDA Python Console will return control label regardless of which accessibility API pointer and property (accName from IAccessibleObject/name property from UIAElement) is called in the end. The label text is what the app developers say the control is, and it is the job of developers to reveal this information to accessibility API’s so screen readers and users can figure out what the control is. This is why in some documentation, apps are called “servers” while screen readers are called “clients.” Remember the “middleman” analogy I used to describe accessibility API’s? This is why. Let me end with two things: answering the question of app accessibility responsibilities and keeping up with changing accessibility workaround landscape. From time to time this forum is asked, “who do I turn to make an app accessible?” Is it app developers, screen reader vendors, or both? While both parties are responsible, I tend to put more weight on app accessibility being a responsibility of app developers. Strictly from the perspective of accessibility information retrieval, screen readers are consumers. People can argue that screen readers are producers, but when we think about apps and where they come from, accessibility and usability of apps are the responsibility of the very people who coded the app in the first place. This is why I have and will continue to advise that the first people to contact regarding app accessibility should be app vendors, not screen reader vendors such as NV Access because dedication to app accessibility from app vendors benefits more than NVDA users. I hope the above Inside Story on accessible control labels and roles gave you reasons for me saying this. Lastly, the accessibility workaround landscape (yes, I did say workaround) is different between 1990s and today. When MSAA was at its infancy (late 1990s and early 2000s), screen scraping was an effective way to obtain information about controls on screen. People who were using JAWS in the 2000s may remember a note from Vispero (formerly Freedom Scientific) about optimal screen resolution settings. Nowadays, it is becoming (and has become) a norm to use accessibility API’s such as UIA to retrieve control properties, and this came about thanks to continued advocacy from the disability community and ongoing dialogue and standardization efforts such as WCAG (Web Content Accessibility Guidelines) and WAI-ARIA (Web Accessibility Initiative/Accessible Rich Internet Applications). This is why notes such as screen resolutions and optimal screen settings for screen readers no longer apply, at least for NVDA users (for the most part; as I noted above, there are specific situations where NVDA can scrape the screen to obtain things). The key takeaway from this Inside Story is this: accessibility API’s cannot replace the mindset of app developers. Hope this post answers and clarifies many things at once. Cheers, Joseph |
|