The Inside Story of NVDA: where do control names and roles come from #NVDA_Internals


 

Hi all,

For our new friends: if you are a new NVDA user, or for that matter, new to screen readers, parts of what I am about to describe can be overwhelming since it will go into the inner workings of a screen reader. As a developer about to embark on a different journey in life soon, I feel this is one of those opportunities where I can pass on whatever I know about screen reading to help the next group of users and would-be developers. I will do everything I can to make the information digestible.

This Inside Story is part of a series of posts that will go into NVDA objects, a central part of NVDA that makes screen reading possible. People who have been participating in NVDA users list know that I can go on and on for hours about a single thing, and NVDA objects is one of those. However, since the story of NVDA objects will touch a key concept in computer programming that will take several posts to unpack (namely object-oriented programming), I’m dividing the whole discussion of NVDA objects into multiple parts. Besides, I want to be practical in most of what I write, especially as I respond to posts coming from this forum and elsewhere.

The NVDA objects series consist of:

  1. Where do control names and roles come from (this post): I will talk about how exactly NVDA know about what control you are dealing with, stemming from a recently discussion on control labels in apps.
  2. The “actual” anatomy of NVDA objects: this is the one that will get really geeky as I will talk about classes, getters and setters, attributes, inheritance, and abstraction in addition to talking about what makes up an NVDA object, hopefully in a way that is understandable.
  3. Overlay classes (very close to add-on development territory): the heart of many add-ons providing improved support for controls and apps. Only after understanding how NVDA objects are organized will this part make sense.

As a first step, let’s find out how NVDA can announce control labels (names) and roles without doing screen scraping (hint: accessibility API’s):

 

A few days ago, a question arose in this thread about what NVDA will say when encountering a gear icon in web browser applications, with the answer being “depending on what the app developer says.” Shortly after, while talking about obtaining system specs, Brian V asked if we can determine the presence of child objects by looking at roles, to which I replied, “perhaps treating roles as a heuristic.”

But how are these related to NVDA objects? Enter accessibility API’s and pointers (yes, for people coming from older programming languages, remember the concept of pointers?). And no, NVDA does not scrape the screen to determine what the control role is, nor analyze icons to find out what the label (name) is… well, for the most part (there are times when NVDA does rely on screen information, called “display model” to determine what a control is but that is in some special circumstances). So to give you the “real” answer to the linked discussions: NVDA asks accessibility API for information such as control name (label) and role and other properties. How NVDA does it (internally) is the focus of the rest of this post.

For the most part, controls on screen have various properties: name, role, state, location, you name it. But how exactly do screen readers know this information? This is done by asking accessibility API’s to fetch information on demand. Think of accessibility API’s such as Microsoft Active Accessibility and UI Automation as a “middleman” that keeps the transactions between a screen reader and the app going. If you think about this statement, you may come to the conclusion that control properties are ultimately set by app developers, telling the app and the accessibility API to expose properties in certain ways for consumption by the screen reader, and in the end, announce properties in certain way to users. But that’s only part of it.

You may then ask, since NVDA seems to know about various accessibility API’s, how can the screen reader recognize which accessibility API can be used to ask the control for name and role? This happens close to creation of an NVDA object (one such instance is handling focus change events) to represent the screen control you are dealing with. NVDA performs a number of tests to determine which accessibility API to use to interact with a given control (one of them is window class name as exposed through Windows API), and based on test results, it constructs an appropriate NVDA object representing an API class (an API class is a collection of NVDA objects for a given accessibility API; I will come back to this concept in a later Inside Story post). But this is still not the complete story.

So NVDA has constructed an NVDA object representing the focused control, and how does NVDA obtain its name and role? Two pieces makes this possible: object properties and accessibility API representative. Every NVDA object derives from an abstract class appropriately named “NVDAObject” (NVDAObjects.NVDAObject), defining a base implementation of ways to obtain properties such as name and role. These are defined as “_get_property” (a getter method) e.g. _get_name to obtain control name, _get_role for role. Doing so allows NVDA to query properties by attribute access (object.something e.g. object.name for control label, object.role for role). But since the base NVDA object is just a blueprint (technically called an “abstract base class”) that either provides a default implementation or does nothing when properties are asked, it cannot be used on its own to announce control labels and roles. This is why almost all NVDA objects derive their power from the base NVDA object and are seen as IAccessible/IAccessible2/JAB/UIA objects, a subject of another Inside Story post as it will go over some object-oriented programming concepts.

So what allows NVDA to work with different API’s to obtain control properties? Through an accessibility API representative, technically called accessibility objects/elements; in reality, these are pointers (a pointer is something that directs a program to a specific memory location/address; why that’s such an important material is beyond the scope of this forum as it touches on various aspects of programming and computer science). Almost all NVDA objects must have a pointer to an accessibility API object or element as an attribute in order for the “magic” of control property announcement to occur (exceptions do exist, including window objects, and of course, the base NVDA object). For example, IAccessible objects include IAccessibleObject, a pointer to the MSAA object representing the control at hand; in UIA world, this is UIA element. Although accessibility API objects operate differently and may expose the same property in different ways, NVDA can work across API’s simply because the same code will work across API’s to obtain the same property.

To illustrate, suppose you move system focus to desktop (Windows+D), and NVDA announces the name of the focused desktop icon. But how can NVDA do so without analyzing icons? This is how:

  1. A system focus event is fired (raised) by the desktop icon.
  2. NVDA recognizes the focus event and determines what kind of control it is dealing with.
  3. After running some tests, NVDA figures out it is working with an MSAA control, so it constructs an IAccesible NVDA object (NVDAObjects.IAccessible.IAccessible). A key attribute for MSAA object is IAccessibleObject, so NVDA will obtain a pointer to it as well.
  4. Even though it is an MSAA object, NVDA knows that it is a custom MSAA object, so it calls NVDAObjects.IAccessible.IAccessible.findOverlayClasses method to determine what to do, eventually learning that it is a Dynamic_SysListView32EmittingDuplicateFocusEventsListItemIAccessible object (don’t worry about this lengthy name).
  5. Now that a proper MSAA NVDA object was created, NVDA now asks IAccessibleObject to return properties such as name and role. In MSAA, the “accName” property from IAccessibleObject holds the control name property, and “accRole” records the object role. These are retrieved from _get_name and _get_role getters defined in IAccesible NVDA object class, respectively.
  6. For control role, an extra step is performed to let NVDA present roles in a friendly way. NVDA’s MSAA API handler (IAccessibleHandler) has a map of MSAA roles to NVDA roles, and that is consulted to return the “NVDA role”.
  7. The steps performed to obtain name and role are used to retrieve other properties such as states, and all these are then presented to users.

 

As another example, suppose you open Windows 10/11 Start menu, and the search box receives focus. The difference from the above example is that NVDA is dealing with a UIA object. This calls for creating a UIA NVDA object, containing a pointer to a UIA element (UIAElement). The UIA element in turn is called upon to obtain name and role coming from name and control type properties, respectively. Just like MSAA control roles, UIA API handler (UIAHandler) includes a map of UIA control types to NVDA roles.

Regardless of which accessibility API is in use, two things remain the same: control properties such as name and role are determined by app developers, and a single source code can handle various accessibility API’s. For example, calling focus.name from NVDA Python Console will return control label regardless of which accessibility API pointer and property (accName from IAccessibleObject/name property from UIAElement) is called in the end. The label text is what the app developers say the control is, and it is the job of developers to reveal this information to accessibility API’s so screen readers and users can figure out what the control is. This is why in some documentation, apps are called “servers” while screen readers are called “clients.” Remember the “middleman” analogy I used to describe accessibility API’s? This is why.

Let me end with two things: answering the question of app accessibility responsibilities and keeping up with changing accessibility workaround landscape. From time to time this forum is asked, “who do I turn to make an app accessible?” Is it app developers, screen reader vendors, or both? While both parties are responsible, I tend to put more weight on app accessibility being a responsibility of app developers. Strictly from the perspective of accessibility information retrieval, screen readers are consumers. People can argue that screen readers are producers, but when we think about apps and where they come from, accessibility and usability of apps are the responsibility of the very people who coded the app in the first place. This is why I have and will continue to advise that the first people to contact regarding app accessibility should be app vendors, not screen reader vendors such as NV Access because dedication to app accessibility from app vendors benefits more than NVDA users. I hope the above Inside Story on accessible control labels and roles gave you reasons for me saying this.

Lastly, the accessibility workaround landscape (yes, I did say workaround) is different between 1990s and today. When MSAA was at its infancy (late 1990s and early 2000s), screen scraping was an effective way to obtain information about controls on screen. People who were using JAWS in the 2000s may remember a note from Vispero (formerly Freedom Scientific) about optimal screen resolution settings. Nowadays, it is becoming (and has become) a norm to use accessibility API’s such as UIA to retrieve control properties, and this came about thanks to continued advocacy from the disability community and ongoing dialogue and standardization efforts such as WCAG (Web Content Accessibility Guidelines) and WAI-ARIA (Web Accessibility Initiative/Accessible Rich Internet Applications). This is why notes such as screen resolutions and optimal screen settings for screen readers no longer apply, at least for NVDA users (for the most part; as I noted above, there are specific situations where NVDA can scrape the screen to obtain things).

The key takeaway from this Inside Story is this: accessibility API’s cannot replace the mindset of app developers.

Hope this post answers and clarifies many things at once.

Cheers,

Joseph