because if I need to read any images now I can
I have to presume those must be images that contain text.  That may be a given, but I would really prefer when "reading images" is talked about it be differentiated.

The ability to OCR an image with much text is completely different than image description, and both are valuable.

