Introducing Sotto
Introducing Sotto
Today, I'm announcing that my first commercial project, Sotto, has reached its early alpha stage! This post will serve as a page for the project for now, until it has its own independent website. Being a commercial project, the code isn't available publicly, but there is an earlier open-source version available here.
Throughout the history of computing, there have been many attempts to nail down exactly what the most core feature of a computer is, and I think, while it may not be the most fundamental, the most important feature of a computer is how users can interact with it. In today's world, our standard human-computer interface is a keyboard, mouse, and display, which works pretty well for the majority, but, for the 200mn people around the world with macular degeneration, and the at least 1.2bn people with some form of non-trivially-treatable visual impairment, this poses significant challenges.
For a person with a visual impairment, especially a highly variable one like macular degeneration, a computer can become exceedingly hard to use, something I've seen in my own grandfather, who has been an inspiration to me my whole life. As he's grown older though, he has become almost completely blind: although he can still make out blurry shapes and colours out of the corner of one eye, the harsh light emitted by a computer (rather than the reflections that normal objects are coloured by) is almost imperceptible, making seeing what's on a screen extremely challenging. Even with the largest font size and yellow text on a black background, he has reached the point over just five years where he can't see what he's doing at all. For a man whose mind is still incredibly active, and for a man who is currently in the process of writing his autobiography, this is incredible frustrating, and, as well as he takes this, and as little as he complains, this is very similar to the situations of millions of people around the globe, for everything from memoirs to daily emails. Screen reader technology is something, although unintuitive for many elderly people, but we still have no reliable program on the market that allows the severely visually impaired to not only write, but also to edit text, the fundamental way we interact with a computer.
So, over the last few months, with a sprint supported by the University of New South Wales over the last six weeks, I have built an application designed to solve this: Sotto is a desktop app powered by OpenAI's Whisper speech-to-text model, and, with the highest accuracy of any openly-available STT model, Sotto can capture text dictated with stunning fidelity. Additionally, I've designed a simplified keyboard input method that only requires the user to be able to find the spacebar (my grandfather, as an example, cannot find an individual key anymore), and then they can move the selected text across in increments of sentences or paragraphs. From there, with the ability to replace sentences, insert before and after them, and have them read back (this last feature is still in development), Sotto provides a comprehensive and extremely simple editor for the blind and visually impaired to produce and edit whatever documents they might need.
From here, my vision is clear: with funding, I will be able to purchase the compute power necessary to further fine-tune the models I'm using and pioneer world-first comprehensive research on prompting strategies for priming Whisper for technical vocabulary (the results of which I will release publicly), such as the medical vocabulary my grandfather (a doctor all his life) uses so regularly even today. When the first version of Sotto is publicly released, I will provide a simple free version, with the aim of improving as many people's lives as possible, and paid versions with further features. These will include:
- AI-aided editing through a command palette
- Automatic optical character recognition to allow editing and annotating other documents like PDFs
- System integration to allow using Sotto to work in other apps
- (Pending a security review) the ability to log in using your voice
These features will form the basis of something that I want to appeal far beyond the blind and visually impaired: a universal, voice-powered human-computer interface. This will require breakthroughs in agentic artificial intelligence and text-to-speech quality, but, with the technology we have available, and the significant research and prototyping I've already put into this, I'm confident that I can help to realise the futuristic vision of every person on the planet being able to at the very least, control documents with their voice, and, ideally, to control their whole computer, using solely their words. The increases in productivity this would bring are extraordinary, and, if aided by AI that can interpret complex commands, I think this will enable, in partnership with existing innovators in this field, a new era of human-computer interaction.
How far I will take this, I don't yet know. All I know right now is that I am committed to making sure that my grandfather, and every other visually impaired person on the planet, has access to software that can empower them to interact with computers as efficiently as possible, regardless of their disability. Visual impairment is something that can happen to anyone: my grandfather is a former Senior Australian of the Year and was the vice-president of the International Physicians for the Prevention of Nuclear War when they were awarded the Nobel Peace Prize --- he is the most incredible man I have ever met, and his ability to write and convey his ideas to others has been squeezed dramatically by an unseen evil of biology, the natural degradation of the eye. If I only get to help him with this, I will consider it a journey worth taking.
Check back on this page soon for further updates, and see here for contact details if you'd like to try out the project or chat about collaborating!