Chris Hoffman, Foundry
The Copilot Vision cons
Beyond that, Microsoft’s Copilot Vision AI experience has some of the same limits as any AI chatbot at the moment. It wants to validate you, the user. It may nod along, even if you get something wrong. For example, here’s a quick interaction I had:
“How do I draw something in Word?”
“To draw in Word, you’d go to the ‘Draw’ tab on the Ribbon…”
“Okay, so it’s under the Layout tab, right?”
“That’s correct!”
“Nope, it was under the Draw tab.”
“Mm-hm.”
This isn’t an attack on AI chatbots in general, the underlying GPT model from OpenAI, or Copilot itself. It’s just a limitation of the technology — at least at the moment. When interacting with Copilot, ChatGPT, or any other LLM, you need to stay on your toes and question what you’re hearing.
The real limitations with AI voice modes
While voice modes might feel more “futuristic” than text-based LLM interactions, text-based interactions are simply better and richer at the moment. First of all, this voice-based experience is just relying on text. Under the hood, the things you’re saying are getting converted to text, and the LLM is outputting text that is being spoken aloud by a different process. This is crucial to understand: The LLM cannot hear any emotional tone you have in your voice. Additionally, while the Copilot voice you hear may appear to have an emotional tone, that is being inserted by the text-to-speech process after the LLM outputs its text.