There are a variety of reasons why model makers don’t disclose their data training particulars. (Let’s not even get into the issue of whether they have legal rights to do whatever training they did — though it’s tempting to do so, if only to explore the hypocrisy of OpenAI complaining about DeepSeek not getting permission before training on much of its data.)
Speaking of DeepSeek, don’t read too much into the lower cost of its underlying models. Yes, its builders cleverly leveraged open source to find efficiencies and lower pricing, but there’s been little disclosure of how much the Chinese government helped with DeepSeek’s funding, either directly or indirectly.
That said, if DeepSeek is the cudgel that puts downward pressure on genAI pricing, I’m all for it — and IT execs should be, too. But until we see evidence of meaningful price cuts, they should use the lack of data transparency in non-English models to try and get model maker pricetags out of the stratospheric.