Shared Chat
Criticisms of OpenAI

One criticism of OpenAI is related to the challenge of evaluating and providing feedback for their language models. The models are constantly updated, making it difficult to compare different versions and track improvements .

Another criticism is that OpenAI has a closed policy when it comes to sharing their models and code. This has led to calls for more open sourcing and greater engagement with the open source community .

There are also concerns about the commoditization of large language models, with some suggesting that there may be a race to the bottom in terms of pricing and quality .

In addition, there are discussions around the need for better benchmarking of language models, especially in terms of real-world use cases and practical considerations like latency, cost, and throughput .

It's important to note that these criticisms are not unique to OpenAI and are part of wider debates and challenges in the field of AI and natural language processing .

1
swyx: He calls it Companions, I call it Waifus and Husbandos. But basically just everything that is interesting and hot in AI, we get a chance to step back from being hosts of the podcast to becoming guests. And so whenever we get the chance to do that, I just love having discussions between podcasters too, just because we know what people are talking about and we get to sit back and take a high level perspective about what's interesting and what people are discussing in the space. So enjoy. This is NLW, Alessio and I on the AI Breakdown.
(someone): Today on the AI Breakdown, I'm joined by the hosts of the Latent Space podcast to discuss everything that happened in AI last month from a technical and developer perspective. From llama to code interpreter to open source debates and beyond, this is your summer technical AI roundup. The AI Breakdown is a daily podcast and video about the most important news and discussions in AI. Go to breakdown.network for more information about our YouTube, our newsletter, and our Discord. Welcome back to the AI Breakdown. Today, I am very excited to be collaborating with the hosts of Latent Space. Latent Space is a podcast focused on AI development and AI engineering. It's hosted by Alessio, who is also a VC investing in AI and other frontier spaces, and Sean, better known as Swix, who is an AI developer and entrepreneur.
2
Alessio Fanelli: And I think there's been more chat online about sometimes when you do reinforcement learning, you don't know what reward and what part of the suggestion the model is anchoring on. Sometimes it's like, oh, this is better. Sometimes the model might be learning more verbose answers, even though they're right the same way. So there's a lot of stuff there to figure out. But I think some examples in the paper are clearly worse. Some of them are not as crazy. But I mean, OpenAI is under a lot of pressure on like the safety and like all the instruction side. And we cannot, like the best thing to do would be, Hey, let's version lock the model and like keep doing evals against each other. Like doing an eval today and an eval like that was like a year ago, there might be like 20 versions in between that you don't even know how the model has changed. So yeah, evals are hard. It's the TLDR.
swyx: I think basically what we're seeing is OpenAI having come to terms with the origin of itself as a research lab where updating models it's just a relatively routine operation versus a product or infrastructure company where it has to have some kind of reliability guarantee to its users. And so OpenAI, I think internally, its researchers are used to one thing, and then the people who come and depend on OpenAI as a product are used to a different thing.
3
(someone): It kind of talks a little bit to the fact that they felt that doing internally wasn't going to get anywhere, or maybe this speaks to some of the middle management type stuff within Google. And then to the point about OpenAI not having a moat, I think for large language models, it will be over time kind of a race to the bottom, just because the switching costs are so low compared with traditional cloud and SaaS. And yeah, there will be differences in quality, but over time, if you look at the limit of these things, I think Sam Altman has been quoted a few times saying that the marginal price of intelligence will go to zero over time. And the marginal price of energy powering that intelligence will also head over time. And in that world, if you're providing large language models, they become commoditized. Like, yeah, what is your mode at that point? I don't know. I think they're extremely well positioned as a team and as a company for leading this space. I'm not that worried about that. But it is something from a strategic point of view to keep in mind about large language models becoming a commodity.
(someone): So it's quite short, so I think it's worth just reading that entire section. It says, epilogue, what about open AI? All of this talk of open source can feel unfair given open AI's current closed policy. Why do we have to share if they won't? That's talking about Google sharing.
4
(someone): And I think OpenAI is very unique in the sense that at least at the present moment, we have so much inbound interest that there's, there is no desire for us to like do that type of developer advocacy work. So it's like more from a developer experience point of view, actually, like how can we enable developers to be successful? And that at the present moment is like building a strong foundation of documentation and things like that. And we had a bunch of amazing folks internally who were who are doing some of this work, but it really wasn't their full-time job. Like they were focused on other things and just helping out here and there. And for me, my full-time job right now is how can we improve the documentation so that people can build the next generation of products and services on top of our API. And it's, yeah, there's so much work that has to happen, but it's, it's, it's been a ton of fun so far.
swyx: I find being in developer relations myself, it's kind of like a fill in the blanks type of thing. You go to where you're needed the most. OpenAI has no problem getting attention. It is more that people are not familiar with the APIs and the best practices around programming for large language models, which is a thing that did not exist three years ago, two years ago, maybe one year ago. I don't know. When did you launch your API? I think you launched DALI first as an API, or I don't know. I don't know the history.
5
swyx: whatever metric you're optimizing for in your social network, if they start to decline, your change will be reverted tomorrow. Whereas here, like we just talked about, it's hard to measure and you don't get that much feedback. There's sort of the thumbs up and down action that you can take in OpenAI, but I'm sure most people don't give feedback at all. So OpenAI has very little feedback to go with on what is actually improving or not improving. And I think this is just normal. It's kind of what we want in a non-ad tracks universe. We've just moved to this subscription economy that everyone is pining for. And this is the result that we're trading off some amount of product feedback, actually.
(someone): Super interesting. So the one other thing before we leave OpenAI ecosystem, the one other big sort of feature announcement from this month was custom instructions. How significant do you think that was as an update?
swyx: So minor. So it is significant in a sense that you get to personalize TrackTBT much more than you previously would have, like it actually will remember facts about you, it will try to obey system prompts about you. You had this in the Playground since forever because you could enter in the system prompt in there and just ChatGPT didn't have it. And this is a rare instance of the ChatGPT team lagging behind the general capabilities of the OpenAI platform. And they just shipped something that could have been there a long time ago.
6
(someone): It kind of talks a little bit to the fact that they felt that doing internally wasn't going to get anywhere, or maybe this speaks to some of the middle management type stuff within Google. And then to the point about OpenAI not having a moat, I think for large language models, it will be over time kind of a race to the bottom, just because the switching costs are so low compared with traditional cloud and SaaS. And yeah, there will be differences in quality, but over time, if you look at the limit of these things, I think Sam Altman has been quoted a few times saying that the marginal price of intelligence will go to zero over time. And the marginal price of energy powering that intelligence will also head over time. And in that world, if you're providing large language models, they become commoditized. Like, yeah, what is your mode at that point? I don't know. I think they're extremely well positioned as a team and as a company for leading this space. I'm not that worried about that. But it is something from a strategic point of view to keep in mind about large language models becoming a commodity.
(someone): So it's quite short, so I think it's worth just reading that entire section. It says, epilogue, what about open AI? All of this talk of open source can feel unfair given open AI's current closed policy. Why do we have to share if they won't? That's talking about Google sharing.
7
(someone): Just the NVIDIA open source driver and this open source repo can launch a CUDA kernel. So rewriting the user space runtime is doable. Rewriting the kernel driver, I don't even have docs. I don't have any docs for the GPU. Like it would just be a massive reverse engineering project. I wasn't complaining about it being slow. I wasn't complaining about PyTorch not compiling. I was complaining about the thing crashing my entire computer. It panics my kernel. And I have to wait five minutes while it reboots because it's a server motherboard and they take five minutes to reboot. So I was like, look, if you guys do not care enough to get me a decent kernel driver, there's no way I'm wasting my time on this, especially when I can use Intel GPUs. Intel GPUs have a stable kernel driver and they have all their hardware documented. You can go and you can find all the register docs on Intel GPUs. So I'm like, why don't I just use these now? There's a downside to them. Their GPU is $350. And you're like, what a deal. It's $350. You know, you get about $350 worth of performance. And if you're paying about 400 for the PCIe slot to put it in, right? Like between the power and all the other stuff, you're like, okay, nevermind.
8
(someone): And I was talking with a researcher at Lockheed Martin yesterday, literally about like, like the version of this that's running of language models running on fighter jets, right. And you talk about like the, the amount of engineering precision and optimization that has to go into to those type of models. And the fact that that you spend so much money, like like training a super distilled, you know, version where milliseconds matter, you know, it's a life or death situation there, you know, and you couldn't even even remotely have a use case there where you could like call out and have, have API calls or something. So I do think there's like keeping in mind the use cases where, where there'll be use cases that I'm more excited about, you know, at the application level where, where, yeah, I want to just have it be super flexible and be able to call out to APIs and have this agentic type, type thing. And then there's also industries and use cases where, where you really need everything baked into the model. Yep.
swyx: Agreed. My, my favorite piece of take on this is I think, GPC 4 as a reasoning engine. which I think came from Nathan at Every.to, which I think, yeah, I see the 100 score over there. Simon, do you have a few seconds on Mojo?
(someone): Sure. So Mojo is a brand new programming language that was just announced a few days ago. It's not actually available yet.
9
(someone): And there's still not a clear way to overcome that, even using functions. That's number one. And number two is that I kind of want to echo what Mayo said before. And while I do appreciate that, again, this is a great advancement. It's super cool. I don't know if I would call it a game changer, because this has existed. Right. Like this, this has happened in various, various frameworks, you know, BlankChain, Guidance, you know, other, other frameworks have guardrails in place. Yes. This, this looks like a very good implementation of this concept of having guardrails and being able to basically force the LLM to do what you want. And yes, it's very, you know. It's beneficial to all of us that OpenAI, that owns these models, actually put some effort into fine-tuning their own models so that this works really well. But I just want to curb the enthusiasm a little bit to just say, this isn't earth-shattering. We haven't seen anything like this before.
(someone): The same could be said about Change.pt though. When Change.pt released,
10
(someone): the role of cheaper and cheaper embeddings on OpenAI's side and also the lock-in into OpenAI's ecosystem versus running them on client-side or, you know, in models for free on localhost.
(someone): Yeah, thanks for having me. Yeah, I think there's definitely two different use cases for these types of things, where what OpenAI is really providing is these very large-scale I mean, any business now that wants to embed all their data or any project that wants to, as you've mentioned, embed a large amount of data, they're going to benefit so greatly from these price reductions. And I mean, as we have some people on the stage here as well with vector databases, I mean, it's only going to accelerate that part of the space right now. And then the other option, which is sort of what I'm, it's funny, I'm not too sure if this is like a battle between these two sides, or it's like just two different use cases, is the, the client side running of these, you know, generating embeddings. And at, well, with the project I'm working on now, Transformers.js is basically running these models client side, running them in specifically, the way I started it was for running in the browser locally. And I think that as we, as I saw from a demo that was created like a week or two ago, there's, there's quite a bit of interest running these things locally.
11
swyx: Licensing issues also come in the form of terms of use, which is not an official license, but it's something you agree to when you use services. So OpenAI has this very famous clause in their terms of service, which basically states that you can't use OpenAI output as input for your training models, which is exactly what the Alpaca and Vicunia students did in Stanford to train their models that now compete with ChatGPT. So this is why in our conversation with Mike Conover, he talked about, he was very excited about Red Pajama, which is an open source replication of Llama, because Llama also has similar licensing issues, like Llama doesn't allow you to use it commercially, right? All these licensing issues, copyright issues, permissions issues are emerging areas that are being litigated. People are coming up with different ways to license this stuff. So for example, Hugging Face has this real license, responsible AI license, that is different from MIT, different from Apache 2. And that's the license that Stable Diffusion is under. But it has never been litigated in a court of law. It's not accepted by the OSI Institute as open source. So it's just unclear. Can you use it? You have to consult your lawyer, quote unquote, which is a real cop out to basically say nobody knows until some judge rules when a case is brought up. So that's the licensing thing.
Alessio Fanelli: There's a lot of work there too, like HuggingFace has built like a PI removal pipeline for like their development.
12
swyx: Yeah. That's true. That's true. My impression is a bunch of you are geniuses. You sit down together in a room and you get all your data, you train your model, like everything very smooth sailing. What's wrong with the image?
(someone): Yeah. So probably a lot of it just in that a lot of our serving infrastructure was already in place before then. So like, Hey, we were able to knock off one of these boxes that I think a lot of other people maybe struggle with. The opens are serving offerings are just, I will say not great in that, in that they aren't customized to transformers and these kinds of workloads where I have high latency and I want to like batch requests and I want to batch requests while keeping latency low. But one of the weird things about generation models is they're like autoregressive, at least for the time being, they're autoregressive. So the latency for a generation is a function of the amount of tokens that you actually end up generating, like that's like the math. And you can imagine, while you're generating the tokens, though, unless you batch a lot, it's going to end up being the case that you're not going to get great flop utilization on the hardware. So there's like a bunch of trade-offs here where if you end up using something completely off the shelf, like one of these serving frameworks, you're going to end up leaving a lot of performance on the table.
13
Alessio Fanelli: And I think there's been more chat online about sometimes when you do reinforcement learning, you don't know what reward and what part of the suggestion the model is anchoring on. Sometimes it's like, oh, this is better. Sometimes the model might be learning more verbose answers, even though they're right the same way. So there's a lot of stuff there to figure out. But I think some examples in the paper are clearly worse. Some of them are not as crazy. But I mean, OpenAI is under a lot of pressure on like the safety and like all the instruction side. And we cannot, like the best thing to do would be, Hey, let's version lock the model and like keep doing evals against each other. Like doing an eval today and an eval like that was like a year ago, there might be like 20 versions in between that you don't even know how the model has changed. So yeah, evals are hard. It's the TLDR.
swyx: I think basically what we're seeing is OpenAI having come to terms with the origin of itself as a research lab where updating models it's just a relatively routine operation versus a product or infrastructure company where it has to have some kind of reliability guarantee to its users. And so OpenAI, I think internally, its researchers are used to one thing, and then the people who come and depend on OpenAI as a product are used to a different thing.
14
(someone): But I think if you look at some of the benchmarks, it is on par or maybe a little shy of some of the Eletheri models. I think that one of the things that you may see here is that the market for foundation models and like the importance of having your own foundation model is actually not that great that like you have a few core trains that people I think of these kind of like stem cells where you know a stem cell is a piece of is a cell that can become more like its surrounding context. It can become anything upon differentiation when it's exposed to eye tissue or kidney tissue These foundation models are archetypal and then under fine-tuning become the specific agent that you have a desire for. And so I think they're expensive to train. They take a long time to train. Even with thousands of GPUs, I think you're still looking at a month to stand up some of these really big models, and that's assuming everything goes correctly. And so what open assistant is doing is I think representative of the next stage which is like open data sets and that's what the dolly release is also about is. I kind of think of it like an upgrade in a video game I don't play a ton of video games but I you know I used to and I'm familiar with the concept of like your character can now double jump. Right. Great. You know, it's like here's a data set that gives it the ability to talk to you. Here's a data set that gives it the ability to answer questions over passages from a vector index.
15
swyx: He calls it Companions, I call it Waifus and Husbandos. But basically just everything that is interesting and hot in AI, we get a chance to step back from being hosts of the podcast to becoming guests. And so whenever we get the chance to do that, I just love having discussions between podcasters too, just because we know what people are talking about and we get to sit back and take a high level perspective about what's interesting and what people are discussing in the space. So enjoy. This is NLW, Alessio and I on the AI Breakdown.
(someone): Today on the AI Breakdown, I'm joined by the hosts of the Latent Space podcast to discuss everything that happened in AI last month from a technical and developer perspective. From llama to code interpreter to open source debates and beyond, this is your summer technical AI roundup. The AI Breakdown is a daily podcast and video about the most important news and discussions in AI. Go to breakdown.network for more information about our YouTube, our newsletter, and our Discord. Welcome back to the AI Breakdown. Today, I am very excited to be collaborating with the hosts of Latent Space. Latent Space is a podcast focused on AI development and AI engineering. It's hosted by Alessio, who is also a VC investing in AI and other frontier spaces, and Sean, better known as Swix, who is an AI developer and entrepreneur.
16
swyx: So you can actually ask the language models to expose their log probabilities and show you how confident they think they are in their answer, which is very important for calibrating whether the language model has the right amount of confidence in itself. And in the GPT-4 paper, they were actually very responsible in disclosing that they used to have about linear correspondence between the amount of confidence and the amount of times it was right. But then adding RLHF onto GPT-4 actually skewed this prediction such that it was more confident than it should be. It was confidently incorrect, as people say. In other words, hallucinating. And that is a problem. So yeah, those are the main issues with benchmarking that we have to deal with.
Alessio Fanelli: Yeah. And a lot of our friends are founders. We work with a lot of founders. If you look at all these benchmarks, all of them just focus on how good of a score they can get. They don't focus on what's actually feasible to use for my product, you know? So I think like production benchmarking is something that doesn't really exist today, but I think we'll see the rise of, and I think the main three drivers are one, latency. You know, how quickly can I infer the answer? Cost. If I'm using this model, how much does each call cost me? Is that in line with my business model? And then throughput.
17
(someone): I think Tesla also has this, uh, dojo supercomputer where they try to have as essentially as fast, um, on-chip memory as possible and removing some of these data transfer back and forth. I think that's a promising direction. The issues I could see, you know, I'm definitely not a hardware expert. One issue is the on-chip memory tends to be really expensive to manufacture, much more expensive per gigabyte compared to off-chip memory. So I talked to, you know, some of my friends are at Cerebras and They have their own stack and compiler and so on, and they can make it work. The other kind of obstacle is, again, with compiler and software framework and so on. For example, if you can run PyTorch on this stuff, lots of people will be using it. But supporting all the operations in PyTorch will take a long time to implement. Of course your people are working on this so i think yeah we can i need these different bets on the hardware side as well. Where has my understanding is has a kind of a longer time scale so you need to design hardware you need to manufacture it maybe on the order of three to five years or something like that so. People are taking different bets, but the AI landscape is changing so fast that it's hard to predict what kind of models will be dominant in, say, three or five years. Or thinking back five years ago, would we have known that Transformer would have been the dominant architecture? Maybe, maybe not.
18
(someone): I think we're seeing that unfold in real time before our eyes. And I think the other interesting angle of this is, to some degree, LLMs, they don't really have switching costs. They are going to become commoditized. At least that's what a lot of people kind of think. To what extent is it a rate in terms of pricing of these things, and they all kind of become roughly the same in terms of their underlying abilities. And open source is going to be actively pushing that forward. And then this is kind of coming from, if it is to be believed, you know, the kind of Google or an insider type mentality around, you know, where is the actual competitive advantage? What should they be focusing on? How can they get back into the game? Uh, when, you know, when, when, when, when currently the, the, the external view of, of Google is that they're kind of spinning their wheels and they have this code red and, you know, it's like they're, they're playing catch up already. Like, uh, you know, could they use the open source community and work with them, which is going to be really, really hard, you know, from a structural perspective, given Google's place in the ecosystem, but a lot, a lot, a lot of jumping off points there.
Alessio Fanelli: I was going to say, I think the post is really focused on how do we get the best model, but it's not focused on how do you build the best product around it.
19
swyx: Let me confirm the tweet, let me find the tweet.
(someone): Okay, because actually I met somebody from Facebook machine learning research a couple of weeks ago and I pressed them on this and they said basically they don't think it'll ever happen because if it happens and then somebody does horrible fascist stuff with this model, all of the headlines will be Mark Zuckerberg releases a monster into the world. So a couple of weeks ago, his feeling was that it's just too risky for them to allow it to be used like that. But a couple of weeks is a couple of months in the AI world. So yeah, it feels to me like strategically, Facebook should be jumping right on this, because this puts them at the very lead of open source innovation around this stuff.
swyx: So I've pinned a tweet talking about Zuck and Zuck saying that Meta will open up Mama. It's from the founder of Obsidian, which gives it a slight bit more credibility, but it is the only tweet that I can find about it. So completely unsourced. We shall see. I mean, I have friends within Meta, I should just go ask them. But yeah, I mean, one interesting angle on the memo actually is that, and they were linking to this in a doc, which is apparently like, Facebook got a bunch of people to do, because they never released it for commercial use, but a lot of people went ahead anyway and optimize and build extensions and stuff. They got a bunch of free work out of open source, which is an interesting strategy.
(someone): I've got exciting piece of news.
20
(someone): Why do we have to share if they won't? That's talking about Google sharing. But the fact of the matter is, we are already sharing everything with them in the form of the steady flow of poached senior researchers. Until we spend that time, secrecy is a moot point. I love that. That's so salty. And in the end, open AI doesn't matter. They are making the same mistakes that we are in their posture relative to open source. And their ability to maintain an edge is necessarily in question. Open source alternatives can and will eventually eclipse them unless they change their stance. In this respect, at least, we can make the first move. So the argument this paper is making is that Google should go like meta and just lean right into open sourcing it and engaging with the wider open source community much more deeply. which OpenAI have very much signaled they are not willing to do. But yeah, read the whole thing. The whole thing is full of little snippets like that. It's just super fun.
swyx: Yes, read the whole thing. I also appreciated the timeline because it set a lot of really great context for people who are out of the loop.
Alessio Fanelli: The final conspiracy theory is that this got leaked right before Sundar and Satya and Tamal went to the White House this morning.
swyx: Yeah, did it happen? I haven't caught up.
21
Alessio Fanelli: So that's whatever.
(someone): The other thing is this, this will probably increase the demand, the compute demand on Azure from all of their enterprise customers, right? So, you know, whether they're selling compute to open AI or all the other enterprises they work with, you know, having more models available that, that everyone's using should, should just kind of keep growing that business.
(someone): Not to mention, I think a lot of their Azure customers probably have significant concerns about privacy, about putting sensitive business data through this, and being able to just run inference on your own hardware that you control probably is more appealing to them in some cases than running a REST API and calling out to OpenAI's infrastructure Azure.
(someone): They've got Azure endpoints for the OpenAI models. I'm actually not quite up to speed with the privacy model there, but my understanding is there's not really much difference.
(someone): My hunch is that it doesn't matter if it is, but what matters is what people feel, it's the vibes. And you see so many of these, so many people, so many companies saying, no, absolutely no way we would pump any of our private data through somebody else's model, even if they say they won't use it for training, which they will do. But whereas I guess maybe they're okay with pumping it through Microsoft Azure, but at least it's on our own GPU reserved instances. Maybe that's what's going on here. There's so much paranoia around this space at the moment.
22
(someone): It's been weird to me. It's gone from suspicious to frustrating to just curious. This seems to be such a hard problem. I think what's going on really is that to solve this, you kind of had to re-engineer it to this message-based API, because they have now reserved tokens. They have tokens that only they know exist that they can insert in as like quote marks. And they can have things like a system message. They can tune it in a way that it actually like, you know, like if you could peer into its brain and see like what circuit is it implementing, it's something that pays attention to the system message in the right way that it like understands the difference between what like the OpenAI customer told it or instructions and what the user told it. And I think that's progress in the right direction. It's a sensible target. It makes sense to me, as an outsider, that's how you would go about addressing this, but it's a big change. And so I think they're moving towards it, but it speaks to what a hard problem this is. Because it is since May, I think, since the preamble, where the original discoverers of it, in May, when they put it in responsible disclosure. And so it seems like it's just a deep issue with how
23
(someone): Unfortunately, Alessio was on vacation, so he couldn't help co-host. But fortunately, friend of the pod, Alex, joined in. And that's going to be the first voice that you hear. Alex has been doing a fantastic job running Twitter spaces every Thursday if you want to talk about just general AI stuff, as well as just follow him for his recaps of really great news. So without any further ado, here is our discussion on OpenAI's Functions API and the rest of the June updates.
(someone): For those of you who work with OpenAI 3.5, M4, et cetera, feel free to raise your hand and come up, ask questions as we explore this together. And I'll just say thanks to a few folks who joined me on stage, Aniston and John. And we've been doing some of these updates every Thursday, but this one is an emergency session. So we'll see, maybe Thursday we'll cover some more. So OpenAI today released. an update, the June update with a bunch of stuff. And we'll start with the simple ones, but we're here to discuss kind of the developer things. We'll start with the pricing updates. So 75% reduction in embedding price. This follows a 90% reduction of embedding costs back in, I want to say November, December.
Unknown error occured.