Shared Chat
What ML techniques does perplexity AI use? How do they collect data and train their models?

Perplexity AI uses reinforcement learning techniques, specifically Reinforcement Learning from Human Feedback (RLHF) .

As for data collection, Perplexity AI uses contractors to collect feedback from users regarding their summaries and completions. Contractors are also used to rate the summaries themselves. Additionally, LLMs (Large Language Models) can be prompted to do the work of the contractors .

Perplexity AI is committed to ensuring factual accuracy in their responses. They have a citation-first approach and strive to only provide information that can be cited or backed up by reliable sources .

It's worth mentioning that Perplexity AI is focused on growing their product and user base, with an emphasis on fast iteration and continuous improvement .

For more detailed information on Perplexity AI's techniques and strategies, you can refer to the provided sources.

1
(someone): You need to build an end-to-end platform where users can correct the mistakes of an LLM. And that also means you need to design the platform where the incentives right for the user, because this could also be used in the other way where users can use it to hide information. So we haven't really thought through all these issues thoroughly, but we are committed to sort of figuring these things out over time.
Elad Gil: That makes sense. I guess in addition to that, or maybe related, you've done an impressive amount of research in reinforcement learning. What's unique about the way that Perplexity uses reinforcement learning and how does it tie into these plans?
(someone): We like RLHF, like Reinforcement Learning from Human Feedback, where we use the contractors to collect feedback from the users on whether they like the summaries, the completions or not. And we use contractors to do the sum ratings themselves. And these days, even LLMs can be prompted to do the work that contractors do. Anthropics has written a paper on that. So all these things are getting really very efficient to do. So that's sort of how we've been thinking about reinforcement learning right now. But we haven't gone beyond that to think of like agents and browsers and things like that. We'll probably focus more on the first part for the next at least six months to a year.
(someone): I want to add on this aspect. Full-blown like RHF is, you know, definitely something we're going to look into that. But there is like several, many steps that you can
2
(someone): And you were like, uh, This is not going to work. You need to think about distribution and search was always the core motivation for me and Dennis and many others at the company. We were just bouncing around ideas and then I started opening around the time and then we left and Dennis also was still at Meta. Then he also left and Andy came in too. help us sort of incorporate and get the company rolling. So that's sort of how the company started. The space of LMS was exciting. Generative models were really exciting. And in general, we were motivated about search, whether it be a general search or vertical search. And we were bouncing around several different ideas. One of the ideas that you gave us was working on text to SQL. And we were pretty excited about that and started prototyping ideas around that. And I think Dennis also was hacking with us on building a Jupyter Notebook extension with Copilot for every cell. And then we were trying it with SQL around databases. But it's all a bunch of nonlinear pathways to eventually get to where we are right now.
Elad Gil: Yeah, absolutely. And hopefully I caveated whatever feedback I gave with. I'm probably wrong, but since I think I'm often wrong on directions.
(someone): No, I think whatever you said still applies. Search is tremendously a distribution game as much as a technology game.
Elad Gil: Yeah, I think one of the really impressive things about Perplexity is the rate of iteration.
3
Sarah Guo: Hey, No Priors listeners. Sarah Nolad here with a quick announcement.
Elad Gil: No Priors is now on YouTube.
Sarah Guo: Search for No Priors podcast on YouTube, and I'm cringing while saying this, like, share, subscribe.
Elad Gil: Okay, back to the show.
(someone): So for factual accuracy, our first step towards that was making sure you can only say stuff that you can cite. It's not just that we want to retrofit citations into a chatbot. That's not what perplexity is. In fact, it's more like a citation-first service.
Sarah Guo: This is the KnowPriors podcast. I'm Sarah Guo. I'm Elad Gal. We invest in, advise, and help start technology companies.
Elad Gil: In this podcast, we're talking with the leading founders and researchers in AI about the biggest questions. With advances in machine learning, the way we search for information online will never be the same. We're back again to talk about the future of search. This week on the No Priors podcast, I'm excited to introduce our next guest from Perplexity AI. Perplexity is a search engine that provides answers to questions in a conversational way and hints at what the future of search might look like. Arvind Srinivas is co-founder and CEO of Perplexity.
4
(someone): And the other thing is obviously prosumerization of this, where we're already sort of beginning to see our Chrome extension pick up rapidly, heading towards 100,000 users. And extensions like Grammarly have the sort of free version and the prosumer version, which has more features in them. So there are ways to do that through the browser extension as a productivity assistance sort of thing. We already see some kind of search by that. Every time you're on a site, you can ask it to do things for you. And then there's the whole, as we keep getting more and more traffic onto our site, like say hundreds of millions of people come to our site at one point eventually, that becomes a ripe ground for serving ads. But we need to not make the mistake that Google did of combining ads into the core search product itself and figure out an alternative pathway like Facebook did. And that might work out better for us. Subscription-based search has been tried by other companies, and that's something that ChatGPT is also trying. So we don't know yet if it's high margin enough. And so if a bigger behemoth like Google or Bing just put out the same or like even 80% as good as you for free, then you're never going to make it as a subscription product. So we are likely to stay away from that pathway, but we don't know yet. And the final piece is like, if perplexity becomes like something that a lot of people want to use for their own internal data, their links or their bookmarks or their company.
5
(someone): research scientists, also very good engineers, very good engineers. And we've noticed through DeepMind and OpenAI, the companies that made the most progress over the last six years or five years were companies where they're extremely good engineers. And it's kind of like, I didn't like from the beginning, this, I guess, view from like academics, that's just like, if you're an engineer, you're probably not, you know, either like smart enough or you're not going to do like great things, but turns out it's actually the other way around. So that was also, I think like motivation. And I feel like you don't need to be, you know, this like very impressive, like academic with PhD to do great things.
(someone): Yeah. This also goes back to the thing Dennis said earlier, that you want to find the people who really want to get into AI. rather than who are already in AI. And every company that's gotten big has done this in their early days, including Google. Like they got a lot of systems people, Jeff Dean was a compilers and systems person. And then they got, and Urs Kroszlow is like a systems professor. They got all these amazing people and they told them, hey, you know, guys, like we're having the most interesting computer science problems to solve here and we can scale it up. So come work on search. So it's not like you have to go, there are few people they hired from information retrieval or search background, but most of the celebrity researchers that they have right now are just people who wanted to get into that space for the first time.
6
(someone): You need to build an end-to-end platform where users can correct the mistakes of an LLM. And that also means you need to design the platform where the incentives right for the user, because this could also be used in the other way where users can use it to hide information. So we haven't really thought through all these issues thoroughly, but we are committed to sort of figuring these things out over time.
Elad Gil: That makes sense. I guess in addition to that, or maybe related, you've done an impressive amount of research in reinforcement learning. What's unique about the way that Perplexity uses reinforcement learning and how does it tie into these plans?
(someone): We like RLHF, like Reinforcement Learning from Human Feedback, where we use the contractors to collect feedback from the users on whether they like the summaries, the completions or not. And we use contractors to do the sum ratings themselves. And these days, even LLMs can be prompted to do the work that contractors do. Anthropics has written a paper on that. So all these things are getting really very efficient to do. So that's sort of how we've been thinking about reinforcement learning right now. But we haven't gone beyond that to think of like agents and browsers and things like that. We'll probably focus more on the first part for the next at least six months to a year.
(someone): I want to add on this aspect. Full-blown like RHF is, you know, definitely something we're going to look into that. But there is like several, many steps that you can
7
(someone): He was not in AI. He was a competitive programmer, a trader. He had worked at Quora for a year, but he's as good as anybody can get in picking up new things. So the other thing also is that LLMs are sort of in this weird territory where the people who use the LLMs for building stuff understand it better than the people who actually did gradient descent and trained these models. Like you could find a PhD student at Stanford or Berkeley who would know a lot about how to train the model, but they might not be the best person to build a product with it.
(someone): Yeah, I wanna like quickly add to this point. So I was at early days at Facebook AI Research in Menlo Park, the office, right? And at that time, and that was honestly one of my reason to do PhD is there was like kind of this like very exclusive culture. So if you like, you don't have a PhD, you're not gonna be a research scientist. And I didn't like that too much. So that's why I decided to do PhD later on, but turns out through my experience, the best, research scientists, also very good engineers, very good engineers. And we've noticed through DeepMind and OpenAI, the companies that made the most progress over the last six years or five years were companies where they're extremely good engineers.
8
(someone): So that's something I'm pretty confident about. And I think asking follow-up questions will become more of the norm. The number of queries in Perplexity that go to at least one follow-up has been increasing ever since we released the chat UI, so that will keep going up. People will get used to this sort of experience where they're encouraged to ask one more question, and they're okay with not getting the answer right away. So that'll happen. The third thing is like actions. People will be more deliberate in what they search for and try to execute an action on top of the search results they consumed. So that's definitely likely to happen. It's already happening. If you go to Google and book a flight, you just like flight from SF to Seattle, you just directly click on the book button. So that's going to happen more frequently in the chat UI too. And this will become an assistant more than just a search or answer engine. And I also think the fourth thing is there will be much fewer traffic to the actual content site. Very few links need to be consumed. In Perplexity, in fact, we don't even cite more than five links. It's a deliberate decision. A lot of people ask this. Can you add 10 links or 20 links? Can you just show all the links together? You put the summary at the top, but you also put all the 10 links, the usual, I want both.
9
(someone): Yeah, you could, in theory, scale up the training, scale up the model capacity, and you don't need planning then. You run this reinforcement learning algorithm for a really long time. You have this really big network and it will eventually learn, in theory at least, how to beat expert humans and go. But there's a question of like, okay, well how much would you have to scale it up? How much would you have to scale up this raw neural net, the capacity and the training in order to match the performance that it achieves with Monte Carlo Research? And if you crunch the numbers, it ends up being 100,000x. Now, these models are already costing like $50 million. Clearly, you're not going to be able to scale them by 100,000x. And so then there's a question of like, okay, well, what do you do instead? And the answer in AlphaGo is like, well, instead of having all that computation be during training, You also have it spend 30 seconds to figure out what move to make next when it's actually playing the game. That shifts the cost burden from having to pre-compute everything to then being able to think on the fly. That's why I think that avenue seems like the piece that's missing.
Elad Gil: A really random question, because if you look at the human brain, you have these very specialized modules with very specific functions, right? You have the visual cortex for visual processing, you have like different things for emotion in terms of specific modules. Like there's specific parts of the brain that if you ablate, you remove certain emotive or other capabilities, right?
10
(someone): So come work on search. So it's not like you have to go, there are few people they hired from information retrieval or search background, but most of the celebrity researchers that they have right now are just people who wanted to get into that space for the first time.
Sarah Guo: It's funny because now the, now the desired pedigree isn't the PhD from whatever, you know, academic lab because the industry labs filled with physicists and engineers and people not necessarily from the domain made the most progress. And so if you want somebody who's worked on, I hear the argument, you don't actually need it. Like you want people who are really smart and motivated, but a lot of companies are looking for somebody who has, you know, experience with X billion parameter training runs, and those people come from now OpenAI and DeepMind and such. So it's quite funny very quickly how the pedigree has changed. What's been the area of steepest learning curve with both of you coming from these research engineering backgrounds?
(someone): For me, it's like how to run a company. That's mostly what I'm doing here. I'm not doing much core engineering. So that was not easy at all, but I've had the opportunity to learn from many good people, including you out here. So if at all, there's an easy way to do it, it's like getting advice, rapid advice from people. The other thing is also like when you're making a mistake, like being super brutally honest with yourself and listening to feedback and quickly course correcting it.
11
(someone): Search is tremendously a distribution game as much as a technology game.
Elad Gil: Yeah, I think one of the really impressive things about Perplexity is the rate of iteration. And to your point, you've gone through things like text-to-SQL, Copilot for the next-gen data stack, and I've always been impressed by how rapidly You've just been able to point in a direction, iterate really fast, prototype something, see if it's working, and then move on to the next thing. And to your point, you always had search in the back of your mind. I remember even as you're prototyping these things, you were talking about indexing aspects of Twitter or other sort of data feeds and then providing search on top of them. How did you end up building a team that can iterate that rapidly, as well as a culture of fast iteration, like other specific things that you all do as a team to help reinforce that?
(someone): Yeah, I'll take the first part of this question and then also let Dennis answer this because he's a big part of why this is happening. We both are basically from an academic background. So in general, the culture in academia is to, you have hundreds of ideas and you just need to try them out pretty quickly, run a lot of experiments really quickly and get the results and iterate. So we come from that background, both of us. And so that's not really new to us. It's just that, When it comes to trying out new products, it's not just a result you get from running an experiment.
12
(someone): So when you write your first paper, the thing your advisors teach you is you only have to write things that you can actually cite. Anything else that you write in the paper is your opinion, not a scientific fact. And so that sort of stuck with us pretty closely. And that's sort of why we did the first version, where it's citation-powered search. So for factual accuracy, our first step towards that was making sure you can only say stuff that you can cite. This is a pretty subtle point here. It's not just that we want to retrofit citations into a chatbot. That's not what perplexity is. In fact, it's more like a citation-first service. It'll never say anything that it cannot say. So if people have tried to play with it, ask if it's like chatGPT, where like, tell me who are you, or things like that. And even for those questions, it would still go to a search engine, pull up stuff, and come back with an answer. It's not going to say, I'm perplexity, I'm like a bot design, how are you doing, or something like that. This is because of our obsession about factual accuracy. Even if it doesn't have a personality or a character in it, we don't care. We only care about the other thing, which is obsession about truth and accuracy.
13
(someone): making very sure that we want to work with this person. So I think that that's been helpful.
Sarah Guo: Is there anywhere you've been surprised as you've made hires where you feel like, you know, the signal was wrong or the trial surprised you?
(someone): Yeah, there has been a few exceptions. Obviously, I think that's the part of it. But I feel like it's definitely At least in comparing to my prior experience, it's definitely a smaller chance where you're going to get surprised. Normal interviews that big companies run, you have like four or five meetings, like 45 minutes each, and then you basically make a decision after that. Sometimes it works, sometimes it doesn't. But I feel like the way we do things, it gives us much more confidence that we're going to make the right decision. And I think it's obviously very useful for us to get the signal, but I also feel like for candidates, it also makes sense to, it's useful to make better decisions as well, right? So they can understand, do they want to work on things like we do? Do they want to work with the base that we do? I think like one important thing for us and like many candidates sort of like, don't want to maybe do this is the energy we're putting into perplexity. It's kind of like work-life balance, maybe not the ideal, but that's the only way to sort of like beat competition and sort of like iterate very fast and do great things. So that's why we kind of like need to have this alignment at the beginning.
14
(someone): So we are likely to stay away from that pathway, but we don't know yet. And the final piece is like, if perplexity becomes like something that a lot of people want to use for their own internal data, their links or their bookmarks or their company. And if we can make it easy for them to build that and become more like a platform which everybody can use, then that's likely to lead to monetization too. So there are like so many different rollouts possible here that we don't know yet which one we'll actually go for. But in the short term, we are more focused on growing the product, the users, the traffic. In fact, improving the whole experience like this, I feel like Google and Microsoft will pretty much do the same thing we have right now, as well as us. And as we discussed in the first part of the podcast, we need to operate with more velocity, ship more things, and stay ahead of them in terms of the core value of the product itself.
Elad Gil: Yeah, that makes a ton of sense. And I think, to the point before, there's probably lots of paths to monetize once you have a lot of usage. And so it's more just figuring out what's native and natural to the product. You mentioned earlier that one of the things you learned from academia is fast iteration. And I feel like most academics I've worked with are almost the opposite. I actually feel like there's a lot of pre-planning and a lot of discussion and there's less of a bias to action. And so I've been very impressed
15
(someone): But the other thing that we're learning, though, from this is it does seem that the type of data you put in and the kind of fine tuning, essentially, it's like weighing the data has a lot of impact. So this instruction tuning stuff is like, really, we have only a few examples of instruction following, but Since we do fine tune the model, it's as if we put a very high weight on it and had lots of examples of that in our training set. I think it's still an open question. For example, if you made a lot of examples of logical puzzles, like you just generate some problems and solutions, would you get a model that's better at logical reasoning? There are other things you can do. I also think a big problem with current models, I think I hinted at this before, is we're just calling them to generate one token at a time. So, for example, you've probably seen this chain of thought reasoning thing. If you ask a model a math problem, and it just tries to answer how many sheep were there, it might say seven or something, and then it tries to make up the explanation, and it's like, wrong. But if you tell it, do the explanation first, think step by step, and then answer, it's more likely to be right. But you can imagine other versions of that, like if it had a scratch pad, if it had a way to backtrack to say, you know, this is kind of a dead end, it might become better. So I think stuff like that, that's kind of around the model, it's still an AI system, but it's not just one giant DNN can further improve its abilities.
16
(someone): We bring in data from two quite different sources. One is data from human individuals, where we don't get to do experiments, but we have experiments of nature. Each of us is an experiment of nature, where nature has modulated our genetics into different types of activity levels or of individual genes, where some of them behave this way and others behave that way. We can look at that mapping from genotype to phenotype as a surrogate of what a therapeutic intervention would do in those humans. So that's great, but it limits you to those experiments of nature. And the experiments of nature are not necessarily the same as what a therapeutic intervention would do. And so what we've done in parallel is to create our own data in our own wet lab, where we make interventions in cellular systems and measure the phenotypic consequences there, again, using very large-scale data with very high-content modalities. The machine learning is actually used, I would say, in three different ways. One is to interrogate the phenotypic consequences of genetic variation in a human, looking at very high-content data like imaging, where we know machine learning works really well, like different types of omic modalities, transcriptomics, proteomics, and so on, to really understand that mapping between genetics and phenotype. We similarly look at the mapping between genetic interventions, which in this case we get to actually direct ourselves by doing genome editing of cells, and say what is the phenotypic consequences of modulating this gene in this cell background and reading out a large high content data to really understand how cell state responds to these interventions. And so the machine learning is used on each of those two separately and then also to bring them together so that you can kind of
17
Elad Gil: Perplexity is a search engine that provides answers to questions in a conversational way and hints at what the future of search might look like. Arvind Srinivas is co-founder and CEO of Perplexity. He's a former research scientist at OpenAI and completed his PhD in computer science at UC Berkeley. Dennis Jarretz is a co-founder and Perplexity CTO. He has a background in machine learning, having worked as a research scientist at Facebook AI, and also as a machine learning engineer at Quora. Arvind and Dennis, welcome to the podcast. Thank you for having us here. Thanks. Thanks so much for joining. So the two of you alongside Andy Konwinski created Perplexity around August or so of 2022. Arvind, do you want to give us a little bit of a sense of why you started this company and what the core thesis of Perplexity is?
(someone): Yeah, sure. Actually, Elad, you're our first ever investor who offered to invest in us. So those were the founding days. In fact, I remember the first ever idea we talked about in Noi Valley, where we're sitting in the open space opposite Martha. And I was telling you, oh, it'd be cool to have a visual search engine. The only way to disrupt Google was to not do text-based search, but to actually do it from camera pixels. And you were like, uh, This is not going to work.
Unknown error occured.