Transcript of Launching the fastest AI inference solution with Cerebras Systems CEO Andrew Feldman

Introduction you're listening to gradient descent a show about making machine learning work in the real world and I'm your host Lucas bwal you know what we've just announced uh is inference and we are now the world's fastest at inference uh we are at the the most accurate level and we're the cheapest today I'm talking with Andrew Feldman the CEO of cerebra system cerebrus is a super cool company that makes giant wafer siiz chips that work really well for ML workloads it's a tough business and a tough company and today they have a really exciting announcement that he came back on the podcast to share with us I hope you enjoy this conversation all right Andrew thanks for coming back for a second um interview it's good to be back Lucas good to have you I was actually going over the notes from the last one and I feel like I really got in all my electrical engineering questions and I I appreciate you bearing with me on that one so I'm going to I'm going to restrain my technical I know that you you're an electrical engineer just hiding as a software engineer I know that's exactly what's going on I want to be electrical engineer and I can't believe that your your degree is in like political science or something I don't know how you got so technical no it's uh it was a mistake uh both to to study political science and not to do more Engineering in school well it seems like you made up for that in your uh career clearly yeah yeah I think uh 30 years of building chips and uh uh they giv me an honorary degree just don't ask me to to solve differential equations that's exactly right well don't ask me that either um but uh but to catch me up so what's what's happened in the last year it seems like it's been a big year for you guys it's been a great year for us you know we've we've built uh tens of exif flops of compute uh We've stood up some of the largest uh AI training clusters in the world um we have uh trained models that are in production that have solved fundamental problems in uh drug design uh we have uh still uh the the premier uh Arabic language uh Arabic English chat uh model big llm um we've papers in which we we have furthered the the state-of-the-art in elect simulation and seismic analysis so it's been an extraordinary run for us that's fantastic um and you know I really want people to go back to the last interview for more technical details but you know the the Highlight for me is always with you that you know you make these gigantic chips um but but where I guess where what are like the what's the strength that you've leaned on to do well in so many different domains like I I I like what are the the axes of the chip market and and like where what's your like your sweet spot where like your chips really Excel that's a really thoughtful question Lucas I I think real softball Andre no no no I I think people who tell you their their stuff is good at everything are almost always bsing right I think that um when you choose to do a an architecture you make some tradeoffs you're going to be good at some things and less good at other things on the training side if you uh fit in one GPU right so that's a small model uh you you should probably use a GPU um it's a really good machine if you fit in it but as the training work gets more complicated and say you want to do work on 100 gpus a th000 gpus or 10,000 gpus the actual work of taking your model and distributing it across this compute landscape is punishing and uh that's where we Excel that's where uh the fact that we have such a large chip makes the distribution of work trivial and once the work is distributed the actual training IS blazic F now Cerebras Systems' Latest Product Announcement that's on the training side and you know what we've just announced uh is inference and we are now the world's fastest at inference uh we are at the the most accurate level and we're the cheapest and so you have this sort of um trifect right fastest top accuracy and lowest price and not by little bit right where if you compare us to what you can get from an Nvidia h100 on azour we're 20 times faster at it and is this is this for what what models are we talking about here let's just begin at uh llama 3.1 8B and 70b and so you know these are the the two most popular models and we're delivering speeds that are impossible to achieve with GPU so if you care about latency right how fast your your answer is delivered so if you've got an agentic model if you've got um a concatenation of models where the output of of one model becomes the input of the next you care desperately about the amount of time it takes to generate the answer but those models actually aren't aren't that big right so no not that big they're not that big but they are uh inference generative AI inference is is a very hard compute problem and it's hard we'll dive down into a little bit of the electrical engineering you love so much and that undoubtedly B bores your your listeners to tears um uh each token that is generated has to go through all the parameters so all the layers and all the parameters and so to generate one token you you have to move all the parameters from memory onto the compute generate a token you then have to do that again for the next token and again for the next token so if you are running a even an 8 billion parameter model and the parameters are 16bit uh parameters you're moving huge amounts of information data from your chip from your memory to your compute over this tiny little pipe memory bad with and and that's that's the bottleneck and so you're zipping this back and forth and because we built a big chip and because all our memories on the chip we don't have that problem and so it's blindingly fast I think for something like the small models you're describing at 8B it's order sort of about 1,750 tokens per second right now which is you know more than 20 app faster than the hyperscalers on age 100s and but but aren't the parameters the same at every step like why why do the the memory have to transfer the the Valu of the parameters back and forth because you have to re redo uh the uh the analysis the right you've got all your prompt and you add a token and then you need to uh redo it with the next token and the next token each time you go through this full cycle of uh moving memory Mo moving the the parameters from the uh from the memory to the compute and this process requires this sort of toggling back and forth and remember memory bandwidth is one of the weaknesses of the gpus and so this exact uh Act of of generative AI is puts pressure on one of the the weakest parts of of the architecture do you have the the same accuracy and throughput trade-off with these models like is it the whole curve that that you can let people pick their their points on sure we we we are running today at at 16 bit at all stages right so that's the the top level of accuracy sorry I meant my bad I meant throughput latency uh trade-off oh right so the the GPU allows you to uh add users right but to go slower for every user right right that's a tradeoff you can make um those who go really fast have a very small number of users so you know some of the guys like together or fireworks or or sort of leaders OCT go really fast but they can't stack many users on at a time and we don't have that tradeoff we're blisteringly vast and we can stack many users simultaneously I see and so that's a very attractive component and that's what enables us to drive down the cost um because we can be fast with many users simultaneously on the equipment so you're now running your own service that that runs any model or or specific models how how does that work sure so we we this is uh our inference cloud and today uh we're running llama 3 .1 8B and 70b and over the next uh weeks and months this will expand uh to a full set of of leading models uh on the service that's very cool um how long has this been in the works oh uh about nine months I think one of the most interesting things that we have seen this year is the movement of uh of sort of AI from Hobby to production right and from cool uh thing to Tinker with to something you can try and and build business applications on top of and I know you guys have seen that a away and biases and we've seen it um across the board what's interesting is that the amount of AI inference compute you need is a function of the number of users and how often they use it and the size of the model and so a as more people use applications that embed AI in them as they use them more often the demand for inference compute goes to the roof and I you know estimates are today it's about 40% of of the compute the generative AI uh market and and expect to to to continue to to to grow faster than the training Market though the training Market's continuing to boo we've seen um you know we've obviously noticed you know an explosion of um you know companies doing inference and it does make sense that you know inference should be more of the compute load than than trading otherwise these things seem just unbelievably wasteful but it does seem like inference is a is kind of a challenging business because the comparison is so um is so Stark right between you know through put inaccuracy on on a given model and I feel like we've seen this real boom and bust cycle with um you know inference providers where you know one you know one company puts out an amazing result and then you know a few weeks later another company puts out an amazing result and it doesn't seem like there's a ton of lock in when the API is literally like a text um query I'm I'm curious like how do you even like think about how many chips to assign to something like this how do you how do you forecast demand um for for a launch like this well first your observation is right is that if I I if um you know if everybody's using the same Hardware um then what you're doing is you're fighting for software optimizations on that hardware and that that's tough battle and that that's why we went to a fundamentally that use fundamentally different architecture right I mean where're uh all the gpus npus are are are based on hbm they're right off chip memory they are are uh uh they all face the same constraints The Challenges of AI Inference we're a waiver scale chip or an all SRAM chip um and so we we we attack this problem and Achieve solutions that uh and performance that that they can't achieve no matter how much money or or gpus or time or effort um because we have 7,000 times more memory bandwidth and memory bandwidth is the bity constraint in this workload so you know you can't you can't go faster than the speed of light they can't get more packets between memory and compute than the memory band with the louds and so that that you got to attack it architecturally is the answer to your question if you want meaningful and durable Advantage Lucas you got to have a better architecture that's number one I think number two you said how do you prepare you make an educated guest I mean I I think people would like CEO to always have the answer and to to have a an illusion that um that much of this isn't um or that they're good equations to tell you how much to prepare for um you sit with your team you talk to your customers you look at what others have done um and then you make an educated guess and that's how you start and then you iterate quickly right um you know in many optimization problems you begin with a seed and you iterate off the seed and you begin with an educ guess and you're ready to to respond extremely quickly to to to to when you get it wrong okay but um I think it would be hard to iterate quickly if the iteration requires manufacturing a ton more um T more of your Hardware Step One is don't don't don't have this be your first rodeo uh that's number one is you know you have a team who's been delivering in world class chip for for 30 years so you have a plan and um I think you also have uh capacity in uh that you've allocated for training so you have some fungibility right um and so uh I think uh you also have uh uh you've notified your your manufacturing Partners long time in advance that we may be seeking to pull forward equipment and so I I think you know I I made it sound like you you sort of guess but in the business of having to forecast things are very hard to forecast you try and manage all the unknowns right you work with your manufacturing Partners to to to to be able to pull forward equipment you work with your customers to say you know uh I've got a new launch here we'll get you on this service if you need a huge amount more here's a date you can have it and you you you set that expectation in advance uh but uh just about everybody gets it wrong at the start right I mean there's no uh the the forecasting of unknowable things is pretty hard totally totally and it's the same in in every business we just sometimes we pretend that we're better at it than we are and that there's some science Beyond doing all the thoughtful things I described and and where are these um where are your boxes going they're kind of in unusual shape aren't they like do you have to build your own data centers or how do you how do you do that yeah that that's one that might be a bridge too far I I've not undertaken a physical construction of of a data center I I like to stay at five nomers or smaller the things I build um our our our chips our wafer scale chips fit into a chassis we design the chassis fits in a standard uh data center rack and so you know we have hundreds of machines here in uh Sunnyville we have machines in Stockton California uh we have machines being stood up as we speak uh in Dallas Texas we have uh contracted for power and space in Minneapolis Minnesota um we have partners that that have uh Power and space and our equipment so I I think you know you that that's what you do you you rent power and space by the megawatt and build out a a big a big cluster if your um if your throughput is higher or your your latency is lower um the same accuracy by by an order of magnitude why would you also price it lower than comperable Solutions oh I I think uh we we intend to crush this Market I mean we intend to to go at this Market with uh uh force and uh uh we we we think that uh the following obtains you know we you and I both remember when the internet was slow and when uh Netflix was was mailing you DVDs right um the the internet got fast and they started streaming and the internet got really fast and then everybody started making little movies right that when you make infrastructure fast um you enable multi-billion dollar markets to sit up on top of it and I Architectural Innovations in Wafer-Scale Chips think right now we're in the dialup era of of inference it is slow and when you when you're waiting for your your your response you can almost hear that that modem noise right that we used to get when the modems were were hooking up and I I think our our desire here is to make this Market bigger MH and to show people that there are all sorts of things that that that aren't obtainable when inference is slow that are uh are are dazzlingly interesting here's a very simple one for example um if you ask uh llama 70b to do a math problem um often it will get it wrong if you ask instead a harder question if you say walk me through the logic you use to solve this problem now it will take more tokens it will walk you through and it will frequently get it right now imagine you can do this the latter task in the time it used to take to do the former task yeah now you got a better answer and so here's a a very very simple way in which speed turns into higher quality answers totally now right and many of us now are are are using uh our own uh input prompts iteratively right so we ask we ask one of these models question and then we use the response as we ask it to improve the response right now uh you write a little little bit of code for that and now if you're 10 times faster that happens in you get 10 iterations to one 20 times faster so each time you're getting better quality results and that's what we're trying to deliver to the developing community and that that's sort of the the vision you know we we see a lot of customers that um they increasingly want to build complicated guard rails that involve calling llms in parallel so I could see I could see that really being um unlocking usually important yeah hugely important um you you want to use them for guard rails you you have a security post Ure um you want to do a check uh to manage hallucinations um there are all sorts of of simple uh benefits in the existing mode before you even go to Breaking Up The Prompt into hundreds of of of mini prompts and kicking out into an agentic world to reconstruct uh a response do you um I mean it's an interesting space that I don't you know know a ton about like I feel like you've taken this real posture of kind of like David versus Goliath with Nvidia and really I mean you've been one of the few people really kind of willing to take shots at at Jensen you know both to journalists and I saw it to his face at a at a conference where you're both at um and do you feel like Nvidia is really your target or do you worry about any I think there's a whole bunch of companies with kind of different takes on chips that are finding the inference Market lucrative and finding various axes to compete on are you do do you worry about those two or is it really kind of Invidia as the far as you know I I have taken some some shots at Invidia they are the Goliath I think also uh as I've told you I think they're three great CEOs of the last decade in our industry and there's Hawk tan at broadcom there's Lisa Sue at AMD and there's Jens right Real-World Applications of AI Inference what what those companies have done is extraordinary and if 10 years ago you had a choice you'd want to be uh their investor you want to be in their stocks all of them and so uh be because I'm going after uh and competing uh doesn't mean I don't have enormous respect for for each of them and think they're they're sort of achieved legendary status now that doesn't mean I I don't think I've built a better product we don't think we built better product for this work it just means that um you know we are David against Goliath and uh they have approximately all the mhm and and so you know talking and fighting with with other startups uh I wish them well too right the the the challenge is to to to beat the h100 and then beat whatever's after that or whatever's after that and uh I I think spending time uh duking it out over crumbs isn't the right strategy for startups I I think our our the right strategy is to to go where the Market's big to go after the uh the the the the biggest players um and do things that uh that they do better and not a little bit better but orders of magnitude better so so you also I guess are you saying that AMD also feels like a Target because I think from a from an ml perspective they kind of feel like underdogs I think to to most the audience under I think I I think I think look they're going to have uh you know from from what we see in the market they're going to have great year they're going to sell a ton of their parts um but I think they are the underdog there's just a bigger Underdog than we are right and when you're an underdog size is not always an advantage totally right um you know my old soccer coach used to say Andrew we're small but we're slow right and um the the there is is the Hope is that what you get by being smaller is is a laity of speed is is ability to adjust quickly is um ability to hire the the absolute best people who just want to do engineering and don't want to deal with a collection of other things that come with with being a big company but um look we we're a big customer of of of AMD uh we use their uh their epic processors and their Genoa processors in our training solution um we have tremendous respect for for the work they're doing and sometimes you compete with your friends too man you're a little less spicy on uh on on my interview than uh some other interviews I I thought for you maybe I'm not as good as an interviewer maybe you're seeing learning I don't know maybe you're witnessing a learning algorithm them happening here no I I think um look I Lisa bought bought my last company uh C micro and I got a chance to work with her and um she's extraordinary and we have the utmost respect for her and for Mark paper master and and and Victor and the team they've got over there um but I I think when you're a professional David and that that's what I am and that that's what my career has been um you want to compete against the best right you want to get to the big leagues you want to face the the best fast ball and the best curveball and and so thinking that uh you know those are the the the best CEOs of the business and H how can we beat them how can we right that that that that seems to be what what's fun about the the business right H how can we deliver to uh customers using the advantages that we have uh unability to do extraordinary engineering unability to be Fearless to to solve problems that that that others struggle with and you know the the the recent announcements around uh nvidia's delays I mean these are problems we solved years ago what do you mean we attacked them years ago we solved them years ago and um that that gives just a a a sense of um sort of what you have to do to to to to be a a successful Challenger well so sorry I'm not totally up on on on the ship world like you're not up rumors on on the chip Speed vs. Accuracy: Striking the Balance space uh Lucas why not I mean come on not resarch don't you have a team of researcher Lucas who are uh preparing all this for you um um yeah they had some packaging challenges that have caused some serious delays in their next Generation parts and um the problems there have to do with uh in our estimate a set of problems around coefficient of thermal expansion that how you put chips on on uh interposers and the these were things that that we thought about and problems we Sol you know half a decade ago and so uh I I think uh you know that when you see your uh when you see work you've done paid off uh uh that that's uh that's rewarding um well okay so I'll say from my perspective one thing that I've noticed in the um you know in the past month or so you know it's been sort of like you know rumors that maybe chip prices are coming down and I have no idea but I have had a lot more Reach Out than typical of random people trying to sell me h100s and and things like that do you think there's any like truth to that or um like could there be like a contraction in in in demand um I I think uh that that that Brokers have a uh a flood of of parts right and they're the ones who are reaching out to software guys who don't buy Hardware offering parts right that that that's not a thoughtful sales effort um but right especially the CEO of a software company it's like uh uh that means uh an allocation for uh usually that had been scheduled for somebody blow up and there's some thousands of of chips that are at the market I think if you talk to the hyperscalers uh they will tell you they don't have enough compute and they they don't have enough space they don't have enough power to to to to to run it all and so I I think um uh at the guys who are consuming Big Blocks there remains a voracious demand uh it Partners like like like our partner in uh the UAE uh who who's building a sovereign Cloud as we see Sovereign clouds emerging around the world uh in the Middle East and in Europe uh uh our our customers like alive alfha um customers in Singapore and and and and elsewhere there remains this huge sucking sound for AI compute and I I think uh we're just seeing it sort of move into uh sort of the big financial institutions yeah right and they're they're you know at first senior guys bang their Fest we need Ai and then there's a bunch of of of VPS directors running around and that was last year and and now it's like oh it it can save a ton of money here and we can create value with it here and here's how we have to implement it and this is what we need to buy and the what what was discussion in PowerPoint last year are projects and uh uh value this year and I think that's the the the we'll see that trend for the next two or three years I like your positivity on this Market Andrew I'm not sure that I um you know share it I don't know if I share well I I think normally I'm software tools guy that the software tools Market's going to fall apart it's a bad time to be in software in AI tools and my I my honest take is I feel a little nervous about um like the the I think the financial services Market you know we are seeing um a lot of increased demand for sure I feel like the ROI on a lot of these projects makes me a little nervous right now like I I I think there I guess what I I would say from my perspective it seem seems like in a lot of places that we're looking there's there's a lot of projects where you go hm like is it is there really are why here and are are projects just started getting funded you know because they have an AI um you know component I really don't want to call any customers that might be listening to this podcast but I think we could all agree that this that this happens from time to time no look there's projects just like there's some startups where you dusted off a business plan you you you you ble took it out of your drawer you dusted it off and you added AI in 11 places and now it's an AI plan I I think those are destined not to be successful but um I think the following I think the companies that begin doing thoughtful projects um will build a muscle of deploying and using AI on projects and I I think there is a muscle there right this isn't one of those things where uh and it's probably the same in in South software too is if you just throw it up and use it it's not nearly as efficient as if you right you look at something like net Suite or all these tools that we use the the the range of value you can get from it is ranges from oh a good Ledger count Overcoming Latency Issues accountant could have done that in a leatherbound volume right to all the way to the other end does this is a tool that really helps us run a build business very differently than we otherwise would right and that is not the tool that's how the Tool's deployed totally right and I'm it's you know you you've seen it sometimes you you your customer success team is working with your customers saying guys if you do it this way this software can really help you differently right and um it's the same with AI that there going to be some projects that that uh that fail um but that the guys who it or step back and say whoa next time we're going to do this and this and this and when they go the next time it's going to going to crush it and so I I think being on the sideline is waiting until everything is sort of everything is smooth and perfect in a rapidly changing environment with huge returns to learning is a really bad strategy and and if there are returns to learning you need to be doing projects and they don't have to be wild expensive projects they don't have to be but you have to be engaged with with the technology you have to be trying to solve problems with the technology and with your data um so that that's the observation and if if you haven't thought even if by doing your AI model your AI model the first one fails doesn't produce value but instead what you learn is your data is a mess right which is just about what everybody learns when they start to do real AI is that their data is much less clean than they thought um even if you learn that and you immediately start to clean and organize and Marshal resources to make your data better uh that's a win because next time whe whether it's AI or something else your data will be in much much better shape and so I I think th those who do will be ahead for sure sure in this space is that is that in line with what you've seen yeah I mean I think that's the positive spin on it I I think inevitably right and the I think it's a good point you need to be out there you need to be learning and need to be working with customers and I agree long term there's a ton of um value here I guess we we've both gone through like boom and bus cycles and it's incredibly hard to to time these things but I think there's no doubt there's a fair amount of um you know excitement and and a lot of it's warranted um but I think I think a lot of companies right now are kind of struggling to turn like very evocative technology into like something that's like you know clearly working on a repeatable basis for um for use cases they care about and I I think it's most extreme in my opinion right now in financial services where there's like there's so many use cases that really should work and each one is kind of tricky to um you to get this right you know there's a a famous paper by an economist called uh Paul David uh title of something like the Dynamo and something or other um he's an economic historian and he he studied the adoption of electricity in the manufacturing sector and he did this because in the mid 80s somebody asked this question which was as follows it says we see computers everywhere but not in the productivity statistics right and so the question was everybody's got a computer on their desk we see them everywhere but we can't find gains in productivity in any in any data and what he showed was sort of what has now become common sort of received wisdom was that at first the technology is brought in to do what's already being done a little bit better so if you think of the computer you know it it it printed instead of typewrite it provided word process have provided the services of general ledger and all these other things um but we were pretty good at those things before right um and so the games were modest but when we reorganized our inputs when we went to the cloud we unlocked an entirely new way for software to be used and you saw these massive jumps in productivity and I think there's some truth to that too that as we replace things we're already pretty good at with AI we'll reduce failures we will get a little bit better and the productivity statistics will be disappointing right they will they'll be disappointing but when we reorganize uh more fundamentally and really use what is at the essence of AI I think you'll see massive productivity jumps and those will be captured in lower costs and new services and cool cool uh applications to to to make people's lives better and I I think that's sort of the the path you know if if at first we we use a IAL assistant just to double check doctor's work so that doctors have a failure rate not unlike other professionals right they have um if we use it to check work that's great that we have fewer failures if we can put a flow together that catches things earlier um now we have a better chance of of of uh healing or bringing uh the goal of Health to to to the customer so I think that that's The Future of AI in Production Environments sort of the way I think about it that that you're right a lot of the the ordinary projects uh will produce ordinary results at first um but that that's the way technolog has been adopted historically right that's good historical reference I I'm excited love it yeah yeah no the paper is really interesting he he showed that you know at first they brought in power and they just used as a backup for belt driven systems and it wasn't until after World War II that they said holy holy cow we can reorganize the shop floor now that we have power the machines are more reliable and you saw a huge spike in productivity and I think it's sort of the same with the things we do is that uh a little better HR right a little better reading of resumés I I is a reduction in cost but you're not going to get this massive jump in in efficiency um but uh if we rethink the way uh our organizations operate um if we you know then you can get a a major jump okay so taking I I think that's sort of what we've seen in the adoption of big technology I mean um well pulling out the the economics Political Science Background did I did that specifically because you ridiculed my undergraduate education start here I mean come on we we had to read books at Stanford in political science I I know that your engineering degree did have a lot of reading of books I think they forced me to take a Humanities class I vaguely remember it European history from 1840 to World War I right that's right that's good [Laughter] stuff did you um okay but I I do want to ask you this so sure when um I feel like you know you're you're business is really or the inference business at least is a real bet on um kind of Open Source software which is kind an interesting question I've kind of noticed that every CEO betting on open source software seems to really believe in open source software definitely winning yeah for my perspective it seems a little less clear like what wins like how what is your feel are you are you sure that open source model will be the future I'm not sure um I think uh we we today have customers that are closed source and are running uh inference on our on on our equipment um we we don't uh depend on open source I I think there is a a battle uh and it's unclear who will win and it's unclear who's won in other open source closed Source battles right um I I think the number of large Enterprises that run true open- source osses as opposed to uh a supported version of the open source OS right there there are lots of places you can look in other areas open source crushed it um but I I think uh it is not clear it is not obvious um but I I think it is healthy for the ecosystem and it's good for everybody that there is that this unbelievable sort of Cambrian explosion of innovation and that it's not just one or two companies and it's not just those you expect right I mean the leading the different way to say it Luke is is the leading clo Source company right was a startup nobody heard of seven years ago right and um that's uh that's an amazing statement right that that the the companies that had been around 40 or 50 or 60 years um aren't the leaders in Competing with Industry Giants model building um uh that uh you know Facebook has dedicated itself to open source and and that's uh producing unbelievably good models others of teams of 20 or 30 in France are producing extraordinary models right I mean you you you look around and it's there are lots of paths to do really interesting work here and and so I I'm not sure uh what the strategies or whether the strategy will be some sort of hybrid where you begin on top of llama and you you you take some do some additional work there or you pair open source with some closed source course um uh uh to do your prefetch or some portion of the work um or you write glue software that links together hundreds of smaller open- Source models and that glue software is still extremely complicated and provides a ton of value right and and so all of these are I mean one of the fun Parts about this industry is that it's it's not storage it's not sort of stepping along step byep step it's Dynamic it's changing um there are lots of ways to attack the opportunity and you're seeing many different strategies play out and that's really fun yeah it's sort of e even even when there are other people's strategies and you said back and you say this company wow they're they're they're going after giant models with a collection of small models all right that that's an interesting thing to watch right uh A Confederacy of small models um these other guys are some other guys are just we've got advantages in money let's spend as much as we can to buy as much computer as we can and we believe that whoever has the biggest compute will win that's a different strategy and watching all of these unfold is is like like a video game it's really fun which company would you describe as that strategy if we have more money so we're just going to buy the most compute who who are you thinking of there I I think I think open AI has stated that they believe um that having the the biggest compute with the the one of the foundational elements of winning I think Elon strategy of of trying to set up some of the biggest training clusters on Earth is is that strategy um it it may well be the right one I don't know we have yet to see all right getting some thoughtful takes but not some hot takes um let me uh look thank you all all this media coaching is working it's inducing ridicule that God damn it um should we talk about the LI bit so come on no no I got I got I got a couple more that I really want to get to so um yeah one thing we didn't talk about last time is um is kind of AI applied to um actually manufacturing chips and we've been hearing a lot about that like a bunch of B have come on and and talked about that like how how relevant is like recent advances in AI to um your actual chit making is that part of what's made you successful or is that sort of separate so uh the Eva tools that are fundamental to uh chip design have used ml uh for a long time they've used uh all sorts of statistical methodologies they've used trees they've used all all sorts of optimization Theory uh of methodologies for a long time now they haven't used what what is sexy now in AI uh the generative models and and those um I think it is still very early and uh as is often the case when uh a new technology is forced to compete with a very mature technology uh the guys are pretty darn good at designing transistors and uh circuits right now right I mean circuit design is pretty pretty Advanced now I I don't think we will see machines designing uh uh chips or AI designing chips that are better than than people for three to five years but I I think there are all sorts of parts of of every stage of the chip design that that could be improved and time brung out right this isn't replacing insight there are repetitive tasks in uh place and Route there are repetitive tasks in timing closure that that uh machines ought to be better than people at and and so I I think uh what will happen is that that that that the AI will be used around uh the the core Insight over time and that will make chip design lower cost and faster and then it will nibble its way in over time into some of the the hardest of problems which is Con converting the uh the desire of the chip architect into Open Source vs. Closed Source in AI Development logic but it sounds like you inside your company aren't aren't doing a lot of this no we're not your bet is like we're gonna make a giant chip and that's the that's the well I I think something different I think um you need a huge amount of data across many many chips to do this and the people who have that data are tsmc and Samsung or they are the Eda toolmakers right you want to play where you can win and um you know what you'd like to have is inside over 500 or thousand ship designs right to do real work not to do marketing everybody can say oh I'm using ML and AI here and there but right what you like is um the the database describing the logic of maybe every chip ever made with that I think a lot of it right but who has that well uh nobody now AMD has what they've made and and Nvidia has what theyve made and Intel has what they've made but um I think the assembly and this is a a general problem for where data is not easy to aggregate um I think that delays the uh use of data intensive Technologies like like AI totally now it doesn't delay the marketing and it doesn't delay some of the the the wrapper right but but the core Insight we we've got some time uh before it's better than uh you know Engineers with 30 30 or 40 Years of Chip design experience awesome well Andrew thanks so much um how's your business Locus while we give you a two-minute commercial it's uh would have been better well I'll tell you actually we should totally integrate our there's a really natural integration that we should do um with you before we we launch this like we you know we now have like a LM evaluation observability tools so we could oh cool it's really popular so it's the perfect example already a customer yours yeah a and uh I I think uh this is a perfect example of what I was saying earlier was that um AI will behave just like this where uh people will will sort of learn that they're not using it as well as they could use it right I mean I I think that's exactly right and that's no different than than other cool tools and so I'm I'm sure there's a ton of stuff in your uh in your in your software stack that that we haven't touched yet and that could be valuable to it totally um but yeah we should we should actually ship this with we can do a we have a really nice kind of evaluation um system that we could run on you you know against like other um you know other stuff and sort of show people like you know third party assessment of like throughput and and accuracy and speed on a bunch of benchmarks and we can also set it up so that you know your customers could do that themselves oh that would be great which would we because then it would be easy for them you know to just see the value the economic value because if we actually get if you give us your pricing you know for people that give the pricing we can actually put the pricing right in the the application then people could really starkly see the difference um that would be really cool yeah yeah we should get if you could give us like whenever you can like some preview access to it we can sync it up so it automatically get we we get you on today oh perfect let's yeah let's do that our Engineers love this and you know they'll play around and I I think we'll blow their mind I I think nice um you know we're getting quotes from from Engineers that just unbelievably rewarding and you know how nice it is when you've spent years of your life building something totally and someone sends you three lines like this is really fast right you come home to your wife and you go what today was an okay day it's like why and you list eight things that went wrong but I got an email from a guy I've never met who used our our our equipment and this is what he sent and your wife looks at you like you're out of your mind um um and and so when you get those um you feel good and and we're getting those just raining down on us that's awesome so that that feels awesome and I know we'll your guys will be really fired up yeah so look cool um thank you for having us on to talk about our our new inference capabilities yeah AB excited we really app appreciate it and it's always fun to talk to you Lucas and it was lovely to see your family in in June um that was really uh to to to see the little the little people running around that was great totally it's pretty fun for you with an LM is a pretty hilarious uh thing I'll tell you yeah yeah for sure for sure uh yeah great to see Andre thank you you you well thanks so much for listening to this episode of gradient descent please stay tuned for future episodes

Launching the fastest AI inference solution with Cerebras Systems CEO Andrew Feldman

Share your thoughts