Transcript of Building the World's Fastest AI Chip with Cerebras Systems' Co-founder and CTO Sean Lie

Introduction running ml models on large GPU clusters is really really hard um I mean to to the point where uh uh almost every large model um uh is a really big research uh publication right and and and companies like opena and Google have teams of hundreds of people not doing the ml but doing the distributed systems engineering to figure out how to map that ml onto you know hundreds or thousands of of gpus and at at the at the heart of it the reason why um why you have that challenge is actually pretty simple um uh when you when you're taking this really really massive model and you're trying to fit it on bunch of these tiny little devices you're just using the wrong tool for the job and so a lot of what you know how we think about the the problem here at cerebrus coming back to this whole Cod design and vertical design is you got to rightsize the problem and right siiz as a tool for the problem welcome to AI in the real world I'm Steve Vaso a general partner at Foundation capital in this series we talk with leading AI researchers about their groundbreaking work and how it's being applied in real businesses today joining me is Shawn Lee co-founder and CTO of C BR systems I've known Shawn since early 2016 before leading the series a investment in Ser Russ at the time Shawn was just wrapping up a three-year stint as Chief Architect for the data Center server Solutions at AMD which he joined via amd's acquisition of his last startup C micro Shawn is one of the biggest brains I know in and around hardware systems bringing a rare expertise and passion for advanced architectures many of which break from tradition and traditional approaches to Computing in this episode Sean guides us through the world of AI Hardware Innovation he breaks down the concept of wafer scale processing the challenges of building monstrous AI chips and the massive potential poal of sparity he offers valuable lessons from his journey as deep Tech entrepreneur including the importance of thinking end to end and planning for scale from day one we close by peering into ai's future and sharing advice for Founders taking on similarly complex system level problems it was a really fun and insightful discussion that I think you'll enjoy and now here's my conversation with Sean welcome Sean thanks for joining us thank you for having me Steve awesome Sean’s background in hardware systems well maybe if if you could just kick us off Sean with a quick uh potted bio your kind of background and how you found yourself uh in I guess it was kind of 2015 when you and Andrew and your uh three other co-founders really got started on this idea that became cerebrus well so you know it's uh if you think back uh in the 2015 time frame um that was uh you know at a time when the term Ai and and ml was still not uh you know what it is today right where everybody knows knows about it but it was right at the kind of the beginning of the the modern AI era where we started to see models um you know pioneered by like Alex net which uh you know which was one of the very first models that uh that was deployed and trained on gpus um you're starting to see these much much larger ml neur Network models than than the industry ever seen before um and they're getting insanely good results right starting to to to get results that uh were almost like magical and and I think that was what captured you know our attention um uh uh just because the the ml is by itself just really cool right see seeing you know back then it was all about image net and being able to recognize images um uh seeing that these these models could actually do things that felt like only humans could do before was was really magical um but what what what was I think um uh very different about this particular you know workload than things we'd seen before was the fact that the only way that Alex net was able to achieve what it achieved was that they figured out how to run these neural networks on gpus at the time one of the very first models that that figured that out um and so it it it was very clear um at least at least to us coming at this from the comput you know Computing world that this was a very very special application and software workload that had such a tight dependency with the underlying hardware and in fact that that paper that alexnet paper which is now like you know a very seminal paper in this in this space um showed that there was almost no limit to how much you can push this you make the model bigger and deeper and it just keeps getting better and better um and and that's what really captured our our imagination um and and and so we we we we thought that there was going to be a huge opportunity here um to to to build something that was specifically for this space right um the the that team figured out how to use a graphics processor um uh to run neural networks when everybody Cerebras’ origin story: Recognizing the need for AI-specific hardware else was using CP use um uh and and that helped a lot so was you know less bad than than running it on uh general purpose CPUs uh but we just asked ourselves a very simple question well what if we were to design something from scratch that was made just for this workload what might that look like um and and then that that's ultimately what what got this all all started uh and then you know uh uh one of the biggest things that that we we took a bet on was you know these models are going to be really big so we built a really really big ship um yeah yeah well I think I remember uh one of our get togethers this is probably 2015 time frame it was before certainly you guys had um even Incorporated and you know was this this observation of the workloads growing far far faster than Moors law I mean we've been talking about Moors law beginning to sort of slow or decelerate um really for for some time but you guys were seeing this you're seeing the workloads and it wasn't just workloads um at the sort of hyperscalers it wasn't just at the Amazon and the Facebooks I remember some of our kind of due diligence calls and chatting with one of our portfolio companies that was in the mtek space and they were experiencing this as they were uh training their um you know their next Generation Um of of of advertising Technologies and um but so here you are you sort of see this opportunity um you know gpus better than CPUs but not what you'd build if uh you were to start from scratch clean whiteboard like what can we build um if we were to really kind of you know build for purpose as it were and um did that immediately turn into wafer scale or what was the when when did you kind of say wait a minute like in order to do this we can't screw around with you know postage stamp-sized uh silicon we're going to have to go big and um when did that leap happen I I sort of feel like it was you know still early days but there was in some sense it's almost a fear around saying the word wafer scale because it had been such a dangerous thing for so many companies over the last really sort of five six decades but anyway tell us when it when it went to wafer scale I think uh I think the the first thing um was the realization of just the unique nature of this this Computing problem you know that was you know these neural networks in that how tightly coupled the memory the compute um and and and and tying it all together it had to be to be able to solve this problem um uh uh that realization honestly I think took uh a little while to get to because um the entire industry ourselves included was very used to this Paradigm which uh uh I guess uh you know was made Popular by Google and all the hyperscalers that all you need needed to do was just scale out using these very Loosely connected um uh processors right and that was what was really driving the Computing industry for you know the decade or two be before this um and so so really trying to understand well what what what is really needed for this workload um actually did take a little bit of a little time to to to Really internalize because it was so counter to everything else that the whole industry um was doing but when you look back it was actually very obvious in fact um that very first Alex net paper that I was referring to they even needed two gpus um to right so even like some of the very earliest models um already you know did not have enough compute on a single chip right and so when we finally like like you know got to the Crux of it understood just how tightly coupled the compute had to be to be able to solve this problem that's when we started to EXP exp or all sorts of different options and uh you're right there was very much a a bit of a negative connotation to to wafer scale um uh we we did not immediately jump there um but we arrived there pretty quickly because uh once you realize how much compute how much interconnect you need you just start thinking through all the various ways that you can do this in the traditional way and they're all limited right um and and and and it was you know one day where we basically just asked ourselves well you know what if we could just build a much much larger chip right A lot of these problems just just go away right yeah so so it's interesting because you know you start with you know Wafer-scale processing and its advantages we the Wafers are now what 12 inches is diameter order of magnitude um and you know we're basically taking the largest uh uh Square you can you know a circum describ with a 12-in wafer and if you think about sort of what the sort of state-of-the-art or kind of conventional wisdom is is you take these massive Wafers then you slice them into um you know many smaller versions of that and then uh spend a whole bunch of money and networking gear and uh you know copper uh to basically reconnect all these things and then cluster them right it's like well what the hell are we why are we cutting this all up just to reconnect it again um and so uh you basically traded that set of challenges which comes with lots of overhead around uh complexity cost software I mean look we I think most folks have realized that you know some of the software challenges that the likes of Nvidia have um you know have built over the last decade are more about managing what's under the covers right so if you if you could write to a single device you wouldn't need to have so much complexity in software layer to coordinate the workloads across um you know so many different systems but um or so many different processors um so you see this opportunity to replace the lexity um of of the interconnectivity of these many systems but we traded off a bunch of other challenges so we I mean if I think back to some of the crazy stuff we had to do in 2017 and 18 around uh thermal management uh stuff that you know today sort of feels like you know thank goodness we made it through that knole but like these were fundamental challenges around heat transfer I I I remember doing some of the math uh and I think on a on a walk per square millimeter basis uh we're only off from the business end of the Space Shuttle rocket by a factor of about 20 so I think we're about half a watt per square millimeter um and uh you know if you're pushing the whatever STS 135 in space uh that's I think about 10 watts per square mil um so I mean that's that's not that far off so anyway that's just one example of the things we had to had to manage we had to deal with lots of other technical challenges but as you look back on the craziness of you know the tradeoffs we you know we did choose to make as a result of building wer scale what are the ones you're most proud of what what are the things that sort of feel like as you look back on them it's like wow that was crazy hard so glad it's behind us I think that um uh you're right the uh the reason why uh this hadn't been done before is that you know it wasn't easy um and and you touched on I think the fundamental you know reason is that to be able to um to drive enough power um and then ultimately to to to pull that power away in the form of cooling um is a fundament fundamentally different challenge uh when your chip is you know 60 times larger than a a traditional uh chip um uh and that was absolutely I think one of the biggest uh you know hurdles for us to you know to to be able to figure out how how we can do that um and we experimented with a lot of different ideas um uh some that are kind of seem very traditional uh uh in terms of power delivery with you know traditional power Regulators um which which traditionally are are mounted on a motherboard right next to the processor um you quickly realize that there's just not enough copper in the circuit board to be able to pull enough current into the wafer um uh and so where we ended up was a design that allowed us to mount the uh the power Regulators directly onto the wafer uh in a kind of a vertical Third Dimension and so we are essentially driving current directly through the circuit board right into the right into the wafer that's a that's an innovation that ultimately um was a key enabler to this to this entire uh wafer scale integration another really really big Big Challenge that um at the beginning we we recognize but I don't think we we we really internalize how big of a problem this would be is is the problem of uh thermal expansion right I you you'll remember the numbers better than I do but I I I I I remember looking at the coefficients of thermal expansion between the silicon and and the copper and I was like how the hell are we going to manage that well exactly and uh uh you know and and you know we laugh about it now but like the number of Wafers that we cracked in this in this process um uh as we learned um how to how to do this and invented you know new techniques for how to do this um uh you know it was it was challenging uh the uh the problem is you know as the uh the the wafer the the Silicon is larger as it expands it will expand more than a traditional chip and so we had uh uh specialized ways to connect that to uh the circuit board such that we would allow some variability and and flexibility so that you know the the actual connections can actually move we keep that secret for later that's that's uh you know that's definitely another area that we are we are particularly proud of the Innovation Technical challenges and innovations in wafer-scale integration that we you know that that that went into this um uh in the end you know when we you when we step back um I think what what really enabled us to do this in and and where others have failed I think is really our ability to have thought about the problem really like end to end right so this is not just a a chip design problem or just a problem with the you know with the with the Fab um uh you really have to think about this throughout the entire uh NTN technology stack right willing to changes and and and new inventions in the packaging um just to enable the chip I love this um Point you're bringing up and I think it's it's one that is often overlooked um and it and it sort of actually speaks to your background uh which would be fun to kind of unpack a little bit more too here but you're bringing up this point of like you kind of when you're making a very large um system change um when you're inventing something that truly is new uh not incremental you uh you really have to kind of build every component of it and in this case you know the challenges that we had to overcome around yield uh you know so you know that that's a hard problem unto itself the packaging as we were talking about a moment ago the power and and thermal management um uh issues and then once you get to that then you like you know let's hope all that works and then you like okay now how do I program this massively parallel system and so you've got to kind of take every piece of this um and and own it all and um I I'd love to kind of have you just talk for a minute about like the kinds of people that you need to assemble to to take on a system like challenging that so this is not these are these are I sometimes talk about them as missionary Misfits these are sort of folks who Spike really hard in in one or two domains um and oftentimes take those uh skills and expertise to a different domain that's adjacent uh and you know unlock some really interesting insight that was probably not obvious to the individuals who are kind of you know programmed if you will in that domain um and so you know and and you know we at cerebrus I think we've just you know we've filled the building with uh proverbially uh with with with these missionary Misfits but talk a little bit about what's required when you take on a system challenge like this um that involves Hardware software embedded systems uh you know compilers uh you know graph Theory I mean literally there there isn't I think any domain in systems theory that we have in touch including you know fluid dynamics fully fully agree actually sometimes you know we we joke about this um that you know we're not a uh we're not a startup but we're like six different startups right um and and I think you're absolutely right like we we we realized very early on that if we really want to make a a difference here um we have to design the entire solution right from working with the Fab to obviously the chip design to building out the package Power Cooling the physical system around it all the layers of the compiler software all the way into the machine learning um to to be able to really make a a big difference and um and in in some ways I mean that itself that decision itself is a is a bit of a a trade-off decision right um uh because on the on the one hand The importance of end-to-end system design in AI hardware humongous opportunity when you do that um it it allows you to to to blur the boundaries between these traditionally Technologies the the flip side is kind of what you're getting at is that it's not easy right because uh you are you know effectively having to innovate um across the entire technology stack even in areas that you might traditionally think are you know solved problems um right like server design right I mean that's a that's no longer an Innovative normally not you don't think of that as a as a high highly Innovative uh field uh but when you got to accommodate wafer size chip in a package well it's you got to rethink that as well um and so you're right the the the way to the only way to uh you know to to make that work in in our minds is is you need to surround yourself with Incredible Talent across the you know the Spectrum and um and and you hit on a very key characteristic that um that most of our Engineers have which is not only are they experts in their field uh they have to have the mentality and the open-mindedness to be able to work in this kind of boundary between this particular field and and the adjacent field um and and be able to be fluid around that we we we pride ourselves and having a very vertically you know co-design uh mentality team here um without which we this would you know would would just not be possible y yeah yeah so let's shift GE and talk a little bit about some of the software challenges um and you know we've got now this extraordinary platform um waer scale engine 3 which we announced about a month ago or so um just as a reminder to folks this is the largest uh and the fastest AI chip um so uh roughly 900,000 cores four trillion transistors um Sean will hit on I'm sure some of the other ridiculous uh you know memory uh specs and and things like that but um you know here's the system and and I think probably the one that most excited me was just the fact that we could cluster 2,000 of these things um uh which is just mind-blowing um and I I know we will have customers and partners that will actually do that so the fact that it can be done will probably mean it will be done um but um but you know so you got this you know amazing um piece of infrastructure uh and and yet the thing that I think makes it most interesting is that it really can be addressed as a single device is that instead of having to as a you know a machine learning or AI uh engineer have to sort of figure out how to lay my model my big big model that hopefully you know I'll be able to iterate quickly on onto this Hardware is just insanely simple um and and the sort of philosophy behind that in other words the hardware affords new approaches to the software and I'd love for you to kind of talk talk about um you know what what that means uh what what that has unlocked um and maybe give a few examples of some of the ways that it's kind of even excited you or surprised you to the upside no yeah absolutely I think um uh you know for for for folks who haven't you know done this before running ml models on large GPU clusters is really really hard um I mean to to the point where uh uh almost every large model um is a really big research uh publication right and and and companies like open and Google have teams of hundreds of people not doing the ml but doing the distributed systems engineering to figure out how to map that ml onto you know hundreds or thousands of of gpus and at at the at the heart of it the reason why um why you have that challenge is actually pretty simple um uh when you when you're taking this really really massive model and you're trying to fit it on bunch of these tiny little devices you're just using the wrong tool for the job and so a lot of what you know how we think about the the problem here at cereus coming back to this whole code design and vertical design is you got to right siiz the problem and right siiz the tool for the problem um and so a key capability since our chip is has so much performance and it's so large is that we can actually run the entire state-of-the-art large model on a single chip love that right and and so you can do that and then and then on once you can do that then you no longer have to split the model then to get acceleration all you have to do is then replicate right and that's what's called Data parallel replication it's actually very simple you're literally just copy pasting and and replicating the model running it all in parallel and we've built an entire cluster architecture um custom built with exactly just this uh this uh this this model in mind so that as you said from the users perspective right whether it's one uh system with one wafer in it or hundreds or up to 2,000 in our current generation the programming model looks exactly the same um and and that's really been been revolutionary um and and is is really one of the key enablers Cerebras’ approach to scaling AI models for uh for what our customers have seen and you were asking like you know some of the things we're we're most like excited or proud of you know we've had customers who have come to us and they say hey we can't you know we have a ton of gpus but we can't get the problem to scale Beyond 64 gpus we try it on 128 and it doesn't run any faster or the model doesn't converge any faster why is that because it's very hard to take that problem and split it up into 128 small pieces and have them all coordinate and and and and U communicate with each other uh in a way that you can continue to get speed up um and then that same model that they could run on 64 processors took them two months to to train this particular customer um on our on our cluster we trained the whole model in like I think it was three days yeah it was a long weekend I remember this and and I think they actually didn't even it wasn't done at 70 days uh so it was still working away um I you had said something to me this is maybe a month or so ago um and it actually reminded me of kind of early days of the company and some of the observations around these workloads are really exploding um uh much much faster than than than then Morlock could keep up but if we just look at from when we launched at hot chips this is the waer scale engine one uh in I want to say August of 2019 um to today so almost five years um the uh the compute requirements have grown I think by 40,000 X is that roughly where we're at yeah crazy that's absolutely right back then uh the most state-of-the-art language model was uh was Bert um and uh today if you consider gp4 uh as you know the state-of-the-art model there to train that those two models there's about a 40,000 times difference in terms of overall training compute it's insane think the next five years look like an extension of that is it is it you know is it another 40,000 or do you think it's 400,000 I think that I think that it it it wants to be another 40,000 or more but I actually think that this is um a a fundamental problem that our entire industry is is facing because using the current approaches uh I don't think there is another 40,000 um math doesn't work basically the the at some point you're gonna you're going to be deploying like multiple uh football fields worth of silicon just to train a single model and spending you know tens or hundreds of billions of dollarss just to train a single it just at some point it doesn't work right and and and we haven't even talked about the power requirements right power EXA yeah exactly right and so I think I think that is the the fundamental challenge that the whole industry is is facing and very much at the heart of what we are trying to uh trying to solve here at at at cerebrus yeah it maybe I want to switch to sparsity in a minute but you're reminding me um uh of you know sort of this concept uh Carver meid had this was this sort of you know uh call to action that he had in 1980 when he told Engineers to basically waste transistors right so you know he was Carver was was you know one of the sort of uh uh you know uh one of the fathers of neuromorphic computing and and had this kind of concept of very large scale Integrated Systems and and uh this concept of wasting transistors um seemed a little out of the time but but if you think about what you know was possible in terms of going for example from command line interfaces to goys um you when I think about some of the programming I did when I was a kid it Introduction of the Wafer-Scale Engine 3 and its capabilities was like you know you were so had to be so careful and sparing in your USIC memory um that the idea of you know moving a mouse around to then you know move a effectively move a cursor on a screen would have been the most wasteful you know really sort of just computationally irresponsible thing to do um but when you can waste transistors um you can begin to unlock some very interesting use cases things that you know would have been foreclosed and and it just sort of like up a world of creativity I I I think about sort of what we're doing in some senses as another version of that sort of waste training uh another words being able to not have to to this point train over the course of 70 days but be able to train much much more quickly and you know hours maybe minutes at at some point such that you can build uh more models experiment with more ideas much much more quickly um so anyway I I realized it's a whatever a 40-year-old um uh metaphor to back to Caron me but I think there are some interesting parallels um maybe shifting gears because I feel like um one of the other Innovations uh that um we really have pioneered with the cerebra system again this is less about the hardware and more about the system is this concept of sparsity and taking advantage of um you know the the fact that we know a lot of matrix multiplication um is overparameterized is there's kind of too many many to many uh uh math computations lots of math that only results in the number zero which maybe we shouldn't be doing but just talk a little bit about um what we're doing in sparsity and why it should matter I I think you're you're absolutely right um in in a lot of ways when you when you think about what neural networks are and what the the process of training a neural network is um what what it is is is essentially created you're creating a very overparameterized model a model where you're throwing a lot more parameters in there than it actually needs and and and the process of training is in some intuitive way discovering which of those parameters are actually important and and which are not right and and even if you look at the actual structure of these neuron Network layers they're actually these all toall connected layers these neurons where every neuron connects to every other neuron and that's because when you start off training you don't know which ones are important but the purpose of training that model is to discover you know the values and which which of the uh those parameters are actually important and so sparsity this concept of of being able to have you know uh parameters that you can remove uh from the model uh is is what sparsity is it's very um inherent in the neural network architectures themselves and and it's an area that I think is generally well understood because it is intuitive however the way that these models are implemented uh and executed on traditional Hardware like gpus doesn't allow you to take advantage of of the sparsity um and the reason for that is uh as you alluded to is that you know gpus and even other you know popular accelerators like tpus they're basically dense Matrix multiply engines put a ton of um flops in there to be able to do dense Matrix multiply um and and that means that they can't actually accelerate if it turns out that a bunch Sparsity in neural networks and Cerebras’ hardware acceleration of the the matrices have sorry a bunch of the elements within the matrices have have zeros in them they're going to go right ahead and and and multiply and add by zero and and and you know guess what multiply and add by zero doesn't change the answer that's an interesting math let's not do that no um you don't need a PhD in in AI to to know that right and uh uh but it's very hard for a a matrix multiply engine to be able to extract out you know these these zeros and so we we had that observation very early on and you know coming back to this whole concept of Designing a hardware platform from scratch for this workload we knew it had to support sparsity because these models have so much inherent sparcity so we we designed our Hardware to be able to accelerate unstructured sparcity these zeros can be anywhere within uh Within These matrices and every time you see a zero not only only do we save the power because we don't execute it we can accelerate and and skip it and move on to the next nonzero element and you get speed up um and you know coming back to your earlier point about the 40,000 times more compute um I believe that sparsity is one of the key ways for us to be able to to to get there if boot Force dense Matrix multiplies um uh it's going to be very hard but we can make the computation much more efficient by skipping all of these redundant uh parameters and and we're we're very proud that you know uh uh we had this Insight at the very beginning and and right now we're the only commercial commercial uh AI accelerator that accelerates unstructural sparcity um uh and as I said not only is it producing you know acceleration now I really believe that this is one of the vectors that will enable that next big step for the whole industry I am I'm so glad you unpack it I mean I I think about you know we we're known particularly because the hardware is so impressive for you know chip architecture system interconnect fabric you know data IO lots of reasons why we're getting so much more at the hardware but data sparsity is when I think about our vectors of innovation is actually the largest Lev um and it's also the one that I think has so much more room to run um maybe um if you could uh without kind of getting into the to the realm of proprietary but um are there other um vectors of innovation that you excited about exploring that you you know are um we pursuing I mean I think about numeric representation uh as as another area of opportunity we don't talk much about that but um what are some of the you know other other vectors um where you think we can kind of maybe gain purchase on this opportunity to deliver more uh benefits without having to you know 400 or 4 40,000 X compute requirements I think uh I think numerics is a is a good example um obviously we spoke about sparsity but I think these all fall under this kind of broad uh umbrella of ml co-designed with the hardware right um uh thinking about changes you can make in the ml algorithm that would allow the hardware to be able to operate a certain way right or changing the way the hardware operates to allow the ml to uh explore different architecture I I I believe that um that's where the next wave of uh of big step function Improvement is you're uh you're you're you know another sort of towering figer in in the field of computing Alan K right you know if you're really serious about software you got to make your own Hardware so and you're bringing up this point around uh you know if you really really want to challenge uh the kind of the the old approach is you've got to be willing to kind of work at every level of the stack um you know you know maybe just shifting gears um do you see any other um you know big opportunities um you know kind of in uh you know untapped opportunities in AI Hardware we're obviously working on you know a very very substantial one with cerebrus but uh when you advise entrepreneurs or sort of see interesting things and people try to you know grab you at the end of a keynote and tell you about their cool idea what what are some of the things um that you are excited about or at least curious to learn more yeah I think I think um there are a few there are a few areas that I'm really excited about and not surprisingly they all kind of fall into this kind of co-design space um uh pushing numerics to the extreme I think is very interesting um uh right we we we know that um uh even in standard Hardware today uh people are talking about training in forbit uh math I think What’s next for AI hardware? you can push it even further um but you can't do that uh without making changes on the ml side right but I think once you do two bit one bit models they start to become you know promising or or or or viable um there other ideas that I I or just general kind of um uh space where where I feel like there's a lot of Promise is um in more Dynamic models right so today these models are generally uh static they're they're generally known exactly up front right oh well the Llama model has 80 layers and each layer is ex exactly you know this size and and and and it has you know two feet4 uh Su layers and so on um and then all you're doing is you're basically just running data through it um I think there's a very promising uh Closing thoughts: Advice for deep tech entrepreneurs Vector where you can change the shape and the size of the model based on the data that runs through it um uh uh imagine a world where the depth of the model changes based on how complicated the data is right um uh if if the data comes in and it's actually relatively simple well maybe you don't need such a deep model right yeah right so it's sort of simple simple complex possible kind of does again back to this notion of sort of let the system sort of scale with the workload exactly so so that's another area where um I I think even intuitively you can see where you know where there might be promise but uh uh it's not actually easy um to be able to run those types of models on existing Hardware again because of this rigid um you know Matrix multip engine uh uh underlying core right um and so another one of these opportunities for Hardware ml uh codesign um so I think there a a very rich um you know space here that can be explored that often is is is not explored because there doesn't exist the hardware um to to explored on right yeah yeah so um maybe in the last few minutes um before we let you go go would love to just kind of gather some of um some of your thoughts and advice for entrepreneurs who are you know kind of early in their Journey so you know imagine the the 2016 version of Shawn uh you know just as you were uh starting cerebr us with with with Andrew and crew um and you know kind of what you've learned about building you know a deep hard tech uh startup uh you know all of the sort of you know challenges of I'm sure uh yielded some epigenetic rewi ing of your brain um what what are what are some of the things that you would sort of like to pass on to the younger version of of Sean that's a great question I think if I you know if I if I were to to rewind I I think the the first thing I would uh uh you know I would warn myself is that don't do it a version of that this this is this is this is hard um doing you know deep Tech uh startup uh especially at the kind of scale that we're talking about is is hard um this is by far the hardest thing I've ever done right um and uh and it's not for the faint of heart um uh but I think it's it's uh you know in terms of advice I think you know the advice there is is really you know really understand what you're you're getting into right um we uh you know we made a very conscious decision that we're going to go after this really massive problem um and and and if you're going to do that make sure that you're set up to to do it across the board right and this isn't just making sure you're hiring the right people that's a huge part of it um I think it's it's making sure that you're you set up the rest of the the business um uh to support it it makes sure that you're working with the right investors and the right um and and and who understand ultimately what we're you know what you're trying to go after right it affects the business model that you're that you're trying to build I think I think um you know one of the the lessons that you know definitely probably took me a little longer to to learn um coming at this especially from the technical side is that you know obviously you know deep deep Tech startups it's all about the technology but um but it doesn't work unless you have all the other supporting and and surrounding uh you know infrastructure so to speak right um and and making sure that you think that through upfront right and that's something I think you know I'm pretty proud that we we were able to do that from the very beginning um and and and you know it meant we were able to hire the right people we were you know able to work with the the right types of investors like yourself um uh you know the the right types of customers and early you know almost Partners yeah yeah they've been I mean there you know if if you think about sort of how important it is to choose the right right highly aligned um design Partners who are going to forgive you when you know the thing doesn't work perfectly and in fact you know in some sense it's become an extension of your team um you're reminding me too a couple thoughts uh you know as I think about some of the the lessons to pass on to you know kind of the next version of Sean who we want to meet by the way um is um is you know you know I think lots of folks um work on you know deep Tech hardtech Frontier investing um and forget that like you know it's not just supposed to be hard to do but it must be worth doing right so I think one of the things that I I'm so proud of you guys and and it was so clear from really date Zero was you know we weren't going after you know 20% better 50% better even 10x better gets chipped away at relatively quickly as we know in this domain it's like you've got to be uh three orders of magnitude better for for it really to matter and uh and you guys went after that the very first you know versions of what we were building um were you know that kind of the the simulations we were running were THX better um and so I think you know that really matters and then the other one I I think I realize and it kind of ties back to some of the sort of thermal management um challenges we were joking about that you know at the top of the the podcast was just um debugging very you know complex systems issues is a very tricky thing because it it turns out that the the big problems actually hide smaller ones they like you know they lurk in their Shadows it's like you you know there's when you solve the big problem there's another set of problems behind it but you couldn't have seen those ones behind it until you solved the big on so like just kind of rushing rushing to the fight if you will with with the biggest problems um and as quickly as you possibly can so that um you know so that you'll then discover the ones that are um you know hiding in the shadows right behind the big ones so um you those are a couple of quick thoughts um you know I guess unless there's more advice for entrepreneurs I'd love your thoughts on when you think about this massive stepwise shift uh towards um you know towards Ai and um all of the you know massive markets that it is going to influence and disrupt um you know everything from from energy and Healthcare you know these are things that are you know 15 20% of of GDP kind of markets um and when you think about kind of you know decades from now how will this moment that we're living to kind of this last you know five years in the next 5 years how will how will this be remembered and kind of what excites you about what will be possible maybe in five years that we couldn't even imagined uh when we were you know we done this podcast in you know 2019 you know five years ago uh yeah absolutely I think that um uh you know there's maybe two general ways that I think about that right there's there's the societal impact that this is going to have um and and even if I take take the cerebrus hat off right um the time that we're in is you know without a doubt uh you know some pretty magical you know time it's once in a generation kind of um or multi-generation kind of kind of time um and what's so exciting here is that uh uh like you said I think there's this is going to enable capabilities that you know we we can't even imagine uh right now um and and we're kind of in this like I think super early phase uh where you know AI is kind of interesting in the in that it's a type of Technology where it's like super easy to come up with a bunch of very you know you know simple use cases because pretty much like you know anything that you can think of that you know you I could do like oh well maybe there should be a you know an app or or an AI that could that right I can write emails I can do marketing I can do right I can write tweets um uh I you can be an assistant I can uh but I think what what's going to what's going to become um you know the the real impact is when we go beyond those kind of trivial things and it's you know it's helping us solve like Health Care problems that like you know we wouldn't even be able to solve you know uh with without it right things like that which I think are going to have the lasting societal impact so you know that that's my my bet and and and you know why I'm also very excited that you know we get to play a a small part in in being able to enable this future now from a technologist standpoint when I was side of the general you know public and and Society Society point of view as a technologist I I think that this time um is going to be remembered as a uh as a as a time when uh kind of a the hardware innovation has has been reborn um if you think about uh Hardware um and Chip design and processors 10 years ago um I was in that field kind of kind of boring um right I mean yeah you're making incremental improvements every generation to generation but uh not really enabling anything anything new and now we're in this in this like era where um without the hardware you wouldn't be able to enable the ml right and and I think um there's a reason why almost every big you know hyperscaler is building their own Hardware um uh and I think I think that's one the big transitions that this this whole kind of modern AI um field has has brought on is this Resurgence you know Rebirth of the importance of Hardware um and and I think that's I think that's a great thing especially given you know I'm a hardware guy right that yeah and I and I think it's it gets back to this point of the inter playay between Hardware software uh you know embeded systems um and you know this notion that you know you really I mean these these itera algorithms you were kind of alluding to this before they're just super special in that they they can lead to these very complex artifacts that actually transcend the complexity of their antecedence right so they they they create opportunity that could you know couldn't have been predicted and um and you need not just the algorithms but you need the hardware um to be able to to enable that and you know of course you won't always understand how they're doing it but um and we've got to you know gain purchase on that specific topic but um but I agree with you it's it's the full stack uh that that kind of enables all of this to be possible um awesome Sean so great to hang with you is there are there any other resources um again this is in the spirit of educating the next generation of um you know of of entrepreneurs hoping to follow in in your big big footsteps what would be you know are there any um books uh podcasts uh uh substacks uh anything that you sort of would would Point folks toward that would help them on their Journey um I guess uh that's a great question and uh I guess I'll start off by by uh saying that you know I'm not generally the kind of guy who you know reads books and listens to podcasts about about you know business or you know building startups and and so on um but uh uh but I guess uh uh if I had to pick something here I I I will say that um uh and this actually comes back to even I I think as I look back a lesson that I I myself um uh feel like you know I learned the hard way um uh Reed Hoffman has a has a great podcast on on scaling U right and uh as I think back uh uh you know early days and and even very recently one of the things that I I think has been the hardest uh is is is figuring out how and when to scale right and and that's the product itself um that's the team right um you know I I can't do it all myself um there's only 24 hours in a day and you know I need to sleep for a few hours uh uh right that's um uh uh that's scaling you know our our customers and the business um uh I think uh uh one of the things that that I feel um many uh many entrepreneurs myself included um uh don't think about early enough is is is is planning the you know to scale uh and and I think that's something that uh uh uh I would tell any you know aspiring entrepreneur that not not to say that you need to be scaling out of the gate but you need to be thinking about it you need to be preparing you need to make sure that what you're doing doesn't lead to a dead end that you then have to redo everything to uh to figure out how to get to the next out it's a great point and I think relates to I mean I think certainly something that's important in in in our business on the investing side um but it relates directly to the point that you're bringing up which is um you know there's a there's a timing element to um to so many of these opportunities I mean we obviously you know spent a lot of time understanding the market opportunity and we talked about sort of how the compute and um Ai workloads and ml workloads were exploding back in that kind of window between 2013 and 2016 um uh but I don't think I mean we couldn't have predicted you know the next inflection after Alex net you know was was really Transformers in 2016 and uh and you know this just ridiculous you know acceleration over the last handful of years as a result of all of that effort and so um you know it's it's it's it's very hard to to make the waves it's impossible nearly to make the waves but you want to be ready to ride the wave and uh and uh and and ride the front half the wave because climbing up the back half of the wave is pretty darn hard so um but yeah and I'm just so excited uh by what we're doing at CBR uh before uh before we started this call we're riffing on some of the things we're going to announce over the next uh uh couple of months and uh more to come so um hopefully folks uh got something out of this podcast Sean I'm hugely grateful for all the time you spent with us and and for all the work you've done at C bruss and and the leadership uh on some you know very gnarly uh technical challenges and uh we'll continue to innovate and and launch some amazing things and uh for yeah anyone out there looking to get a hold of Sean you can you can find them in lots of usual spots um but uh you know let let us know if there's anything that comes up and we'll follow up in the uh in the comments on this and look forward to uh continuing the conversation but thanks again Sean great stuff uh and congratulations on all the amazing uh both Technical and Commercial Success you've uh you've been helped drive over s Russ no problem thank you Steve for having me uh uh today and and of course for all your support over the over the years here at seris awesome thanks man we'll talk to you later

Building the World's Fastest AI Chip with Cerebras Systems' Co-founder and CTO Sean Lie

Share your thoughts