Published: Sep 03, 2024
Duration: 00:36:25
Category: People & Blogs
Trending searches: internet archive
this is Elton Sherwin the producer of these rotary podcasts today we are going back to a meeting that first aired in March of 2016 it is a presentation and an interview with Bruce trale the founder of the internet archive but first if I could ask a favor of you if you are enjoying these podcasts could you recommend them to just one other person also we have recently released a new podcast series called development success stories it is a subset of these podcasts that deal specifically with economic development and will be we have noticed that these episodes have been particularly popular so we duplicated them in their own series if you know individuals who are in public service work for NOS foundations or who study or teach development economics or public health rushon and I and the whole crew at The Rotary e- Club of Silicon Valley would appreciate it if you could recommend our new series to them it's called development success stories and it is available on Spotify as a video podcast and on Apple and Amazon and elsewhere as audio podcast on to this week's presentation and interview hello everybody this is the Rotary e- Club of Silicon Valley every week we come to you with stories about Innovation entrepreneurship and education as the regard to service to others and we are always looking for those interesting stories that can Inspire one to see the world in new ways think about ways that they can improve their Community big small uh through technology through old school means whatever it might take and to that to that end we love getting really interesting people to join us and talk about the work they do this week our speaker is Bruce trale and Bruce Jale is the founder and digital librarian at the internet AR if you're like me you knew for years that this was around but you never really thought about it until you know you got connected to it and in in my case through a really interesting article that got me to go up and visit in San Francis and uh and at which I became a a major fan of the kind of work that they are doing uh Brewster is a is a former MIT person who have been involved in startups who has uh taken a real passion and made it something very real in terms of uh building what we'll call you know Library of Alexandria 2.0 I'll let him speak to that in more detail but please welcome Bruce trale R thank you very much really appreciate uh meeting you last week and and and doing this this hangout um I thought what I could do is give kind of an overview of what it is we're doing with this digital transition of when when everything goes digital you know is it better or worse and hopefully it's better and uh that we could uh take advantage of the site give an overview of what the internet archive is and you have to play for for Q&A is that sound good idea and all right the way I'll do that is I thought i' I'd just give not going to be a blithering set of of um PowerPoint slides it just be a few just to go over some of the basic ideas uh on what's in uh the arch so I'm going to screen share um and naked so for what what is the internet archive the internet archive is an a 501c3 nonprofit Library based in San Francisco and the IDE is to try to build the Library of Alexandria versal to C make all published works of human kind available to everybody that wants to have access to it can we build that vision of the internet can we build universal access to all knowledge technologically you can actually calculate it out and it's actually doable can we get there through the rights issues through the roles and responsibilities of different organization so what are we well we're a library uh this is our headquarters building in uh San Francisco uh um and it's um full of techies basically um some looks a little bit more like a high tech organization in your classic Library there aren't books in this uh in this building um there are um people working away with a big fiber optic line um we've got a a series of media types that we're trying to go do if we wanted the universal access B bid we break it down BS music video software web page how much is there and how do you get it online so um my smallest collection our software collection which I think we've got around 990,000 software total now you can actually go to archive.org and launch and run software that was for the IBM PC or the Apple 2 or the kaver 64 or the Atari 2600 and it running in emulation in JavaScript in your browser if you go to Archive .org and I you know suggest you do try it um go and wak around in the software collection to try to find some only you might want to run that you may have used Years Gone by um but also it could be productivity software the first time I've ever gotten visit uh for instance so try to preserve software and bring it to another generation even though the underlying Hardware platform may be uh maybe long there we also have a um Arch where we're actually collecting physical materials try to stay away from this and let libraries do it we found that libraries are throwing things away to where we wanted one book digital of every book ever public uh we're now trying to collect one Co copy of every book ever published and we've got about 2.3 million um now of uh physical um books and also actual books that are uh being scan in lots of libraries so um so our book our collection we're also collecting movies music um and and try to store these densely cost effectively good preservation where the access copies are the digital ones and these are Rel in storage uh and know long-term preservation forun um we also have moving image collection um about two million mov um we either digitize themselves or people upload them to the internet archive if you got archive.org there's a little upload button go please try it Go and add them we also we get a lot of um physical movies donated and we've been digitizing like educational films uh industrial films old advertisement uh those sorts of things that make the those available they're much more popular than I would have imagine the one in the background here is are you ready for marriage uh which is a social behavior film 50s or 60s I don't know all all all pretty weird but um people are are very interested in trying to understand the 20th century by using these Ming images that weren't necessarily from a uh a Hollywood kind of perspective we're also doing home movies on the line we have lots of audio recordings um lot of concerts so we have the the tradition that the Grateful Dead started allowing people to record and share their uh concert recordings um it's been copied by lots of other bands as long as nobody makes any money and so are a lot of bands have given their permission for um their fanss to store these concert recordings are very high res on the internet archive as long as no one needs any money and working so we've got 6,000 bands and 130,000 concerts and lots of other audio recordings were getting better at uh 78 LPS and uh uh and CDs are we just got a collect of 300,000 CDs and 40,000 78 RPM record it was starting to get the Technologies to bring these somewhat online you can't go into put all of these dou mom line because people get mad at you rates ISU properly sir so we're trying to figure out how to strike the right balance um oh providing some level of access and maybe directing people back to where to buy things uh we also been recording a lot of Television um so to the year 2000 Russian Chinese Japanese Iraqi alges here cnnc foxs 24 hours a day DVD quality um we um been doing a thing called the political ad archive political ad archive.org and it is a site that allows you to see all of the political ads in 20 um uh markets right in the United States and so we're learning where all of the ads are played and how often and then we're linking them with LFA and other um Sunshine organizations to try to say who's behind it where's the money coming from um and who's behind their organization that's fronting the money blah blah blah um so it's political ad archive is another service that we uh uh that we do we have 4 million ebooks um so these are digitized books that we did or digitized um and you can search and find and download and play with them if they're modern books from the 20th century um books then we lend them one reader at a time to the open library.org site so there's archive.org and has the wave ad machine and the television archive there's political ad archive and open library or three sites that you might want to uh uh to play with we have lots of lots of web pages that we have um these web pages collections that we've been archiving over the w y w since 1996 so you can go type in a URL on so it's not by keywords or we don't know how to do that yet you can type in a URL and see past versions of a website um that's a way of of uh being able to see the web as it was basically or or it is a 404 that you can go and uh document not far you can maybe find it in the Way Ma and so we're trying to make uh the the web reliable with this but also are to serve um researchers so they can go and M Collective knowledge of humaning time and and come up with new items so those are the basic uh collections of the internet archive we made have about 25 pedabytes of data archived this is some of the servers we have are um ring here in San Francisco where whites are blinking as people are using the uh colletions we also help on by going giving away internet access do uh a rooftop Network mostly in San Francisco and Richmond um that's kind of fun we're trying to figure out how to be sustainable um with f needs keeping your employees being able to keep working here and to bond the apartment building that who have de Free Housing and start the credit union helps uh our users our our employees and other people that's been actually real struggle and we actually shut that down a regulatory uh push back on um um not wanting new ideas around we that one uh and we're also looking to lock the web open and we're trying to um make a more archivable web more reliable web so that's the uh the internet archive in a nut show um it uh hopefully that you can come to other website and participate by uploading using that commenting or just playing around the ar.org open library.org as well astical ad archive.org I'll turn off my screen sharing now and uh we could do some Q&A absolutely wonderful BR the the I mean so exciting to see the possibilities that are part of this for kids you know in schools learning to to properly build on the ideas of others that's one of the challenges we Face schools everywhere is how do you get kids to properly typ their sources and understand what which ideas are theirs and which ideas are those that they're building off of and and to have this you know kind of universally is is a very exciting idea I'm I'm curious when when you began thinking about this because you you started this in the mid 90 you must have had some sense of this is what you know the Library of Alexandria version two is going to be what what the way has has kind of been different for you what has evolved in ways that were unexpected I've been amazed at how slowly it has all evolved um they were're now 20 almost 20 years old now and we Arch is so pretty weak in being building these things um also we haven't ended up digitizing all of the books that are out there and made them available Google did with a bunch of libraries but they did it in a way that caused people to get mad and it it got basically uh uh uh made it difficult to get access to it we've been digitizing away so what I'm 20 years ago 25 years ago and I started in this whole Internet space um we said hey you're going to turn to your computers to answer questions people just said okay and they did and but unfortunately a lot of us haven't gone and done hard work of making sure that everything the best works of humankind are within reach of our children there not um when I was growing up but you had to go your bones into a library but it was all there when on the internet what's there is really easy to get to what's not there is almost impossible to get to or people just don't do it so we've got to go and get the rest of everything that's important put it up on the net we work through the issues um because if it's not online it's as if it doesn't exist and if our kids are going to learn from whatever it is they have available to them and if we just give them junp um or superficial you we not the full in depth the baits of what it is the 20th century what we're going to get the generation we deserve so I I find it a great deal of urgency to go and make sure that we've got the best we have to offer were Within Reach of everyone um now that they're looking for things on screens they're using the materials all the time we get two to three million users every day we're about the 250th most popular website pretty great but we've got to get all the rest of what's should be shared up on L well one of the Intriguing pieces of the article that that led me to you um had to do with this very issue of what can people find online and you know as teachers we talk all the time about oh you know be careful what you but up it'll be there forever but there's this opposite problem of things disappearing off the web and there was a story for example about the um the the loss of the Malaysian Airline flight over the Ukraine was particularly powerful wondering if you could kind of tell that story as a way of helping us better understand the need to Archive so we archive the worldwide web and we have these robots that are going around and collecting things as they can um and but they may not get to every page in a timely way so there are about a thousand Librarians they go and set up crawls to go and um there's a particular unfolding event they go and set up set up and say okay you should get these websites at these frequencies and the like there's also a button on our site that says save this page right now this is worth saving so um we operate this and uh in these different ways there was a airliner that was shot down um over the Ukraine and a um there was a somebody Rebel some form or another um bragged about um shooting down this military plane well it turned out that plane wasn't the military plane it was full of bunch of civilians and bunch of people from power and um and so they took away this person went within our thing went and took down their post um that that was bragging that they uh participated in shooting this thing now and um but during that time that it was up there were several people that had gone to the internet archive archived that or web.org and hit the save page now button on that post and also one of the other crawls actually had gotten around to collecting that be during that period of time that it was up and that was used as some of the evidence um that that um this you know everybody denied it afterwards um and then but there is now this post that was up there that uh uh helped at least inform um the populace and we we learned about it through um through the popular press um going in in using um this reference um as fruit that this happen you must also get a lot of requests to take down information that you've archived yes yes um there's a lot on the web that's not designed for the ages and so um you can go and put in a robot exclusion on your website and a lot of people don't know how to do that so they write to us and if it's their website then we'll we'll take things down out of the way back machine and make it so that uh may not be you know embarrassing or may be a rights issue or something um but we only collect things that were publicly available and but if people want things retroactively taken taken down we we do take things things out cool um got more questions but I also want to bring in uh one of our members who was able to join us we're recording this on a on a Friday afternoon which which is often kind of a tough time to get people in however um one of our members uh Nate gildart who is in Tokyo this is not Friday afternoon him there Nate you know good to have happy to be here uh bruer thanks very much for your presentation I uh I'm a history teacher so I'm incredibly intrigued and and feel incredibly stupid like why haven't I known this you know already but um very very powerful stuff I I I'm looking forward to Monday actually and taking this back to my classes because I think my students are really going to be impressed with this as well um I I am curious um about language um so you're archiving you know you got millions of books and and other you know resources and that I'm kind of curious like do you work in several different languages are you working in English only at the moment or you know the World Knowledge obviously you know priates Lage culture and all that so our website is only in English um but the materials that are on the archive are in all sort of actually were very popular in the AR World um you girls that are uploaded or we digitize books we digitize in all languages uh and we use optical character recognition uh software to try to go and make that more searchable and more usable um so these materials are in lots of different languages and we try to make it uh uh available but there there are a lot of languages that are even supported that way uh that well by technology and we did a project in Bal kind of fun uh there are three million speakers of the Bal the use language um but the hit po Malay which is a uh uh kind of mixed language and so it may be that Banes will be going away and the with the Banes um uh wrote books um with um Palm leaves and so they would inscribe manuscript and Palm leaves and they said H there and so we talked in uh with the bones government and said and we just digitize everything ever written in Bolognese and put them online for free and they said yeah yes and uh so we Mar room photographed all of these and it's a colle with all one part um and now we're looking to take that and transcribe it um into a computer readable form and then try to get the back up into the diligent to reinforce smaller languages and hoping we can use technology not to wipe out smaller languages but to actually reinforce um the economic viability and interest in um in smaller languages yeah threaten languages I would think that even online lessons kind of thing archived would be uh that way keep them from becoming extinct well as extinct and just reporing and putting in a bottle which is one thing but I really we want is people use the languages and we want uh the translations I think that Google have used their book collection a lot by digitizing books um they add things in different translations and they use that to go and teach their their uh their translation systems which I think is fantastic so that we can make it that it's not as economically disadvantageous to um speak a different language than the people you're hanging out with so I'm I'm I'm optimistic that we can go and get the larger and then can we make open source versions so that there are probably going to be languages that are not worly viable to go try to support this software uh Investments so can we get it so that people that are very motivated maybe very local to can make the last other 5,000 languages um be supported by our Technologies now when I visited I remember U meeting people from a lot of different countries you know who people who are part of your team at the archive in San Francisco are do you bring in a lot of people who have both Tech skills and kind of culture and language you know backgrounds you know kind of in variety so that you can tap into resources that you otherwise wouldn't enabled the internet archive staff is is fairly technical staff and also we have 33 scanning centers in eight countries because we digitize about a thousand books every day and those scan centers are inside libraries all around the world pretty fun um so that's that's happening um but we're largely a conduit or technological or or uh organization we don't know that know bones WS um or tun for or um you know chemistry or whatever um so we're working with others to help Empower that to the extent that we can think of and as libraries as engines for restor so can we provide resources that they can't EAS get other ways to be able to do their work and is that digitization process something that is also uh a piece of how you create the revenue streams that support the internet yes the internet archive um it's about 12 or5 million a year and about a third of that comes from libraries um digitizing books um with our staff we're also we've created a a desktop scanner called the The Scribe that people can go and and buy and then scan books and they get uploaded and they're permanently stored and maintain on the archive.org site um they also get a vocal copy of course um so there's this um that's about a third and then another third of our income comes from libraries paying us to collect worldwide web pages just pretty excy and then if found uh and Foundation grants um this we we have such a high impact ratio for um we're we're a bargain we we basically know we don't spend very much yet we're able to go and um inform and provide U materials to million we we well it's between two and three million users every day and that's that's an enormous amount of reach for uh an organization um and that all empowered because of the uh the internet well in order to make that work I mean obviously you have to have you have to have strong Technologies backing up your efforts I mean are are there as you're looking forward are there particular Technologies on the near Horizon that you would say are especially promising for for the work you do or perhaps even potentially harmful for the work you the the internet archive exists technically because of open- source software and because of ever decreasing cost of of computer storage so we currently buy 8 terabyte hard drives by the stack just this last week we have 2.3 pedabytes of new storage that just came online it's just amazing to me that's that's you know it's Mega Giga Terra Peta so it's uh that that many pites is is now us 2.5 just came online this last week so that's all good news but philosophically what how the internet archive lives is based on the concept of universal education and openness and idea of access to knowledge is a good idea so full good that it should be supported um and that's not historically been a longterm thing that doesn't exist everywhere in any particular time and it generally doesn't last for very long before something clamps down um one thing that technologically could look bad for us going forward is we don't have an open system like the worldwide web if it really starts to become more closed like the app that are on iPhones all the only apps that are on iPhones are the ones that are allowed to be there by Apple now they may be good or you know but they can be kind of selective and but having one organization say the set of information that you can get access to that's a little Scar and if you want to go and change the operating system on your phone that it's called jail so there we've got to um we're going to need openness all the way down um and the internet the worldwide web and computers we now see more people coming in with mobile devices which are anything but open so I think we've got um we have to keep pushing towards making the open structures uh reinforced and worthwhile otherwise we will lose it we will go back to how was when I was growing up with you know few organizations control the distribution of whether it's textbooks or the new was or newspapers or and everybody else was just you know left to having dinner parties you know discussing what what it is they were told through those other channels the worldwide web and this openness made it anybody could be a publisher I'd really like to not lose that cool Nate um I want to make sure that if you have another question you have the chance I've got kind of final one myself I I was just curious with one thing you mentioned um uh you know kids are people searching for things on the internet and you know sifting through what's what's junk and what's you know knowledge or what's valuable how do you vet what goes up or not or do you just take everything and then uh and you know scan or record or upload or or whatever so we pretty much try to take everything we can um technologically and then we try to get some level of context around things um I my kids are they have a horrifying method of of sifting through WP that's dropped on the it it comes at them at a level of speed that is just haunting and so they're they're pretty winnowing away um uh the weird stuff but we've got to give people the the background information of where this stuff comes from uh the history of these organizations and a lot of that's pretty obscure you know you could judge things by well at a.gov or.org and you know that's that's not good enough we need much more information easily um brought into this context so that people can try to answer the question am I being fed something here or is this true so we we're a collect all um organization try to and then provide information around I keep you from becoming determined what everybody else can get to right right so there are two groups of people I'd like you to talk about um just to finish to finish off our our Q&A uh one is is your team and how they talk to each other I was I was wildly impressed when I visited and I would encourage everybody to go when they're in San Francisco on on a Friday a midday and go get list groups about how they report in uh so like well and then I'd like to talk about the group of people who might be out there who might say how can I help you know what what can I do to help you know make make things happen for the archive tell tell about those two groups so the internet archive we we're trying and experiment of how open can you be and we give everything away for free um and the closest we have to a staff meeting happens or Friday lunch we have an open lunch and guests come and R came and I don't know there was just today there maybe a dozen people came from all sorts of places and um and they join and we go around the room we everybody stands up and introduces themselves and says a little bit of what they're what they're into and what they're doing their staff and they say a little bit of what they did and to serve their reports so everybody do this whether it's the person that signs the checks or the person that replac the hard drives it's not just the the uh the manager um doing this where it's is still a small enough group to be able to do this and it's worked out really well because it's well I think it's better than a PowerPoint presentation about wom an organization doesn't get an idea of who the people are that are really make me all about people um and in terms of how people can can participate how you might want to particip please first just go to archive.org just I mean stop looking I'm pretty ugly go go to archive.org and just start clicking your app or library.org or political ad archive.org and just play around a little bit and if you get H then maybe start uploading something just something and you know it'll be a little harder than it should be um um but but you know try it um there might be something that is really important that you think should be archive um and just scan it and put it up maybe if your for me it was my grandfather's books that he wrote and photographs are something that's important to you and start to participate in communities that leave a digital Trail um and if you leave it internet archive we'll do what we can to try to keep it up to date in terms of formats and uh available to people over the long term so I I hope they know just learn to participate in some way or another or go to the Donate button which is probably pretty hard to find and donate some money um donate some time money materials uh drop by when we're you're in San Francisco um write something that deserves to be in the library nice awesome well I'll I'll give you the chance to to finish off the uh the broadcast in just a few moments but uh just to finish off for all of our members and guests uh you know this is the Rotary e- Club of Silicon Valley rotary is R has dedicated any number of things that are designed to improve the lives of others we have What's called the four-way test and it comes to mind uh you know listening to Bruce your talk about the mission of the archive and in the four-way test comes out to these four things of of all we all we think they are do and that is um is it the truth is it fair to all concerned will it build Goodwill and better friendships and will it be beneficial to aranson and so that's you know that as an as a frame of reference for for how we think about our world is is one that I would say is not dissimilar to the way the archive is built to make sure that that people have access to information in in all sorts of productive ways for sure uh if you have been productively taking part in this meeting today uh you'll find that the links that that Bru are just past the uh the the embedded video that is this talk that is the the program for uh this week's meeting and so you'll be able to get to all of those there if rooster thinks of any more along the way he'll send them to us and we'll get those in as well uh but we also want you to do a couple of things first of all no matter who you are let us know you are here there's the attendance survey and that lets us know that you are part of it that gives us a chance to kind of uh send you a little thank you we won't spam you or anything like that but but we'd love to be able to at least send send a thanks for uh for joining in and if you are a visiting rotarian then having sold that out properly with email address means that you'll get an automated email that you can pass along to your secretary for a makeup or a Mis meeting of your then finally there is our discuss section disq us and you can create your own discuss account or you can you know kind of log in via Facebook or one of the other one of the other tools that it allows you to do and leave comment let us know what you think of of this program of of the ideas that that bruer put forward that we've been sharing the other pieces of the meeting we believe that we can always get better and we will get better by getting your feedback about what you see and what you think about so we welcome your thoughts and and we thank you again for joining us so bruer if you got kind of a last thought to share for with all of our members and guests around the world and everybody that you're with over here pleas participate in some way or another going and making what it is you sheriff or what you think should be shared put it online go take the risk the uh the the dive and make it as broadly available as you can in such a way that uh other people can learn you're having been here the rotary has got a long tradition of helping and being a public service um and being a digital public service is every bit as fun um and interesting as in the physical world well thank you very much R for the opportunity our our our honor to have you speak to us for sure everybody thank you again for joining us we hope to see you