welcome to my YouTube channel I hope you will consider subscribing to the channel and I hope you enjoy this video so in the previous videos we've talked about getting data from the NHL primarily from the NHL API in this video we want to get data from the pwl so we need to scrape the data from the pwl website uh this will be the final video about getting data then in the coming videos we'll talk about cleaning data transforming data and we'll do all that in my SQL but in this video we are going to get data we're going to scrape data from the pwl now uh we are working with the pwl here but you could easily do the same thing for the AHL is the same status provider so it's the exact same process um but I'll just open the pwl website and let's take a look around here uh so the first thing we want to get is we want to get the schedule uh for all teams and the easiest way to get that is to select a specific team we could just pick Boston and and click on schedule so here it has the schedule of all bosson games but we are interested in all teams so we'll just select all teams and this is the schedule uh for the regular season in 2024 now one thing that's worth noting is that it says one here so this is the season ID uh this is important to take take note of because it's not very obvious but in this case this season ID equals 1 means it's the regular season in this 2024 season so if we select preseason you can see it changes to two and if we s like the playoffs it changes to three uh so today we are only interested in one and three we only interested in the regular season and the playoffs um okay so this is it so what we're interested in is getting this data so let's just select regular season again so we're interested in getting this table right here with all of this data however you can't just copy this URL and put it into Excel that won't work but what we can do is we can right click and go to inspect now this is this is in Danish but it should say inspect on your computer and and then we need to go to network and look at JavaScript uh and there's nothing here because we need to reload the page but once we do that we'll get a few links here that hold all the information that we're interested in so when you're looking at the pwl or the AHL websites what you're interested in interested in is these index um Pages here so uh so there's a lot of information here some of these you can't use those with a bootstrap doesn't work uh but what we are interested in here is this one um so this URL right here holds all the information that we're actually interested in now there is a call back angular callback thingy in the end uh we don't want that so we just copy the URL from here and we can just try and put it in uh in and see what happens so you can see here that this looks a lot like a Json uh Json file a Json format um and there's a lot of information here that's very difficult to extract uh just by looking at it uh but generally speaking this is adjacent there's just a parentheses in the beginning and there's a parentheses in the end that won't work because that means that Excel or power query can't load the data as Json file will we'll get to that part uh but generally speaking all the information is right here we can just uh change this season right now it says one if we change that to three then we'll get the playoff data so those are the URLs that we're going to need um okay so let's just go to excel now um and I want to build a table with the URLs that we need to get the get the scheduling uh we could just get the schedule and then append with another URL but it's easier to put it all in a table and then do a function and then invoke that function on a column just like we did in the previous videos it's the exact same process uh so let's just create a table here so we want a table that holds the season ID and we know right now that what we're interested in is one and three one for regular season three for playoffs let's just put in this season as well let's say it's 2023 2024 just to be consistent with the formatting style even though all games were being played in 202 for it's the season started uh in January and finally let's put in season States as well because this information you can't get from the data so you just have to know based on which season ID we are calling um so let's just say regular and playoffs so this means uh when the next season start you just add uh the new season ID to this table that we're building right here so it will likely be number four and then 24 25 season and regular once we adding new information but for now we only have these two Rosy so it's going to be a very little table a very small table but uh if we were to do this with the AHL you would have a lot more season IDs and a lot more seasons generally speaking so the last thing so we want to do is we want to just add a link with uh the URLs so let's just copy this link right here actually let's make it into a table first so let's just make it into a table and then in here we're going to say equals to the URL but then instead of having the season equals 3 we'll just put in the season ID column so we're going to just say season ID and just like that so this means that now when you're going to add a new season you just uh fill out these three columns here and this one will automatically update okay so that's it this is what we need in order to get our schedule for the 23 24 season in the pwhl uh so just like we did last time we'll just add this table to the data model uh so that we can work with it in power Quarry and actually we'll just rename this to schedule so now we need to create a function that we can invoke on this column right here so we'll just select one of these doesn't matter which one and we'll try to load this uh in order to create a temporary table that we can use to create our function so we'll just go to web this is exactly what we did with the NHL API but when we hit enter here it won't work uh this is because uh Power quar is trying to load this data as a Json input but because you have those parentheses in the beginning and in the end it won't work so we have to load it not as a Json file but as a text file I'll just hit okay so now it's just loaded as a text file and now we can remove the parenthesis and then we can pass it as a Jon so we'll just add a new column and we'll use the text middle function so it takes the middle part of the text uh you I don't think you can use the in text between the limites and then just take the parentheses because there might be parentheses Within in quotes as well in the text so we have to do it this way so this way we're taking the middle part of the text so we're going to select the text column one and then we're going to say the start value has to be one because in power query you start counting from zero so number one is actually the second character so we're going to ignore the first one and start at the second one and then the count of uh of characters we are taking is going to be the count of the full text minus two because we don't want the first parenthesis and we don't want the second parenthesis so for that we're going to use the length text length function and just take the text so this just counts how many characters are there in this text in total and then we're going to say minus two so now it should work so now we've created a new column where the opening and the ending parentheses aren't there so now we can do this so we can say right click drill down and now we have the entire text without parentheses in the beginning and in the end so this means that now we can pass it as a Json file and once we've done that we can make it into a table and now we can start extracting data from this Json file uh but we don't want to do that just yet uh we just wanted to create a function that does all of this so it loads it as a text file then it removes the parentheses and then it passes it then it loads it as a j file and converts it into a table so this is what we want our function to do so now we just need to create the function and we did this in the previous videos as well so we need to create a new parameter we can call this one link schedule it's a text I we'll just put in the URL and then we can go to the source code here and change this from a fixed URL and just put in the link schedule and then now we can create a function based on this table and the parameter so we just say create function function schedule and now we created the function that we are going to need so this table right here is just a temporary table that we created in order to create our function so we'll just delete this one and now we have a function that we can invoke on this column right here so we'll just say invoke custom function and and it works so now we can just expand this and obviously the data is not quite as clean here as it was when we called the NHL API uh but it does work and all the data is in here so we'll just expand it and we'll just expand sections we'll expand to new rows and there are three column here I already know because I worked with this before that all the relevant information is in the data column so let's just expand the data column and then we can expand to new rows and then again I know all the relevant information is in the row column so let's just pick that one and now we can expand it and it has all of the uh schedule data that we're interested in uh so let's just start off by expanding all of them then we can look at it so what we have here is we have the game ID we have the date now the problem is that the date doesn't have a year um so we will probably have to add the year based on our season column here but in this case it works because if there's no year here then it just assumes that it's the current year so um so for now we'll just leave it as it is because it works in this particular case but when we're going to the next season and we're going to add new data then at some point it will be a problem now you have how many goals did the home goals home team score how many goals did the way team score and you have uh this if necessary I don't know what this means but it just says zero all the way so we'll just remove that column and also in the end you have just these columns just says game report game sheet game sumary not really relevant broadcasters there's nothing there so let's just remove the columns that aren't relevant so let's go and remove those if necessary yep so we'll just remove those and now we can uh let's just change the names a little bit so we'll call this one game ID Corless one game date I think in the uh in the NHL data we call this home score away score game States that's fine let's call this one home team away team and I think that's fine so let's just select everything and go to detect data type and now you can see the date column here is correct it just filled in 2024 and it works um so let's just leave it as it is it will be a problem if you have more than one season we of dates and the date isn't this current year then there will be problems anyway other than that it works fine um so this is our schedule data we could just remove the link column not really could also remove the season ID I don't think that's particularly interesting so let's just remove those season ID and Link and this is our schedule table so we found this one so that part of the video is now done uh so let's go back to the pwl page um and the next thing we want to get is we want to scrape a play byplay data and we also want to scrape uh Player information and time on Ice data so if we go here to the stats part of the pwhl you will see there's information about plus minus and points and so on but there's no uh time on is information here so we can't just get the data from the stats page we're going to get the data from the game reports instead so we're going to get play byplay data for each game and then we're going to get time onized data for each game as well uh I hope it will make sense in a little bit so so now let's just select a random game it could be this one and go to the game center and then there's a lot of information here now what we are interested in is the is this table right here because we're interested in the timer nice and then we also interested in this right here we interested in play byplay data with shot location uh so we're not really particularly interested in goals and assists and shots from this table because we can get that data from the play-by-play data anyway but we're going to need this time on ice here now the problem with the pwl and the AHL is that they do have time on ni but they don't have it based on string state so we don't know how much time each needs player play at even strength and how much time needs player play at Power Play and shorthanded and so on uh but we can at least get the time oniz per game uh from this data so we'll just do the exactly like we did before so we'll right click and say inspect go to the network tap and we'll just reload the page and then we have have all of these index uh index links here so again I know that they are in the bottom so this one holds information about the play by play this one is yeah we can just show it actually let's do that so this one is just a list of all the teams that played in that season we're not going to take that information now but it might be interesting um might be worth doing in a in another situation so this just gives us information about each team um but we'll just ignore that one for now what we also interested in is the game summary here so let's start off by taking this one again we don't want all of this call back so we'll just ignore that take this part of the URL and again we can open it and it looks like ajacent file uh difficult to extract any useful information from this but it does have information about each player it has uh name and position and birthday we interested in that information and it also has the timer nice somewhere in there uh so let's just copy this one uh and then let's go back here so we want to get the game summary for all of the games in our schedule table so we'll actually just duplicate this one and we'll call this one players and then we'll just remove some of the columns here because we're not really interested in uh and the home score and Away score so we'll just let's just remove these and then we can use this to get the game summaries [Music] um let's make this into text let's just go back here so in this URL right here the only input that we're going to use is the game ID game ID equals 2 um game ID equals 2 so we just going to need this input from this column right here other that uh and we can use that to create URLs that we can then invoke a function on in order to get the game summaries so we'll just start off by creating a column here let's just put in the URL and then instead of having game ID equals 2 we'll say game ID equals the game ID column and that's it so now we have a URL for each game now we just need to create a function that we can invoke on this column right here so let's let's just do exactly what we did before copy this one load it from web and again it's going to say that there's an error because there are parentheses in the beginning and in the end so it can't load it as a Json input so again we'll just load it as a text file and then we'll remove the parentheses using the text middle function and the length function minus 2 then we'll say drill [Music] down and now we have the text in Json format and we can pass it as a Json file and now we have the information that we that we can use okay so there's a there's quite a lot of information here you have uh details we're not going to use this for now um you might want to do something with the attendance or something but what we are interested in today is just the home team and the visiting team uh so these two records right here have information about uh all the players and coaches and goendale and then just let's only keep this row and this row so we'll just go keep rows range of rows and say the first one is seven and we only want to keep two rows so now we have um game summary for the home team game summary for the array team and this is what we want our function to do so again the function loads the link as a text file removes the parenthesis passes it as a Json file and converts it into a table and removes all the rows that we're not interested in so that's what our function should do so now we'll just make it into a function so we need to create a new parameter let's call it link players put in the URL change the source to the parameter link players that's it now we can create a function let's call it function players and now we can remove this delete this table because it was just one that we created in order to create our function so let's just say delete and this means that we have a function that we can invoke on this particular column so let's do that and then we can open uh extract expand uh the tables and you can see here home home team visiting team and we can expand this and there's a lot of information here actually uh but let's just try and expand all of it to begin with and then we'll look at what we are interested in um so the first one here has information about the team the second one has Team statistics so this is how many shots that the home team have uh how many shots did the way team have and so on and so on and so on um in the video uh there's a media column when not really interested in that today there's a coaching column there's a skater column and there's a goalies column so what we are interested in is the skaters and the goal tenders um but we can't do it all at once because if we expand this it has to expand it to New rows and and then we have to expand this two new rows as well that gets really messy so we can't do that so we have to do it in two steps um so let's let's actually just copy this one uh so we'll use this one to get data from the skates and then we'll use this one to get data from the goers and then we'll append the two uh tables in the end so we'll call this one goalies okay so for now we are not interested in any of the team stats or Media or coaches um so in this table we'll just pick up the skaters column just like that and then we can expand our skaters um and what we're interested in is the player info and we're interested in the stats um we're not really it's not really interesting to know which players started on the ice uh and status is just whether or not the player is a captain or an assistant Captain uh we don't want that either so now we have these two columns and we can get let's just load let's just expand all of it so this gives us the player ID it gives us the first name and last name jersey number position birthday and it gives us a image URL so you could just try and open this one and it should be an image of the player uh and then in the stats column we can get all of the this is a game by game stats because we're looking at game summary uh so you can get how many goals assists points and so on and so on so on uh but actually we're not really interested in that because we can get that information from the play byplay as well so in this case we're just interested in the time oniz because we can't get that anywhere else so let's just keep this one so now we have the Tim oniz for each player in each game they played and we have player Information Age and position and so on um now in this data all players that were on the game report is in here even if they didn't play any minutes um so let's just try and open this null one first so these are players where there's some shootout details uh but there's no information here so let's just ignore that one but it also has players that were in the uh on the game sheet so to speak but didn't play a single shift so you have some players here we're not really I don't I I don't think we want to include those in our data as well so so let's just select everything except for these two so now we are only including players that actually played in the game okay so now we'll just do the exact same for the go tenders uh so we're just interested in the goalies and we're going to expand it and we don't want status but it's actually interesting to know if the goal trender was the starting goalie or not so we'll keep the starting part here and then we'll just expand the info this is exactly the same info as it was in the players table um and again in stats you have information about goals again shots again so on and so on and in this case we're just going to get the time on ice and I think we'll rename this one to starting goenda and you can see when we are appending the two tables in a second this one has to be the same name as the time on I here so we need to change the name to time on ice and there's a lot of null values here so in this data we are including not only the go that I was playing but also the backup even if the backup doesn't get on the ice so let's just remove all of those so now we're only including the golden that actually played in the game okay so now we can just append the go tenders into this players table so we'll just say append append goalies and you can see a new column was added that's correct and we can go to position and then we can say load more and we can see now that gold tenders are included now for some reason there is one player that has no position maybe she played both as a forward and as a Defender I don't know but we'll just leave leave it like that for now I don't know which position is the correct one so let's just uh yeah let's just to remove this one so we have all the information that we need here so we can just detect data type let's make this one a number I don't think it's important but and let's just remove the the link as well we don't want that yes so now we have a table with the schedule we have a table with Player information and time oniz um so the last thing we want to get is the play byplay uh and the starting point will again be this schedule table so let's duplicate that let's call the table play by play and let's look at this so some of this information we're not really interested in so let's just remove some of the columns home score away score game Status venue name let's keep the home team and a away team that might be interesting actually uh so this is it and we'll make this into a text and then we need to go find the url that we're going to use so that we can create a new column with the links then create a function and then invoke that function on that column just like we did before so let's just go back here and then then we know this one was the correct one this has says Game Center play by play and it has game ID so we just need to change the game ID from two to a parameter just like we did before so let's just copy this URL go back to Power query and then we'll create a new column just put in the URL and we'll change this one to the game ID column and that's it okay so now we need to create a function that's the exact same approach as before we'll just load this and again it's going to give us an error because there are those starting and ending parentheses so we need to load it as a text file then we need to remove the parentheses just like that then we can drill down we can pass it we can convert it into a table and now we can expand it uh get all the play byplay data we don't want to do that just yet we just want to create a function that does all of this so we'll just add a new parameter link playby play put in the URL we'll change the source and now we can create a function function playby play so the function is now created so we can just delete this table don't need it anymore and then then we can invoke our function on this column that's it and now we can expand the data expand expand so this gives us the play byplay data so it has all of the events and then there's a lot of detail on each event we have to click load more otherwise you won't get all the data and you can see there's quite a lot of information here so we'll just load it all and it gives us the play byplay data now this is not very clean data we have to do a lot of data cleaning and we'll do some of it uh today right here in our query uh but we can't do all of it some of it uh you would have to do in another tool or I would recommend doing in another tool but we won't be doing that in this video and the next video we'll be working in my SQL just to do a little bit of data cleaning and data transformation uh but we want to do just a little bit in in this video as well so every time you have a player column so this is the goalie coming in then you need to expand it and what we are interested in um is actually just the player ID so the problem when we do this is that it changes the name of the column to ID and [Music] um yeah I don't think there's anything we can do about that so this means that when we the period we also want the period ID which is the period number so now you can see you have two ID columns but we'll we'll change the names afterwards then we'll do the same for the home player home player is actually the home player taking a face off and the visiting player is the visiting player taking the face off then you have the X location and the Y location um and these are a little bit weird because the x-axis goes from 0 to 600 and the y- axis goes from 0 to 300 uh this doesn't really make any sense because the ring size is 200 ft * 85 ft uh so you do need to make a little bit of a conversion so you would divide this one by three to get the data in feed but the weirdest part is that uh this one goes to 300 this one goes to 600 even though on a pwhl rank the length is not twice that of a of the width so so the conversion rate is not the same when transforming these into feeds anyway uh that's just something to be aware of but we won't be doing any of that today but uh it's just to say that the X and the Y um coordinates are quite weird in the pwhl data and in the AHL data but we just need to convert it into feed and in the NHL data you have uh the Middle Point the face of dot in the middle as the 0.0 location and then the X goes from minus 100 to plus 100 and the Y goes from - 42.5 to plus 42.5 so I would probably transform this into that format but we won't be doing that today even though it's relatively easy to do home win what is that well it's actually just whether or not the home player won the face off or not uh and nothing else then you have a shooter column we'll just expand the player ID here you have the goals in the column let's expand the player ID you have the shooter team ID you have a column is it goal you have a shot quality column um from the looks of it this is just based on shot location so quality is a shot from the slot area non-quality is a slot from the outside area short type uh in the NHL you have uh shot type for all shots in the pwl you only have shot Types on goals otherwise it just says default then you have a goal ID you have this is the team ID of the goal scoring team you have goals scor goal number which is actually just the I think this is the count of number of goals the player has scored in the season up until this point so I don't think we'll include this one in the in the data today then you have the scorer let's just pick the ID you have an assist column now the problem is that [Music] um that some go have two assists so this will expand to new rows and we're not interested in doing that uh because it will mess up the data so we need to do something about the assist but for now we'll just keep it as it is you have the assist numbers again that's just how many goals have how many assists this this player have in the season up until this point um then we have a properties let's expand that so this gives us information about uh what kind of a goal it was was it a power play goal empty net goal short-handed goal so on then you have plus minus players you only have that on goals but this column will also expand to new rows just like the assist one so you can see for each for each goal you have five players or up to five players so we need to expand this not to new rows but we want to expand it to new column but that's a little bit complicated and it's the same thing with the minus players but we'll get back to that then we have the penalty taking team we have the player taking the penalty we have the player serving the penalty we don't have the player drawing the penalty we do have that information in the NHL data but we don't have that here we have whether or not the penalty resulted in a power play or not was it a bench Miner and finally we have a column here called shooter team um but this is actually just the team of a penalty shot shooter so there's only a team ID here when it's a shootout or maybe even on a penalty shot within the game I don't know but in a shootout for sure you have the team ID of the shooter okay so now we have a lot of ID so we need to rename all of those columns so let's just do that so this one was goalie coming in and actually there's no information in the goalie going out it's just a completely empty um empty column in the AHL data you do have that information as well so we'll just leave it in maybe they'll add it later on we have this one was period this was home player let's just say phase off just it's only on the phas off we're talking about the home player this was away player face off location location this one was so what was this one this one was shooter this was the goalie um this was the gold scoring team let's just call it scorer team this was the scorer and then we have all of these uh goal types let's just say power play goal short-handed goal empty net goal let's say penalty shot goal insurance I don't know what to abbreviate that too so let's just keep it game winning goal then we have the plus minus players we'll leave them as they are right now here we have the penalty taking team let's just call it pen team uh you would definitely do some more cleaning on this afterwards anyway so for now it's just we're just giving them names to ensure that you understand what it is and then you can clean it afterwards so this was the let's say taken by served by and finally we had the I don't know let's call it team on shootouts okay so now we' changed the names and we've uh expanded everything so the next part I want to look into is how can we get this information not by expanding to new rows but instead we want to expand it to New columns so that we have one column for each player that was on the ice uh for the team scoring the goal this is a little bit complicated uh and I'm not 100% sure uh why this works but I found the formula on the internet and it does work and I understand how it works so I assume that's the most important part so we'll just add a new step right after here and then I'll just put put it in here so it's a table transform let's say transform instead table transform columns and we're going to use this one as the table so it just takes the table from the last step over here that's what this means now we need to do a transformation on the let's start with the plus players so it's doing the transformation on that column and what we want to do is we want to use the each list list transform and we're going to do that on the underscore um this is just how this syntax Works in in power c um underscore and then for each each list I will explain the function after explain the formula afterwards the best I can do will take the record field underscore and the record that we interested in is the ID record and this is it so uh let's just run it and then we'll explain it afterwards so let's see what happened here okay so here you have the plus place so it says error in all the empty uh in all the empty rows but down here you see a list so now we have a list of player IDs compared to in the minus column we have a list of records and then each record holds information about player ID's name and so on and so on and so on so basically what our function does our step right here what that does is instead of having a list of Records we now have a list of player IDs uh so this is important because now we can instead of expanding to new rows we can extract the values because now we have a list of values but let's just start off by doing the same for the minus players and the assist players so let's just copy this one and let's create a new Step put it in and then instead of having this we'll just say custom one meaning it takes the table from the custom One Step then instead of doing this on the plus players we'll do it on the minus Players let's just copy it and say enter so now we did the same on the minus place and finally we need to do the same on the assists so let's just copy it in and say now the previous step was custom two and the column is now assist let's hit enter and it should work here so there was only one assist on this particular goal but sometimes there'll be two and okay so you have a lot of Errors let's just ignore those for now we'll remove them afterwards but now we can actually extract the values so we can say extract and then we'll just put in a space between uh each value so now it extracted the player ID here and then we'll do the same the plus players and the minus players so let's just say extract values put in space as the delimit so now you can see these were the five players on the ice uh for this goal and they are just uh uh that's just a the limiter of a space in between each value so let's just do the same for the minus players and now we have something that we can use so uh so there's a function on on here that just says split column so now we can split the column based on a delimiter so let's just use space as the delimiter and don't quote anything so now it takes each of the minus players and puts them into a new column what happened here why do we only have two players two columns so clearly there's five players there right let's just try it again I don't know what happened that's super weird let's just try it over here see if white here hm I don't know why there's I don't think there's usually a number here uh maybe if you leave it blank it works no okay doesn't matter let's just put in five we know that there's five players so I don't think it usually does this but apparently we need to say five and now we have five columns then we'll do the same with this one move the quotes put into five columns yes it works then we'll do the same the assists split it into two columns and it works so now we have the primary assist here the secondary assist here and we also have a lot of errors but let's just remove all of those because those were just uh empty entries anyway a it's because it didn't remove the errors before I expanded them to new columns that's why there was a problem um so ideally we would just remove the errors before we extracted the values then there would be no problem and so let's just do this but it works in this uh what should I say in this order as well but let's just say replace errors and then replace them with null values so now we have the information in separate columns rather than on separate rows and it is a little bit complicated but at least it wies [Music] um not really sure I want to do any more right or we need to remove the errors here as well um replace errors with null and that's it so there's a lot of other things you need to do in order to get this data more clean and more ready for usage uh but we don't want to do much more right now now one thing I just want to mention is that whenever there's a goal you do have duplicate values in the data set um so let's just go here okay so here we have a goal but the shot above the goal is actually also a goal so let's just take a look at this so you can see the time is the exact same and you can see the location is the exact same um and you can see there's a shooter but there's no shooter when the entry is goal and but there is a shooter when the event is a shot that is also a goal so you can see it's a goal right here so you have to take care of that somehow as well so you need to combine these two rows somehow um so that you have all the information in just one row that makes a lot more sense uh but anyway we won't be doing more right now we'll just uh leave it as it is let's just detect data type and let's just remove the columns that we're not really interested in so let's remove the link we could remove this one as well there's nothing in there uh let's remove this one and assist numbers and I think we'll just keep it as it is right now and now we can load all of the data so let's just to start with let's just uh create the connection and then we can load the tables that we actually interested in um okay so we have these four tables here and we're interested in the schedule we interested in the players table the goali is we're not interested in because that information is already appended into the players table so let's just try and load these one at a time let's put it in a table put it in a new worksheet and add it to the data model as well fine it works we have the schedule here um let's do the same put it into a table and this takes a little while because now we're not calling an API we're actually scraping data from the different websites so it's not that slow slow but it does take a little while so now we have all the player information including time on I time on ice for each game and finally we'll get the play byplay data again this does need quite a lot of data cleaning um but we won't be doing that today this was more about getting data from the pwl and not so much about cleaning the data but as I said you can do this for the pwl pwhl but it also works uh from the AHL website it's the exact same stats provided so it's set up the exact same way and now we have the play byplay data set up the way we want um again it's not very clean and we forgot to change the time so now it says something weird uh we might want to keep this as a text but so let's go here see the time was changed into a time let's keep it as a text now it's easier to read anyway that's uh that's what we wanted to do in this video um and this was the last video about getting data um it's just to illustrate that it's actually quite easy to get data from the pwl website and the AHL website it's just a little bit complicated to clean and work with the data but getting the data itself is quite easy to do