I've won welcome John here today's video,we're gonna be looking at getting out,more product information but this time,we're gonna be specifically looking at,Shopify stores now there's a nice little,trick to this and it's really simple,it's actually a lot easier than having,to go through each and every individual,product page and scraping the data that,way I believe it's specific to Shopify,stores and I'll show you that now so the,the website we're gonna be scraping is,this one it's Footwear boots it's pretty,cool we could as you see the products,pop up like this we could go through all,we could try and scrape through each and,individual thing like this and try and,get the data out through the HTML but,and it's basically just doing products,dot JSON after the URL for the Shopify,store if we let that load up you'll see,that we get back if I click on raw data,and proceed print Jason information for,all the products on the store,now this is really really useful so we,can just get everything out this way and,if I go back to this view here you'll,actually see that there are it's only,coming up with 29 products back the,actual limit is 250 and yet limit is,equal to 250 like this this was actually,bringing us back up to 250 results and,as you can see here this website,actually has 84 if the website the short,5 site you're looking at limits out and,has more than 250 and percent for and,and pages equal to 1 and just run it,through on the pagination that way the,first page second page etc etc but we,don't need to do that for this one,fortunately cuz there are only 84,results what we want to do is we want to,use Python requests and the JSON library,and manipulate that data and get it out,this is the information that we're,probably going to get you've got the ID,the title which is the name of the,product the handle which is like the bit,that goes after at the end of the URL so,if you want to construct a URL for every,one of these products you could do that,the data was created the type the vendor,which is important in some stores,possibly not so much in this one and the,variants which in this case is the sizes,now this could be anything you know when,you go on a shrub of a store and you,have a drop,down menu on the product it's got the,price and whether that is available or,not now what this means is if I find one,that says false now this will mean that,this specific variant is out of stock,what it doesn't do though is if there,are products that are hidden on from the,back end from the admin panel on Shopify,they won't appear on this JSON list but,it will have if they're available but,they're out of stock and this is what,this false means so they could be hidden,products that they haven't set to sell,yet but what you could actually do is,you could write a script to check for,new products every day and then you,notify yourself from this it was quite,cool and the SKU information which is,probably not totally relevant to you but,I've put it on there anyway and variant,sale know what this is so if we find one,with data in the way that Shopify works,is that to put an item on sale you have,the old price and the new price so this,was the old price and this is the new,price so you could go through all of,these and you could then put in a,formula to show out how much discount,you're getting like this one was 295 n,is now 150 although that's out of stock,with these ones on and then of course,the image URL I've picked just the first,image just to get an idea of what the,product looked like looks like if you're,doing data analysis you maybe don't want,the image but I put it in there anyway,all right so let's get going the first,thing we want to do is we want to import,requests we're going to need that and,then import Jason we're going to need,that as well we want to set our URL to,the JSON data file there like this and,then do our is equal to request get URL,and now what we want to do is we want to,get our variable which I'm going to call,data and then I'm going to do our for,what we've got back and then Jason like,that and what that's going to do is it's,going to load this JSON object into our,data variable so if I print that out we,should get all the information that,we've just seen on on the web page back,into our Python scripts there we go you,can see they're turning up,that's all we need to do now is just go,through this data and get out the bits,that we want to do that we need to go,back to our raw data so we can see how,the JSON is structured we can see right,away that we have products as a key and,then this bracket which means basically,in Python is a start of a list so,whenever we go to try and get some,information out we need to remember that,if we look at the first one and we want,to get let's say the title of the first,product we would do let's print that out,we need two data and then products that,we just saw and because we want the,first item on in the list it's the zero,index and now we can call title I think,it was called,let's get that and we can see that we've,got the first one here we're going to,want to get all of the information so,we're going to need to loop through it,so if we get rid of these bits and let's,do for item in and that's print out the,item and let's just do the title again,so this is going to give us all the,titles and we can see we've got the more,here this will be a way to or whatever,of our products to get out more,information we could go back to our JSON,we can see what we've got here so other,parts that I called out was the handle,we published out great step and so I'm,gonna do is I'm just gonna quickly type,some of these some more of these ones,out right it's all I've done there is,just add a few more fields in and,printed it out again to the terminal we,can see that we've got our information,there that's the title that's the handle,that was the created date so when the,item was created and then the product,type because the variants are in their,own little thing here we actually need,to loop through each variant with inside,our loop to get that information out and,this is where the best information is so,it's got the SKU it's got the price and,it's got it compare out which determines,whether it's on sale or not so the way,we want to do that is if I just comment,this print statement out for the moment,within our for loop here we do another,for loop so,I'm going to call it variant in and then,we do item variants of acres we're,inside the i'ts we're inside our item,here but we want to look inside the,variant so this has become our item and,now we want to look inside this which,I've called variants so in here we can,do let's go and get the best get the,price for now ok so let's do price is,equal to and then variant and again,calling the key of price like that now,if we print price just to make sure,we're in the right place when we get the,information out you see we've got a load,of prices coming out there so if I'm,just going to add in a few more or do,skew is equal to variant skew and what,else should we do,available is a very handy one so,available is equal to variant available,so let's do price skew and available,then we go I'm not too soon so now we're,getting somewhere so we've got the price,is the skew of the item and whether it's,on sale or not ie where those available,to buy great so I'm going to leave it,there on that part and we're going to,move on and I want to talk a little bit,about how I get the images out as well,to get the image out because it's inside,the same product which we have as item,we actually want to keep the lit we will,have another four lit but we want to,have it not inside this one otherwise,we'll end up with a lot of extra data I,won't even work at all so here I'm going,to put in for image in and it's item,because we're inside this for loop and,not this look for new images I think,it's called and then I'm going to do,print image SRC now quite often this,fails,it doesn't this time around that's good,so sometimes you'll find that some,products won't actually have an image,which will cause you problems because it,will just fail so I tend to do from my,experience of doing this the image is,the one that they're not most likely to,not have so sometimes I put it in it,just to try and accept so if we do try,let's put it here actually try and then,accept and we'll just say image is equal,to none because what we're going to do,is we're going to change this to image,this RC is equal to that and then we're,going to do get rid of that and now,we're going to create a dictionary for,each in every single product and then,add it to a blank list which I do quite,often so let's create our blank list up,here on that list like this actually,inside the last for loop because we want,to go through all of this and then,through these and then add those let's,do product is let's go our dictionary,looks a title,alright all and then I'll call it handle,anyway,okay so what I've done here is basically,just create a product dictionary and,then add in the fields that we've,collected the title handle created the,product type don't need this print,statement anymore and then the image,source and we'll just change that to the,same so if there's no image if this,fails it comes and just puts none in,instead and then through the variants,the price SKU and available we've got,that there and then right underneath,here we're just going to do all that,list dot append and then our product,like that and now if we come out of that,all the loops completely and do print,and for that list hopefully we don't get,any errors and we get a nice long list,of everything back so let's look at the,last one in here and we can see we have,the title the handle original name the,price and the SKU whether it's a,variable an image that's great so the,easiest way to get this information into,a CSV file in my opinion is to use,pandas so let's import that as PD this,is just the standard and then right down,here instead of printing this product,list we're going to go and create a data,frame with it and we're data frame is,equal to PD dot data frame just like,this this is very basic data frame and,we can just put in our product list and,all that's going to do is it's going to,take the information in our product list,and it's going to create a data frame,for us and I've got other videos on how,to use pandas and how to do this so I'm,going to go through this quite quick,you'll find them on my channel I'll link,to them as well so you can go a better,explanation to that if you've not used,pandas like this before and then we just,do D F dot - CSV and we'll give it a,name we'll just do test run dot CSV and,I like to have a print statement at the,end just to make sure that I know it's,all done everything in about let's just,say save to file done so if we run that,we should get nothing except for save to,file,so now assuming on this I can you can,see right away that we've got the tire,handled for every product when it was,created the product type and the SKU and,the price availability and image now any,information in this you can go and get,you can seek out anything that you want,if you wanted to get the HTML the body,HTML which is the description it's all,here sometimes the tags are really,useful some people use that I'll just,recap real quickly what we've done so,we've got our URL now every Shopify,store can have this part of view our URL,put after it and remember the limits to,50 so if you need to use pagination just,change the number we've gone out and,used request to get any information,we've put the JSON object that we've got,back into our data variable create a,blank list so we can add the products to,it and we've basically just looping,through the JSON to get out the,information that we want you can change,these to suit any bits of information,that you like all Shopify stores use the,data like this so you're pretty safe,with these names I tend to put in the,try and accept around the images because,sometimes people don't have images,especially for old products that are,just lingering but you could use this on,any other part that's failing or maybe,just experiment and see what you need to,do we created our product dictionary,with all the information we've got added,it to the list and we've used pandas to,create a data frame so we can quickly,and easily export that to CSV if you,notice on this one we've got an extra,index that's because I'm here I didn't,do index is equal to false so all that,will do is it will just remove this,column and leave us with the rest,so hopefully that has been useful for,you guys if you're looking at scraping,product data check straightaway to see,if it's a Shopify store they all can,have the products that Jason pulled out,like this this way and it's as I said,just now it's all very strict and very,regular so all the names of the same to,handle the title so you could quite,easily loop through a lot of trouble,face stores and get information out like,this and create a big data set or you,could scrape your favorite stores a,couple of times a couple of times a week,or every day,on a cron job and you could tell,yourself whether there are new products,available or products have gone on sale,you can check the availability all this,sort of stuff this can all of them all,be done with this so give it a go find a,store see if the products are Jason Witt,works after the URL and see what,information you can get out cheers guys,bye
Congratulation! You bave finally finished reading shopify product updated_at when sold and believe you bave enougb understending shopify product updated_at when sold
Come on and read the rest of the article!