BACK

how to scrape shopify stores

How to Find Shopify Stores & Emails with Scrapebox what's up guys ivan here with,goodivin.com and in

Ivan SEO

Updated on Feb 01,2023

How to Find Shopify Stores & Emails with Scrapebox

The above is a brief introduction to how to scrape shopify stores

Let's move on to the first section of how to scrape shopify stores

Let TThunt's experts help you find the best TikTok product on your Shopify business!

Find TikTok products (It's Free)
No difficulty
No complicated process
Find winning products
3.5K Ratings

WHY YOU SHOULD CHOOSE TTHUNT

TThunt has the world's largest selection of TikTok products to choose from, and each product has a large number of advertising materials, so you can choose advertising materials for TikTok ads or Facebook ads without any hassle.

how to scrape shopify stores catalogs

How to Find Shopify Stores & Emails with Scrapebox

what's up guys ivan here with,goodivin.com and in this video i'm going,to be showing you guys a,cool trick on how to scrape,shopify websites using scrapebox,so we're not only going to scrape,shopify websites we're going to scrape,the emails from them i wanted to show,you a brief example of something that i,did recently,took me a minute to figure out the right,combination of things in scrapebox to do,this,and i actually fulfilled i think i've,only done one order,fully so far i may have done another one,i don't remember actually now,but i had someone ask me about it and i,went and kind of figured out how to do,the footprints,for e-commerce websites and i started,with shopify and then i actually got an,order from another person,who asked me to then do just generic,shopify scrapes,so i ran several tests in,trying to collect the urls appropriately,based on particular footprint,combinations and i was able to do to,complete this job,here's one for 5600 uh,e-commerce niches 200 results,in yahoo so i think i think this is the,one of the,final files this is one of the main,files,that i did here's a general uh list of,e-commerce footprints that i plan on,collecting so in the future,you should see some videos for all of,these footprints as well,and um then see here this is like the,i did one for 10 e-commerce niches that,actually hit uh,local cities so you can see 43k here,um you know people say that ecommerce is,not local that is not true,once i ran it with local cities it's,amazing how much more the list was,expanded,and so you can see here that i combined,these 10 niches with local cities with,5600 categories,and larger quantity results and that,gave me this final,final list let's take a look at this,sample,so this sample is 26 000 emails that,have been,verified uh valid emails,across so it's basically like 5 600,inches for 200 results,for each of those keywords so it should,be,yeah it should be 5 600 times 200,that's a pretty good number of course,those get you know knocked down,for the the duplicates that's the number,of urls we're looking at,and then for the 10 niches it'd be 10,times 43,000. so also quite large there,anyways that's just an example of,something i did let's go over to scrape,box here,and then we can kind of take a brief,look at uh,some of the methodology so,for example in the,the tend to the cities you would go,you would take a list of cities and i'll,put the lists of these cities in my,description,uh below this video and then you can um,basically permutate these these keywords,using this merge tag here,so then you just kind of grab these from,the desktop and,uh it mashes those together,so this is this is kind of the the long,term,potential of this method because,you know i did a number of tests and,variations and this did provide me with,a really large list i must have done,actually i don't remember what i did for,the number of results i might have been,i doubt that i did 200 that would be,gigantic,um that number that i did was just one,to one,so uh 43 430 k,for one to one ratio right but,um of 10 keywords but i probably did 30,results in that case which,would make it much much larger you can,see the number of keywords here,is pretty huge um,but uh it's hard to say i don't i don't,remember,actually now so we'll have to uh we'll,try something smaller in a second to,give you,an example of the of the footprint,because you you want to do this keyword,but then you also want to do a footprint,up here,and you know i tried doing uh i tried,doing site,or no i didn't do site i didn't do site,i did,actually it's in this drop down let's,take a look at this drop down i noticed,maybe it's not in this drop down,it might be another one somewhere see is,it in this one,all right here it is so it looks like i,did,broad match i did exact match and phrase,match i also did this site,footprint so basically the footprint is,the cdn,so i'll open up a browser here and i'll,show you this real quick so,if we for example did uh,clothes fort worth texas,and then did something like cd uh,cdn.shopify.com it's all over the place,but,it'll return shopify,uh websites in general is pretty,reliable,you just right click go to view page,source and then,what you want to look for is the cdn,footprint so you see here it says,cdn.shopify.com so the that's the,general strategy,is with any given e-commerce platform,you just want to find,something like the cdn or some sort of,consistent,thing to latch on to that's related to,that platform,in in the page source of the uh,of the the e-commerce website you know,and in this case it's,it's pretty straightforward there and,you can even do like,forward slash and things like that if,you wanted to and that's kind of,a part of the cleanup phase which i'll,show you guys in a second but,but um if i recall i believe i just did,i don't like to use site operators that,much anymore because i found that yahoo,and bing,really like throw up blocks when you use,site operators sometimes it doesn't,happen but,a lot of the times i've been i've tried,it it does do that,so i've noticed it's okay to do like a,an exact match or even a phrase match,but really just,the broad match i've tested it and i've,gone like 15 pages deep with those types,of things,and it seems to work fine like every,single result that i tested was a,shopify website which is wild,but let's just do cdn.shopify.com up,here in the footprint,and then let's not do this many cities,this is huge,let's just do and this is just 10,popular e-commerce niches,clothes fitness gear kitchen equipment,cosmetics books pet care products rugs,musical instruments supplements toys,you know i mean there's a whole lot more,you could do this list is partially done,from general web searches,but then it's also uh google shopping,uh categories i'll put i'll put a paste,bin of this list,below this video also so that you guys,can get this list and so this comes from,the google shopping region of categories,i broke it down to the very like end of,the,of the descriptions so let's just take,like,let's just like grab a chunk of these,maybe if we can,scrub it let's grab a chunk of these,well let's do,now that's not going to work let's try,that let's try let's try this i suppose,um and then i'll just create a random,file here,and throw these in there because you,gotta merging you know together,leave that footprint in there let's do,top five thousand cities,by population and then let's merge those,with my random file,so that's that's ninety thousand,keywords geez that's so that's still,large let's go ahead and do top 30,results,and let's start harvesting i'll skip,ahead to the point when it's finished,should go pretty quickly,it's kind of uh oh i did i didn't start,it,let's do this yahoo footprint they're a,little more flexible and let's do,yes 30 results okay let's do yahoo and,then let's just hit go,i was like get pausing there for a,second,um so it should start in just a second,and i'll skip ahead,there it goes unleashed,there's the gates open or open the gates,okay i'm gonna skip ahead,okay so i went ahead and stopped it,prematurely so that i wouldn't have to,wait so long,it looks like we processed about 104 000,emails,so let's take a look here and it looks,like actually,well that's probably just partially,processing,um looks like we did hit some misses,it kind of varies sometimes it's better,to do bing in some cases,and sometimes yahoo gives really good,results i i could be,a little bit burned out right now on,yahoo but it's exit to maine looks like,that dropped down to 18k,um so there was a number of duplicates,in there i'm not sure how many,who knows how many processed and you,know which keywords processed,um those are pretty random what was that,waffle irons wakeboard parts,what was that walk behind something,other walking head accessories,wall clocks wall shelves and ledges all,sorts of random stuff and things so,anyways once you have your sample though,was this,18 000 so let's export this as a,you know just one two three one two,three,pop up here and then you want to use,this page scanner add-on,so that we can confirm the shopify,footprint for the sample that we have,and it should still be in there let's,make sure that we remove these,duplicators removed,and let's go ahead and also remove,entries which are not urls,entries and hosts is an ip let's also,remove,extend urls with extensions so that,takes out all the images and things like,that so that's,technically what we want to save there,anyways in the page scanner we load the,urls from the harvester,and then under the platform mask you,edit this and deselect,all of these so set this to none,and then i looks like i set this over,here so let me,check my uh i'm pretty sure it's just,the cdn,with the with the ford and the,backslashes but i just want to check to,make sure,yeah that's all it is super simple right,um so then you just add,a new one or wait a second,so we're gonna do boom to here,add name this shopify,hit enter and then scroll down should be,there,shopify that's good and should be saved,i'll,click update just in case okay then,close this,and let me check my settings here,make sure this is at 200 maximum then 15,and roughly 30 that's kind of that's,kind of my default but you're you're,welcome to try other things,i find that if you don't have your read,timeout at least 30 you can run into a,lot of problems in a lot of different,areas of scrape box,so that should be good to go let's hit,start that should go pretty quickly,let's go ahead and,jump forward and you can see,boom boom boom it's it's already,detecting,all kinds of things and most of them,will be shopify footprints but let's,let's jump ahead to the final results,okay finished,and looks like some of the threads are,locked up let me see if i can,it regularly locks up let's go ahead and,uh well let's let's check the uh,the add-on sessions is page scanner so,it should be,us it's our looks like it's,empty this is page scanner right,um page yeah page scanner okay so i,guess it's,locked up let's try to,force it to close and hopefully it'll,it'll push the results into the file,yeah it did okay so push those in you've,got to do this all the time if you,haven't figured that out yet you gotta,constantly force,shut shut down processes of scrapebox,let's go ahead and um do we want to,probably process these in here,um so let's drop these results here so,it looks like it only,dropped down by about a thousand so,that's pretty good let's remove,duplicates,okay that's good i'm gonna go ahead and,export these to,really the same one two three file and,then so that's basically a list all,those are actually you know what,it looks like we've got to cut these,strings down,so let's remove everything remove,urls and not because you see it's got,delimiters of the piping symbol than,shopify,um sometimes they do this instead of,just give you the urls they should just,export the urls and not the footprint,but it makes sense that they do that if,you have multiple footprints,okay remove urls not containing,shopify so yeah that did,that did cut it down by about 6k or so,and,some of those could contain well if it,contains shopify in the url then it's,it's probably fine let's also um,i also kind of want to drop these i've,been dropping the the e-commerce ones,down to the root domain,because there's no reason to have a lot,of duplicate urls when it comes to,e-commerce you really just want the main,you try to go for the main home page or,you know,some other some other page perhaps but,you know sometimes if you scrape a large,amount of data there can be duplicate,urls you know because there's obviously,multiple products so this is one of,those cases not like with,necessarily with the service side of,things where you're scraping,directories in this case you're you're,kind of going straight to the shop the,shopify,websites through the footprint and so,let's go ahead and trim,this to,trim urls to domain let's let's try trim,to root,yeah that worked no i'm not totally,clear why,room urls to domain level trim to roots,kind of this,kind of sounds like the same thing i,need to figure that out,let's see what is this,okay okay so this just should be fine,let's remove,duplicate urls yeah so that dropped down,to 8k,and let's see okay let's try this,well that's that's interesting it,dropped down even,further oh i probably you know what it,probably dropped down um,i might have dropped down some some some,multi-layered subdomains i don't know,probably shouldn't have done that but,anyways let's just go ahead and export,this,down to this one two three and so,basically we kind of went from,you know 100k urls dropped down to like,18 or,maybe around 20k and we did some some,some culling and then it went down to,like 18k,eventually you know all these different,little cuts and we,got down to 8k so that might not sound,like,a lot but given the fact that you can do,a whole lot more you can,create a lot larger set but what i,stopped it at was about 10,of the scrapes so whether you're doing,this variation,or you're doing you know some other,variation with a whole lot more urls and,a lot more page results,you can get you know huge samples the,mine mine was a hundred thousand urls,and like trying to just been trying to,describe you can get it down to,you know tens of millions of urls pretty,easily if you let the urls accumulate,and then you run through the page,scanner and you know you can get a huge,sample of data,anywho basically from here you just do,the grab check on the uh,the emails the phone numbers whatever,you're trying to get you know you can,you can grab,metadata um let's go ahead and grab,grab emails and,save urls with emails,do we have a filter set here and make,sure that's not checked,should be good to go nothing in here too,confused,go ahead and start this this should go,pretty quickly,let's have let's hop ahead to it okay so,that finished pretty quickly,i got about 5 000 emails from those 8,000 pages which is pretty good,and we're going to go into the data,folder and just grab both of these files,and it's weird how it splits them like,that i really don't like that,i need to put them a little higher maybe,what maybe that won't happen,anyways let's close this and,clear this and grab this down here,and throw these in here remove,duplicates,typically i export with um,the the urls for,for client deliveries and things like,that,do i really want to do that for well i,was going to add this file to this,description below this in case you guys,wanted to use these or test these or,something like that,but um yeah i guess i could go ahead and,do that,see it's it's actually not let's see,here this one says with,all with all dupes let's see what this,one yeah those are the 5,000 typically would like just combine,them all and then scrub them but the,problem is that,for some reason it's not scrubbing right,the,formatting must be off but look there's,a image,see if i can remove with extensions it,probably won't work,yeah it won't work because it's not,looking right anyways once you got your,emails,you basically want to take them to your,whatever your email verification tool is,so,i am going to export these,as see here four six four five six four,five,six and copy this file,over into my email verification,so i will hop over there and then hop,back real quick,okie dokie so i'm gonna open up one of,my instances,this is atomic mail verifier this is one,of the options you can also use,um,what's it called g-lock by aev i've,tested them both and they both,come out to relative relatively the same,results i'm also,kind of thinking of flirting with the,idea of maybe seeing if i could,recreate this process because uh i kind,of like it to be,uh sped up a bit if possible and have my,own independent thing i don't have to,pay for but,let's go ahead and open this up with uh,notepad plus plus,and we need to clean the syntax errors,before we run the email verification so,let's change the encoding to ansi and,then scroll down to the bottom,and do i spot,i do not spot any syntax errors which is,unusual there's a whole lot of,duplicates in here,of bogus stuff so we'll have to,actually i need to i need to open this,and i need to change the delimiter from,the piping,into a semicolon so change this to,this replace all,then save close,and transfer,so that should be as ton of images jeez,that's not good,i could have sworn that i deleted the,well it could have it picked up multiple,images is probably what happened,let's take a look at what happens so,let's actually let's delete the um,duplicates so it went from 5k to about,4.3 k verify,domain does not exist,that's not true let's stop this here,sometimes that happens there's a glitch,with the formatting,let's go back into the file and let's,reopen it and,find the semicolon and replace it with,two more semicolons three semicolons,give it some space so that it can't mess,up,when it interprets the domain let's uh,delete everything and uh add this back,select all deleted,i sure oh not sure what's going on there,huh let's close this instance something,is weird is happening i've never seen,this error before,okay i'm about to just sorry for these,noises,i'm about to just,shut this program down,like this,uh looks like it closed the wrong,instance,there it is,sorry about that let's reopen it,and that's never had that problem happen,let's uh go ahead and delete duplicates,again,there it goes,and,the emails get picked up from scrapebox,using basically regular expression so,that's why those patterns are being,viewed that way,let's go ahead and see if we can get,some domain does not exist huh,let's uh let's actually check some of,this stuff what is this,portrait galia store.com is this,alphabetical let me,or or in order let's take a look at this,url,that is definitely a website,so what's the problem here,it's obviously invalid but if it doesn't,detect the website then,that's a serious issue,oh you know what it's it's looking at,the i'm sorry that's kind of doofus me,of course the domain does not exist with,uh with the image,for some reason i was thinking of it as,uh,as it was looking at the uh,at the other column and that's not how,it works my apologies i just kind of had,a,brain blip there so this will be,finished in a moment let's skip ahead,okay so looks like it's basically,finished,it kind of tends to hang up but uh at,the last minute just,kind of like scrape box but i don't know,what it is about these programs that,can't cut off,ties with these threads let's go ahead,and stop it and,you can see 2100 they're absolutely,valid so that's about half the sample,which is which is actually really good,and then some are uncertain mostly from,email forwarding,and some of them are going to be from,you know bands and things like that,because i,very i always verify from the same ip,because i want to,like find the ones that are absolutely,clean anyways let's export these,to just pure actually do we want it to,be,yeah we'll keep these columns,yeah to well i'll go ahead and do valid,and uncertain,and then now i can just give this to you,guys in a paste bin below,so let's uh save this,and,let's call this,random w,shopify,or let's see random w keyword e-commerce,keywords would say,let's say 5k i would say 30r,for 30 results 5k for,the the towns should be should be 5kt,usa yahoo,scrape scrubbed,verified valid,and uncertain on search reign,normally i save them as text files but,let's say this is csv so i can convert,it,uh easily to mark down,so that should be good is it yeah it's,down here i'm gonna put this,put this up here,and then let's go ahead and delete this,minimize copy this over,come over to,drop this into my file and then,that's good and then let's head over to,open up my little uh csv to markdown,converter,let's choose that file we just saved,e-commerce let's grab this,and so it's just awesome how this thing,is is so fast,i just discovered this recently let's,copy this to my clipboard,and let's actually open up a paste,here paste bin new paste,grab this mark down,drop this in there,and in the second here i'll put the link,to this video in this paste,let's do this unlisted for now,and what was this file called let me,grab that file name,oh you know i didn't i didn't even put,shopify in this,shopify urls is where this goes,a little bit late but oh well,i'll just call this,random,verified shopify,email sample okay,plus random verified shopify email,sample,okay uh let's see here markdown never,enlisted yeah,full let's put this in scrape box,and create new paste,should be good,there we go all right so there it is,i'll send this to you guys,and hopefully you can take a look at it,in your own time and just,you know check some of those sites and,maybe check,some of those emails out but uh that's,pretty much it i know,this was a whole lot involved here but i,just wanted to show you guys,how easy it was to um,you know uh mash in some e-commerce,keywords some product related keywords,throw in a shopify footprint and uh you,know maybe some locations or you know,increase the number of,results there's a lot there's there's a,little bit of you know,flex and you know some of the variations,you can do,and you can get you know emails really,quickly contact information really,quickly,and you can verify it really quickly and,i mean just within the course of this,video,over a short period of time i was able,to get 2k valid emails,you know super easy to do if you've got,scrapebox and you check out my other uh,tutorials in this playlist i have a,playlist on first grade box on my,channel where i add all my scrapbox,videos,there's all kinds of instruction on how,to set up scrapebox configurations,configure the settings get the good,proxies,just set up a vps and so on so forth and,so on and so forth so,check out that content make sure to like,comment and subscribe appreciate you,guys watching and i'll catch you in the,next videos on setting up,other footprints for other you know,things in e-commerce and,um and uh you know i'm gonna do some,videos on harvester configurations so,i'll catch you in the next videos guys,thanks bye

Congratulation! You bave finally finished reading how to scrape shopify stores and believe you bave enougb understending how to scrape shopify stores

Come on and read the rest of the article!

Browse More Content