国际站转独立站:用Webscraper采集阿里国际站数据

全文字数:517 | 阅读时间:3分钟

一、安装Webscraper

打开Chrome浏览器 – 谷歌应用商店 – 搜索Web Scraper – 安装并启用。

二、打开Webscraper

三、导入国际站Sitemap

四、示例Sitemap JSON

这是爬取阿里国际站一个产品的sitemap,收集的数据包括产品名称、图片、视频、价格、描述,在startUrl中增加网址(用逗号分隔)即可爬取多个阿里国际站的产品。

{"_id":"Alibaba_Single_Product","startUrl":["https://www.alibaba.com/product-detail/Mini-Smart-Sealing-Machine-Electric-Driven_1601220476064.html"],"selectors":[{"delay":2000,"elementLimit":500,"id":"Scroll","multiple":false,"parentSelectors":["_root"],"selector":".ipH7B span.BEv27","type":"SelectorElementScroll"},{"id":"Title","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"h1","type":"SelectorText"},{"extractAttribute":"src","id":"Images","parentSelectors":["_root"],"selector":".image-list-slider img","type":"SelectorGroup"},{"extractAttribute":"src","id":"Video Link","multiple":false,"parentSelectors":["_root"],"selector":".react-dove-video video","type":"SelectorElementAttribute"},{"id":"Prices Item Group","multiple":false,"parentSelectors":["_root"],"regex":"","selector":".price-list","type":"SelectorHTML"},{"id":"Module Attributes","multiple":false,"parentSelectors":["_root"],"regex":"","selector":".layout-left > .module_attribute","type":"SelectorHTML"},{"id":"Module Lead Time","multiple":false,"parentSelectors":["_root"],"regex":"","selector":".layout-left > .module_lead","type":"SelectorHTML"},{"id":"Module Service","multiple":false,"parentSelectors":["_root"],"regex":"","selector":".layout-left > .module_service","type":"SelectorHTML"},{"id":"Module Description","multiple":false,"parentSelectors":["_root"],"regex":"","selector":".layout-left > .module_description","type":"SelectorHTML"}]}

这是爬取阿里国际站一个产品列表页的sitemap,收集的数据包括产品页面URL和价格区间,在startUrl中增加网址(用逗号分隔)即可爬取多个阿里国际站的产品列表页。

{"_id":"Alibaba_List_Page","startUrl":["https://szwei.en.alibaba.com/productgrouplist-829442874/Thermoforming_Machine.html"],"selectors":[{"id":"Products","parentSelectors":["wrapper_for_Products_Price Range"],"type":"SelectorElementAttribute","selector":"a.title-link","multiple":false,"extractAttribute":"href"},{"id":"Price Range","parentSelectors":["wrapper_for_Products_Price Range"],"type":"SelectorText","selector":"span.num","multiple":false,"regex":""},{"id":"wrapper_for_Products_Price Range","parentSelectors":["_root"],"type":"SelectorElement","selector":"div.vertical","multiple":true}]}

五、开始爬取

点击【Sitemaps】查看完整列表 – 选择要使用的Sitemap – 开始爬取 – 设置爬取时间(保持默认值即可) – 等待爬取完成 – 导出数据(可以选择Excel/CSV格式)

六、创建自定义Sitemap

购买建站服务的,可以提交需求,说明要收集的平台与数据,由我们创建Sitemap并发送给您,也可以帮忙采集数据。

自己创建Sitemap可参考官方视频教程,Web Scraper Sitemap代创建、代采集服务200元起。

文章目录
Scroll to Top