Python大批量搜索引擎图像爬虫工具详解

站长资源 2024/12/25 佚名

2 0 1

python图像爬虫包

最近在做一些图像分类的任务时，为了扩充我们的数据集，需要在搜索引擎下爬取额外的图片来扩充我们的训练集。搞人工智能真的是太难了"htmlcode">

pip install icrawler

下面附上我爬虫的代码：

from icrawler.builtin import BaiduImageCrawler 
from icrawler.builtin import BingImageCrawler 
from icrawler.builtin import GoogleImageCrawler 
#需要爬虫的关键字
list_word = ['抽烟 行人','吸烟 行人','接电话 行人','打电话 行人', '玩手机 行人']
for word in list_word:
  #bing爬虫
  #保存路径
  bing_storage = {'root_dir': 'bing\\'+word}
  #从上到下依次是解析器线程数，下载线程数，还有上面设置的保存路径
  bing_crawler = BingImageCrawler(parser_threads=2,
                  downloader_threads=4,
                  storage=bing_storage)
  #开始爬虫，关键字+图片数量
  bing_crawler.crawl(keyword=word,
            max_num=2000)

  #百度爬虫
  # baidu_storage = {'root_dir': 'baidu\\' + word}
  # baidu_crawler = BaiduImageCrawler(parser_threads=2,
  #                  downloader_threads=4,
  #                  storage=baidu_storage)
  # baidu_crawler.crawl(keyword=word,
  #           max_num=2000)


  # google爬虫
  # google_storage = {'root_dir': '‘google\\' + word}
  # google_crawler = GoogleImageCrawler(parser_threads=4,
  #                  downloader_threads=4,
  #                  storage=google_storage)
  # google_crawler.crawl(keyword=word,
  #           max_num=2000)

这个爬虫库能够实现多线程，多搜索引擎（百度、必应、谷歌）的爬虫，当然谷歌爬虫需要梯子。这里展示的是基于必应的爬虫，百度和谷歌的代码也在下面，只不过被我屏蔽掉了，当然也可以三个同时全开！这样的python爬虫库用起来简直不要太爽。

Python搜索引擎图像爬虫,Python图像爬虫

广告合作：本站广告合作请联系QQ：858582 申请时备注：广告合作（否则不回）
免责声明：本站资源来自互联网收集,仅供用于学习和交流,请遵循相关法律法规,本站一切资源不代表本站立场,如有侵权、后门、不妥请联系本站删除！

评论“Python大批量搜索引擎图像爬虫工具详解”

暂无评论...

www.wwsws.com 伏龙阁资源网

39,976影音资源

44,792技术资源

21,817软件资源

651,128站长资源

更新日志

2024年12月25日

Python大批量搜索引擎图像爬虫工具详解

python re的findall和finditer的区别详解

Python获取android设备cpu和内存占用情况

评论“Python大批量搜索引擎图像爬虫工具详解”

更新日志

友情链接