Golang 爬虫 抓取豆瓣小组图片 通过api提交入库到 Chevereto 图床

来源:GK导航    date:2022-9-6    标签:,     admin

前面我们提到了Python 爬虫 抓取豆瓣小组图片 通过api提交入库到 Chevereto 图床,后来闲着无聊又使用Golang写了一个脚本,用来抓取豆瓣小组的图片。

Chevereto free版本 使用api 上传图片 图文教程

图床地址:http://788to.com

使用之前大家先配置一下Golang的环境,然后安装一下必要的包:

go get github.com/PuerkitoBio/goquery

脚本运行时可以使用两个参数:

-u 小组的url地址,例如:https://www.douban.com/group/meituikong/discussion?start=

-e 最后一些的start=的值

-k?Chevereto密匙

完整的运行示例:

go run get-douban-image.go -u=”https://www.douban.com/group/265201/discussion?start=” -e=”700″ -k=”laoji.org”

git 地址:https://github.com/qsbaq/doubanImage

源码如下,以下代码仅作演示,以git地址代码为准:

package main    import (  	"encoding/json"  	"flag"  	"fmt"  	"io/ioutil"  	"log"  	"net/http"  	"net/url"  	"regexp"  	"strconv"  	"sync"  	"time"    	"github.com/PuerkitoBio/goquery"  )    func GetUrl(url string) []byte {  	ret, err := http.Get(url)  	if err != nil {  		log.Println(url)  	}  	body := ret.Body  	data, _ := ioutil.ReadAll(body)  	return data  }    func getImage(image_url string, k string) {  	data := GetUrl(image_url)  	body := string(data)  	part := regexp.MustCompile("https://(.*).doubanio.com/view/group_topic/large/public/(.*).jpg")  	match := part.FindAllString(body, -1)  	for _, value := range match {  		submit_url := "http://788to.com/api/1/upload/?key=" + k + "&source=" + url.QueryEscape(value)  		fmt.Println(submit_url)  		return_json := GetUrl(submit_url)  		res := make(map[string]interface{})  		json.Unmarshal(return_json, &res)  		log.Printf("%s -> %v n", value, res["status_code"])  	}  }    func getGroupList(target_url string, k string) {  	fmt.Printf("Begin Url : %sn", target_url)  	doc, err := goquery.NewDocument(target_url)  	if err != nil {  		panic(err)  		log.Fatal(err)  	}  	// Find the review items  	doc.Find("td.title a").Each(func(i int, s *goquery.Selection) {  		// For each item found, get the band and title  		href, IsExist := s.Attr("href")  		if IsExist {  			getImage(href, k)  		}  	})  	wg.Done()  }    var wg sync.WaitGroup    func main() {  	k := flag.String("k", "laoji.org", "Chevereto Key")  	endStartInt := flag.Int("e", 100, "End Start Int Value")  	defaultUrl := flag.String("u", "https://www.douban.com/group/meituikong/discussion?start=", "Group Url")  	flag.Parse()  	for i := 0; i < *endStartInt; i = i + 25 {  		wg.Add(1)  		go getGroupList(*defaultUrl+strconv.Itoa(i), *k)  		time.Sleep(3e9)  	}  	wg.Wait()  }  

运行结果:

2017/02/10 08:18:10 https://img3.doubanio.com/view/group_topic/large/public/p615
41380.jpg -> 200
2017/02/10 08:18:10 https://img3.doubanio.com/view/group_topic/large/public/p447
24331.jpg -> 200
2017/02/10 08:18:10 https://img3.doubanio.com/view/group_topic/large/public/p655
69545.jpg -> 200
2017/02/10 08:18:10 https://img1.doubanio.com/view/group_topic/large/public/p447
24327.jpg -> 200
Begin Url : https://www.douban.com/group/265201/discussion?start=500
2017/02/10 08:18:10 https://img3.doubanio.com/view/group_topic/large/public/p470
29205.jpg -> 200
2017/02/10 08:18:10 https://img5.doubanio.com/view/group_topic/large/public/p336
82186.jpg -> 200
2017/02/10 08:18:11 https://img3.doubanio.com/view/group_topic/large/public/p649
79344.jpg -> 200
2017/02/10 08:18:11 https://img5.doubanio.com/view/group_topic/large/public/p470
29206.jpg -> 200
2017/02/10 08:18:11 https://img3.doubanio.com/view/group_topic/large/public/p649
79345.jpg -> 200
2017/02/10 08:18:11 https://img3.doubanio.com/view/group_topic/large/public/p487
17685.jpg -> 200
2017/02/10 08:18:11 https://img3.doubanio.com/view/group_topic/large/public/p507
72901.jpg -> 200
2017/02/10 08:18:11 https://img1.doubanio.com/view/group_topic/large/public/p452
23799.jpg -> 200
2017/02/10 08:18:11 https://img1.doubanio.com/view/group_topic/large/public/p477
58309.jpg -> 200

腾讯云限时秒杀【点击购买】

搬瓦工,CN2高速线路,1GB带宽,电信联通优化KVM,延迟低,速度快,建站稳定,搬瓦工BandwagonHost VPS优惠码BWH26FXH3HIQ,支持<支付宝> 【点击购买】!

Vultr$3.5日本节点,512M内存/500G流量/1G带宽,电信联通优化,延迟低,速度快【点击购买】!

阿里云香港、新加坡VPS/1核/1G/25G SSD/1T流量/30M带宽/年付¥288【点击购买】

Golang 爬虫 抓取豆瓣小组图片 通过api提交入库到 Chevereto 图床

`微信`扫码 加好友