Project: Marmot | HTTP Download

HTTP Download Helper, Supports Many Features such as Cookie Persistence, HTTP(S) and SOCKS5 Proxy....

1. Introduction

World-Wide-Web robot, also known as spiders and crawlers. The principle is to falsify network data by constructing appointed HTTP protocol data packet, then request resource to the specified host, goal is to access the data returned. There are a large number of web information, human's hand movement such as copy-paste data from web page is time-consuming and laborious, thus inspired the data acquisition industry.

Batch access to public network data does not break the law, but because there is no difference, no control, very violent means will lead to other services is not stable, therefore, most of the resources provider will filtering some data packets(falsify), in this context, batch small data acquisition has become a problem. Integrated with various requirements, such as various API development, automated software testing(all this have similar technical principle). So this project come into the world(very simple).

The Marmot is very easy to understand, just like Python's library requests(Not yet Smile~ --| ). By enhancing native Golang HTTP library, help you deal with some trivial logic (such as collecting information, checking parameters), and add some fault-tolerant mechanisms (such as add lock, close time flow, ensure the high concurrent run without accident). It provides a human friendly API interface, you can reuse it often. Very convenient to support Cookie Persistence, Crawler Proxy Settings, as well as others general settings, such as HTTP request header settings, timeout/pause settings, data upload/post settings. It supports all the HTTP methods POST/PUT/GET/DELETE/... and has built-in spider pool and browser UA pool, easy to develop UA+Cookie persistence distributed spider.

The library is simple and practical, just a few lines of code to replace the previous Spaghetti code, has been applied in some large projects.

The main uses: WeChat development/ API docking / Automated test / Rush Ticket Scripting / Vote Plug-in / Data Crawling

Now We support Default Worker, You can easy use:

lesson1.go

package main

import (
	"fmt"
	"github.com/hunterhug/marmot/miner"
)

func main() {
	miner.SetLogLevel(miner.DEBUG)

	// Use Default Worker, You can Also New One:
	//worker, _ := miner.New(nil)
	//worker = miner.NewWorkerWithNoProxy()
	//worker = miner.NewAPI()
	//worker, _ = miner.NewWorkerWithProxy("socks5://127.0.0.1:1080")
	worker := miner.Clone()
	_, err := worker.SetUrl("https://www.bing.com").Go()
	if err != nil {
		fmt.Println(err.Error())
	} else {
		fmt.Println(worker.ToString())
	}
}

See the example dir. such as lesson or practice.

2. How To Use

You can get it by:

go get -v github.com/hunterhug/marmot/miner

2.1 The First Step

There are four kinds of worker:

worker, err := miner.NewWorker("http://xx:[email protected]:808") // proxy worker, format: protocol://user(optional):password(optional)@ip:port, alias toNew(), support http(s), socks5
worker, err := miner.NewWorker(nil) // normal worker, default keep Cookie, alias to New()
worker := miner.NewAPI() // API worker, will not keep Cookie
worker, err := miner.NewWorkerByClient(&http.Client{}) // You can also pass a http.Client if you want

if you want to use worker twice, you can call Clone() method to clone a new worker, it can isolate the request and response of http, otherwise, you should deal concurrent program carefully.

2.2 The Second Step

Camouflage our worker:

worker.SetUrl("https://www.bing.com") // required: set url you want to
worker.SetMethod(miner.GET) // optional: set http method POST/GET/PUT/POSTJSON and so on
worker.SetWaitTime(2) // optional: set timeout of http request
worker.SetUa(miner.RandomUa()) // optional: set http browser user agent, you can see miner/config/ua.txt
worker.SetRefer("https://www.bing.com") // optional: set http request Refer
worker.SetHeaderParam("diyheader", "diy") // optional: set http diy header
worker.SetBData([]byte("file data")) // optional: set binary data for post or put
worker.SetFormParam("username","hunterhug") // optional: set form data for post or put
worker.SetCookie("xx=dddd") // optional: you can set a init cookie, some website you can login and F12 copy the cookie
worker.SetCookieByFile("/root/cookie.txt") // optional: set cookie which store in a file

2.3 The Third Step

Run our worker:

body, err := worker.Go() // if you use SetMethod(), auto use following ways, otherwise use Get()
body, err := worker.Get() // default
body, err := worker.Post() // post form request, data fill by SetFormParam()
body, err := worker.PostJSON() // post JSON request, data fill by SetBData()
body, err := worker.PostXML() // post XML request, data fill by SetBData()
body, err := worker.PostFILE() // upload file, data fill by SetBData(), and should set SetFileInfo(fileName, fileFormName string)
body, err := worker.Delete() // you know!
body, err := worker.Put() // ones http method...
body, err := worker.PutJSON() // put JSON request
body, err := worker.PutXML()
body, err := worker.PutFILE()
body, err := worker.OtherGo("OPTIONS", "application/x-www-form-urlencoded") // Other http method, Such as OPTIONS etc., can not sent binary.
body, err := worker.OtherGoBinary("OPTIONS", "application/x-www-form-urlencoded") // Other http method, Such as OPTIONS etc., just sent binary.
body, err := worker.GoByMethod("POST") // you can override SetMethod() By this, equal SetMethod() then Go()

2.4 The Fourth Step

Deal the return data, all data will be return as binary, You can immediately store it into a new variable:

fmt.Println(string(html)) // type change directly
fmt.Println(worker.ToString()) // use spider method, after http response, data will keep in the field Raw, just use ToString
fmt.Println(worker.JsonToString()) // some json data will include chinese and other multibyte character, such as 我爱你,我的小绵羊,사랑해

Attention:

After every request for an url, the next request you should cover your http request header, otherwise http header you set still exist, if just want clear post data, use Clear(), and want clear HTTP header too please use ClearAll(), but I suggest use Clone() to avoid this.

2.5 Other

Hook:

SetBeforeAction(fc func(context.Context, *Worker))
SetAfterAction(fc func(context.Context, *Worker))

License

Copyright [2016-2022] [github.com/hunterhug]

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 170 Commits
example		example
expert		expert
miner		miner
util		util
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_ZH.md		README_ZH.md
go.mod		go.mod
go.sum		go.sum
logo.png		logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project: Marmot | HTTP Download

1. Introduction

2. How To Use

2.1 The First Step

2.2 The Second Step

2.3 The Third Step

2.4 The Fourth Step

2.5 Other

License

About

Releases

Packages

Languages

License

hunterhug/marmot

Folders and files

Latest commit

History

Repository files navigation

Project: Marmot | HTTP Download

1. Introduction

2. How To Use

2.1 The First Step

2.2 The Second Step

2.3 The Third Step

2.4 The Fourth Step

2.5 Other

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages