This branch is a html-scan variant of WeReadScan, integrating script developed by Sec-ant.
Thanks for Sec-ant, this variant of WeReadScan can be more efficient.
More detail about Sec-ant's project, you can visit https://github.com/Sec-ant/weread-scraper
pip install WeReadScan-HTML
This package needs selenium, so you should have some basis of selenium.
Talk is cheap, just show you the code.
from selenium.webdriver import Chrome, ChromeOptions
from WeReadScan import WeRead
# options
chrome_options = ChromeOptions()
# now you can choose headless or not
chrome_options.add_argument('--headless')
chrome_options.add_argument("--disable-blink-features=AutomationControlled")
chrome_options.add_argument('disable-infobars')
chrome_options.add_argument('log-level=3')
# launch Webdriver
print('Webdriver launching...')
driver = Chrome(options=chrome_options)
print('Webdriver launched.')
with WeRead(driver) as weread:
# login to grab the whole book
weread.login()
# scan the book number one with it's url
weread.scan2html('https://weread.qq.com/web/reader/2c632ef071a486a92c60226kc81322c012c81e728d9d180')
# scan the book number two with it's url
weread.scan2html('https://weread.qq.com/web/reader/a9c32f40717db77aa9c9171kc81322c012c81e728d9d180')
Just code as demo show.