doc update and better versionning method

This commit is contained in:
Romain Dorgueil
2016-12-29 17:27:06 +01:00
parent b39d51071f
commit 8b63b0bf44
15 changed files with 197 additions and 114 deletions

33
docs/guide/crawlers.rst Normal file
View File

@ -0,0 +1,33 @@
Web crawlers with Bonobo
========================
.. todo:: Bonobo-Selenium is at a very alpha stage, and things will change. This section is here to give a brief
overview but is neither complete nor definitive.
Writing web crawlers with Bonobo and Selenium is easy.
First, install **bonobo-selenium**:
.. code-block:: shell-session
$ pip install bonobo-selenium
The idea is to have one callable crawl one thing and delegate drill downs to callables further away in the chain.
An example chain could be:
.. graphviz::
digraph {
rankdir = LR;
login -> paginate -> list -> details -> "ExcelWriter(...)";
}
Where each step would do the following:
* `login()` is in charge to open an authenticated session in the browser.
* `paginate()` open each page of a fictive list and pass it to next.
* `list()` take every list item and yield it.
* `details()` extract the data you're interested in.
* ... and the writer saves it somewhere.