doc update and better versionning method

This commit is contained in:
Romain Dorgueil
2016-12-29 17:27:06 +01:00
parent b39d51071f
commit 8b63b0bf44
15 changed files with 197 additions and 114 deletions

3
docs/_static/custom.css vendored Normal file
View File

@ -0,0 +1,3 @@
svg {
border: 2px solid green
}

2
docs/_static/graphs.css vendored Normal file
View File

@ -0,0 +1,2 @@
.node {
}

33
docs/guide/crawlers.rst Normal file
View File

@ -0,0 +1,33 @@
Web crawlers with Bonobo
========================
.. todo:: Bonobo-Selenium is at a very alpha stage, and things will change. This section is here to give a brief
overview but is neither complete nor definitive.
Writing web crawlers with Bonobo and Selenium is easy.
First, install **bonobo-selenium**:
.. code-block:: shell-session
$ pip install bonobo-selenium
The idea is to have one callable crawl one thing and delegate drill downs to callables further away in the chain.
An example chain could be:
.. graphviz::
digraph {
rankdir = LR;
login -> paginate -> list -> details -> "ExcelWriter(...)";
}
Where each step would do the following:
* `login()` is in charge to open an authenticated session in the browser.
* `paginate()` open each page of a fictive list and pass it to next.
* `list()` take every list item and yield it.
* `details()` extract the data you're interested in.
* ... and the writer saves it somewhere.

View File

@ -1,4 +1,8 @@
Guides
======
.. todo:: write the fucking doc!
.. toctree::
:maxdepth: 2
purity
crawlers

128
docs/guide/purity.rst Normal file
View File

@ -0,0 +1,128 @@
Pure components and space complexity
====================================
The nature of components, and how the data flow from one to another, make them not so easy to write correctly.
Hopefully, with a few hints, you will be able to understand why and how they should be written.
The major problem we have is that one message can go through more than one component, and at the same time. If you
wanna be safe, you tend to :func:`copy.copy()` everything between two calls to two different components, but that
will mean that a lot of useless memory space would be taken for copies that are never modified.
Instead of that, we chosed the oposite: copies are never made, and you should not modify in place the inputs of your
component before yielding them, and that mostly means that you want to recreate dicts and lists before yielding (or
returning) them. Numeric values, strings and tuples being immutable in python, modifying a variable of one of those
type will already return a different instance.
Numbers
=======
You can't be wrong with numbers. All of the following are correct.
.. code-block:: python
def do_your_number_thing(n: int) -> int:
return n
def do_your_number_thing(n: int) -> int:
yield n
def do_your_number_thing(n: int) -> int:
return n + 1
def do_your_number_thing(n: int) -> int:
yield n + 1
def do_your_number_thing(n: int) -> int:
# correct, but bad style
n += 1
return n
def do_your_number_thing(n: int) -> int:
# correct, but bad style
n += 1
yield n
The same is true with other numeric types, so don't be shy. Operate like crazy, my friend.
Tuples
======
Tuples are immutable, so you risk nothing.
.. code-block:: python
def do_your_tuple_thing(t: tuple) -> tuple:
return ('foo', ) + t
def do_your_tuple_thing(t: tuple) -> tuple:
return t + ('bar', )
def do_your_tuple_thing(t: tuple) -> tuple:
# correct, but bad style
t += ('baaaz', )
return t
Strings
=======
You know the drill, strings are immutable, blablabla ... Examples left as an exercise for the reader.
Dicts
=====
So, now it gets interesting. Dicts are mutable. It means that you can mess things up badly here if you're not cautious.
For example, doing the following may cause unexpected problems:
.. code-block:: python
def mutate_my_dict_like_crazy(d: dict) -> dict:
# Bad! Don't do that!
d.update({
'foo': compute_something()
})
# Still bad! Don't mutate the dict!
d['bar']: compute_anotherthing()
return d
The problem is easy to understand: as **Bonobo** won't make copies of your dict, the same dict will be passed along the
transformation graph, and mutations will be seen in components downwards the output, but also upward. Let's see
a more obvious example of something you should not do:
.. code-block:: python
def mutate_my_dict_and_yield() -> dict:
d = {}
for i in range(100):
# Bad! Don't do that!
d['index'] = i
yield d
Here, the same dict is yielded in each iteration, and its state when the next component in chain is called is undetermined.
Now let's see how to do it correctly:
.. code-block:: python
def new_dicts_like_crazy(d: dict) -> dict:
# Creating a new dict is correct.
return {
**d,
'foo': compute_something(),
'bar': compute_anotherthing(),
}
def new_dict_and_yield() -> dict:
d = {}
for i in range(100):
# Different dict each time.
yield {
'index': i
}
I hear you think «Yeah, but if I create like millions of dicts ...». The answer is simple. Using dicts like this will
create a lot, but also free a lot because as soon as all the future components that take this dict as input are done,
the dict will be garbage collected. Youplaboum!

View File

@ -1,22 +0,0 @@
bonobo.ext.console package
==========================
Submodules
----------
bonobo.ext.console.plugin module
--------------------------------
.. automodule:: bonobo.ext.console.plugin
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: bonobo.ext.console
:members:
:undoc-members:
:show-inheritance:

View File

@ -1,30 +0,0 @@
bonobo.ext.jupyter package
==========================
Submodules
----------
bonobo.ext.jupyter.plugin module
--------------------------------
.. automodule:: bonobo.ext.jupyter.plugin
:members:
:undoc-members:
:show-inheritance:
bonobo.ext.jupyter.widget module
--------------------------------
.. automodule:: bonobo.ext.jupyter.widget
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: bonobo.ext.jupyter
:members:
:undoc-members:
:show-inheritance:

View File

@ -1,46 +0,0 @@
bonobo.ext package
==================
Subpackages
-----------
.. toctree::
bonobo.ext.console
bonobo.ext.jupyter
Submodules
----------
bonobo.ext.couchdb_ module
--------------------------
.. automodule:: bonobo.ext.couchdb_
:members:
:undoc-members:
:show-inheritance:
bonobo.ext.opendatasoft module
------------------------------
.. automodule:: bonobo.ext.opendatasoft
:members:
:undoc-members:
:show-inheritance:
bonobo.ext.selenium module
--------------------------
.. automodule:: bonobo.ext.selenium
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: bonobo.ext
:members:
:undoc-members:
:show-inheritance:

View File

@ -58,7 +58,10 @@ Let's chain the three components together and run the transformation:
digraph {
rankdir = LR;
"generate_data" -> "uppercase" -> "output";
stylesheet = "../_static/graphs.css";
BEGIN [shape="point"];
BEGIN -> "generate_data" -> "uppercase" -> "output";
}
We use the :func:`bonobo.run` helper that hides the underlying object composition necessary to actually run the