[doc] cleanup & refactorings
This commit is contained in:
@ -1,8 +0,0 @@
|
||||
Docker Extension
|
||||
================
|
||||
|
||||
.. todo:: The `bonobo-docker` package is at a very alpha stage, and things will change. This section is here to give a
|
||||
brief overview but is neither complete nor definitive.
|
||||
|
||||
Read the introduction: https://www.bonobo-project.org/with/docker
|
||||
|
||||
@ -1,41 +0,0 @@
|
||||
Jupyter Extension
|
||||
=================
|
||||
|
||||
There is a builtin plugin that integrates (somewhat minimallistically, for now) bonobo within jupyter notebooks, so
|
||||
you can read the execution status of a graph within a nice (ok, not so nice) html/javascript widget.
|
||||
|
||||
See https://github.com/jupyter-widgets/widget-cookiecutter for the base template used.
|
||||
|
||||
Installation
|
||||
::::::::::::
|
||||
|
||||
Install `bonobo` with the **jupyter** extra::
|
||||
|
||||
pip install bonobo[jupyter]
|
||||
|
||||
Install the jupyter extension::
|
||||
|
||||
jupyter nbextension enable --py --sys-prefix widgetsnbextension
|
||||
jupyter nbextension enable --py --sys-prefix bonobo.ext.jupyter
|
||||
|
||||
Development
|
||||
:::::::::::
|
||||
|
||||
You should favor yarn over npm to install node packages. If you prefer to use npm, it's up to you to adapt the code.
|
||||
|
||||
To install the widget for development, make sure you're using an editable install of bonobo (see install document)::
|
||||
|
||||
jupyter nbextension install --py --symlink --sys-prefix bonobo.ext.jupyter
|
||||
jupyter nbextension enable --py --sys-prefix bonobo.ext.jupyter
|
||||
|
||||
If you want to change the javascript, you should run webpack in watch mode in some terminal::
|
||||
|
||||
cd bonobo/ext/jupyter/js
|
||||
yarn install
|
||||
./node_modules/.bin/webpack --watch
|
||||
|
||||
To compile the widget into a distributable version (which gets packaged on PyPI when a release is made), just run
|
||||
webpack::
|
||||
|
||||
./node_modules/.bin/webpack
|
||||
|
||||
@ -1,42 +0,0 @@
|
||||
Selenium Extension
|
||||
==================
|
||||
|
||||
.. todo:: The `bonobo-selenium` package is at a very alpha stage, and things will change. This section is here to give a
|
||||
brief overview but is neither complete nor definitive.
|
||||
|
||||
|
||||
Writing web crawlers with Bonobo and Selenium is easy.
|
||||
|
||||
First, install **bonobo-selenium**:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ pip install bonobo-selenium
|
||||
|
||||
The idea is to have one callable crawl one thing and delegate drill downs to callables further away in the chain.
|
||||
|
||||
An example chain could be:
|
||||
|
||||
.. graphviz::
|
||||
|
||||
digraph {
|
||||
rankdir = LR;
|
||||
login -> paginate -> list -> details -> "ExcelWriter(...)";
|
||||
}
|
||||
|
||||
Where each step would do the following:
|
||||
|
||||
* `login()` is in charge to open an authenticated session in the browser.
|
||||
* `paginate()` open each page of a fictive list and pass it to next.
|
||||
* `list()` take every list item and yield it.
|
||||
* `details()` extract the data you're interested in.
|
||||
* ... and the writer saves it somewhere.
|
||||
|
||||
Installation
|
||||
::::::::::::
|
||||
|
||||
Overview
|
||||
::::::::
|
||||
|
||||
Details
|
||||
:::::::
|
||||
@ -1,16 +0,0 @@
|
||||
SQLAlchemy Extension
|
||||
====================
|
||||
|
||||
.. todo:: The `bonobo-sqlalchemy` package is at a very alpha stage, and things will change. This section is here to
|
||||
give a brief overview but is neither complete nor definitive.
|
||||
|
||||
Read the introduction: https://www.bonobo-project.org/with/sqlalchemy
|
||||
|
||||
Installation
|
||||
::::::::::::
|
||||
|
||||
Overview
|
||||
::::::::
|
||||
|
||||
Details
|
||||
:::::::
|
||||
11
docs/guide/graphs.rst
Normal file
11
docs/guide/graphs.rst
Normal file
@ -0,0 +1,11 @@
|
||||
Graphs
|
||||
======
|
||||
|
||||
Writing graphs
|
||||
::::::::::::::
|
||||
|
||||
Debugging graphs
|
||||
::::::::::::::::
|
||||
|
||||
Executing graphs
|
||||
::::::::::::::::
|
||||
@ -6,18 +6,9 @@ Here are a few guides and best practices to work with bonobo.
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
purity
|
||||
graphs
|
||||
transformations
|
||||
services
|
||||
environment
|
||||
purity
|
||||
|
||||
There is a also few extensions that ease the use of the library with third party tools. Each integration is
|
||||
available as an optional extra dependency, and the maturity stage of each extension vary.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
ext/docker
|
||||
ext/jupyter
|
||||
ext/selenium
|
||||
ext/sqlalchemy
|
||||
|
||||
@ -1,34 +1,39 @@
|
||||
Pure transformations
|
||||
====================
|
||||
Best Practices
|
||||
==============
|
||||
|
||||
The nature of components, and how the data flow from one to another, can be a bit tricky.
|
||||
Hopefully, they should be very easy to write with a few hints.
|
||||
|
||||
The major problem we have is that one message (underlying implementation: :class:`bonobo.structs.bags.Bag`) can go
|
||||
through more than one component, and at the same time. If you wanna be safe, you tend to :func:`copy.copy()` everything
|
||||
between two calls to two different components, but that's very expensive.
|
||||
Pure transformations
|
||||
::::::::::::::::::::
|
||||
|
||||
Instead, we chose the opposite: copies are never made, and you should not modify in place the inputs of your
|
||||
component before yielding them, and that mostly means that you want to recreate dicts and lists before yielding (or
|
||||
returning) them. Numeric values, strings and tuples being immutable in python, modifying a variable of one of those
|
||||
type will already return a different instance.
|
||||
One “message” (a.k.a :class:`bonobo.Bag` instance) may go through more than one component, and at the same time.
|
||||
To ensure your code is safe, one could :func:`copy.copy()` each message on each transformation input but that's quite
|
||||
expensive, especially because it may not be needed.
|
||||
|
||||
Instead, we chose the opposite: copies are never made, instead you should not modify in place the inputs of your
|
||||
component before yielding them, which that mostly means that you want to recreate dicts and lists before yielding if
|
||||
their values changed.
|
||||
|
||||
Numeric values, strings and tuples being immutable in python, modifying a variable of one of those type will already
|
||||
return a different instance.
|
||||
|
||||
Examples will be shown with `return` statements, of course you can do the same with `yield` statements in generators.
|
||||
|
||||
Numbers
|
||||
:::::::
|
||||
-------
|
||||
|
||||
In python, numbers are immutable. So you can't be wrong with numbers. All of the following are correct.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def do_your_number_thing(n: int) -> int:
|
||||
def do_your_number_thing(n):
|
||||
return n
|
||||
|
||||
def do_your_number_thing(n: int) -> int:
|
||||
def do_your_number_thing(n):
|
||||
return n + 1
|
||||
|
||||
def do_your_number_thing(n: int) -> int:
|
||||
def do_your_number_thing(n):
|
||||
# correct, but bad style
|
||||
n += 1
|
||||
return n
|
||||
@ -37,37 +42,37 @@ The same is true with other numeric types, so don't be shy.
|
||||
|
||||
|
||||
Tuples
|
||||
::::::
|
||||
------
|
||||
|
||||
Tuples are immutable, so you risk nothing.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def do_your_tuple_thing(t: tuple) -> tuple:
|
||||
def do_your_tuple_thing(t):
|
||||
return ('foo', ) + t
|
||||
|
||||
def do_your_tuple_thing(t: tuple) -> tuple:
|
||||
def do_your_tuple_thing(t):
|
||||
return t + ('bar', )
|
||||
|
||||
def do_your_tuple_thing(t: tuple) -> tuple:
|
||||
def do_your_tuple_thing(t):
|
||||
# correct, but bad style
|
||||
t += ('baaaz', )
|
||||
return t
|
||||
|
||||
Strings
|
||||
:::::::
|
||||
-------
|
||||
|
||||
You know the drill, strings are immutable.
|
||||
You know the drill, strings are immutable, too.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def do_your_str_thing(t: str) -> str:
|
||||
def do_your_str_thing(t):
|
||||
return 'foo ' + t + ' bar'
|
||||
|
||||
def do_your_str_thing(t: str) -> str:
|
||||
def do_your_str_thing(t):
|
||||
return ' '.join(('foo', t, 'bar', ))
|
||||
|
||||
def do_your_str_thing(t: str) -> str:
|
||||
def do_your_str_thing(t):
|
||||
return 'foo {} bar'.format(t)
|
||||
|
||||
You can, if you're using python 3.6+, use `f-strings <https://docs.python.org/3/reference/lexical_analysis.html#f-strings>`_,
|
||||
@ -75,15 +80,15 @@ but the core bonobo libraries won't use it to stay 3.5 compatible.
|
||||
|
||||
|
||||
Dicts
|
||||
:::::
|
||||
-----
|
||||
|
||||
So, now it gets interesting. Dicts are mutable. It means that you can mess things up if you're not cautious.
|
||||
|
||||
For example, doing the following may cause unexpected problems:
|
||||
For example, doing the following may (will) cause unexpected problems:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def mutate_my_dict_like_crazy(d: dict) -> dict:
|
||||
def mutate_my_dict_like_crazy(d):
|
||||
# Bad! Don't do that!
|
||||
d.update({
|
||||
'foo': compute_something()
|
||||
@ -112,7 +117,7 @@ Now let's see how to do it correctly:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def new_dicts_like_crazy(d: dict) -> dict:
|
||||
def new_dicts_like_crazy(d):
|
||||
# Creating a new dict is correct.
|
||||
return {
|
||||
**d,
|
||||
@ -120,7 +125,7 @@ Now let's see how to do it correctly:
|
||||
'bar': compute_anotherthing(),
|
||||
}
|
||||
|
||||
def new_dict_and_yield() -> dict:
|
||||
def new_dict_and_yield():
|
||||
d = {}
|
||||
for i in range(100):
|
||||
# Different dict each time.
|
||||
@ -133,8 +138,8 @@ I bet you think «Yeah, but if I create like millions of dicts ...».
|
||||
Let's say we chose the opposite way and copied the dict outside the transformation (in fact, `it's what we did in bonobo's
|
||||
ancestor <https://github.com/rdcli/rdc.etl/blob/dev/rdc/etl/io/__init__.py#L187>`_). This means you will also create the
|
||||
same number of dicts, the difference is that you won't even notice it. Also, it means that if you want to yield the same
|
||||
dict 1 million times , going "pure" makes it efficient (you'll just yield the same object 1 million times) while going "copy
|
||||
crazy" will create 1 million objects.
|
||||
dict 1 million times, going "pure" makes it efficient (you'll just yield the same object 1 million times) while going
|
||||
"copy crazy" would create 1 million identical objects.
|
||||
|
||||
Using dicts like this will create a lot of dicts, but also free them as soon as all the future components that take this dict
|
||||
as input are done. Also, one important thing to note is that most primitive data structures in python are immutable, so creating
|
||||
|
||||
Reference in New Issue
Block a user