[doc] cleanup & refactorings

This commit is contained in:
Romain Dorgueil
2017-10-03 08:37:46 +02:00
parent 2ab48080e6
commit d936e164ac
10 changed files with 97 additions and 45 deletions

14
docs/extension/index.rst Normal file
View File

@ -0,0 +1,14 @@
Extensions
==========
Extensions contains all things needed to work with a few popular third party tools.
Most of them are available as optional extra dependencies, and the maturity stage of each may vary.
.. toctree::
:maxdepth: 2
docker
jupyter
selenium
sqlalchemy

11
docs/guide/graphs.rst Normal file
View File

@ -0,0 +1,11 @@
Graphs
======
Writing graphs
::::::::::::::
Debugging graphs
::::::::::::::::
Executing graphs
::::::::::::::::

View File

@ -6,18 +6,9 @@ Here are a few guides and best practices to work with bonobo.
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
purity graphs
transformations transformations
services services
environment environment
purity
There is a also few extensions that ease the use of the library with third party tools. Each integration is
available as an optional extra dependency, and the maturity stage of each extension vary.
.. toctree::
:maxdepth: 2
ext/docker
ext/jupyter
ext/selenium
ext/sqlalchemy

View File

@ -1,34 +1,39 @@
Pure transformations Best Practices
==================== ==============
The nature of components, and how the data flow from one to another, can be a bit tricky. The nature of components, and how the data flow from one to another, can be a bit tricky.
Hopefully, they should be very easy to write with a few hints. Hopefully, they should be very easy to write with a few hints.
The major problem we have is that one message (underlying implementation: :class:`bonobo.structs.bags.Bag`) can go Pure transformations
through more than one component, and at the same time. If you wanna be safe, you tend to :func:`copy.copy()` everything ::::::::::::::::::::
between two calls to two different components, but that's very expensive.
Instead, we chose the opposite: copies are never made, and you should not modify in place the inputs of your One “message” (a.k.a :class:`bonobo.Bag` instance) may go through more than one component, and at the same time.
component before yielding them, and that mostly means that you want to recreate dicts and lists before yielding (or To ensure your code is safe, one could :func:`copy.copy()` each message on each transformation input but that's quite
returning) them. Numeric values, strings and tuples being immutable in python, modifying a variable of one of those expensive, especially because it may not be needed.
type will already return a different instance.
Instead, we chose the opposite: copies are never made, instead you should not modify in place the inputs of your
component before yielding them, which that mostly means that you want to recreate dicts and lists before yielding if
their values changed.
Numeric values, strings and tuples being immutable in python, modifying a variable of one of those type will already
return a different instance.
Examples will be shown with `return` statements, of course you can do the same with `yield` statements in generators. Examples will be shown with `return` statements, of course you can do the same with `yield` statements in generators.
Numbers Numbers
::::::: -------
In python, numbers are immutable. So you can't be wrong with numbers. All of the following are correct. In python, numbers are immutable. So you can't be wrong with numbers. All of the following are correct.
.. code-block:: python .. code-block:: python
def do_your_number_thing(n: int) -> int: def do_your_number_thing(n):
return n return n
def do_your_number_thing(n: int) -> int: def do_your_number_thing(n):
return n + 1 return n + 1
def do_your_number_thing(n: int) -> int: def do_your_number_thing(n):
# correct, but bad style # correct, but bad style
n += 1 n += 1
return n return n
@ -37,37 +42,37 @@ The same is true with other numeric types, so don't be shy.
Tuples Tuples
:::::: ------
Tuples are immutable, so you risk nothing. Tuples are immutable, so you risk nothing.
.. code-block:: python .. code-block:: python
def do_your_tuple_thing(t: tuple) -> tuple: def do_your_tuple_thing(t):
return ('foo', ) + t return ('foo', ) + t
def do_your_tuple_thing(t: tuple) -> tuple: def do_your_tuple_thing(t):
return t + ('bar', ) return t + ('bar', )
def do_your_tuple_thing(t: tuple) -> tuple: def do_your_tuple_thing(t):
# correct, but bad style # correct, but bad style
t += ('baaaz', ) t += ('baaaz', )
return t return t
Strings Strings
::::::: -------
You know the drill, strings are immutable. You know the drill, strings are immutable, too.
.. code-block:: python .. code-block:: python
def do_your_str_thing(t: str) -> str: def do_your_str_thing(t):
return 'foo ' + t + ' bar' return 'foo ' + t + ' bar'
def do_your_str_thing(t: str) -> str: def do_your_str_thing(t):
return ' '.join(('foo', t, 'bar', )) return ' '.join(('foo', t, 'bar', ))
def do_your_str_thing(t: str) -> str: def do_your_str_thing(t):
return 'foo {} bar'.format(t) return 'foo {} bar'.format(t)
You can, if you're using python 3.6+, use `f-strings <https://docs.python.org/3/reference/lexical_analysis.html#f-strings>`_, You can, if you're using python 3.6+, use `f-strings <https://docs.python.org/3/reference/lexical_analysis.html#f-strings>`_,
@ -75,15 +80,15 @@ but the core bonobo libraries won't use it to stay 3.5 compatible.
Dicts Dicts
::::: -----
So, now it gets interesting. Dicts are mutable. It means that you can mess things up if you're not cautious. So, now it gets interesting. Dicts are mutable. It means that you can mess things up if you're not cautious.
For example, doing the following may cause unexpected problems: For example, doing the following may (will) cause unexpected problems:
.. code-block:: python .. code-block:: python
def mutate_my_dict_like_crazy(d: dict) -> dict: def mutate_my_dict_like_crazy(d):
# Bad! Don't do that! # Bad! Don't do that!
d.update({ d.update({
'foo': compute_something() 'foo': compute_something()
@ -112,7 +117,7 @@ Now let's see how to do it correctly:
.. code-block:: python .. code-block:: python
def new_dicts_like_crazy(d: dict) -> dict: def new_dicts_like_crazy(d):
# Creating a new dict is correct. # Creating a new dict is correct.
return { return {
**d, **d,
@ -120,7 +125,7 @@ Now let's see how to do it correctly:
'bar': compute_anotherthing(), 'bar': compute_anotherthing(),
} }
def new_dict_and_yield() -> dict: def new_dict_and_yield():
d = {} d = {}
for i in range(100): for i in range(100):
# Different dict each time. # Different dict each time.
@ -133,8 +138,8 @@ I bet you think «Yeah, but if I create like millions of dicts ...».
Let's say we chose the opposite way and copied the dict outside the transformation (in fact, `it's what we did in bonobo's Let's say we chose the opposite way and copied the dict outside the transformation (in fact, `it's what we did in bonobo's
ancestor <https://github.com/rdcli/rdc.etl/blob/dev/rdc/etl/io/__init__.py#L187>`_). This means you will also create the ancestor <https://github.com/rdcli/rdc.etl/blob/dev/rdc/etl/io/__init__.py#L187>`_). This means you will also create the
same number of dicts, the difference is that you won't even notice it. Also, it means that if you want to yield the same same number of dicts, the difference is that you won't even notice it. Also, it means that if you want to yield the same
dict 1 million times , going "pure" makes it efficient (you'll just yield the same object 1 million times) while going "copy dict 1 million times, going "pure" makes it efficient (you'll just yield the same object 1 million times) while going
crazy" will create 1 million objects. "copy crazy" would create 1 million identical objects.
Using dicts like this will create a lot of dicts, but also free them as soon as all the future components that take this dict Using dicts like this will create a lot of dicts, but also free them as soon as all the future components that take this dict
as input are done. Also, one important thing to note is that most primitive data structures in python are immutable, so creating as input are done. Also, one important thing to note is that most primitive data structures in python are immutable, so creating

View File

@ -7,6 +7,7 @@ Bonobo
install install
tutorial/index tutorial/index
guide/index guide/index
extension/index
reference/index reference/index
faq faq
contribute/index contribute/index

View File

@ -1,6 +1,7 @@
Installation Installation
============ ============
Create an ETL project Create an ETL project
::::::::::::::::::::: :::::::::::::::::::::
@ -15,6 +16,7 @@ Creating a project and starting to write code should take less than a minute:
Once you bootstrapped a project, you can start editing the default example transformation by editing Once you bootstrapped a project, you can start editing the default example transformation by editing
`my-etl-project/main.py`. Now, you can head to :doc:`tutorial/index`. `my-etl-project/main.py`. Now, you can head to :doc:`tutorial/index`.
Other installation options Other installation options
:::::::::::::::::::::::::: ::::::::::::::::::::::::::
@ -27,6 +29,7 @@ You can install it directly from the `Python Package Index <https://pypi.python.
$ pip install bonobo $ pip install bonobo
Install from source Install from source
------------------- -------------------
@ -39,6 +42,13 @@ below).
$ pip install git+https://github.com/python-bonobo/bonobo.git@develop#egg=bonobo $ pip install git+https://github.com/python-bonobo/bonobo.git@develop#egg=bonobo
.. note::
Here, we use the `develop` branch, which is the incoming unreleased minor version. It's the way to "live on the
edge", either to test your codebase with a future release, or to test unreleased features. You can use this
technique to install any branch you want, and even a branch in your own repository.
Editable install Editable install
---------------- ----------------
@ -48,9 +58,11 @@ of your python interpreter.
.. code-block:: shell-session .. code-block:: shell-session
$ pip install --editable git+https://github.com/python-bonobo/bonobo.git@master#egg=bonobo $ pip install --editable git+https://github.com/python-bonobo/bonobo.git@develop#egg=bonobo
.. note:: You can also use the `-e` flag instead of the long version. .. note:: You can also use `-e`, the shorthand version of `--editable`.
.. note:: Once again, we use `develop` here. New features should go to `develop`, while bugfixes can go to `master`.
If you can't find the "source" directory, try trunning this: If you can't find the "source" directory, try trunning this:
@ -58,6 +70,9 @@ If you can't find the "source" directory, try trunning this:
$ python -c "import bonobo; print(bonobo.__path__)" $ python -c "import bonobo; print(bonobo.__path__)"
Local clone
-----------
Another option is to have a "local" editable install, which means you create the clone by yourself and make an editable install Another option is to have a "local" editable install, which means you create the clone by yourself and make an editable install
from the local clone. from the local clone.
@ -78,10 +93,25 @@ I usually name the git remote for the main bonobo repository "upstream", and my
Of course, replace my github username by the one you used to fork bonobo. You should be good to go! Of course, replace my github username by the one you used to fork bonobo. You should be good to go!
Windows support Supported platforms
::::::::::::::: :::::::::::::::::::
There are minor issues on the windows platform, mostly due to the fact bonobo was not developed by experienced windows Linux, OSX and other Unixes
---------------------------
Bonobo test suite runs continuously on Linux, and core developpers use both OSX and Linux machines. Also, there are jobs
running on production linux machines everyday, so the support for those platforms should be quite excellent.
If you're using some esotheric UNIX machine, there can be surprises (although we're not aware, yet). We do not support
officially those platforms, but if you can actually fix the problems on those systems, we'll be glad to integrate
your patches (as long as it is tested, for both existing linux environments and your strange systems).
Windows
-------
Windows support is correct, as a few contributors helped us to test and fix the quirks.
There may still be minor issues on the windows platform, mostly due to the fact bonobo was not developed by windows
users. users.
We're trying to look into that but energy available to provide serious support on windows is very limited. We're trying to look into that but energy available to provide serious support on windows is very limited.