[doc] cleanup & refactorings
This commit is contained in:
14
docs/extension/index.rst
Normal file
14
docs/extension/index.rst
Normal file
@ -0,0 +1,14 @@
|
|||||||
|
Extensions
|
||||||
|
==========
|
||||||
|
|
||||||
|
Extensions contains all things needed to work with a few popular third party tools.
|
||||||
|
|
||||||
|
Most of them are available as optional extra dependencies, and the maturity stage of each may vary.
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
|
||||||
|
docker
|
||||||
|
jupyter
|
||||||
|
selenium
|
||||||
|
sqlalchemy
|
||||||
11
docs/guide/graphs.rst
Normal file
11
docs/guide/graphs.rst
Normal file
@ -0,0 +1,11 @@
|
|||||||
|
Graphs
|
||||||
|
======
|
||||||
|
|
||||||
|
Writing graphs
|
||||||
|
::::::::::::::
|
||||||
|
|
||||||
|
Debugging graphs
|
||||||
|
::::::::::::::::
|
||||||
|
|
||||||
|
Executing graphs
|
||||||
|
::::::::::::::::
|
||||||
@ -6,18 +6,9 @@ Here are a few guides and best practices to work with bonobo.
|
|||||||
.. toctree::
|
.. toctree::
|
||||||
:maxdepth: 2
|
:maxdepth: 2
|
||||||
|
|
||||||
purity
|
graphs
|
||||||
transformations
|
transformations
|
||||||
services
|
services
|
||||||
environment
|
environment
|
||||||
|
purity
|
||||||
|
|
||||||
There is a also few extensions that ease the use of the library with third party tools. Each integration is
|
|
||||||
available as an optional extra dependency, and the maturity stage of each extension vary.
|
|
||||||
|
|
||||||
.. toctree::
|
|
||||||
:maxdepth: 2
|
|
||||||
|
|
||||||
ext/docker
|
|
||||||
ext/jupyter
|
|
||||||
ext/selenium
|
|
||||||
ext/sqlalchemy
|
|
||||||
|
|||||||
@ -1,34 +1,39 @@
|
|||||||
Pure transformations
|
Best Practices
|
||||||
====================
|
==============
|
||||||
|
|
||||||
The nature of components, and how the data flow from one to another, can be a bit tricky.
|
The nature of components, and how the data flow from one to another, can be a bit tricky.
|
||||||
Hopefully, they should be very easy to write with a few hints.
|
Hopefully, they should be very easy to write with a few hints.
|
||||||
|
|
||||||
The major problem we have is that one message (underlying implementation: :class:`bonobo.structs.bags.Bag`) can go
|
Pure transformations
|
||||||
through more than one component, and at the same time. If you wanna be safe, you tend to :func:`copy.copy()` everything
|
::::::::::::::::::::
|
||||||
between two calls to two different components, but that's very expensive.
|
|
||||||
|
|
||||||
Instead, we chose the opposite: copies are never made, and you should not modify in place the inputs of your
|
One “message” (a.k.a :class:`bonobo.Bag` instance) may go through more than one component, and at the same time.
|
||||||
component before yielding them, and that mostly means that you want to recreate dicts and lists before yielding (or
|
To ensure your code is safe, one could :func:`copy.copy()` each message on each transformation input but that's quite
|
||||||
returning) them. Numeric values, strings and tuples being immutable in python, modifying a variable of one of those
|
expensive, especially because it may not be needed.
|
||||||
type will already return a different instance.
|
|
||||||
|
Instead, we chose the opposite: copies are never made, instead you should not modify in place the inputs of your
|
||||||
|
component before yielding them, which that mostly means that you want to recreate dicts and lists before yielding if
|
||||||
|
their values changed.
|
||||||
|
|
||||||
|
Numeric values, strings and tuples being immutable in python, modifying a variable of one of those type will already
|
||||||
|
return a different instance.
|
||||||
|
|
||||||
Examples will be shown with `return` statements, of course you can do the same with `yield` statements in generators.
|
Examples will be shown with `return` statements, of course you can do the same with `yield` statements in generators.
|
||||||
|
|
||||||
Numbers
|
Numbers
|
||||||
:::::::
|
-------
|
||||||
|
|
||||||
In python, numbers are immutable. So you can't be wrong with numbers. All of the following are correct.
|
In python, numbers are immutable. So you can't be wrong with numbers. All of the following are correct.
|
||||||
|
|
||||||
.. code-block:: python
|
.. code-block:: python
|
||||||
|
|
||||||
def do_your_number_thing(n: int) -> int:
|
def do_your_number_thing(n):
|
||||||
return n
|
return n
|
||||||
|
|
||||||
def do_your_number_thing(n: int) -> int:
|
def do_your_number_thing(n):
|
||||||
return n + 1
|
return n + 1
|
||||||
|
|
||||||
def do_your_number_thing(n: int) -> int:
|
def do_your_number_thing(n):
|
||||||
# correct, but bad style
|
# correct, but bad style
|
||||||
n += 1
|
n += 1
|
||||||
return n
|
return n
|
||||||
@ -37,37 +42,37 @@ The same is true with other numeric types, so don't be shy.
|
|||||||
|
|
||||||
|
|
||||||
Tuples
|
Tuples
|
||||||
::::::
|
------
|
||||||
|
|
||||||
Tuples are immutable, so you risk nothing.
|
Tuples are immutable, so you risk nothing.
|
||||||
|
|
||||||
.. code-block:: python
|
.. code-block:: python
|
||||||
|
|
||||||
def do_your_tuple_thing(t: tuple) -> tuple:
|
def do_your_tuple_thing(t):
|
||||||
return ('foo', ) + t
|
return ('foo', ) + t
|
||||||
|
|
||||||
def do_your_tuple_thing(t: tuple) -> tuple:
|
def do_your_tuple_thing(t):
|
||||||
return t + ('bar', )
|
return t + ('bar', )
|
||||||
|
|
||||||
def do_your_tuple_thing(t: tuple) -> tuple:
|
def do_your_tuple_thing(t):
|
||||||
# correct, but bad style
|
# correct, but bad style
|
||||||
t += ('baaaz', )
|
t += ('baaaz', )
|
||||||
return t
|
return t
|
||||||
|
|
||||||
Strings
|
Strings
|
||||||
:::::::
|
-------
|
||||||
|
|
||||||
You know the drill, strings are immutable.
|
You know the drill, strings are immutable, too.
|
||||||
|
|
||||||
.. code-block:: python
|
.. code-block:: python
|
||||||
|
|
||||||
def do_your_str_thing(t: str) -> str:
|
def do_your_str_thing(t):
|
||||||
return 'foo ' + t + ' bar'
|
return 'foo ' + t + ' bar'
|
||||||
|
|
||||||
def do_your_str_thing(t: str) -> str:
|
def do_your_str_thing(t):
|
||||||
return ' '.join(('foo', t, 'bar', ))
|
return ' '.join(('foo', t, 'bar', ))
|
||||||
|
|
||||||
def do_your_str_thing(t: str) -> str:
|
def do_your_str_thing(t):
|
||||||
return 'foo {} bar'.format(t)
|
return 'foo {} bar'.format(t)
|
||||||
|
|
||||||
You can, if you're using python 3.6+, use `f-strings <https://docs.python.org/3/reference/lexical_analysis.html#f-strings>`_,
|
You can, if you're using python 3.6+, use `f-strings <https://docs.python.org/3/reference/lexical_analysis.html#f-strings>`_,
|
||||||
@ -75,15 +80,15 @@ but the core bonobo libraries won't use it to stay 3.5 compatible.
|
|||||||
|
|
||||||
|
|
||||||
Dicts
|
Dicts
|
||||||
:::::
|
-----
|
||||||
|
|
||||||
So, now it gets interesting. Dicts are mutable. It means that you can mess things up if you're not cautious.
|
So, now it gets interesting. Dicts are mutable. It means that you can mess things up if you're not cautious.
|
||||||
|
|
||||||
For example, doing the following may cause unexpected problems:
|
For example, doing the following may (will) cause unexpected problems:
|
||||||
|
|
||||||
.. code-block:: python
|
.. code-block:: python
|
||||||
|
|
||||||
def mutate_my_dict_like_crazy(d: dict) -> dict:
|
def mutate_my_dict_like_crazy(d):
|
||||||
# Bad! Don't do that!
|
# Bad! Don't do that!
|
||||||
d.update({
|
d.update({
|
||||||
'foo': compute_something()
|
'foo': compute_something()
|
||||||
@ -112,7 +117,7 @@ Now let's see how to do it correctly:
|
|||||||
|
|
||||||
.. code-block:: python
|
.. code-block:: python
|
||||||
|
|
||||||
def new_dicts_like_crazy(d: dict) -> dict:
|
def new_dicts_like_crazy(d):
|
||||||
# Creating a new dict is correct.
|
# Creating a new dict is correct.
|
||||||
return {
|
return {
|
||||||
**d,
|
**d,
|
||||||
@ -120,7 +125,7 @@ Now let's see how to do it correctly:
|
|||||||
'bar': compute_anotherthing(),
|
'bar': compute_anotherthing(),
|
||||||
}
|
}
|
||||||
|
|
||||||
def new_dict_and_yield() -> dict:
|
def new_dict_and_yield():
|
||||||
d = {}
|
d = {}
|
||||||
for i in range(100):
|
for i in range(100):
|
||||||
# Different dict each time.
|
# Different dict each time.
|
||||||
@ -133,8 +138,8 @@ I bet you think «Yeah, but if I create like millions of dicts ...».
|
|||||||
Let's say we chose the opposite way and copied the dict outside the transformation (in fact, `it's what we did in bonobo's
|
Let's say we chose the opposite way and copied the dict outside the transformation (in fact, `it's what we did in bonobo's
|
||||||
ancestor <https://github.com/rdcli/rdc.etl/blob/dev/rdc/etl/io/__init__.py#L187>`_). This means you will also create the
|
ancestor <https://github.com/rdcli/rdc.etl/blob/dev/rdc/etl/io/__init__.py#L187>`_). This means you will also create the
|
||||||
same number of dicts, the difference is that you won't even notice it. Also, it means that if you want to yield the same
|
same number of dicts, the difference is that you won't even notice it. Also, it means that if you want to yield the same
|
||||||
dict 1 million times , going "pure" makes it efficient (you'll just yield the same object 1 million times) while going "copy
|
dict 1 million times, going "pure" makes it efficient (you'll just yield the same object 1 million times) while going
|
||||||
crazy" will create 1 million objects.
|
"copy crazy" would create 1 million identical objects.
|
||||||
|
|
||||||
Using dicts like this will create a lot of dicts, but also free them as soon as all the future components that take this dict
|
Using dicts like this will create a lot of dicts, but also free them as soon as all the future components that take this dict
|
||||||
as input are done. Also, one important thing to note is that most primitive data structures in python are immutable, so creating
|
as input are done. Also, one important thing to note is that most primitive data structures in python are immutable, so creating
|
||||||
|
|||||||
@ -7,6 +7,7 @@ Bonobo
|
|||||||
install
|
install
|
||||||
tutorial/index
|
tutorial/index
|
||||||
guide/index
|
guide/index
|
||||||
|
extension/index
|
||||||
reference/index
|
reference/index
|
||||||
faq
|
faq
|
||||||
contribute/index
|
contribute/index
|
||||||
|
|||||||
@ -1,6 +1,7 @@
|
|||||||
Installation
|
Installation
|
||||||
============
|
============
|
||||||
|
|
||||||
|
|
||||||
Create an ETL project
|
Create an ETL project
|
||||||
:::::::::::::::::::::
|
:::::::::::::::::::::
|
||||||
|
|
||||||
@ -15,6 +16,7 @@ Creating a project and starting to write code should take less than a minute:
|
|||||||
Once you bootstrapped a project, you can start editing the default example transformation by editing
|
Once you bootstrapped a project, you can start editing the default example transformation by editing
|
||||||
`my-etl-project/main.py`. Now, you can head to :doc:`tutorial/index`.
|
`my-etl-project/main.py`. Now, you can head to :doc:`tutorial/index`.
|
||||||
|
|
||||||
|
|
||||||
Other installation options
|
Other installation options
|
||||||
::::::::::::::::::::::::::
|
::::::::::::::::::::::::::
|
||||||
|
|
||||||
@ -27,6 +29,7 @@ You can install it directly from the `Python Package Index <https://pypi.python.
|
|||||||
|
|
||||||
$ pip install bonobo
|
$ pip install bonobo
|
||||||
|
|
||||||
|
|
||||||
Install from source
|
Install from source
|
||||||
-------------------
|
-------------------
|
||||||
|
|
||||||
@ -39,6 +42,13 @@ below).
|
|||||||
|
|
||||||
$ pip install git+https://github.com/python-bonobo/bonobo.git@develop#egg=bonobo
|
$ pip install git+https://github.com/python-bonobo/bonobo.git@develop#egg=bonobo
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Here, we use the `develop` branch, which is the incoming unreleased minor version. It's the way to "live on the
|
||||||
|
edge", either to test your codebase with a future release, or to test unreleased features. You can use this
|
||||||
|
technique to install any branch you want, and even a branch in your own repository.
|
||||||
|
|
||||||
|
|
||||||
Editable install
|
Editable install
|
||||||
----------------
|
----------------
|
||||||
|
|
||||||
@ -48,9 +58,11 @@ of your python interpreter.
|
|||||||
|
|
||||||
.. code-block:: shell-session
|
.. code-block:: shell-session
|
||||||
|
|
||||||
$ pip install --editable git+https://github.com/python-bonobo/bonobo.git@master#egg=bonobo
|
$ pip install --editable git+https://github.com/python-bonobo/bonobo.git@develop#egg=bonobo
|
||||||
|
|
||||||
.. note:: You can also use the `-e` flag instead of the long version.
|
.. note:: You can also use `-e`, the shorthand version of `--editable`.
|
||||||
|
|
||||||
|
.. note:: Once again, we use `develop` here. New features should go to `develop`, while bugfixes can go to `master`.
|
||||||
|
|
||||||
If you can't find the "source" directory, try trunning this:
|
If you can't find the "source" directory, try trunning this:
|
||||||
|
|
||||||
@ -58,6 +70,9 @@ If you can't find the "source" directory, try trunning this:
|
|||||||
|
|
||||||
$ python -c "import bonobo; print(bonobo.__path__)"
|
$ python -c "import bonobo; print(bonobo.__path__)"
|
||||||
|
|
||||||
|
Local clone
|
||||||
|
-----------
|
||||||
|
|
||||||
Another option is to have a "local" editable install, which means you create the clone by yourself and make an editable install
|
Another option is to have a "local" editable install, which means you create the clone by yourself and make an editable install
|
||||||
from the local clone.
|
from the local clone.
|
||||||
|
|
||||||
@ -78,10 +93,25 @@ I usually name the git remote for the main bonobo repository "upstream", and my
|
|||||||
|
|
||||||
Of course, replace my github username by the one you used to fork bonobo. You should be good to go!
|
Of course, replace my github username by the one you used to fork bonobo. You should be good to go!
|
||||||
|
|
||||||
Windows support
|
Supported platforms
|
||||||
:::::::::::::::
|
:::::::::::::::::::
|
||||||
|
|
||||||
There are minor issues on the windows platform, mostly due to the fact bonobo was not developed by experienced windows
|
Linux, OSX and other Unixes
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
Bonobo test suite runs continuously on Linux, and core developpers use both OSX and Linux machines. Also, there are jobs
|
||||||
|
running on production linux machines everyday, so the support for those platforms should be quite excellent.
|
||||||
|
|
||||||
|
If you're using some esotheric UNIX machine, there can be surprises (although we're not aware, yet). We do not support
|
||||||
|
officially those platforms, but if you can actually fix the problems on those systems, we'll be glad to integrate
|
||||||
|
your patches (as long as it is tested, for both existing linux environments and your strange systems).
|
||||||
|
|
||||||
|
Windows
|
||||||
|
-------
|
||||||
|
|
||||||
|
Windows support is correct, as a few contributors helped us to test and fix the quirks.
|
||||||
|
|
||||||
|
There may still be minor issues on the windows platform, mostly due to the fact bonobo was not developed by windows
|
||||||
users.
|
users.
|
||||||
|
|
||||||
We're trying to look into that but energy available to provide serious support on windows is very limited.
|
We're trying to look into that but energy available to provide serious support on windows is very limited.
|
||||||
|
|||||||
Reference in New Issue
Block a user