[doc] Updating guides in documentation

This commit is contained in:
Romain Dorgueil
2017-10-08 13:13:20 +02:00
parent 736f73d5ce
commit 7fa9a2be5b
12 changed files with 557 additions and 111 deletions

View File

@ -1,8 +1,90 @@
Transformations
===============
Here is some guidelines on how to write transformations, to avoid the convention-jungle that could happen without
a few rules.
Transformations are the smallest building blocks in Bonobo ETL.
They are written using standard python callables (or iterables, if you're writing transformations that have no input,
a.k.a extractors).
Definitions
:::::::::::
Transformation
The base building block of Bonobo, anything you would insert in a graph as a node. Mostly, a callable or an iterable.
Extractor
Special case transformation that use no input. It will be only called once, and its purpose is to generate data,
either by itself or by requesting it from an external service.
Loader
Special case transformation that feed an external service with data. For convenience, it can also yield the data but
a "pure" loader would have no output (although yielding things should have no bad side effect).
Callable
Anything one can call, in python. Can be a function, a python builtin, or anything that implements `__call__`
Iterable
Something we can iterate on, in python, so basically anything you'd be able to use in a `for` loop.
Function based transformations
::::::::::::::::::::::::::::::
The most basic transformations are function-based. Which means that you define a function, and it will be used directly
in a graph.
.. code-block:: python
def get_representation(row):
return repr(row)
graph = bonobo.Graph(
[...],
get_representation,
[...],
)
It does not allow any configuration, but if it's an option, prefer it as it's simpler to write.
Class based transformations
:::::::::::::::::::::::::::
For less basic use cases, you'll want to use classes to define some of your transformations. It's also a better choice
to build reusable blocks, as you'll be able to create parametrizable transformations that the end user will be able to
configure at the last minute.
Configurable
------------
.. autoclass:: bonobo.config.Configurable
Options
-------
.. autoclass:: bonobo.config.Option
Services
--------
.. autoclass:: bonobo.config.Service
Methods
-------
.. autoclass:: bonobo.config.Method
ContextProcessors
-----------------
.. autoclass:: bonobo.config.ContextProcessor
Naming conventions
@ -44,50 +126,35 @@ can be used as a graph node, then use camelcase names:
upper = Apply(str.upper)
Function based transformations
::::::::::::::::::::::::::::::
Testing
:::::::
As Bonobo use plain old python objects as transformations, it's very easy to unit test your transformations using your
favourite testing framework. We're using pytest internally for Bonobo, but it's up to you to use the one you prefer.
If you want to test a transformation with the surrounding context provided (for example, service instances injected, and
context processors applied), you can use :class:`bonobo.execution.NodeExecutionContext` as a context processor and have
bonobo send the data to your transformation.
The most basic transformations are function-based. Which means that you define a function, and it will be used directly
in a graph.
.. code-block:: python
def get_representation(row):
return repr(row)
from bonobo.constants import BEGIN, END
from bonobo.execution import NodeExecutionContext
graph = bonobo.Graph(
[...],
get_representation,
)
with NodeExecutionContext(
JsonWriter(filename), services={'fs': ...}
) as context:
# Write a list of rows, including BEGIN/END control messages.
context.write(
BEGIN,
Bag({'foo': 'bar'}),
Bag({'foo': 'baz'}),
END
)
It does not allow any configuration, but if it's an option, prefer it as it's simpler to write.
Class based transformations
:::::::::::::::::::::::::::
A lot of logic is a bit more complex, and you'll want to use classes to define some of your transformations.
The :class:`bonobo.config.Configurable` class gives you a few toys to write configurable transformations.
Options
-------
.. autoclass:: bonobo.config.Option
Services
--------
.. autoclass:: bonobo.config.Service
Methods
-------
.. autoclass:: bonobo.config.Method
ContextProcessors
-----------------
.. autoclass:: bonobo.config.ContextProcessor
# Out of the bonobo main loop, we need to call `step` explicitely.
context.step()
context.step()