Documenting transformations and configurables.
This commit is contained in:
32
docs/_templates/index.html
vendored
32
docs/_templates/index.html
vendored
@ -3,10 +3,7 @@
|
||||
{% block body %}
|
||||
|
||||
<div style="border: 2px solid red; font-weight: bold; margin: 1em; padding: 1em">
|
||||
Bonobo is currently <strong>ALPHA</strong> software. That means that the doc is not finished, and that
|
||||
some APIs will change.<br>
|
||||
There are a lot of missing sections, including comparison with other tools. But if you're looking for a
|
||||
replacement for X, unless X is an ETL, bonobo is probably not what you want.
|
||||
Bonobo is <strong>ALPHA</strong> software. Some APIs will change.
|
||||
</div>
|
||||
|
||||
<h1 style="text-align: center">
|
||||
@ -16,26 +13,12 @@
|
||||
|
||||
<p>
|
||||
{% trans %}
|
||||
<strong>Bonobo</strong> is a line-by-line data-processing toolkit for python 3.5+ emphasizing simple and
|
||||
atomic data transformations defined using a directed graph of plain old python callables (functions and
|
||||
generators).
|
||||
<strong>Bonobo</strong> is a line-by-line data-processing toolkit for python 3.5+ (extract-transform-load
|
||||
framework) emphasizing simple and atomic data transformations defined using a directed graph of plain old
|
||||
python objects (functions, iterables, generators, ...).
|
||||
{% endtrans %}
|
||||
</p>
|
||||
|
||||
<p>
|
||||
{% trans %}
|
||||
<strong>Bonobo</strong> is a extract-transform-load framework that uses python code to define transformations.
|
||||
{% endtrans %}
|
||||
</p>
|
||||
|
||||
<p>
|
||||
{% trans %}
|
||||
<strong>Bonobo</strong> is your own data-monkey army. Tedious and repetitive data-processing incoming? Give
|
||||
it a try!
|
||||
{% endtrans %}
|
||||
</p>
|
||||
|
||||
|
||||
<h2 style="margin-bottom: 0">{% trans %}Documentation{% endtrans %}</h2>
|
||||
|
||||
<table class="contentstable">
|
||||
@ -95,8 +78,9 @@
|
||||
</li>
|
||||
<li>
|
||||
{% trans %}
|
||||
<b>Dependency injection:</b> Abstract the transformation dependencies to easily switch data sources and
|
||||
used libraries, allowing to easily test your transformations.
|
||||
<b>Service injection:</b> Abstract the transformation dependencies to easily switch data sources and
|
||||
dependant libraries. You'll be able to specify the concrete implementations or configurations at
|
||||
runtime, for example to switch a database connection string or an API endpoint.
|
||||
{% endtrans %}
|
||||
</li>
|
||||
<li>
|
||||
@ -107,7 +91,7 @@
|
||||
</li>
|
||||
<li>
|
||||
{% trans %}
|
||||
Work in progress: read the <a href="https://www.bonobo-project.org/roadmap">roadmap</a>.
|
||||
Bonobo is young, and the todo-list is huge. Read the <a href="https://www.bonobo-project.org/roadmap">roadmap</a>.
|
||||
{% endtrans %}
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
@ -10,6 +10,7 @@ There are a few things that you should know while writing transformations graphs
|
||||
:maxdepth: 2
|
||||
|
||||
purity
|
||||
transformations
|
||||
services
|
||||
|
||||
Third party integrations
|
||||
|
||||
@ -1,20 +1,18 @@
|
||||
Services and dependencies (draft implementation)
|
||||
================================================
|
||||
Services and dependencies
|
||||
=========================
|
||||
|
||||
:Status: Draft implementation
|
||||
:Stability: Alpha
|
||||
:Last-Modified: 28 apr 2017
|
||||
:Last-Modified: 20 may 2017
|
||||
|
||||
Most probably, you'll want to use external systems within your transformations. Those systems may include databases,
|
||||
apis (using http, for example), filesystems, etc.
|
||||
You'll probably want to use external systems within your transformations. Those systems may include databases, apis
|
||||
(using http, for example), filesystems, etc.
|
||||
|
||||
You can start by hardcoding those services. That does the job, at first.
|
||||
|
||||
If you're going a little further than that, you'll feel limited, for a few reasons:
|
||||
|
||||
* Hardcoded and tightly linked dependencies make your transformations hard to test, and hard to reuse.
|
||||
* Processing data on your laptop is great, but being able to do it on different systems (or stages), in different
|
||||
environments, is more realistic? You probably want to contigure a different database on a staging environment,
|
||||
* Processing data on your laptop is great, but being able to do it on different target systems (or stages), in different
|
||||
environments, is more realistic. You'll want to contigure a different database on a staging environment,
|
||||
preprod environment or production system. Maybe you have silimar systems for different clients and want to select
|
||||
the system at runtime. Etc.
|
||||
|
||||
@ -52,10 +50,11 @@ injected to your calls under the parameter name "database".
|
||||
Function-based transformations
|
||||
------------------------------
|
||||
|
||||
No implementation yet, but expect something similar to CBT API, maybe using a `@Service(...)` decorator.
|
||||
No implementation yet, but expect something similar to CBT API, maybe using a `@Service(...)` decorator. See
|
||||
`issue #70 <https://github.com/python-bonobo/bonobo/issues/70>`_.
|
||||
|
||||
Execution
|
||||
---------
|
||||
Provide implementation at run time
|
||||
----------------------------------
|
||||
|
||||
Let's see how to execute it:
|
||||
|
||||
|
||||
89
docs/guide/transformations.rst
Normal file
89
docs/guide/transformations.rst
Normal file
@ -0,0 +1,89 @@
|
||||
Transformations
|
||||
===============
|
||||
|
||||
Here is some guidelines on how to write transformations, to avoid the convention-jungle that could happen without
|
||||
a few rules.
|
||||
|
||||
|
||||
Naming conventions
|
||||
::::::::::::::::::
|
||||
|
||||
The naming convention used is the following.
|
||||
|
||||
If you're naming something which is an actual transformation, that can be used directly as a graph node, then use
|
||||
underscores and lowercase names:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# instance of a class based transformation
|
||||
filter = Filter(...)
|
||||
|
||||
# function based transformation
|
||||
def uppercase(s: str) -> str:
|
||||
return s.upper()
|
||||
|
||||
If you're naming something which is configurable, that will need to be instanciated or called to obtain something that
|
||||
can be used as a graph node, then use camelcase names:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# configurable
|
||||
class ChangeCase(Configurable):
|
||||
modifier = Option(default='upper')
|
||||
def call(self, s: str) -> str:
|
||||
return getattr(s, self.modifier)()
|
||||
|
||||
# transformation factory
|
||||
def Apply(method):
|
||||
@functools.wraps(method)
|
||||
def apply(s: str) -> str:
|
||||
return method(s)
|
||||
return apply
|
||||
|
||||
# result is a graph node candidate
|
||||
upper = Apply(str.upper)
|
||||
|
||||
|
||||
Function based transformations
|
||||
::::::::::::::::::::::::::::::
|
||||
|
||||
The most basic transformations are function-based. Which means that you define a function, and it will be used directly
|
||||
in a graph.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def get_representation(row):
|
||||
return repr(row)
|
||||
|
||||
graph = bonobo.Graph(
|
||||
[...],
|
||||
get_representation,
|
||||
)
|
||||
|
||||
|
||||
It does not allow any configuration, but if it's an option, prefer it as it's simpler to write.
|
||||
|
||||
|
||||
Class based transformations
|
||||
:::::::::::::::::::::::::::
|
||||
|
||||
A lot of logic is a bit more complex, and you'll want to use classes to define some of your transformations.
|
||||
|
||||
The :class:`bonobo.config.Configurable` class gives you a few toys to write configurable transformations.
|
||||
|
||||
Options
|
||||
-------
|
||||
|
||||
.. autoclass:: bonobo.config.Option
|
||||
|
||||
Services
|
||||
--------
|
||||
|
||||
.. autoclass:: bonobo.config.Service
|
||||
|
||||
Method
|
||||
------
|
||||
|
||||
.. autoclass:: bonobo.config.Method
|
||||
|
||||
|
||||
@ -1,23 +1,29 @@
|
||||
First steps
|
||||
===========
|
||||
|
||||
We tried hard to make **Bonobo** simple. We use simple python, and we believe it should be simple to learn.
|
||||
Bonobo uses simple python and should be quick and easy to learn.
|
||||
|
||||
What is Bonobo?
|
||||
:::::::::::::::
|
||||
|
||||
Bonobo is an ETL (Extract-Transform-Load) framework for python 3.5. The goal is to define data-transformations, with
|
||||
python code in charge of handling similar shaped independant lines of data.
|
||||
|
||||
Bonobo *is not* a statistical or data-science tool. If you're looking for a data-analysis tool in python, use Pandas.
|
||||
|
||||
Bonobo is a lean manufacturing assembly line for data that let you focus on the actual work instead of the plumbery.
|
||||
|
||||
|
||||
Tutorial
|
||||
::::::::
|
||||
|
||||
We strongly advice that even if you're an advanced python developper, you go through the whole tutorial for two
|
||||
reasons: that should be sufficient to do anything possible with **Bonobo** and that's a good moment to learn the few
|
||||
concepts you'll see everywhere in the software.
|
||||
|
||||
If you're not familiar with python, you should first read :doc:`python`.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
tut01
|
||||
tut02
|
||||
|
||||
|
||||
What's next?
|
||||
::::::::::::
|
||||
|
||||
@ -39,3 +45,4 @@ Read about integrating external tools with bonobo
|
||||
* :doc:`../guide/ext/jupyter`: run transformations within jupyter notebooks.
|
||||
* :doc:`../guide/ext/selenium`: run
|
||||
* :doc:`../guide/ext/sqlalchemy`: everything you need to interract with SQL databases.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user