Merge remote-tracking branch 'upstream/develop'

This commit is contained in:
Romain Dorgueil
2018-01-08 07:41:16 +01:00
231 changed files with 7133 additions and 3877 deletions

View File

@ -1,3 +1,47 @@
svg {
border: 2px solid green
}
}
div.related {
width: 940px;
margin: 30px auto 0 auto;
}
@media screen and (max-width: 875px) {
div.related {
visibility: hidden;
display: none;
}
}
.brand {
font-family: 'Ubuntu', 'goudy old style', 'minion pro', 'bell mt', Georgia, 'Hiragino Mincho Pro', serif;
font-size: 0.9em;
}
div.sphinxsidebar h3 {
margin: 30px 0 10px 0;
}
div.admonition p.admonition-title {
font-family: 'Ubuntu', 'goudy old style', 'minion pro', 'bell mt', Georgia, 'Hiragino Mincho Pro', serif;
}
div.sphinxsidebarwrapper {
padding: 0;
}
div.note {
border: 0;
}
div.admonition {
padding: 20px;
}
.last {
margin-bottom: 0 !important;
}
pre {
padding: 6px 20px;
}

View File

@ -4,17 +4,8 @@
{%- block extrahead %}
{{ super() }}
<style>
div.related {
width: 940px;
margin: 30px auto 0 auto;
}
@media screen and (max-width: 875px) {
div.related {
visibility: hidden;
display: none;
}
}
</style>
<link href="https://fonts.googleapis.com/css?family=Ubuntu" rel="stylesheet">
{% endblock %}
{%- block footer %}

View File

@ -1,22 +1,21 @@
<h3>About Bonobo</h3>
<p>
Bonobo is a data-processing toolkit for python 3.5+, with emphasis on simplicity, atomicity and testability. Oh,
and performances, too!
Bonobo is a data-processing toolkit for python 3.5+, your swiss-army knife for everyday's data.
</p>
<h3>Other Formats</h3>
<p>
You can download the documentation in other formats as well:
Download the docs...
</p>
<ul>
<li><a href="http://readthedocs.org/projects/bonobo/downloads/pdf/master/">as PDF</a></li>
<li><a href="http://readthedocs.org/projects/bonobo/downloads/htmlzip/master/">as zipped HTML</a></li>
<li><a href="http://readthedocs.org/projects/bonobo/downloads/epub/master/">as EPUB</a></li>
<li><a href="http://readthedocs.org/projects/bonobo/downloads/pdf/master/" title="Bonobo ETL documentation as PDF">... as PDF</a></li>
<li><a href="http://readthedocs.org/projects/bonobo/downloads/htmlzip/master/" title="Bonobo ETL documentation as zipped HTML">... as zipped HTML</a></li>
<li><a href="http://readthedocs.org/projects/bonobo/downloads/epub/master/" title="Bonobo ETL documentation as EPUB">... as EPUB</a></li>
</ul>
<h3>Useful Links</h3>
<ul>
<li><a href="https://www.bonobo-project.org/">Bonobo ETL</a></li>
<li><a href="http://pypi.python.org/pypi/bonobo">Bonobo ETL @ PyPI</a></li>
<li><a href="http://github.com/python-bonobo/bonobo">Bonobo ETL @ GitHub</a></li>
<li><a href="https://www.bonobo-project.org/">Bonobo's homepage</a></li>
<li><a href="http://pypi.python.org/pypi/bonobo">Package on PyPI</a></li>
<li><a href="http://github.com/python-bonobo/bonobo">Source code on GitHub</a></li>
</ul>

View File

@ -1,10 +1,12 @@
<a href="{{ pathto(master_doc) }}" style="border: none">
<h1 style="text-align: center; margin: 0;">
<img class="logo" src="{{ pathto('_static/bonobo.png', 1) }}" title="Bonobo" style="width: 48px; height: 48px; vertical-align: bottom"/>
Bonobo
<img class="logo" src="{{ pathto('_static/bonobo.png', 1) }}" title="Bonobo" style="width: 40px; height: 40px; vertical-align: bottom"/>
<span class="brand">
Bonobo
</span>
</h1>
</a>
<p style="text-align: center">
<p style="text-align: center" class="first">
Data processing for humans.
</p>

131
docs/changelog-0.6.rst Normal file
View File

@ -0,0 +1,131 @@
Bonobo 0.6.0
::::::::::::
* Removes dead snippet. (Romain Dorgueil)
* Example datasets are now stored by bonobo minor version. (Romain Dorgueil)
* Removing datasets from the repository. (Romain Dorgueil)
* For some obscure reason, coverage is broken under python 3.7 making the test suite fail, disabled python3.7 in travis waiting for it to be fixed. (Romain Dorgueil)
* [tests] adding a spec to magicmock of nodes to avoid it being seen as partially configured nodes (Romain Dorgueil)
* Adds an OrderFields transformation factory, update examples. (Romain Dorgueil)
* Check partially configured transformations that are function based (aka transformation factories) on execution context setup. (Romain Dorgueil)
* Fix PrettyPrinter, output verbosity is now slightly more discreete. (Romain Dorgueil)
* Inheritance of bags and better jupyter output for pretty printer. (Romain Dorgueil)
* Documentation cosmetics. (Romain Dorgueil)
* Simple "examples" command that just show examples for now. (Romain Dorgueil)
* Rewritting Bags from scratch using a namedtuple approach, along with other (less major) updates. (Romain Dorgueil)
* Adding services to naive execution (Kenneth Koski)
* Fix another typo in `run` (Daniel Jilg)
* Fix two typos in the ContextProcessor documentation (Daniel Jilg)
* Core: refactoring contexts with more logical responsibilities, stopping to rely on kargs ordering for compat with python3.5 (Romain Dorgueil)
* Simplification of node execution context, handle_result is now in step() as it is the only logical place where this will actually be called. (Romain Dorgueil)
* Less strict CSV processing, to allow dirty input. (Romain Dorgueil)
* [stdlib] Adds Update(...) and FixedWindow(...) the the standard nodes provided with bonobo. (Romain Dorgueil)
* Adds a benchmarks directory with small scripts to test performances of things. (Romain Dorgueil)
* Moves jupyter extension to both bonobo.contrib.jupyter (for the jupyter widget) and to bonobo.plugins (for the executor-side plugin). (Romain Dorgueil)
* Fix examples with new module paths. (Romain Dorgueil)
* IOFormats: if no kwargs, then try with one positional argument. (Romain Dorgueil)
* Adds a __getattr__ dunder to ValueHolder to enable getting attributes, and especially method calls, on contained objects. (Romain Dorgueil)
* Moves ODS extension to contrib module. (Romain Dorgueil)
* Moves google extension to contrib module. (Romain Dorgueil)
* Moves django extension to contrib module. (Romain Dorgueil)
* Update graphs.rst (CW Andrews)
* Adds argument parser support to django extension. (Romain Dorgueil)
* Trying to understand conda... (Romain Dorgueil)
* Trying to understand conda... (Romain Dorgueil)
* Trying to understand conda... (Romain Dorgueil)
* Update conda conf so readthedocs can maybe build. (Romain Dorgueil)
* Working on the new version of the tutorial. Only Step1 implemented. (Romain Dorgueil)
* Adds a "bare" template, containing the very minimum you want to have in 90% of cases. (Romain Dorgueil)
* Fix default logging level, adds options to default template. (Romain Dorgueil)
* Skip failing order test for python 3.5 (temporary). (Romain Dorgueil)
* Switch to stable mondrian. (Romain Dorgueil)
* Moves timer to statistics utilities. (Romain Dorgueil)
* Adds basic test for convert command. (Romain Dorgueil)
* [tests] adds node context lifecycle test.( (Romain Dorgueil)
* Small changes in events, and associated tests. (Romain Dorgueil)
* [core] Moves bonobo.execution context related package to new bonobo.execution.contexts package, also moves bonobo.strategies to new bonobo.execution.strategies package, so everything related to execution is now contained under the bonobo.execution package. (Romain Dorgueil)
* Remove the sleep() in tick() that causes a minimum execution time of 2*PERIOD, more explicit status display and a small test case for console plugin. (Romain Dorgueil)
* [tests] Fix path usage for python 3.5 (Romain Dorgueil)
* Adds a test for default file init command. (Romain Dorgueil)
* Adds 3.7-dev target to travis runner. (Romain Dorgueil)
* Update requirements with first whistle stable. (Romain Dorgueil)
* [core] Refactoring to use an event dispatcher in the main thread. (Romain Dorgueil)
* Update to mondrian 0.4a0. (Romain Dorgueil)
* Fix imports. (Romain Dorgueil)
* Removing old error handler. (Romain Dorgueil)
* [errors] Move error handling in transformations to use mondrian. (Romain Dorgueil)
* [logging] Switching to mondrian, who got all our formating code. (Romain Dorgueil)
* Adds argument parser support in default template. (Romain Dorgueil)
* Adds the ability to initialize a package from bonobo init. (Romain Dorgueil)
* Still cleaning up. (Romain Dorgueil)
* [examples] comments. (Romain Dorgueil)
* Update dependencies, remove python-dotenv. (Romain Dorgueil)
* Remove unused argument. (Romain Dorgueil)
* Remove files in examples that are not used anymore. (Romain Dorgueil)
* Refactoring the runner to go more towards standard python, also adds the ability to use bonobo argument parser from standard python execution. (Romain Dorgueil)
* Removes cookiecutter. (Romain Dorgueil)
* Switch logger setup to mondrian (deps). (Romain Dorgueil)
* Module registry reimported as it is needed for "bonobo convert". (Romain Dorgueil)
* [core] Simplification: as truthfully stated by Maik at Pycon.DE sprint «lets try not to turn python into javascript». (Romain Dorgueil)
* [core] still refactoring env-related stuff towards using __main__ blocks (but with argparser, if needed). (Romain Dorgueil)
* [core] Refactoring of commands to move towards a more pythonic way of running the jobs. Commands are now classes, and bonobo "graph" related commands now hooks into bonobo.run() calls so it will use what you actually put in your __main__ block. (Romain Dorgueil)
* Minor test change. (Romain Dorgueil)
* [core] Change the token parsing part in prevision of different flags. (Romain Dorgueil)
* Support line-delimited JSON (Michael Penkov)
* Update Makefile/setup. (Romain Dorgueil)
* [tests] simplify assertion (Romain Dorgueil)
* Issue #134: use requests.get as a context manager (Michael Penkov)
* Issue #134: use requests instead of urllib (Michael Penkov)
* update Projectfile with download entry point (Michael Penkov)
* Issue #134: update documentation (Michael Penkov)
* Issue #134: add a `bonobo download url` command (Michael Penkov)
* commands.run: Enable relative imports in main.py (Stefan Zimmermann)
* adapt tutorial "Working with files" to the latest develop version (Peter Uebele)
* Add a note about the graph variable (Michael Penkov)
* [tests] trying to speed up the init test. (Romain Dorgueil)
* [tests] bonobo.util.objects (Romain Dorgueil)
* [nodes] Removing draft quality factory from bonobo main package, will live in separate personnal package until it is good enough to live here. (Romain Dorgueil)
* [tests] rename factory test and move bag detecting so any bag is returned as is as an output. (Romain Dorgueil)
* [core] Still refactoring the core behaviour of bags, starting to be much simpler. (Romain Dorgueil)
* Fix python 3.5 os.chdir not accepting LocalPath (arimbr)
* Remove unused shutil import (arimbr)
* Use pytest tmpdir fixture and add more init tests (arimbr)
* Check if target directory is empty instead of current directory and remove overwrite_if_exists argument (arimbr)
* Remove dispatcher as it is not a dependency, for now, and as such breaks the continuous integration (yes, again.). (Romain Dorgueil)
* Remove dispatcher as it is not a dependency, for now, and as such breaks the continuous integration. (Romain Dorgueil)
* Code formating. (Romain Dorgueil)
* [core] Testing and fixing new args/kwargs behaviour. (Romain Dorgueil)
* [core] simplification of result interpretation. (Romain Dorgueil)
* [tests] fix uncaptured output in test_commands (Romain Dorgueil)
* Documentation for new behaviour. (Romain Dorgueil)
* [django, misc] adds create_or_update to djangos ETLCommand class, adds getitem/setitem/contains dunders to ValueHolder. (Romain Dorgueil)
* [core] (..., dict) means Bag(..., **dict) (Romain Dorgueil)
* [django, google] Implements basic extensions for django and google oauth systems. (Romain Dorgueil)
* Test tweak to work for Windows CI. (cwandrews)
* Updated requirements files using edgy-project. (cwandrews)
* Updated Projectfile to include python-dotenv dependency. (cwandrews)
* Add tests for bonobo init new directory and init within empty directory (arimbr)
* Update environment.rst (CW Andrews)
* Update environment.rst (CW Andrews)
* Cast env_dir to string before passing to load_dotenv as passing a PosixPath to load_dotenv raises an exception in 3.5. (cwandrews)
* Updated environment documentation in guides to account for env files. (cwandrews)
* Added more tests and moved all env and env file testing to classes (it might make more sense to just move them to separate files?). (cwandrews)
* Moved env vars tests to class. (cwandrews)
* Updated .env >>> .env_one to include in repo (.env ignored). (cwandrews)
* [core] Refactoring IOFormats so there is one and only obvious way to send it. (Romain Dorgueil)
* Set cookiecutter overwrite_if_exists parameter to True if current directory is empty (arimbr)
* [cli/util] fix requires to use the right stack frame, remove --print as "-" does the job (Romain Dorgueil)
* [cli] Adds a --filter option to "convert" command, allowing to use arbitrary filters to a command line conversion. Also adds --print and "-" output to pretty print to terminal instead of file output. (Romain Dorgueil)
* [cli] convert, remove useless import. (Romain Dorgueil)
* [config] adds a __doc__ constructor kwarg to set option documentation inline. (Romain Dorgueil)
* [doc] formating (Romain Dorgueil)
* [cli] adds ability to override reader/writer options from cli convert. (Romain Dorgueil)
* comparison to None|True|False should be 'if cond is None:' (mouadhkaabachi)
* Fixed bug involved in finding env when running module. (cwandrews)
* Moved default-env-file tests to class. (cwandrews)
* Small adjustment to test parameters. (cwandrews)
* Added tests for running file with combinations of multiple default env files, env files, and env vars. Also reorganized environment directory in examples. (cwandrews)
* Updated requirements.txt and requirements-dev.txt to include python-dotenv and dependencies. (cwandrews)
* default-env-file, default-env, and env-file now in place alongside env. default-env-file and default-env both use os.environ.setdefault so as not to overwrite existing variables (system environment) while env-file and env will overwrite existing variables. All four allow for multiple values (***How might this affect multiple default-env and default-env-file values, I expect that unlike env-file and env the first passed variables would win). (cwandrews)
* Further Refactored the setting of env vars passed via the env flag. (cwandrews)
* Refactored setting of env vars passed via the env flag. (cwandrews)

View File

@ -1,6 +1,25 @@
Changelog
=========
Unreleased
::::::::::
* Cookiecutter usage is removed. Linked to the fact that bonobo now use either a single file (up to you to get python
imports working as you want) or a regular fully fledged python package, we do not need it anymore.
New features
------------
Command line
............
* `bonobo download /examples/datasets/coffeeshops.txt` now downloads the coffeeshops example
Graphs and Nodes
................
* New `LdjsonReader` and `LdjsonWriter` nodes for handling `line-delimited JSON <https://en.wikipedia.org/wiki/JSON_Streaming>`_.
v.0.5.0 - 5 october 2017
::::::::::::::::::::::::

View File

@ -1,8 +1,9 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import sys
import datetime
import os
import sys
sys.path.insert(0, os.path.abspath('..'))
sys.path.insert(0, os.path.abspath('_themes'))
@ -36,8 +37,8 @@ master_doc = 'index'
# General information about the project.
project = 'Bonobo'
copyright = '2012-2017, Romain Dorgueil'
author = 'Romain Dorgueil'
copyright = '2012-{}, {}'.format(datetime.datetime.now().year, author)
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
@ -185,3 +186,10 @@ epub_exclude_files = ['search.html']
# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
rst_epilog = """
.. |bonobo| replace:: **Bonobo**
.. |longversion| replace:: v.{version}
""".format(version=version, )

View File

@ -4,8 +4,6 @@ Jupyter Extension
There is a builtin plugin that integrates (somewhat minimallistically, for now) bonobo within jupyter notebooks, so
you can read the execution status of a graph within a nice (ok, not so nice) html/javascript widget.
See https://github.com/jupyter-widgets/widget-cookiecutter for the base template used.
Installation
::::::::::::

View File

@ -23,25 +23,76 @@ simply to use the optional ``--env`` argument when running bonobo from the shell
syntax ``VAR_NAME=VAR_VALUE``. Multiple environment variables can be passed by using multiple ``--env`` / ``-e`` flags
(i.e. ``bonobo run --env FIZZ=buzz ...`` and ``bonobo run --env FIZZ=buzz --env Foo=bar ...``). Additionally, in bash
you can also set environment variables by listing those you wish to set before the `bonobo run` command with space
separating the key-value pairs (i.e. ``FIZZ=buzz bonobo run ...`` or ``FIZZ=buzz FOO=bar bonobo run ...``).
separating the key-value pairs (i.e. ``FIZZ=buzz bonobo run ...`` or ``FIZZ=buzz FOO=bar bonobo run ...``). Additionally,
bonobo is able to pull environment variables from local '.env' files rather than having to pass each key-value pair
individually at runtime. Importantly, a strict 'order of priority' is followed when setting environment variables so
it is advisable to read and understand the order listed below to prevent
The order of priority is from lower to higher with the higher "winning" if set:
1. default values
``os.getenv("VARNAME", default_value)``
The user/writer/creator of the graph is responsible for setting these.
2. ``--default-env-file`` values
Specify file to read default env values from. Each env var in the file is used if the var isn't already a corresponding value set at the system environment (system environment vars not overwritten).
3. ``--default-env`` values
Works like #2 but the default ``NAME=var`` are passed individually, with one ``key=value`` pair for each ``--default-env`` flag rather than gathered from a specified file.
4. system environment values
Env vars already set at the system level. It is worth noting that passed env vars via ``NAME=value bonobo run ...`` falls here in the order of priority.
5. ``--env-file`` values
Env vars specified here are set like those in #2 albeit that these values have priority over those set at the system level.
6. ``--env`` values
Env vars set using the ``--env`` / ``-e`` flag work like #3 but take priority over all other env vars.
Examples
::::::::
The Examples below demonstrate setting one or multiple variables using both of these methods:
.. code-block:: bash
# Using one environment variable via --env flag:
# Using one environment variable via a --env or --defualt-env flag:
bonobo run csvsanitizer --env SECRET_TOKEN=secret123
bonobo run csvsanitizer --defaul-env SECRET_TOKEN=secret123
# Using multiple environment variables via -e (env) flag:
# Using multiple environment variables via -e (env) and --default-env flags:
bonobo run csvsanitizer -e SRC_FILE=inventory.txt -e DST_FILE=inventory_processed.csv
# Using one environment variable inline (bash only):
bonobo run csvsanitizer --default-env SRC_FILE=inventory.txt --default-env DST_FILE=inventory_processed.csv
# Using one environment variable inline (bash-like shells only):
SECRET_TOKEN=secret123 bonobo run csvsanitizer
# Using multiple environment variables inline (bash only):
# Using multiple environment variables inline (bash-like shells only):
SRC_FILE=inventory.txt DST_FILE=inventory_processed.csv bonobo run csvsanitizer
*Though not-yet implemented, the bonobo roadmap includes implementing environment / .env files as well.*
# Using an env file for default env values:
bonobo run csvsanitizer --default-env-file .env
# Using an env file for env values:
bonobo run csvsanitizer --env-file '.env.private'
ENV File Structure
::::::::::::::::::
The file structure for env files is incredibly simple. The only text in the file
should be `NAME=value` pairs with one pair per line like the below.
.. code-block:: text
# .env
DB_USER='bonobo'
DB_PASS='cicero'
Accessing Environment Variables from within the Graph Context
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

View File

@ -1,9 +1,8 @@
Graphs
======
Graphs are the glue that ties transformations together. It's the only data-structure bonobo can execute directly. Graphs
must be acyclic, and can contain as much nodes as your system can handle. Although this number can be rather high in
theory, extreme practical cases usually do not exceed hundreds of nodes (and this is already extreme, really).
Graphs are the glue that ties transformations together. They are the only data-structure bonobo can execute directly. Graphs
must be acyclic, and can contain as many nodes as your system can handle. However, although in theory the number of nodes can be rather high, practical use cases usually do not exceed more than a few hundred nodes and only then in extreme cases.
Definitions
@ -50,7 +49,7 @@ Non-linear graphs
Divergences / forks
-------------------
To create two or more divergent data streams ("fork"), you should specify `_input` kwarg to `add_chain`.
To create two or more divergent data streams ("forks"), you should specify the `_input` kwarg to `add_chain`.
.. code-block:: python
@ -74,12 +73,12 @@ Resulting graph:
"b" -> "f" -> "g";
}
.. note:: Both branch will receive the same data, at the same time.
.. note:: Both branches will receive the same data and at the same time.
Convergences / merges
Convergence / merges
---------------------
To merge two data streams ("merge"), you can use the `_output` kwarg to `add_chain`, or use named nodes (see below).
To merge two data streams, you can use the `_output` kwarg to `add_chain`, or use named nodes (see below).
.. code-block:: python
@ -88,7 +87,7 @@ To merge two data streams ("merge"), you can use the `_output` kwarg to `add_cha
graph = bonobo.Graph()
# Here we mark _input to None, so normalize won't get the "begin" impulsion.
# Here we set _input to None, so normalize won't start on its own but only after it receives input from the other chains.
graph.add_chain(normalize, store, _input=None)
# Add two different chains
@ -122,7 +121,7 @@ Resulting graph:
Named nodes
:::::::::::
Using above code to create convergences can lead to hard to read code, because you have to define the "target" stream
Using above code to create convergences often leads to code which is hard to read, because you have to define the "target" stream
before the streams that logically goes to the beginning of the transformation graph. To overcome that, one can use
"named" nodes:
@ -194,7 +193,7 @@ You can also run a python module:
$ bonobo run -m my.own.etlmod
In each case, bonobo's CLI will look for an instance of :class:`bonobo.Graph` in your file/module, create the plumbery
In each case, bonobo's CLI will look for an instance of :class:`bonobo.Graph` in your file/module, create the plumbing
needed to execute it, and run it.
If you're in an interactive terminal context, it will use :class:`bonobo.ext.console.ConsoleOutputPlugin` for display.

View File

@ -41,7 +41,7 @@ instances.
class JoinDatabaseCategories(Configurable):
database = Service('orders_database')
def call(self, database, row):
def __call__(self, database, row):
return {
**row,
'category': database.get_category_name_for_sku(row['sku'])

View File

@ -32,6 +32,100 @@ Iterable
Something we can iterate on, in python, so basically anything you'd be able to use in a `for` loop.
Concepts
::::::::
Whatever kind of transformation you want to use, there are a few common concepts you should know about.
Input
-----
All input is retrieved via the call arguments. Each line of input means one call to the callable provided. Arguments
will be, in order:
* Injected dependencies (database, http, filesystem, ...)
* Position based arguments
* Keyword based arguments
You'll see below how to pass each of those.
Output
------
Each callable can return/yield different things (all examples will use yield, but if there is only one output per input
line, you can also return your output row and expect the exact same behaviour).
Let's see the rules (first to match wins).
1. A flag, eventually followed by something else, marks a special behaviour. If it supports it, the remaining part of
the output line will be interpreted using the same rules, and some flags can be combined.
**NOT_MODIFIED**
**NOT_MODIFIED** tells bonobo to use the input row unmodified as the output.
*CANNOT be combined*
Example:
.. code-block:: python
from bonobo import NOT_MODIFIED
def output_will_be_same_as_input(*args, **kwargs):
yield NOT_MODIFIED
**APPEND**
**APPEND** tells bonobo to append this output to the input (positional arguments will equal `input_args + output_args`,
keyword arguments will equal `{**input_kwargs, **output_kwargs}`).
*CAN be combined, but not with itself*
.. code-block:: python
from bonobo import APPEND
def output_will_be_appended_to_input(*args, **kwargs):
yield APPEND, 'foo', 'bar', {'eat_at': 'joe'}
**LOOPBACK**
**LOOPBACK** tells bonobo that this output must be looped back into our own input queue, allowing to create the stream
processing version of recursive algorithms.
*CAN be combined, but not with itself*
.. code-block:: python
from bonobo import LOOPBACK
def output_will_be_sent_to_self(*args, **kwargs):
yield LOOPBACK, 'Hello, I am the future "you".'
**CHANNEL(...)**
**CHANNEL(...)** tells bonobo that this output does not use the default channel and is routed through another path.
This is something you should probably not use unless your data flow design is complex, and if you're not certain
about it, it probably means that it is not the feature you're looking for.
*CAN be combined, but not with itself*
.. code-block:: python
from bonobo import CHANNEL
def output_will_be_sent_to_self(*args, **kwargs):
yield CHANNEL("errors"), 'That is not cool.'
2. Once all flags are "consumed", the remaining part is interpreted.
* If it is a :class:`bonobo.Bag` instance, then it's used directly.
* If it is a :class:`dict` then a kwargs-only :class:`bonobo.Bag` will be created.
* If it is a :class:`tuple` then an args-only :class:`bonobo.Bag` will be created, unless its last argument is a
:class:`dict` in which case a args+kwargs :class:`bonobo.Bag` will be created.
* If it's something else, it will be used to create a one-arg-only :class:`bonobo.Bag`.
Function based transformations
::::::::::::::::::::::::::::::
@ -112,7 +206,7 @@ can be used as a graph node, then use camelcase names:
# configurable
class ChangeCase(Configurable):
modifier = Option(default='upper')
def call(self, s: str) -> str:
def __call__(self, s: str) -> str:
return getattr(s, self.modifier)()
# transformation factory

View File

@ -1,20 +1,39 @@
Installation
============
Create an ETL project
:::::::::::::::::::::
Creating a project and starting to write code should take less than a minute:
First, install the framework:
.. code-block:: shell-session
$ pip install --upgrade bonobo cookiecutter
$ bonobo init my-etl-project
$ bonobo run my-etl-project
$ pip install --upgrade bonobo
Once you bootstrapped a project, you can start editing the default example transformation by editing
`my-etl-project/main.py`. Now, you can head to :doc:`tutorial/index`.
Create a simple job:
.. code-block:: shell-session
$ bonobo init my-etl.py
And let's go for a test drive:
.. code-block:: shell-session
$ python my-etl.py
Congratulations, you ran your first Bonobo ETL job.
Now, you can head to :doc:`tutorial/index`.
.. note::
It's often best to start with a single file then move it into a project
(which, in python, needs to live in a package).
You can read more about this topic in the :doc:`guide/packaging` section,
along with pointers on how to move this first file into an existing fully
featured python package.
Other installation options
@ -29,6 +48,12 @@ You can install it directly from the `Python Package Index <https://pypi.python.
$ pip install bonobo
To upgrade an existing installation, use `--upgrade`:
.. code-block:: shell-session
$ pip install --upgrade bonobo
Install from source
-------------------
@ -81,18 +106,29 @@ from the local clone.
  $ git clone git@github.com:python-bonobo/bonobo.git
$ cd bonobo
$ pip install --editable .
You can develop on this clone, but you probably want to add your own repository if you want to push code back and make pull requests.
I usually name the git remote for the main bonobo repository "upstream", and my own repository "origin".
.. code-block:: shell-session
$ git remote rename origin upstream
$ git remote add origin git@github.com:hartym/bonobo.git
$ git fetch --all
Of course, replace my github username by the one you used to fork bonobo. You should be good to go!
Preview versions
----------------
Sometimes, there are pre-versions available (before a major release, for example). By default, pip does not target
pre-versions to avoid accidental upgrades to a potentially instable software, but you can easily opt-in:
.. code-block:: shell-session
$ pip install --upgrade --pre bonobo
Supported platforms
:::::::::::::::::::
@ -117,4 +153,3 @@ users.
We're trying to look into that but energy available to provide serious support on windows is very limited.
If you have experience in this domain and you're willing to help, you're more than welcome!

View File

@ -16,16 +16,6 @@ Syntax: `bonobo convert [-r reader] input_filename [-w writer] output_filename`
to read from csv and write to csv too (or other format) but adding a geocoder filter that would add some fields.
Bonobo Init
:::::::::::
Create an empty project, ready to use bonobo.
Syntax: `bonobo init`
Requires `cookiecutter`.
Bonobo Inspect
::::::::::::::

View File

@ -1,54 +0,0 @@
Internal roadmap notes
======================
Things that should be thought about and/or implemented, but that I don't know where to store.
Graph and node level plugins
::::::::::::::::::::::::::::
* Enhancers or node-level plugins
* Graph level plugins
* Documentation
Command line interface and environment
::::::::::::::::::::::::::::::::::::::
* How do we manage environment ? .env ?
* How do we configure plugins ?
Services and Processors
:::::::::::::::::::::::
* ContextProcessors not clean (a bit better, but still not in love with the api)
Next...
:::::::
* Release process specialised for bonobo. With changelog production, etc.
* Document how to upgrade version, like, minor need change badges, etc.
* Windows console looks crappy.
* bonobo init --with sqlalchemy,docker; cookiecutter?
* logger, vebosity level
External libs that looks good
:::::::::::::::::::::::::::::
* dask.distributed
* mediator (event dispatcher)
Version 0.4
:::::::::::
* SQLAlchemy 101
Design decisions
::::::::::::::::
* initialize / finalize better than start / stop ?
Minor stuff
:::::::::::
* Should we include datasets in the repo or not? As they may change, grow, and even eventually have licenses we can't use,
it's probably best if we don't.

258
docs/tutorial/1-init.rst Normal file
View File

@ -0,0 +1,258 @@
Part 1: Let's get started!
==========================
To get started with |bonobo|, you need to install it in a working python 3.5+ environment (you should use a
`virtualenv <https://virtualenv.pypa.io/>`_).
.. code-block:: shell-session
$ pip install bonobo
Check that the installation worked, and that you're using a version that matches this tutorial (written for bonobo
|longversion|).
.. code-block:: shell-session
$ bonobo version
See :doc:`/install` for more options.
Create an ETL job
:::::::::::::::::
Since Bonobo 0.6, it's easy to bootstrap a simple ETL job using just one file.
We'll start here, and the later stages of the tutorial will guide you toward refactoring this to a python package.
.. code-block:: shell-session
$ bonobo init tutorial.py
This will create a simple job in a `tutorial.py` file. Let's run it:
.. code-block:: shell-session
$ python tutorial.py
Hello
World
- extract in=1 out=2 [done]
- transform in=2 out=2 [done]
- load in=2 [done]
If you have a similar result, then congratulations! You just ran your first |bonobo| ETL job.
Inspect your graph
::::::::::::::::::
The basic building blocks of |bonobo| are **transformations** and **graphs**.
**Transformations** are simple python callables (like functions) that handle a transformation step for a line of data.
**Graphs** are a set of transformations, with directional links between them to define the data-flow that will happen
at runtime.
To inspect the graph of your first transformation (you must install graphviz first to do so), run:
.. code-block:: shell-session
$ bonobo inspect --graph tutorial.py | dot -Tpng -o tutorial.png
Open the generated `tutorial.png` file to have a quick look at the graph.
.. graphviz::
digraph {
rankdir = LR;
"BEGIN" [shape="point"];
"BEGIN" -> {0 [label="extract"]};
{0 [label="extract"]} -> {1 [label="transform"]};
{1 [label="transform"]} -> {2 [label="load"]};
}
You can easily understand here the structure of your graph. For such a simple graph, it's pretty much useless, but as
you'll write more complex transformations, it will be helpful.
Read the Code
:::::::::::::
Before we write our own job, let's look at the code we have in `tutorial.py`.
Import
------
.. code-block:: python
import bonobo
The highest level APIs of |bonobo| are all contained within the top level **bonobo** namespace.
If you're a beginner with the library, stick to using only those APIs (they also are the most stable APIs).
If you're an advanced user (and you'll be one quite soon), you can safely use second level APIs.
The third level APIs are considered private, and you should not use them unless you're hacking on |bonobo| directly.
Extract
-------
.. code-block:: python
def extract():
yield 'hello'
yield 'world'
This is a first transformation, written as a python generator, that will send some strings, one after the other, to its
output.
Transformations that take no input and yields a variable number of outputs are usually called **extractors**. You'll
encounter a few different types, either purely generating the data (like here), using an external service (a
database, for example) or using some filesystem (which is considered an external service too).
Extractors do not need to have its input connected to anything, and will be called exactly once when the graph is
executed.
Transform
---------
.. code-block:: python
def transform(*args):
yield tuple(
map(str.title, args)
)
This is a second transformation. It will get called a bunch of times, once for each input row it gets, and apply some
logic on the input to generate the output.
This is the most **generic** case. For each input row, you can generate zero, one or many lines of output for each line
of input.
Load
----
.. code-block:: python
def load(*args):
print(*args)
This is the third and last transformation in our "hello world" example. It will apply some logic to each row, and have
absolutely no output.
Transformations that take input and yields nothing are also called **loaders**. Like extractors, you'll encounter
different types, to work with various external systems.
Please note that as a convenience mean and because the cost is marginal, most builtin `loaders` will send their
inputs to their output unmodified, so you can easily chain more than one loader, or apply more transformations after a
given loader.
Graph Factory
-------------
.. code-block:: python
def get_graph(**options):
graph = bonobo.Graph()
graph.add_chain(extract, transform, load)
return graph
All our transformations were defined above, but nothing ties them together, for now.
This "graph factory" function is in charge of the creation and configuration of a :class:`bonobo.Graph` instance, that
will be executed later.
By no mean is |bonobo| limited to simple graphs like this one. You can add as many chains as you want, and each chain
can contain as many nodes as you want.
Services Factory
----------------
.. code-block:: python
def get_services(**options):
return {}
This is the "services factory", that we'll use later to connect to external systems. Let's skip this one, for now.
(we'll dive into this topic in :doc:`4-services`)
Main Block
----------
.. code-block:: python
if __name__ == '__main__':
parser = bonobo.get_argument_parser()
with bonobo.parse_args(parser) as options:
bonobo.run(
get_graph(**options),
services=get_services(**options)
)
Here, the real thing happens.
Without diving into too much details for now, using the :func:`bonobo.parse_args` context manager will allow our job to
be configurable, later, and although we don't really need it right now, it does not harm neither.
Reading the output
::::::::::::::::::
Let's run this job once again:
.. code-block:: shell-session
$ python tutorial.py
Hello
World
- extract in=1 out=2 [done]
- transform in=2 out=2 [done]
- load in=2 [done]
The console output contains two things.
* First, it contains the real output of your job (what was :func:`print`-ed to `sys.stdout`).
* Second, it displays the execution status (on `sys.stderr`). Each line contains a "status" character, the node name,
numbers and a human readable status. This status will evolve in real time, and allows to understand a job's progress
while it's running.
* Status character:
* “ ” means that the node was not yet started.
*`-`” means that the node finished its execution.
*`+`” means that the node is currently running.
*`!`” means that the node had problems running.
* Numerical statistics:
* “`in=...`” shows the input lines count, also known as the amount of calls to your transformation.
*`out=...`” shows the output lines count.
*`read=...`” shows the count of reads applied to an external system, if the transformation supports it.
*`write=...`” shows the count of writes applied to an external system, if the transformation supports it.
*`err=...`” shows the count of exceptions that happened while running the transformation. Note that exception will abort
a call, but the execution will move to the next row.
Wrap up
:::::::
That's all for this first step.
You now know:
* How to create a new job (using a single file).
* How to inspect the content of a job.
* What should go in a job file.
* How to execute a job file.
* How to read the console output.
It's now time to jump to :doc:`2-jobs`.

66
docs/tutorial/2-jobs.rst Normal file
View File

@ -0,0 +1,66 @@
Part 2: Writing ETL Jobs
========================
What's an ETL job ?
:::::::::::::::::::
In |bonobo|, an ETL job is a formal definition of an executable graph.
Each node of a graph will be executed in isolation from the other nodes, and the data is passed from one node to the
next using FIFO queues, managed by the framework. It's transparent to the end-user, though, and you'll only use
function arguments (for inputs) and return/yield values (for outputs).
Each input row of a node will cause one call to this node's callable. Each output is cast internally as a tuple-like
data structure (or more precisely, a namedtuple-like data structure), and for one given node, each output row must
have the same structure.
If you return/yield something which is not a tuple, bonobo will create a tuple of one element.
Properties
----------
|bonobo| assists you with defining the data-flow of your data engineering process, and then streams data through your
callable graphs.
* Each node call will process one row of data.
* Queues that flows the data between node are first-in, first-out (FIFO) standard python :class:`queue.Queue`.
* Each node will run in parallel
* Default execution strategy use threading, and each node will run in a separate thread.
Fault tolerance
---------------
Node execution is fault tolerant.
If an exception is raised from a node call, then this node call will be aborted but bonobo will continue the execution
with the next row (after outputing the stack trace and incrementing the "err" counter for the node context).
It allows to have ETL jobs that ignore faulty data and try their best to process the valid rows of a dataset.
Some errors are fatal, though.
If you pass a 2 elements tuple to a node that takes 3 args, |bonobo| will raise an :class:`bonobo.errors.UnrecoverableTypeError`, and exit the
current graph execution as fast as it can (finishing the other node executions that are in progress first, but not
starting new ones if there are remaining input rows).
Let's write a sample data integration job
:::::::::::::::::::::::::::::::::::::::::
Let's create a sample application.
The goal of this application will be to extract all the fablabs in the world using an open-data API, normalize this
data and, for now, display it. We'll then build on this foundation in the next steps to write to files, databases, etc.
Moving forward
::::::::::::::
You now know:
* How to ...
**Next: :doc:`3-files`**

22
docs/tutorial/3-files.rst Normal file
View File

@ -0,0 +1,22 @@
Part 3: Working with Files
==========================
* Filesystems
* Reading files
* Writing files
* Writing files to S3
* Atomic writes ???
Moving forward
::::::::::::::
You now know:
* How to ...
**Next: :doc:`4-services`**

View File

@ -0,0 +1,207 @@
Part 4: Services and Configurables
==================================
In the last section, we used a few new tools.
Class-based transformations and configurables
:::::::::::::::::::::::::::::::::::::::::::::
Bonobo is a bit dumb. If something is callable, it considers it can be used as a transformation, and it's up to the
user to provide callables that logically fits in a graph.
You can use plain python objects with a `__call__()` method, and it ill just work.
As a lot of transformations needs common machinery, there is a few tools to quickly build transformations, most of
them requiring your class to subclass :class:`bonobo.config.Configurable`.
Configurables allows to use the following features:
* You can add **Options** (using the :class:`bonobo.config.Option` descriptor). Options can be positional, or keyword
based, can have a default value and will be consumed from the constructor arguments.
.. code-block:: python
from bonobo.config import Configurable, Option
class PrefixIt(Configurable):
prefix = Option(str, positional=True, default='>>>')
def __call__(self, row):
return self.prefix + ' ' + row
prefixer = PrefixIt('$')
* You can add **Services** (using the :class:`bonobo.config.Service` descriptor). Services are a subclass of
:class:`bonobo.config.Option`, sharing the same basics, but specialized in the definition of "named services" that
will be resolved at runtime (a.k.a for which we will provide an implementation at runtime). We'll dive more into that
in the next section
.. code-block:: python
from bonobo.config import Configurable, Option, Service
class HttpGet(Configurable):
url = Option(default='https://jsonplaceholder.typicode.com/users')
http = Service('http.client')
def __call__(self, http):
resp = http.get(self.url)
for row in resp.json():
yield row
http_get = HttpGet()
* You can add **Methods** (using the :class:`bonobo.config.Method` descriptor). :class:`bonobo.config.Method` is a
subclass of :class:`bonobo.config.Option` that allows to pass callable parameters, either to the class constructor,
or using the class as a decorator.
.. code-block:: python
from bonobo.config import Configurable, Method
class Applier(Configurable):
apply = Method()
def __call__(self, row):
return self.apply(row)
@Applier
def Prefixer(self, row):
return 'Hello, ' + row
prefixer = Prefixer()
* You can add **ContextProcessors**, which are an advanced feature we won't introduce here. If you're familiar with
pytest, you can think of them as pytest fixtures, execution wise.
Services
::::::::
The motivation behind services is mostly separation of concerns, testability and deployability.
Usually, your transformations will depend on services (like a filesystem, an http client, a database, a rest api, ...).
Those services can very well be hardcoded in the transformations, but there is two main drawbacks:
* You won't be able to change the implementation depending on the current environment (development laptop versus
production servers, bug-hunting session versus execution, etc.)
* You won't be able to test your transformations without testing the associated services.
To overcome those caveats of hardcoding things, we define Services in the configurable, which are basically
string-options of the service names, and we provide an implementation at the last moment possible.
There are two ways of providing implementations:
* Either file-wide, by providing a `get_services()` function that returns a dict of named implementations (we did so
with filesystems in the previous step, :doc:`tut02`)
* Either directory-wide, by providing a `get_services()` function in a specially named `_services.py` file.
The first is simpler if you only have one transformation graph in one file, the second allows to group coherent
transformations together in a directory and share the implementations.
Let's see how to use it, starting from the previous service example:
.. code-block:: python
from bonobo.config import Configurable, Option, Service
class HttpGet(Configurable):
url = Option(default='https://jsonplaceholder.typicode.com/users')
http = Service('http.client')
def __call__(self, http):
resp = http.get(self.url)
for row in resp.json():
yield row
We defined an "http.client" service, that obviously should have a `get()` method, returning responses that have a
`json()` method.
Let's provide two implementations for that. The first one will be using `requests <http://docs.python-requests.org/>`_,
that coincidally satisfies the described interface:
.. code-block:: python
import bonobo
import requests
def get_services():
return {
'http.client': requests
}
graph = bonobo.Graph(
HttpGet(),
print,
)
If you run this code, you should see some mock data returned by the webservice we called (assuming it's up and you can
reach it).
Now, the second implementation will replace that with a mock, used for testing purposes:
.. code-block:: python
class HttpResponseStub:
def json(self):
return [
{'id': 1, 'name': 'Leanne Graham', 'username': 'Bret', 'email': 'Sincere@april.biz', 'address': {'street': 'Kulas Light', 'suite': 'Apt. 556', 'city': 'Gwenborough', 'zipcode': '92998-3874', 'geo': {'lat': '-37.3159', 'lng': '81.1496'}}, 'phone': '1-770-736-8031 x56442', 'website': 'hildegard.org', 'company': {'name': 'Romaguera-Crona', 'catchPhrase': 'Multi-layered client-server neural-net', 'bs': 'harness real-time e-markets'}},
{'id': 2, 'name': 'Ervin Howell', 'username': 'Antonette', 'email': 'Shanna@melissa.tv', 'address': {'street': 'Victor Plains', 'suite': 'Suite 879', 'city': 'Wisokyburgh', 'zipcode': '90566-7771', 'geo': {'lat': '-43.9509', 'lng': '-34.4618'}}, 'phone': '010-692-6593 x09125', 'website': 'anastasia.net', 'company': {'name': 'Deckow-Crist', 'catchPhrase': 'Proactive didactic contingency', 'bs': 'synergize scalable supply-chains'}},
]
class HttpStub:
def get(self, url):
return HttpResponseStub()
def get_services():
return {
'http.client': HttpStub()
}
graph = bonobo.Graph(
HttpGet(),
print,
)
The `Graph` definition staying the exact same, you can easily substitute the `_services.py` file depending on your
environment (the way you're doing this is out of bonobo scope and heavily depends on your usual way of managing
configuration files on different platforms).
Starting with bonobo 0.5 (not yet released), you will be able to use service injections with function-based
transformations too, using the `bonobo.config.requires` decorator to mark a dependency.
.. code-block:: python
from bonobo.config import requires
@requires('http.client')
def http_get(http):
resp = http.get('https://jsonplaceholder.typicode.com/users')
for row in resp.json():
yield row
Read more
:::::::::
* :doc:`/guide/services`
* :doc:`/reference/api_config`
Next
::::
:doc:`tut04`.
Moving forward
::::::::::::::
You now know:
* How to ...
**Next: :doc:`5-packaging`**

View File

@ -0,0 +1,28 @@
Part 5: Projects and Packaging
==============================
Until then, we worked with one file managing a job.
Real life often involves more complicated setups, with relations and imports between different files.
This section will describe the options available to move this file into a package, either a new one or something
that already exists in your own project.
Data processing is something a wide variety of tools may want to include, and thus |bonobo| does not enforce any
kind of project structure, as the targert structure will be dicated by the hosting project. For example, a `pipelines`
sub-package would perfectly fit a django or flask project, or even a regular package, but it's up to you to chose the
structure of your project.
about using |bonobo| in a pyt
is about set of jobs working together within a project.
Let's see how to move from the current status to a package.
Moving forward
::::::::::::::
You now know:
* How to ...

3
docs/tutorial/django.rst Normal file
View File

@ -0,0 +1,3 @@
Working with Django
===================

View File

@ -1,9 +1,6 @@
First steps
===========
What is Bonobo?
:::::::::::::::
Bonobo is an ETL (Extract-Transform-Load) framework for python 3.5. The goal is to define data-transformations, with
python code in charge of handling similar shaped independent lines of data.
@ -14,50 +11,45 @@ Bonobo is a lean manufacturing assembly line for data that let you focus on the
Bonobo uses simple python and should be quick and easy to learn.
Tutorial
::::::::
.. note::
Good documentation is not easy to write. We do our best to make it better and better.
Although all content here should be accurate, you may feel a lack of completeness, for which we plead guilty and
apologize.
If you're stuck, please come and ask on our `slack channel <https://bonobo-slack.herokuapp.com/>`_, we'll figure
something out.
If you're not stuck but had trouble understanding something, please consider contributing to the docs (via GitHub
pull requests).
**Tutorials**
.. toctree::
:maxdepth: 2
:maxdepth: 1
tut01
tut02
tut03
tut04
1-init
2-jobs
3-files
4-services
5-packaging
What's next?
::::::::::::
**Integrations**
Read a few examples
-------------------
.. toctree::
:maxdepth: 1
* :doc:`../reference/examples`
django
notebooks
sqlalchemy
Read about best development practices
-------------------------------------
**What's next?**
* :doc:`../guide/index`
* :doc:`../guide/purity`
Once you're familiar with all the base concepts, you can...
Read about integrating external tools with bonobo
-------------------------------------------------
* Read the :doc:`Guides </guide/index>` to have a deep dive in each concept.
* Explore the :doc:`Extensions </extension/index>` to widen the possibilities.
* Open the :doc:`References </reference/index>` and start hacking like crazy.
* :doc:`../extension/docker`: run transformation graphs in isolated containers.
* :doc:`../extension/jupyter`: run transformations within jupyter notebooks.
* :doc:`../extension/selenium`: crawl the web using a real browser and work with the gathered data.
* :doc:`../extension/sqlalchemy`: everything you need to interract with SQL databases.
**You're not alone!**
Good documentation is not easy to write.
Although all content here should be accurate, you may feel a lack of completeness, for which we plead guilty and
apologize.
If you're stuck, please come to the `Bonobo Slack Channel <https://bonobo-slack.herokuapp.com/>`_ and we'll figure it
out.
If you're not stuck but had trouble understanding something, please consider contributing to the docs (using GitHub
pull requests).

View File

@ -0,0 +1,4 @@
Working with Jupyter Notebooks
==============================

View File

@ -1,11 +0,0 @@
Just enough Python for Bonobo
=============================
.. todo::
This is a work in progress and it is not yet available. Please come back later or even better, help us write this
guide!
This guide is intended to help programmers or enthusiasts to grasp the python basics necessary to use Bonobo. It
should definately not be considered as a general python introduction, neither a deep dive into details.

View File

@ -0,0 +1,4 @@
Working with SQL Databases
==========================

View File

@ -1,8 +1,7 @@
Let's get started!
==================
To begin with Bonobo, you need to install it in a working python 3.5+ environment, and you'll also need cookiecutter
to bootstrap your project.
To get started with Bonobo, you need to install it in a working python 3.5+ environment:
.. code-block:: shell-session
@ -14,21 +13,24 @@ See :doc:`/install` for more options.
Create an empty project
:::::::::::::::::::::::
Your ETL code will live in ETL projects, which are basically a bunch of files, including python code, that bonobo
can run.
Your ETL code will live in standard python files and packages.
.. code-block:: shell-session
$ bonobo init tutorial
$ bonobo create tutorial.py
This will create a `tutorial` directory (`content description here <https://www.bonobo-project.org/with/cookiecutter>`_).
This will create a simple example job in a `tutorial.py` file.
To run this project, use:
Now, try to execute it:
.. code-block:: shell-session
$ bonobo run tutorial
$ python tutorial.py
Congratulations, you just ran your first ETL job!
.. todo:: XXX **CHANGES NEEDED BELOW THIS POINTS BEFORE 0.6** XXX
Write a first transformation
::::::::::::::::::::::::::::
@ -105,6 +107,9 @@ To do this, it needs to know what data-flow you want to achieve, and you'll use
The `if __name__ == '__main__':` section is not required, unless you want to run it directly using the python
interpreter.
The name of the `graph` variable is arbitrary, but this variable must be global and available unconditionally.
Do not put it in its own function or in the `if __name__ == '__main__':` section.
Execute the job
:::::::::::::::
@ -128,9 +133,9 @@ Rewrite it using builtins
There is a much simpler way to describe an equivalent graph:
.. literalinclude:: ../../bonobo/examples/tutorials/tut01e02.py
:language: python
:language: python
The `extract()` generator has been replaced by a list, as Bonobo will interpret non-callable iterables as a no-input
The `extract()` generator has been replaced by a list, as Bonobo will interpret non-callable iterables as a no-input
generator.
This example is also available in :mod:`bonobo.examples.tutorials.tut01e02`, and you can also run it as a module:

View File

@ -59,13 +59,7 @@ available in **Bonobo**'s repository:
.. code-block:: shell-session
$ curl https://raw.githubusercontent.com/python-bonobo/bonobo/master/bonobo/examples/datasets/coffeeshops.txt > `python3 -c 'import bonobo; print(bonobo.get_examples_path("datasets/coffeeshops.txt"))'`
.. note::
The "example dataset download" step will be easier in the future.
https://github.com/python-bonobo/bonobo/issues/134
$ bonobo download examples/datasets/coffeeshops.txt
.. literalinclude:: ../../bonobo/examples/tutorials/tut02e01_read.py
:language: python

View File

@ -30,7 +30,7 @@ Configurables allows to use the following features:
class PrefixIt(Configurable):
prefix = Option(str, positional=True, default='>>>')
def call(self, row):
def __call__(self, row):
return self.prefix + ' ' + row
prefixer = PrefixIt('$')
@ -48,7 +48,7 @@ Configurables allows to use the following features:
url = Option(default='https://jsonplaceholder.typicode.com/users')
http = Service('http.client')
def call(self, http):
def __call__(self, http):
resp = http.get(self.url)
for row in resp.json():
@ -68,7 +68,7 @@ Configurables allows to use the following features:
class Applier(Configurable):
apply = Method()
def call(self, row):
def __call__(self, row):
return self.apply(row)
@Applier
@ -114,7 +114,7 @@ Let's see how to use it, starting from the previous service example:
url = Option(default='https://jsonplaceholder.typicode.com/users')
http = Service('http.client')
def call(self, http):
def __call__(self, http):
resp = http.get(self.url)
for row in resp.json():