Major update to documentation, removing deprecated docs and adding the new syntax to graph building options.

This commit is contained in:
Romain Dorgueil
2019-06-01 14:08:25 +02:00
parent c998708923
commit e84440df8c
23 changed files with 434 additions and 883 deletions

View File

@ -1,9 +0,0 @@
.. warning::
This tutorial was written for |bonobo| 0.5, while the current stable version is |bonobo| 0.6.
Please be aware that some things changed.
A summary of changes is available in the `migration guide from 0.5 to 0.6 <https://news.bonobo-project.org/migration-guide-for-bonobo-0-6-alpha-c1d36b0a9d35>`_.

View File

@ -1,65 +0,0 @@
First steps
===========
.. include:: _outdated_note.rst
What is Bonobo?
:::::::::::::::
Bonobo is an ETL (Extract-Transform-Load) framework for python 3.5. The goal is to define data-transformations, with
python code in charge of handling similar shaped independent lines of data.
Bonobo *is not* a statistical or data-science tool. If you're looking for a data-analysis tool in python, use Pandas.
Bonobo is a lean manufacturing assembly line for data that let you focus on the actual work instead of the plumbery
(execution contexts, parallelism, error handling, console output, logging, ...).
Bonobo uses simple python and should be quick and easy to learn.
Tutorial
::::::::
.. note::
Good documentation is not easy to write. We do our best to make it better and better.
Although all content here should be accurate, you may feel a lack of completeness, for which we plead guilty and
apologize.
If you're stuck, please come and ask on our `slack channel <https://bonobo-slack.herokuapp.com/>`_, we'll figure
something out.
If you're not stuck but had trouble understanding something, please consider contributing to the docs (via GitHub
pull requests).
.. toctree::
:maxdepth: 2
tut01
tut02
tut03
tut04
What's next?
::::::::::::
Read a few examples
-------------------
* :doc:`/reference/examples`
Read about best development practices
-------------------------------------
* :doc:`/guide/index`
* :doc:`/guide/purity`
Read about integrating external tools with bonobo
-------------------------------------------------
* :doc:`/extension/docker`: run transformation graphs in isolated containers.
* :doc:`/extension/jupyter`: run transformations within jupyter notebooks.
* :doc:`/extension/selenium`: crawl the web using a real browser and work with the gathered data.
* :doc:`/extension/sqlalchemy`: everything you need to interract with SQL databases.

View File

@ -1,13 +0,0 @@
Just enough Python for Bonobo
=============================
.. include:: _outdated_note.rst
.. todo::
This is a work in progress and it is not yet available. Please come back later or even better, help us write this
guide!
This guide is intended to help programmers or enthusiasts to grasp the python basics necessary to use Bonobo. It
should definately not be considered as a general python introduction, neither a deep dive into details.

View File

@ -1,202 +0,0 @@
Let's get started!
==================
.. include:: _outdated_note.rst
To begin with Bonobo, you need to install it in a working python 3.5+ environment, and you'll also need cookiecutter
to bootstrap your project.
.. code-block:: shell-session
$ pip install bonobo cookiecutter
See :doc:`/install` for more options.
Create an empty project
:::::::::::::::::::::::
Your ETL code will live in ETL projects, which are basically a bunch of files, including python code, that bonobo
can run.
.. code-block:: shell-session
$ bonobo init tutorial
This will create a `tutorial` directory (`content description here <https://www.bonobo-project.org/with/cookiecutter>`_).
To run this project, use:
.. code-block:: shell-session
$ bonobo run tutorial
Write a first transformation
::::::::::::::::::::::::::::
Open `tutorial/main.py`, and delete all the code here.
A transformation can be whatever python can call. Simplest transformations are functions and generators.
Let's write one:
.. code-block:: python
def transform(x):
return x.upper()
Easy.
.. note::
This function is very similar to :func:`str.upper`, which you can use directly.
Let's write two more transformations for the "extract" and "load" steps. In this example, we'll generate the data from
scratch, and we'll use stdout to "simulate" data-persistence.
.. code-block:: python
def extract():
yield 'foo'
yield 'bar'
yield 'baz'
def load(x):
print(x)
Bonobo makes no difference between generators (yielding functions) and regular functions. It will, in all cases, iterate
on things returned, and a normal function will just be seen as a generator that yields only once.
.. note::
Once again, you should use the builtin :func:`print` directly instead of this `load()` function.
Create a transformation graph
:::::::::::::::::::::::::::::
Amongst other features, Bonobo will mostly help you there with the following:
* Execute the transformations in independent threads
* Pass the outputs of one thread to other(s) thread(s) inputs.
To do this, it needs to know what data-flow you want to achieve, and you'll use a :class:`bonobo.Graph` to describe it.
.. code-block:: python
import bonobo
graph = bonobo.Graph(extract, transform, load)
if __name__ == '__main__':
bonobo.run(graph)
.. graphviz::
digraph {
rankdir = LR;
stylesheet = "../_static/graphs.css";
BEGIN [shape="point"];
BEGIN -> "extract" -> "transform" -> "load";
}
.. note::
The `if __name__ == '__main__':` section is not required, unless you want to run it directly using the python
interpreter.
Execute the job
:::::::::::::::
Save `tutorial/main.py` and execute your transformation again:
.. code-block:: shell-session
$ bonobo run tutorial
This example is available in :mod:`bonobo.examples.tutorials.tut01e01`, and you can also run it as a module:
.. code-block:: shell-session
$ bonobo run -m bonobo.examples.tutorials.tut01e01
Rewrite it using builtins
:::::::::::::::::::::::::
There is a much simpler way to describe an equivalent graph:
.. literalinclude:: ../../bonobo/examples/tutorials/tut01e02.py
:language: python
The `extract()` generator has been replaced by a list, as Bonobo will interpret non-callable iterables as a no-input
generator.
This example is also available in :mod:`bonobo.examples.tutorials.tut01e02`, and you can also run it as a module:
.. code-block:: shell-session
$ bonobo run -m bonobo.examples.tutorials.tut01e02
You can now jump to the next part (:doc:`tut02`), or read a small summary of concepts and definitions introduced here
below.
Takeaways
:::::::::
① The :class:`bonobo.Graph` class is used to represent a data-processing pipeline.
It can represent simple list-like linear graphs, like here, but it can also represent much more complex graphs, with
forks and joins.
This is what the graph we defined looks like:
.. graphviz::
digraph {
rankdir = LR;
BEGIN [shape="point"];
BEGIN -> "iter(['foo', 'bar', 'baz'])" -> "str.upper" -> "print";
}
`Transformations` are simple python callables. Whatever can be called can be used as a `transformation`. Callables can
either `return` or `yield` data to send it to the next step. Regular functions (using `return`) should be prefered if
each call is guaranteed to return exactly one result, while generators (using `yield`) should be prefered if the
number of output lines for a given input varies.
③ The `Graph` instance, or `transformation graph` is executed using an `ExecutionStrategy`. You won't use it directly,
but :func:`bonobo.run` created an instance of :class:`bonobo.ThreadPoolExecutorStrategy` under the hood (the default
strategy). Actual behavior of an execution will depend on the strategy chosen, but the default should be fine for most
cases.
④ Before actually executing the `transformations`, the `ExecutorStrategy` instance will wrap each component in an
`execution context`, whose responsibility is to hold the state of the transformation. It enables you to keep the
`transformations` stateless, while allowing you to add an external state if required. We'll expand on this later.
Concepts and definitions
::::::::::::::::::::::::
* **Transformation**: a callable that takes input (as call parameters) and returns output(s), either as its return value or
by yielding values (a.k.a returning a generator).
* **Transformation graph (or Graph)**: a set of transformations tied together in a :class:`bonobo.Graph` instance, which is
a directed acyclic graph (or DAG).
* **Node**: a graph element, most probably a transformation in a graph.
* **Execution strategy (or strategy)**: a way to run a transformation graph. It's responsibility is mainly to parallelize
(or not) the transformations, on one or more process and/or computer, and to setup the right queuing mechanism for
transformations' inputs and outputs.
* **Execution context (or context)**: a wrapper around a node that holds the state for it. If the node needs state, there
are tools available in bonobo to feed it to the transformation using additional call parameters, keeping
transformations stateless.
Next
::::
Time to jump to the second part: :doc:`tut02`.

View File

@ -1,123 +0,0 @@
Working with files
==================
.. include:: _outdated_note.rst
Bonobo would be pointless if the aim was just to uppercase small lists of strings.
In fact, Bonobo should not be used if you don't expect any gain from parallelization/distribution of tasks.
Some background...
::::::::::::::::::
Let's take the following graph:
.. graphviz::
digraph {
rankdir = LR;
BEGIN [shape="point"];
BEGIN -> "A" -> "B" -> "C";
"B" -> "D";
}
When run, the execution strategy wraps every component in a thread (assuming you're using the default
:class:`bonobo.strategies.ThreadPoolExecutorStrategy`).
Bonobo will send each line of data in the input node's thread (here, `A`). Now, each time `A` *yields* or *returns*
something, it will be pushed on `B` input :class:`queue.Queue`, and will be consumed by `B`'s thread. Meanwhile, `A`
will continue to run, if it's not done.
When there is more than one node linked as the output of a node (for example, with `B`, `C`, and `D`), the same thing
happens except that each result coming out of `B` will be sent to both on `C` and `D` input :class:`queue.Queue`.
One thing to keep in mind here is that as the objects are passed from thread to thread, you need to write "pure"
transformations (see :doc:`/guide/purity`).
You generally don't have to think about it. Just be aware that your nodes will run in parallel, and don't worry
too much about nodes running blocking operations, as they will run in parallel. As soon as a line of output is ready,
the next nodes will start consuming it.
That being said, let's manipulate some files.
Reading a file
::::::::::::::
There are a few component builders available in **Bonobo** that let you read from (or write to) files.
All readers work the same way. They need a filesystem to work with, and open a "path" they will read from.
* :class:`bonobo.CsvReader`
* :class:`bonobo.FileReader`
* :class:`bonobo.JsonReader`
* :class:`bonobo.PickleReader`
We'll use a text file that was generated using Bonobo from the "liste-des-cafes-a-un-euro" dataset made available by
Mairie de Paris under the Open Database License (ODbL). You can `explore the original dataset
<https://opendata.paris.fr/explore/dataset/liste-des-cafes-a-un-euro/information/>`_.
You'll need the `"coffeeshops.txt" example dataset <https://github.com/python-bonobo/bonobo/blob/master/bonobo/examples/datasets/coffeeshops.txt>`_,
available in **Bonobo**'s repository:
.. code-block:: shell-session
$ curl https://raw.githubusercontent.com/python-bonobo/bonobo/master/bonobo/examples/datasets/coffeeshops.txt > `python3 -c 'import bonobo; print(bonobo.get_examples_path("datasets/coffeeshops.txt"))'`
.. note::
The "example dataset download" step will be easier in the future.
https://github.com/python-bonobo/bonobo/issues/134
.. literalinclude:: ../../bonobo/examples/tutorials/tut02e01_read.py
:language: python
You can also run this example as a module (but you'll still need the dataset...):
.. code-block:: shell-session
$ bonobo run -m bonobo.examples.tutorials.tut02e01_read
.. note::
Don't focus too much on the `get_services()` function for now. It is required, with this exact name, but we'll get
into that in a few minutes.
Writing to files
::::::::::::::::
Let's split this file's each lines on the first comma and store a json file mapping coffee names to their addresses.
Here are, like the readers, the classes available to write files
* :class:`bonobo.CsvWriter`
* :class:`bonobo.FileWriter`
* :class:`bonobo.JsonWriter`
* :class:`bonobo.PickleWriter`
Let's write a first implementation:
.. literalinclude:: ../../bonobo/examples/tutorials/tut02e02_write.py
:language: python
(run it with :code:`bonobo run -m bonobo.examples.tutorials.tut02e02_write` or :code:`bonobo run myfile.py`)
If you read the output file, you'll see it misses the "map" part of the problem.
Let's extend :class:`bonobo.io.JsonWriter` to finish the job:
.. literalinclude:: ../../bonobo/examples/tutorials/tut02e03_writeasmap.py
:language: python
(run it with :code:`bonobo run -m bonobo.examples.tutorials.tut02e03_writeasmap` or :code:`bonobo run myfile.py`)
It should produce a nice map.
We favored a bit hackish solution here instead of constructing a map in python then passing the whole to
:func:`json.dumps` because we want to work with streams, if you have to construct the whole data structure in python,
you'll loose a lot of bonobo's benefits.
Next
::::
Time to write some more advanced transformations, with service dependencies: :doc:`tut03`.

View File

@ -1,202 +0,0 @@
Configurables and Services
==========================
.. include:: _outdated_note.rst
.. note::
This section lacks completeness, sorry for that (but you can still read it!).
In the last section, we used a few new tools.
Class-based transformations and configurables
:::::::::::::::::::::::::::::::::::::::::::::
Bonobo is a bit dumb. If something is callable, it considers it can be used as a transformation, and it's up to the
user to provide callables that logically fits in a graph.
You can use plain python objects with a `__call__()` method, and it will just work.
As a lot of transformations needs common machinery, there is a few tools to quickly build transformations, most of
them requiring your class to subclass :class:`bonobo.config.Configurable`.
Configurables allows to use the following features:
* You can add **Options** (using the :class:`bonobo.config.Option` descriptor). Options can be positional, or keyword
based, can have a default value and will be consumed from the constructor arguments.
.. code-block:: python
from bonobo.config import Configurable, Option
class PrefixIt(Configurable):
prefix = Option(str, positional=True, default='>>>')
def call(self, row):
return self.prefix + ' ' + row
prefixer = PrefixIt('$')
* You can add **Services** (using the :class:`bonobo.config.Service` descriptor). Services are a subclass of
:class:`bonobo.config.Option`, sharing the same basics, but specialized in the definition of "named services" that
will be resolved at runtime (a.k.a for which we will provide an implementation at runtime). We'll dive more into that
in the next section
.. code-block:: python
from bonobo.config import Configurable, Option, Service
class HttpGet(Configurable):
url = Option(default='https://jsonplaceholder.typicode.com/users')
http = Service('http.client')
def call(self, http):
resp = http.get(self.url)
for row in resp.json():
yield row
http_get = HttpGet()
* You can add **Methods** (using the :class:`bonobo.config.Method` descriptor). :class:`bonobo.config.Method` is a
subclass of :class:`bonobo.config.Option` that allows to pass callable parameters, either to the class constructor,
or using the class as a decorator.
.. code-block:: python
from bonobo.config import Configurable, Method
class Applier(Configurable):
apply = Method()
def call(self, row):
return self.apply(row)
@Applier
def Prefixer(self, row):
return 'Hello, ' + row
prefixer = Prefixer()
* You can add **ContextProcessors**, which are an advanced feature we won't introduce here. If you're familiar with
pytest, you can think of them as pytest fixtures, execution wise.
Services
::::::::
The motivation behind services is mostly separation of concerns, testability and deployability.
Usually, your transformations will depend on services (like a filesystem, an http client, a database, a rest api, ...).
Those services can very well be hardcoded in the transformations, but there is two main drawbacks:
* You won't be able to change the implementation depending on the current environment (development laptop versus
production servers, bug-hunting session versus execution, etc.)
* You won't be able to test your transformations without testing the associated services.
To overcome those caveats of hardcoding things, we define Services in the configurable, which are basically
string-options of the service names, and we provide an implementation at the last moment possible.
There are two ways of providing implementations:
* Either file-wide, by providing a `get_services()` function that returns a dict of named implementations (we did so
with filesystems in the previous step, :doc:`tut02`)
* Either directory-wide, by providing a `get_services()` function in a specially named `_services.py` file.
The first is simpler if you only have one transformation graph in one file, the second allows to group coherent
transformations together in a directory and share the implementations.
Let's see how to use it, starting from the previous service example:
.. code-block:: python
from bonobo.config import Configurable, Option, Service
class HttpGet(Configurable):
url = Option(default='https://jsonplaceholder.typicode.com/users')
http = Service('http.client')
def call(self, http):
resp = http.get(self.url)
for row in resp.json():
yield row
We defined an "http.client" service, that obviously should have a `get()` method, returning responses that have a
`json()` method.
Let's provide two implementations for that. The first one will be using `requests <http://docs.python-requests.org/>`_,
that coincidally satisfies the described interface:
.. code-block:: python
import bonobo
import requests
def get_services():
return {
'http.client': requests
}
graph = bonobo.Graph(
HttpGet(),
print,
)
If you run this code, you should see some mock data returned by the webservice we called (assuming it's up and you can
reach it).
Now, the second implementation will replace that with a mock, used for testing purposes:
.. code-block:: python
class HttpResponseStub:
def json(self):
return [
{'id': 1, 'name': 'Leanne Graham', 'username': 'Bret', 'email': 'Sincere@april.biz', 'address': {'street': 'Kulas Light', 'suite': 'Apt. 556', 'city': 'Gwenborough', 'zipcode': '92998-3874', 'geo': {'lat': '-37.3159', 'lng': '81.1496'}}, 'phone': '1-770-736-8031 x56442', 'website': 'hildegard.org', 'company': {'name': 'Romaguera-Crona', 'catchPhrase': 'Multi-layered client-server neural-net', 'bs': 'harness real-time e-markets'}},
{'id': 2, 'name': 'Ervin Howell', 'username': 'Antonette', 'email': 'Shanna@melissa.tv', 'address': {'street': 'Victor Plains', 'suite': 'Suite 879', 'city': 'Wisokyburgh', 'zipcode': '90566-7771', 'geo': {'lat': '-43.9509', 'lng': '-34.4618'}}, 'phone': '010-692-6593 x09125', 'website': 'anastasia.net', 'company': {'name': 'Deckow-Crist', 'catchPhrase': 'Proactive didactic contingency', 'bs': 'synergize scalable supply-chains'}},
]
class HttpStub:
def get(self, url):
return HttpResponseStub()
def get_services():
return {
'http.client': HttpStub()
}
graph = bonobo.Graph(
HttpGet(),
print,
)
The `Graph` definition staying the exact same, you can easily substitute the `_services.py` file depending on your
environment (the way you're doing this is out of bonobo scope and heavily depends on your usual way of managing
configuration files on different platforms).
Starting with bonobo 0.5 (not yet released), you will be able to use service injections with function-based
transformations too, using the `bonobo.config.requires` decorator to mark a dependency.
.. code-block:: python
from bonobo.config import requires
@requires('http.client')
def http_get(http):
resp = http.get('https://jsonplaceholder.typicode.com/users')
for row in resp.json():
yield row
Read more
:::::::::
* :doc:`/guide/services`
* :doc:`/reference/api_config`
Next
::::
:doc:`tut04`.

View File

@ -1,216 +0,0 @@
Working with databases
======================
.. include:: _outdated_note.rst
Databases (and especially SQL databases here) are not the focus of Bonobo, thus support for it is not (and will never
be) included in the main package. Instead, working with databases is done using third party, well maintained and
specialized packages, like SQLAlchemy, or other database access libraries from the python cheese shop.
.. note::
SQLAlchemy extension is not yet complete. Things may be not optimal, and some APIs will change. You can still try,
of course.
Consider the following document as a "preview" (yes, it should work, yes it may break in the future).
Also, note that for early development stages, we explicitely support only PostreSQL, although it may work well
with `any other database supported by SQLAlchemy <http://docs.sqlalchemy.org/en/latest/core/engines.html#supported-databases>`_.
First, read https://www.bonobo-project.org/with/sqlalchemy for instructions on how to install. You **do need** the
bleeding edge version of `bonobo` and `bonobo-sqlalchemy` to make this work.
Requirements
::::::::::::
Once you installed `bonobo_sqlalchemy` (read https://www.bonobo-project.org/with/sqlalchemy to use bleeding edge
version), install the following additional packages:
.. code-block:: shell-session
$ pip install -U python-dotenv psycopg2 awesome-slugify
Those packages are not required by the extension, but `python-dotenv` will help us configure the database DSN, and
`psycopg2` is required by SQLAlchemy to connect to PostgreSQL databases. Also, we'll use a slugifier to create unique
identifiers for the database (maybe not what you'd do in the real world, but very much sufficient for example purpose).
Configure a database engine
:::::::::::::::::::::::::::
Open your `_services.py` file and replace the code:
.. code-block:: python
import bonobo, dotenv, logging, os
from bonobo_sqlalchemy.util import create_postgresql_engine
dotenv.load_dotenv(dotenv.find_dotenv())
logging.getLogger('sqlalchemy.engine').setLevel(logging.INFO)
def get_services():
return {
'fs': bonobo.open_examples_fs('datasets'),
'fs.output': bonobo.open_fs(),
'sqlalchemy.engine': create_postgresql_engine(**{
'name': 'tutorial',
'user': 'tutorial',
'pass': 'tutorial',
})
}
The `create_postgresql_engine` is a tiny function building the DSN from reasonable defaults, that you can override
either by providing kwargs, or with system environment variables. If you want to override something, open the `.env`
file and add values for one or more of `POSTGRES_NAME`, `POSTGRES_USER`, 'POSTGRES_PASS`, `POSTGRES_HOST`,
`POSTGRES_PORT`. Please note that kwargs always have precedence on environment, but that you should prefer using
environment variables for anything that is not immutable from one platform to another.
Add database operation to the graph
:::::::::::::::::::::::::::::::::::
Let's create a `tutorial/pgdb.py` job:
.. code-block:: python
import bonobo
import bonobo_sqlalchemy
from bonobo.examples.tutorials.tut02e03_writeasmap import graph, split_one_to_map
graph = graph.copy()
graph.add_chain(
bonobo_sqlalchemy.InsertOrUpdate('coffeeshops'),
_input=split_one_to_map
)
Notes here:
* We use the code from :doc:`tut02`, which is bundled with bonobo in the `bonobo.examples.tutorials` package.
* We "fork" the graph, by creating a copy and appending a new "chain", starting at a point that exists in the other
graph.
* We use :class:`bonobo_sqlalchemy.InsertOrUpdate` (which role, in case it is not obvious, is to create database rows if
they do not exist yet, or update the existing row, based on a "discriminant" criteria (by default, "id")).
If we run this transformation (with `bonobo run tutorial/pgdb.py`), we should get an error:
.. code-block:: text
| File ".../lib/python3.6/site-packages/psycopg2/__init__.py", line 130, in connect
| conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
| sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL: database "tutorial" does not exist
|
|
| The above exception was the direct cause of the following exception:
|
| Traceback (most recent call last):
| File ".../bonobo-devkit/bonobo/bonobo/strategies/executor.py", line 45, in _runner
| node_context.start()
| File ".../bonobo-devkit/bonobo/bonobo/execution/base.py", line 75, in start
| self._stack.setup(self)
| File ".../bonobo-devkit/bonobo/bonobo/config/processors.py", line 94, in setup
| _append_to_context = next(_processed)
| File ".../bonobo-devkit/bonobo-sqlalchemy/bonobo_sqlalchemy/writers.py", line 43, in create_connection
| raise UnrecoverableError('Could not create SQLAlchemy connection: {}.'.format(str(exc).replace('\n', ''))) from exc
| bonobo.errors.UnrecoverableError: Could not create SQLAlchemy connection: (psycopg2.OperationalError) FATAL: database "tutorial" does not exist.
The database we requested do not exist. It is not the role of bonobo to do database administration, and thus there is
no tool here to create neither the database, nor the tables we want to use.
Create database and table
:::::::::::::::::::::::::
There are however tools in `sqlalchemy` to manage tables, so we'll create the database by ourselves, and ask sqlalchemy
to create the table:
.. code-block:: shell-session
$ psql -U postgres -h localhost
psql (9.6.1, server 9.6.3)
Type "help" for help.
postgres=# CREATE ROLE tutorial WITH LOGIN PASSWORD 'tutorial';
CREATE ROLE
postgres=# CREATE DATABASE tutorial WITH OWNER=tutorial TEMPLATE=template0 ENCODING='utf-8';
CREATE DATABASE
Now, let's use a little trick and add this section to `pgdb.py`:
.. code-block:: python
import sys
from sqlalchemy import Table, Column, String, Integer, MetaData
def main():
from bonobo.commands.run import get_default_services
services = get_default_services(__file__)
if len(sys.argv) == 1:
return bonobo.run(graph, services=services)
elif len(sys.argv) == 2 and sys.argv[1] == 'reset':
engine = services.get('sqlalchemy.engine')
metadata = MetaData()
coffee_table = Table(
'coffeeshops',
metadata,
Column('id', String(255), primary_key=True),
Column('name', String(255)),
Column('address', String(255)),
)
metadata.drop_all(engine)
metadata.create_all(engine)
else:
raise NotImplementedError('I do not understand.')
if __name__ == '__main__':
main()
.. note::
We're using private API of bonobo here, which is unsatisfactory, discouraged and may change. Some way to get the
service dictionnary will be added to the public api in a future release of bonobo.
Now run:
.. code-block:: python
$ python tutorial/pgdb.py reset
Database and table should now exist.
Format the data
:::::::::::::::
Let's prepare our data for database, and change the `.add_chain(..)` call to do it prior to `InsertOrUpdate(...)`
.. code-block:: python
from slugify import slugify_url
def format_for_db(row):
name, address = list(row.items())[0]
return {
'id': slugify_url(name),
'name': name,
'address': address,
}
# ...
graph = graph.copy()
graph.add_chain(
format_for_db,
bonobo_sqlalchemy.InsertOrUpdate('coffeeshops'),
_input=split_one_to_map
)
Run!
::::
You can now run the script (either with `bonobo run tutorial/pgdb.py` or directly with the python interpreter, as we
added a "main" section) and the dataset should be inserted in your database. If you run it again, no new rows are
created.
Note that as we forked the graph from :doc:`tut02`, the transformation also writes the data to `coffeeshops.json`, as
before.