Working on the new version of the tutorial. Only Step1 implemented.

2017-11-05 19:41:27 +01:00
parent eb393331cd
commit 8f3c4252b4
13 changed files with 586 additions and 43 deletions
--- a/bonobo/_api.py
+++ b/bonobo/_api.py
@ -10,16 +10,33 @@ __all__ = []


 def register_api(x, __all__=__all__):
+    """Register a function as being part of Bonobo's API, then returns the original function."""
    __all__.append(get_name(x))
    return x


+def register_graph_api(x, __all__=__all__):
+    """
+    Register a function as being part of Bonobo's API, after checking that its signature contains the right parameters
+    to work correctly, then returns the original function.
+    """
+    from inspect import signature
+    parameters = list(signature(x).parameters)
+    required_parameters = {'plugins', 'services', 'strategy'}
+    assert parameters[0] == 'graph', 'First parameter of a graph api function must be "graph".'
+    assert required_parameters.intersection(
+        parameters) == required_parameters, 'Graph api functions must define the following parameters: ' + ', '.join(
+        sorted(required_parameters))
+
+    return register_api(x, __all__=__all__)
+
+
 def register_api_group(*args):
    for attr in args:
        register_api(attr)


-@register_api
+@register_graph_api
 def run(graph, *, plugins=None, services=None, strategy=None):
    """
    Main entry point of bonobo. It takes a graph and creates all the necessary plumbery around to execute it.
@ -82,8 +99,8 @@ def _inspect_as_graph(graph):
 _inspect_formats = {'graph': _inspect_as_graph}


-@register_api
-def inspect(graph, *, format):
+@register_graph_api
+def inspect(graph, *, plugins=None, services=None, strategy=None, format):
    if not format in _inspect_formats:
        raise NotImplementedError(
            'Output format {} not implemented. Choices are: {}.'.format(
--- a/docs/_static/custom.css
+++ b/docs/_static/custom.css
@ -1,3 +1,19 @@
 svg {
    border: 2px solid green
-}
+}
+
+div.related {
+    width: 940px;
+    margin: 30px auto 0 auto;
+}
+
+@media screen and (max-width: 875px) {
+    div.related {
+        visibility: hidden;
+        display: none;
+    }
+}
+
+.brand {
+    font-family: 'Ubuntu', 'goudy old style', 'minion pro', 'bell mt', Georgia, 'Hiragino Mincho Pro', serif;
+}
--- a/docs/_templates/base.html
+++ b/docs/_templates/base.html
@ -4,17 +4,8 @@
 {%- block extrahead %}
 {{ super() }}
 <style>
-    div.related {
-        width: 940px;
-        margin: 30px auto 0 auto;
-    }
-    @media screen and (max-width: 875px) {
-        div.related {
-            visibility: hidden;
-            display: none;
-        }
-    }
 </style>
+<link href="https://fonts.googleapis.com/css?family=Ubuntu" rel="stylesheet">
 {% endblock %}

 {%- block footer %}
--- a/docs/conf.py
+++ b/docs/conf.py
@ -186,3 +186,12 @@ epub_exclude_files = ['search.html']

 # Example configuration for intersphinx: refer to the Python standard library.
 intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+
+rst_epilog = """
+.. |bonobo| replace:: **Bonobo**
+   
+.. |longversion| replace:: v.{version}
+
+""".format(
+    version = version,
+)
--- a/docs/tutorial/1-init.rst
+++ b/docs/tutorial/1-init.rst
@ -0,0 +1,258 @@
+Part 1: Let's get started!
+==========================
+
+To get started with |bonobo|, you need to install it in a working python 3.5+ environment (you should use a
+`virtualenv <https://virtualenv.pypa.io/>`_).
+
+.. code-block:: shell-session
+
+    $ pip install bonobo
+
+Check that the installation worked, and that you're using a version that matches this tutorial (written for bonobo
+|longversion|).
+
+.. code-block:: shell-session
+
+    $ bonobo version
+
+See :doc:`/install` for more options.
+
+
+Create an ETL job
+:::::::::::::::::
+
+Since Bonobo 0.6, it's easy to bootstrap a simple ETL job using just one file.
+
+We'll start here, and the later stages of the tutorial will guide you toward refactoring this to a python package.
+
+.. code-block:: shell-session
+
+    $ bonobo init tutorial.py
+
+This will create a simple job in a `tutorial.py` file. Let's run it:
+
+.. code-block:: shell-session
+
+    $ python tutorial.py
+    Hello
+    World
+     - extract in=1 out=2 [done]
+     - transform in=2 out=2 [done]
+     - load in=2 [done]
+
+If you have a similar result, then congratulations! You just ran your first |bonobo| ETL job.
+
+
+Inspect your graph
+::::::::::::::::::
+
+The basic building blocks of |bonobo| are **transformations** and **graphs**.
+
+**Transformations** are simple python callables (like functions) that handle a transformation step for a line of data.
+
+**Graphs** are a set of transformations, with directional links between them to define the data-flow that will happen
+at runtime.
+
+To inspect the graph of your first transformation (you must install graphviz first to do so), run:
+
+.. code-block:: shell-session
+
+    $ bonobo inspect --graph tutorial.py | dot -Tpng -o tutorial.png
+
+Open the generated `tutorial.png` file to have a quick look at the graph.
+
+.. graphviz::
+
+    digraph {
+      rankdir = LR;
+      "BEGIN" [shape="point"];
+      "BEGIN" -> {0 [label="extract"]};
+      {0 [label="extract"]} -> {1 [label="transform"]};
+      {1 [label="transform"]} -> {2 [label="load"]};
+    }
+
+You can easily understand here the structure of your graph. For such a simple graph, it's pretty much useless, but as
+you'll write more complex transformations, it will be helpful.
+
+
+Read the Code
+:::::::::::::
+
+Before we write our own job, let's look at the code we have in `tutorial.py`.
+
+
+Import
+------
+
+.. code-block:: python
+
+    import bonobo
+
+
+The highest level APIs of |bonobo| are all contained within the top level **bonobo** namespace.
+
+If you're a beginner with the library, stick to using only those APIs (they also are the most stable APIs).
+
+If you're an advanced user (and you'll be one quite soon), you can safely use second level APIs.
+
+The third level APIs are considered private, and you should not use them unless you're hacking on |bonobo| directly.
+
+
+Extract
+-------
+
+.. code-block:: python
+
+    def extract():
+        yield 'hello'
+        yield 'world'
+
+This is a first transformation, written as a python generator, that will send some strings, one after the other, to its
+output.
+
+Transformations that take no input and yields a variable number of outputs are usually called **extractors**. You'll
+encounter a few different types, either purely generating the data (like here), using an external service (a
+database, for example) or using some filesystem (which is considered an external service too).
+
+Extractors do not need to have its input connected to anything, and will be called exactly once when the graph is
+executed.
+
+
+Transform
+---------
+
+.. code-block:: python
+
+    def transform(*args):
+        yield tuple(
+            map(str.title, args)
+        )
+
+This is a second transformation. It will get called a bunch of times, once for each input row it gets, and apply some
+logic on the input to generate the output.
+
+This is the most **generic** case. For each input row, you can generate zero, one or many lines of output for each line
+of input.
+
+
+Load
+----
+
+.. code-block:: python
+
+    def load(*args):
+        print(*args)
+
+This is the third and last transformation in our "hello world" example. It will apply some logic to each row, and have
+absolutely no output.
+
+Transformations that take input and yields nothing are also called **loaders**. Like extractors, you'll encounter
+different types, to work with various external systems.
+
+Please note that as a convenience mean and because the cost is marginal, most builtin `loaders` will send their
+inputs to their output, so you can easily chain more than one loader, or apply more transformations after a given
+loader was applied.
+
+
+Graph Factory
+-------------
+
+.. code-block:: python
+
+    def get_graph(**options):
+        graph = bonobo.Graph()
+        graph.add_chain(extract, transform, load)
+        return graph
+
+All our transformations were defined above, but nothing ties them together, for now.
+
+This "graph factory" function is in charge of the creation and configuration of a :class:`bonobo.Graph` instance, that
+will be executed later.
+
+By no mean is |bonobo| limited to simple graphs like this one. You can add as many chains as you want, and each chain
+can contain as many nodes as you want.
+
+
+Services Factory
+----------------
+
+.. code-block:: python
+
+    def get_services(**options):
+        return {}
+
+This is the "services factory", that we'll use later to connect to external systems. Let's skip this one, for now.
+
+(we'll dive into this topic in :doc:`4-services`)
+
+
+Main Block
+----------
+
+.. code-block:: python
+
+    if __name__ == '__main__':
+        parser = bonobo.get_argument_parser()
+        with bonobo.parse_args(parser) as options:
+            bonobo.run(
+                get_graph(**options),
+                services=get_services(**options)
+            )
+
+Here, the real thing happens.
+
+Without diving into too much details for now, using the :func:`bonobo.parse_args` context manager will allow our job to
+be configurable, later, and although we don't really need it right now, it does not harm neither.
+
+Reading the output
+::::::::::::::::::
+
+Let's run this job once again:
+
+.. code-block:: shell-session
+
+    $ python tutorial.py
+    Hello
+    World
+     - extract in=1 out=2 [done]
+     - transform in=2 out=2 [done]
+     - load in=2 [done]
+
+The console output contains two things.
+
+* First, it contains the real output of your job (what was :func:`print`-ed to `sys.stdout`).
+* Second, it displays the execution status (on `sys.stderr`). Each line contains a "status" character, the node name,
+  numbers and a human readable status. This status will evolve in real time, and allows to understand a job's progress
+  while it's running.
+
+  * Status character:
+
+    * “ ” means that the node was not yet started.
+    * “`-`” means that the node finished its execution.
+    * “`+`” means that the node is currently running.
+    * “`!`” means that the node had problems running.
+
+  * Numerical statistics:
+
+    * “`in=...`” shows the input lines count, also known as the amount of calls to your transformation.
+    * “`out=...`” shows the output lines count.
+    * “`read=...`” shows the count of reads applied to an external system, if the transformation supports it.
+    * “`write=...`” shows the count of writes applied to an external system, if the transformation supports it.
+    * “`err=...`” shows the count of exceptions that happened while running the transformation. Note that exception will abort
+      a call, but the execution will move to the next row.
+
+
+Moving forward
+::::::::::::::
+
+That's all for this first step.
+
+You now know:
+
+* How to create a new job file.
+* How to inspect the content of a job file.
+* What should go in a job file.
+* How to execute a job file.
+* How to read the console output.
+
+**Next: :doc:`2-jobs`**
--- a/docs/tutorial/2-jobs.rst
+++ b/docs/tutorial/2-jobs.rst
@ -0,0 +1,12 @@
+Part 2: Writing ETL Jobs
+========================
+
+
+Moving forward
+::::::::::::::
+
+You now know:
+
+* How to ...
+
+**Next: :doc:`3-files`**
--- a/docs/tutorial/3-files.rst
+++ b/docs/tutorial/3-files.rst
@ -0,0 +1,12 @@
+Part 3: Working with Files
+==========================
+
+
+Moving forward
+::::::::::::::
+
+You now know:
+
+* How to ...
+
+**Next: :doc:`4-services`**
--- a/docs/tutorial/4-services.rst
+++ b/docs/tutorial/4-services.rst
@ -0,0 +1,210 @@
+Part 4: Services and Configurables
+==================================
+
+.. note::
+
+    This section lacks completeness, sorry for that (but you can still read it!).
+
+In the last section, we used a few new tools.
+
+Class-based transformations and configurables
+:::::::::::::::::::::::::::::::::::::::::::::
+
+Bonobo is a bit dumb. If something is callable, it considers it can be used as a transformation, and it's up to the
+user to provide callables that logically fits in a graph.
+
+You can use plain python objects with a `__call__()` method, and it ill just work.
+
+As a lot of transformations needs common machinery, there is a few tools to quickly build transformations, most of
+them requiring your class to subclass :class:`bonobo.config.Configurable`.
+
+Configurables allows to use the following features:
+
+* You can add **Options** (using the :class:`bonobo.config.Option` descriptor). Options can be positional, or keyword
+  based, can have a default value and will be consumed from the constructor arguments.
+
+    .. code-block:: python
+
+        from bonobo.config import Configurable, Option
+
+        class PrefixIt(Configurable):
+            prefix = Option(str, positional=True, default='>>>')
+
+            def call(self, row):
+                return self.prefix + ' ' + row
+
+        prefixer = PrefixIt('$')
+
+* You can add **Services** (using the :class:`bonobo.config.Service` descriptor). Services are a subclass of
+  :class:`bonobo.config.Option`, sharing the same basics, but specialized in the definition of "named services" that
+  will be resolved at runtime (a.k.a for which we will provide an implementation at runtime). We'll dive more into that
+  in the next section
+
+    .. code-block:: python
+
+        from bonobo.config import Configurable, Option, Service
+
+        class HttpGet(Configurable):
+            url = Option(default='https://jsonplaceholder.typicode.com/users')
+            http = Service('http.client')
+
+            def call(self, http):
+                resp = http.get(self.url)
+
+                for row in resp.json():
+                    yield row
+
+        http_get = HttpGet()
+
+
+* You can add **Methods** (using the :class:`bonobo.config.Method` descriptor). :class:`bonobo.config.Method` is a
+  subclass of :class:`bonobo.config.Option` that allows to pass callable parameters, either to the class constructor,
+  or using the class as a decorator.
+
+    .. code-block:: python
+
+        from bonobo.config import Configurable, Method
+
+        class Applier(Configurable):
+            apply = Method()
+
+            def call(self, row):
+                return self.apply(row)
+
+        @Applier
+        def Prefixer(self, row):
+            return 'Hello, ' + row
+
+        prefixer = Prefixer()
+
+* You can add **ContextProcessors**, which are an advanced feature we won't introduce here. If you're familiar with
+  pytest, you can think of them as pytest fixtures, execution wise.
+
+Services
+::::::::
+
+The motivation behind services is mostly separation of concerns, testability and deployability.
+
+Usually, your transformations will depend on services (like a filesystem, an http client, a database, a rest api, ...).
+Those services can very well be hardcoded in the transformations, but there is two main drawbacks:
+
+* You won't be able to change the implementation depending on the current environment (development laptop versus
+  production servers, bug-hunting session versus execution, etc.)
+* You won't be able to test your transformations without testing the associated services.
+
+To overcome those caveats of hardcoding things, we define Services in the configurable, which are basically
+string-options of the service names, and we provide an implementation at the last moment possible.
+
+There are two ways of providing implementations:
+
+* Either file-wide, by providing a `get_services()` function that returns a dict of named implementations (we did so
+  with filesystems in the previous step, :doc:`tut02`)
+* Either directory-wide, by providing a `get_services()` function in a specially named `_services.py` file.
+
+The first is simpler if you only have one transformation graph in one file, the second allows to group coherent
+transformations together in a directory and share the implementations.
+
+Let's see how to use it, starting from the previous service example:
+
+.. code-block:: python
+
+    from bonobo.config import Configurable, Option, Service
+
+    class HttpGet(Configurable):
+        url = Option(default='https://jsonplaceholder.typicode.com/users')
+        http = Service('http.client')
+
+        def call(self, http):
+            resp = http.get(self.url)
+
+            for row in resp.json():
+                yield row
+
+We defined an "http.client" service, that obviously should have a `get()` method, returning responses that have a
+`json()` method.
+
+Let's provide two implementations for that. The first one will be using `requests <http://docs.python-requests.org/>`_,
+that coincidally satisfies the described interface:
+
+.. code-block:: python
+
+    import bonobo
+    import requests
+
+    def get_services():
+        return {
+            'http.client': requests
+        }
+
+    graph = bonobo.Graph(
+        HttpGet(),
+        print,
+    )
+
+If you run this code, you should see some mock data returned by the webservice we called (assuming it's up and you can
+reach it).
+
+Now, the second implementation will replace that with a mock, used for testing purposes:
+
+.. code-block:: python
+
+    class HttpResponseStub:
+        def json(self):
+            return [
+                {'id': 1, 'name': 'Leanne Graham', 'username': 'Bret', 'email': 'Sincere@april.biz', 'address': {'street': 'Kulas Light', 'suite': 'Apt. 556', 'city': 'Gwenborough', 'zipcode': '92998-3874', 'geo': {'lat': '-37.3159', 'lng': '81.1496'}}, 'phone': '1-770-736-8031 x56442', 'website': 'hildegard.org', 'company': {'name': 'Romaguera-Crona', 'catchPhrase': 'Multi-layered client-server neural-net', 'bs': 'harness real-time e-markets'}},
+                {'id': 2, 'name': 'Ervin Howell', 'username': 'Antonette', 'email': 'Shanna@melissa.tv', 'address': {'street': 'Victor Plains', 'suite': 'Suite 879', 'city': 'Wisokyburgh', 'zipcode': '90566-7771', 'geo': {'lat': '-43.9509', 'lng': '-34.4618'}}, 'phone': '010-692-6593 x09125', 'website': 'anastasia.net', 'company': {'name': 'Deckow-Crist', 'catchPhrase': 'Proactive didactic contingency', 'bs': 'synergize scalable supply-chains'}},
+            ]
+
+    class HttpStub:
+        def get(self, url):
+            return HttpResponseStub()
+
+    def get_services():
+        return {
+            'http.client': HttpStub()
+        }
+
+    graph = bonobo.Graph(
+        HttpGet(),
+        print,
+    )
+
+The `Graph` definition staying the exact same, you can easily substitute the `_services.py` file depending on your
+environment (the way you're doing this is out of bonobo scope and heavily depends on your usual way of managing
+configuration files on different platforms).
+
+Starting with bonobo 0.5 (not yet released), you will be able to use service injections with function-based
+transformations too, using the `bonobo.config.requires` decorator to mark a dependency.
+
+.. code-block:: python
+
+    from bonobo.config import requires
+
+    @requires('http.client')
+    def http_get(http):
+        resp = http.get('https://jsonplaceholder.typicode.com/users')
+
+        for row in resp.json():
+            yield row
+
+
+Read more
+:::::::::
+
+* :doc:`/guide/services`
+* :doc:`/reference/api_config`
+
+Next
+::::
+
+:doc:`tut04`.
+
+
+Moving forward
+::::::::::::::
+
+You now know:
+
+* How to ...
+
+**Next: :doc:`5-packaging`**
--- a/docs/tutorial/5-packaging.rst
+++ b/docs/tutorial/5-packaging.rst
@ -0,0 +1,11 @@
+Part 5: Projects and Packaging
+==============================
+
+
+Moving forward
+::::::::::::::
+
+You now know:
+
+* How to ...
+
--- a/docs/tutorial/django.rst
+++ b/docs/tutorial/django.rst
@ -0,0 +1,3 @@
+Working with Django
+===================
+
--- a/docs/tutorial/index.rst
+++ b/docs/tutorial/index.rst
@ -17,47 +17,43 @@ Bonobo uses simple python and should be quick and easy to learn.
 Tutorial
 ::::::::

-.. note::
+.. toctree::
+    :maxdepth: 1

-    Good documentation is not easy to write. We do our best to make it better and better.
+    1-init
+    2-jobs
+    3-files
+    4-services
+    5-packaging

-    Although all content here should be accurate, you may feel a lack of completeness, for which we plead guilty and
-    apologize.
-
-    If you're stuck, please come and ask on our `slack channel <https://bonobo-slack.herokuapp.com/>`_, we'll figure
-    something out.
-
-    If you're not stuck but had trouble understanding something, please consider contributing to the docs (via GitHub
-    pull requests).
+More
+::::

 .. toctree::
-    :maxdepth: 2
-
-    tut01
-    tut02
-    tut03
-    tut04
+    :maxdepth: 1

+    django
+    notebooks
+    sqlalchemy

 What's next?
 ::::::::::::

-Read a few examples
-------------------
+* :doc:`The Bonobo Guide <../guide/index>`
+* :doc:`Extensions <../extension/index>`

-* :doc:`../reference/examples`

-Read about best development practices
-------------------------------------
+We're there!
+::::::::::::

-* :doc:`../guide/index`
-* :doc:`../guide/purity`
+Good documentation is not easy to write.

-Read about integrating external tools with bonobo
-------------------------------------------------
+Although all content here should be accurate, you may feel a lack of completeness, for which we plead guilty and
+apologize.

-* :doc:`../extension/docker`: run transformation graphs in isolated containers.
-* :doc:`../extension/jupyter`: run transformations within jupyter notebooks.
-* :doc:`../extension/selenium`: crawl the web using a real browser and work with the gathered data.
-* :doc:`../extension/sqlalchemy`: everything you need to interract with SQL databases.
+If you're stuck, please come to the `Bonobo Slack Channel <https://bonobo-slack.herokuapp.com/>`_ and we'll figure it
+out.
+
+If you're not stuck but had trouble understanding something, please consider contributing to the docs (using GitHub
+pull requests).

--- a/docs/tutorial/notebooks.rst
+++ b/docs/tutorial/notebooks.rst
@ -0,0 +1,4 @@
+Working with Jupyter Notebooks
+==============================
+
+
--- a/docs/tutorial/sqlalchemy.rst
+++ b/docs/tutorial/sqlalchemy.rst
@ -0,0 +1,4 @@
+Working with SQL Databases
+==========================
+
+