[docs] rewriting the tutorial.

2018-01-14 15:26:04 +01:00
parent c311b05a42
commit 9af5d80171
4 changed files with 119 additions and 178 deletions
--- a/docs/conf.py
+++ b/docs/conf.py
@ -188,7 +188,11 @@ epub_copyright = copyright
 epub_exclude_files = ['search.html']

 # Example configuration for intersphinx: refer to the Python standard library.
-intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+intersphinx_mapping = {
+    'python': ('https://docs.python.org/3', None),
+    'fs': ('https://docs.pyfilesystem.org/en/latest/', None),
+    'requests': ('http://docs.python-requests.org/en/master/', None),
+}

 rst_epilog = """
 .. |bonobo| replace:: **Bonobo**
--- a/docs/tutorial/3-files.rst
+++ b/docs/tutorial/3-files.rst
@ -1,8 +1,6 @@
 Part 3: Working with Files
 ==========================

-.. include:: _wip_note.rst
-
 Writing to the console is nice, but let's be serious, real world will require us to use files or external services.

 Let's see how to use a few builtin writers and both local and remote filesystems.
--- a/docs/tutorial/4-services.rst
+++ b/docs/tutorial/4-services.rst
@ -1,201 +1,99 @@
-Part 4: Services and Configurables
-==================================
+Part 4: Services
+================

-.. include:: _wip_note.rst
+All external dependencies (like filesystems, network clients, database connections, etc.) should be provided to
+transformations as a service. It allows great flexibility, including the ability to test your transformations isolated
+from the external world, and being friendly to the infrastructure guys (and if you're one of them, it's also nice to
+treat yourself well).

-In the last section, we used a few new tools.
-
-Class-based transformations and configurables
-:::::::::::::::::::::::::::::::::::::::::::::
-
-Bonobo is a bit dumb. If something is callable, it considers it can be used as a transformation, and it's up to the
-user to provide callables that logically fits in a graph.
-
-You can use plain python objects with a `__call__()` method, and it ill just work.
-
-As a lot of transformations needs common machinery, there is a few tools to quickly build transformations, most of
-them requiring your class to subclass :class:`bonobo.config.Configurable`.
-
-Configurables allows to use the following features:
-
-* You can add **Options** (using the :class:`bonobo.config.Option` descriptor). Options can be positional, or keyword
-  based, can have a default value and will be consumed from the constructor arguments.
-
-    .. code-block:: python
-
-        from bonobo.config import Configurable, Option
-
-        class PrefixIt(Configurable):
-            prefix = Option(str, positional=True, default='>>>')
-
-            def __call__(self, row):
-                return self.prefix + ' ' + row
-
-        prefixer = PrefixIt('$')
-
-* You can add **Services** (using the :class:`bonobo.config.Service` descriptor). Services are a subclass of
-  :class:`bonobo.config.Option`, sharing the same basics, but specialized in the definition of "named services" that
-  will be resolved at runtime (a.k.a for which we will provide an implementation at runtime). We'll dive more into that
-  in the next section
-
-    .. code-block:: python
-
-        from bonobo.config import Configurable, Option, Service
-
-        class HttpGet(Configurable):
-            url = Option(default='https://jsonplaceholder.typicode.com/users')
-            http = Service('http.client')
-
-            def __call__(self, http):
-                resp = http.get(self.url)
-
-                for row in resp.json():
-                    yield row
-
-        http_get = HttpGet()
+In the last section, we used the `fs` service to access filesystems, we'll go even further by switching our `requests`
+call to use the `http` service, so we can switch the `requests` session at runtime. We'll use it to add an http cache,
+which is a great thing to avoid hammering a remote API.


-* You can add **Methods** (using the :class:`bonobo.config.Method` descriptor). :class:`bonobo.config.Method` is a
-  subclass of :class:`bonobo.config.Option` that allows to pass callable parameters, either to the class constructor,
-  or using the class as a decorator.
+Default services
+::::::::::::::::

-    .. code-block:: python
+As a default, |bonobo| provides only two services:

-        from bonobo.config import Configurable, Method
+* `fs`, a :obj:`fs.osfs.OSFS` object to access files.
+* `http`, a :obj:`requests.Session` object to access the Web.

-        class Applier(Configurable):
-            apply = Method()

-            def __call__(self, row):
-                return self.apply(row)
+Overriding services
+:::::::::::::::::::

-        @Applier
-        def Prefixer(self, row):
-            return 'Hello, ' + row
-
-        prefixer = Prefixer()
-
-* You can add **ContextProcessors**, which are an advanced feature we won't introduce here. If you're familiar with
-  pytest, you can think of them as pytest fixtures, execution wise.
-
-Services
-::::::::
-
-The motivation behind services is mostly separation of concerns, testability and deployability.
-
-Usually, your transformations will depend on services (like a filesystem, an http client, a database, a rest api, ...).
-Those services can very well be hardcoded in the transformations, but there is two main drawbacks:
-
-* You won't be able to change the implementation depending on the current environment (development laptop versus
-  production servers, bug-hunting session versus execution, etc.)
-* You won't be able to test your transformations without testing the associated services.
-
-To overcome those caveats of hardcoding things, we define Services in the configurable, which are basically
-string-options of the service names, and we provide an implementation at the last moment possible.
-
-There are two ways of providing implementations:
-
-* Either file-wide, by providing a `get_services()` function that returns a dict of named implementations (we did so
-  with filesystems in the previous step, :doc:`tut02`)
-* Either directory-wide, by providing a `get_services()` function in a specially named `_services.py` file.
-
-The first is simpler if you only have one transformation graph in one file, the second allows to group coherent
-transformations together in a directory and share the implementations.
-
-Let's see how to use it, starting from the previous service example:
+You can override the default services, or define your own services, by providing a dictionary to the `services=`
+argument of :obj:`bonobo.run`:

 .. code-block:: python

-    from bonobo.config import Configurable, Option, Service
-
-    class HttpGet(Configurable):
-        url = Option(default='https://jsonplaceholder.typicode.com/users')
-        http = Service('http.client')
-
-        def __call__(self, http):
-            resp = http.get(self.url)
-
-            for row in resp.json():
-                yield row
-
-We defined an "http.client" service, that obviously should have a `get()` method, returning responses that have a
-`json()` method.
-
-Let's provide two implementations for that. The first one will be using `requests <http://docs.python-requests.org/>`_,
-that coincidally satisfies the described interface:
-
-.. code-block:: python
-
-    import bonobo
    import requests

    def get_services():
+        http = requests.Session()
+        http.headers = {'User-Agent': 'Monkeys!'}
        return {
-            'http.client': requests
+            'http': http
        }

-    graph = bonobo.Graph(
-        HttpGet(),
-        print,
-    )
+Switching requests to use the service
+:::::::::::::::::::::::::::::::::::::

-If you run this code, you should see some mock data returned by the webservice we called (assuming it's up and you can
-reach it).
-
-Now, the second implementation will replace that with a mock, used for testing purposes:
+Let's replace the :obj:`requests.get` call we used in the first steps to use the `http` service:

 .. code-block:: python

-    class HttpResponseStub:
-        def json(self):
-            return [
-                {'id': 1, 'name': 'Leanne Graham', 'username': 'Bret', 'email': 'Sincere@april.biz', 'address': {'street': 'Kulas Light', 'suite': 'Apt. 556', 'city': 'Gwenborough', 'zipcode': '92998-3874', 'geo': {'lat': '-37.3159', 'lng': '81.1496'}}, 'phone': '1-770-736-8031 x56442', 'website': 'hildegard.org', 'company': {'name': 'Romaguera-Crona', 'catchPhrase': 'Multi-layered client-server neural-net', 'bs': 'harness real-time e-markets'}},
-                {'id': 2, 'name': 'Ervin Howell', 'username': 'Antonette', 'email': 'Shanna@melissa.tv', 'address': {'street': 'Victor Plains', 'suite': 'Suite 879', 'city': 'Wisokyburgh', 'zipcode': '90566-7771', 'geo': {'lat': '-43.9509', 'lng': '-34.4618'}}, 'phone': '010-692-6593 x09125', 'website': 'anastasia.net', 'company': {'name': 'Deckow-Crist', 'catchPhrase': 'Proactive didactic contingency', 'bs': 'synergize scalable supply-chains'}},
-            ]
+    from bonobo.config import use

-    class HttpStub:
-        def get(self, url):
-            return HttpResponseStub()
+    @use('http')
+    def extract_fablabs(http):
+        yield from http.get(FABLABS_API_URL).json().get('records')

-    def get_services():
-        return {
-            'http.client': HttpStub()
-        }
+Tadaa, done! You're not anymore tied to a specific implementation, but to whatever :obj:`requests` compatible object the
+user want to provide.

-    graph = bonobo.Graph(
-        HttpGet(),
-        print,
-    )
+Adding cache
+::::::::::::

-The `Graph` definition staying the exact same, you can easily substitute the `_services.py` file depending on your
-environment (the way you're doing this is out of bonobo scope and heavily depends on your usual way of managing
-configuration files on different platforms).
+Let's demonstrate the flexibility of this approach by adding some local cache for HTTP requests, to avoid hammering the
+API endpoint as we run our tests.

-Starting with bonobo 0.5 (not yet released), you will be able to use service injections with function-based
-transformations too, using the `bonobo.config.requires` decorator to mark a dependency.
+First, let's install `requests-cache`:
+
+.. code-block:: shell-session
+
+    $ pip install requests-cache
+
+Then, let's switch the implementation, conditionally.

 .. code-block:: python

-    from bonobo.config import requires
+    def get_services(use_cache=False):
+        if use_cache:
+            from requests_cache import CachedSession
+            http = CachedSession('http.cache')
+        else:
+            import requests
+            http = requests.Session()

-    @requires('http.client')
-    def http_get(http):
-        resp = http.get('https://jsonplaceholder.typicode.com/users')
+        return {
+            'http': http
+        }

-        for row in resp.json():
-            yield row
+Then in the main block, let's add support for a `--use-cache` argument:

+.. code-block:: python

-Read more
-:::::::::
+    if __name__ == '__main__':
+        parser = bonobo.get_argument_parser()
+        parser.add_argument('--use-cache', action='store_true', default=False)

-* :doc:`/guide/services`
-* :doc:`/reference/api_config`
+        with bonobo.parse_args(parser) as options:
+            bonobo.run(get_graph(**options), services=get_services(**options))

-Next
-::::
-
-:doc:`tut04`.
+And you're done! Now, you can switch from using or not the cache using the `--use-cache` argument in command line when
+running your job.


 Moving forward
@ -203,6 +101,9 @@ Moving forward

 You now know:

-* How to ...
+* How to use builtin service implementations
+* How to override a service
+* How to define your own service
+* How to tune the default argument parser

 It's now time to jump to :doc:`5-packaging`.
--- a/docs/tutorial/5-packaging.rst
+++ b/docs/tutorial/5-packaging.rst
@ -1,32 +1,67 @@
 Part 5: Projects and Packaging
 ==============================

-.. include:: _wip_note.rst
-
 Until then, we worked with one file managing a job.

 Real life often involves more complicated setups, with relations and imports between different files.

-This section will describe the options available to move this file into a package, either a new one or something
-that already exists in your own project.
-
 Data processing is something a wide variety of tools may want to include, and thus |bonobo| does not enforce any
-kind of project structure, as the targert structure will be dicated by the hosting project. For example, a `pipelines`
+kind of project structure, as the target structure will be dictated by the hosting project. For example, a `pipelines`
 sub-package would perfectly fit a django or flask project, or even a regular package, but it's up to you to chose the
 structure of your project.

- is about set of jobs working together within a project.

-Let's see how to move from the current status to a package.
+Imports mechanism
+:::::::::::::::::
+
+|bonobo| does not enforce anything on how the python import mechanism work. Especially, it won't add anything to your
+`sys.path`, unlike some popular projects, because we're not sure that's something you want.
+
+If you want to use imports, you should move your script in a python package, and it's up to you to have it setup
+correctly.
+
+
+Moving into an existing project
+:::::::::::::::::::::::::::::::
+
+First, and quite popular option, is to move your ETL job file into a package that already exists.
+
+For example, it can be your existing software, eventually using some frameworks like django, flask, twisted, celery...
+Name yours!
+
+We suggest, but nothing is compulsory, that you decide on a namespace that will hold all your ETL pipelines and move all
+your jobs in it. For example, it can be `mypkg.pipelines`.
+
+
+Creating a brand new package
+::::::::::::::::::::::::::::
+
+Because you're maybe starting a project with the data-engineering part, then you may not have a python package yet. As
+it can be a bit tedious to setup right, there is an helper, using `Medikit <http://medikit.rdc.li/en/latest/>`_, that
+you can use to create a brand new project:
+
+.. code-block:: shell-session
+
+    $ bonobo init --package pipelines
+
+Answer a few questions, and you should now have a `pipelines` package, with an example transformation in it.
+
+You can now follow the instructions on how to install it (`pip install --editable pipelines`), and the import mechanism
+will work "just right" in it.
+
+
+Common stuff
+::::::::::::
+
+Probably, you'll want to separate the `get_services()` factory from your pipelines, and just import it, as the
+dependencies may very well be project wide.
+
+But hey, it's just python! You're at home, now!


 Moving forward
 ::::::::::::::

-You now know:
-
-* How to ...
-
 That's the end of the tutorial, you should now be familiar with all the basics.

 A few appendixes to the tutorial can explain how to integrate with other systems (we'll use the "fablabs" application
@ -40,6 +75,9 @@ created in this tutorial and extend it):
 Then, you can either to jump head-first into your code, or you can have a better grasp at all concepts by
 :doc:`reading the full bonobo guide </guide/index>`.

+You should also `join the slack community <https://bonobo-slack.herokuapp.com/>`_ and ask all your questions there! No
+need to stay alone, and the only stupid question is the one nobody asks!
+
 Happy data flows!