[docs] rewriting the tutorial.

2018-01-14 15:26:04 +01:00
parent c311b05a42
commit 9af5d80171
4 changed files with 119 additions and 178 deletions
--- a/docs/conf.py
+++ b/docs/conf.py
@ -188,7 +188,11 @@ epub_copyright = copyright
 epub_exclude_files = ['search.html']
 # Example configuration for intersphinx: refer to the Python standard library.
-intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+intersphinx_mapping = {
    'python': ('https://docs.python.org/3', None),
    'fs': ('https://docs.pyfilesystem.org/en/latest/', None),
    'requests': ('http://docs.python-requests.org/en/master/', None),
 }
 rst_epilog = """
 .. |bonobo| replace:: **Bonobo**
--- a/docs/tutorial/3-files.rst
+++ b/docs/tutorial/3-files.rst
@ -1,8 +1,6 @@
 Part 3: Working with Files
 ==========================
 .. include:: _wip_note.rst
 Writing to the console is nice, but let's be serious, real world will require us to use files or external services.
 Let's see how to use a few builtin writers and both local and remote filesystems.
--- a/docs/tutorial/4-services.rst
+++ b/docs/tutorial/4-services.rst
@ -1,201 +1,99 @@
-Part 4: Services and Configurables
+Part 4: Services
-==================================
+================
-.. include:: _wip_note.rst
+All external dependencies (like filesystems, network clients, database connections, etc.) should be provided to
 transformations as a service. It allows great flexibility, including the ability to test your transformations isolated
 from the external world, and being friendly to the infrastructure guys (and if you're one of them, it's also nice to
 treat yourself well).
-In the last section, we used a few new tools.
+In the last section, we used the `fs` service to access filesystems, we'll go even further by switching our `requests`
-
+call to use the `http` service, so we can switch the `requests` session at runtime. We'll use it to add an http cache,
-Class-based transformations and configurables
+which is a great thing to avoid hammering a remote API.
 :::::::::::::::::::::::::::::::::::::::::::::
 Bonobo is a bit dumb. If something is callable, it considers it can be used as a transformation, and it's up to the
 user to provide callables that logically fits in a graph.
 You can use plain python objects with a `__call__()` method, and it ill just work.
 As a lot of transformations needs common machinery, there is a few tools to quickly build transformations, most of
 them requiring your class to subclass :class:`bonobo.config.Configurable`.
 Configurables allows to use the following features:
 * You can add **Options** (using the :class:`bonobo.config.Option` descriptor). Options can be positional, or keyword
  based, can have a default value and will be consumed from the constructor arguments.
    .. code-block:: python
        from bonobo.config import Configurable, Option
        class PrefixIt(Configurable):
            prefix = Option(str, positional=True, default='>>>')
            def __call__(self, row):
                return self.prefix + ' ' + row
        prefixer = PrefixIt('$')
 * You can add **Services** (using the :class:`bonobo.config.Service` descriptor). Services are a subclass of
  :class:`bonobo.config.Option`, sharing the same basics, but specialized in the definition of "named services" that
  will be resolved at runtime (a.k.a for which we will provide an implementation at runtime). We'll dive more into that
  in the next section
    .. code-block:: python
        from bonobo.config import Configurable, Option, Service
        class HttpGet(Configurable):
            url = Option(default='https://jsonplaceholder.typicode.com/users')
            http = Service('http.client')
            def __call__(self, http):
                resp = http.get(self.url)
                for row in resp.json():
                    yield row
        http_get = HttpGet()
-* You can add **Methods** (using the :class:`bonobo.config.Method` descriptor). :class:`bonobo.config.Method` is a
+Default services
-  subclass of :class:`bonobo.config.Option` that allows to pass callable parameters, either to the class constructor,
+::::::::::::::::
  or using the class as a decorator.
-    .. code-block:: python
+As a default, |bonobo| provides only two services:
-        from bonobo.config import Configurable, Method
+* `fs`, a :obj:`fs.osfs.OSFS` object to access files.
 * `http`, a :obj:`requests.Session` object to access the Web.
        class Applier(Configurable):
            apply = Method()
-            def __call__(self, row):
+Overriding services
-                return self.apply(row)
+:::::::::::::::::::
-        @Applier
+You can override the default services, or define your own services, by providing a dictionary to the `services=`
-        def Prefixer(self, row):
+argument of :obj:`bonobo.run`:
            return 'Hello, ' + row
        prefixer = Prefixer()
 * You can add **ContextProcessors**, which are an advanced feature we won't introduce here. If you're familiar with
  pytest, you can think of them as pytest fixtures, execution wise.
 Services
 ::::::::
 The motivation behind services is mostly separation of concerns, testability and deployability.
 Usually, your transformations will depend on services (like a filesystem, an http client, a database, a rest api, ...).
 Those services can very well be hardcoded in the transformations, but there is two main drawbacks:
 * You won't be able to change the implementation depending on the current environment (development laptop versus
  production servers, bug-hunting session versus execution, etc.)
 * You won't be able to test your transformations without testing the associated services.
 To overcome those caveats of hardcoding things, we define Services in the configurable, which are basically
 string-options of the service names, and we provide an implementation at the last moment possible.
 There are two ways of providing implementations:
 * Either file-wide, by providing a `get_services()` function that returns a dict of named implementations (we did so
  with filesystems in the previous step, :doc:`tut02`)
 * Either directory-wide, by providing a `get_services()` function in a specially named `_services.py` file.
 The first is simpler if you only have one transformation graph in one file, the second allows to group coherent
 transformations together in a directory and share the implementations.
 Let's see how to use it, starting from the previous service example:
 .. code-block:: python
    from bonobo.config import Configurable, Option, Service
    class HttpGet(Configurable):
        url = Option(default='https://jsonplaceholder.typicode.com/users')
        http = Service('http.client')
        def __call__(self, http):
            resp = http.get(self.url)
            for row in resp.json():
                yield row
 We defined an "http.client" service, that obviously should have a `get()` method, returning responses that have a
 `json()` method.
 Let's provide two implementations for that. The first one will be using `requests <http://docs.python-requests.org/>`_,
 that coincidally satisfies the described interface:
 .. code-block:: python
    import bonobo
    import requests
    def get_services():
        http = requests.Session()
        http.headers = {'User-Agent': 'Monkeys!'}
        return {
-            'http.client': requests
+            'http': http
        }
-    graph = bonobo.Graph(
+Switching requests to use the service
-        HttpGet(),
+:::::::::::::::::::::::::::::::::::::
        print,
    )
-If you run this code, you should see some mock data returned by the webservice we called (assuming it's up and you can
+Let's replace the :obj:`requests.get` call we used in the first steps to use the `http` service:
 reach it).
 Now, the second implementation will replace that with a mock, used for testing purposes:
 .. code-block:: python
-    class HttpResponseStub:
+    from bonobo.config import use
        def json(self):
            return [
                {'id': 1, 'name': 'Leanne Graham', 'username': 'Bret', 'email': 'Sincere@april.biz', 'address': {'street': 'Kulas Light', 'suite': 'Apt. 556', 'city': 'Gwenborough', 'zipcode': '92998-3874', 'geo': {'lat': '-37.3159', 'lng': '81.1496'}}, 'phone': '1-770-736-8031 x56442', 'website': 'hildegard.org', 'company': {'name': 'Romaguera-Crona', 'catchPhrase': 'Multi-layered client-server neural-net', 'bs': 'harness real-time e-markets'}},
                {'id': 2, 'name': 'Ervin Howell', 'username': 'Antonette', 'email': 'Shanna@melissa.tv', 'address': {'street': 'Victor Plains', 'suite': 'Suite 879', 'city': 'Wisokyburgh', 'zipcode': '90566-7771', 'geo': {'lat': '-43.9509', 'lng': '-34.4618'}}, 'phone': '010-692-6593 x09125', 'website': 'anastasia.net', 'company': {'name': 'Deckow-Crist', 'catchPhrase': 'Proactive didactic contingency', 'bs': 'synergize scalable supply-chains'}},
            ]
-    class HttpStub:
+    @use('http')
-        def get(self, url):
+    def extract_fablabs(http):
-            return HttpResponseStub()
+        yield from http.get(FABLABS_API_URL).json().get('records')
-    def get_services():
+Tadaa, done! You're not anymore tied to a specific implementation, but to whatever :obj:`requests` compatible object the
-        return {
+user want to provide.
            'http.client': HttpStub()
        }
-    graph = bonobo.Graph(
+Adding cache
-        HttpGet(),
+::::::::::::
        print,
    )
-The `Graph` definition staying the exact same, you can easily substitute the `_services.py` file depending on your
+Let's demonstrate the flexibility of this approach by adding some local cache for HTTP requests, to avoid hammering the
-environment (the way you're doing this is out of bonobo scope and heavily depends on your usual way of managing
+API endpoint as we run our tests.
 configuration files on different platforms).
-Starting with bonobo 0.5 (not yet released), you will be able to use service injections with function-based
+First, let's install `requests-cache`:
-transformations too, using the `bonobo.config.requires` decorator to mark a dependency.
+
 .. code-block:: shell-session
    $ pip install requests-cache
 Then, let's switch the implementation, conditionally.
 .. code-block:: python
-    from bonobo.config import requires
+    def get_services(use_cache=False):
        if use_cache:
            from requests_cache import CachedSession
            http = CachedSession('http.cache')
        else:
            import requests
            http = requests.Session()
-    @requires('http.client')
+        return {
-    def http_get(http):
+            'http': http
-        resp = http.get('https://jsonplaceholder.typicode.com/users')
+        }
-        for row in resp.json():
+Then in the main block, let's add support for a `--use-cache` argument:
            yield row
 .. code-block:: python
-Read more
+    if __name__ == '__main__':
-:::::::::
+        parser = bonobo.get_argument_parser()
        parser.add_argument('--use-cache', action='store_true', default=False)
-* :doc:`/guide/services`
+        with bonobo.parse_args(parser) as options:
-* :doc:`/reference/api_config`
+            bonobo.run(get_graph(**options), services=get_services(**options))
-Next
+And you're done! Now, you can switch from using or not the cache using the `--use-cache` argument in command line when
-::::
+running your job.
 :doc:`tut04`.
 Moving forward
@ -203,6 +101,9 @@ Moving forward
 You now know:
-* How to ...
+* How to use builtin service implementations
 * How to override a service
 * How to define your own service
 * How to tune the default argument parser
 It's now time to jump to :doc:`5-packaging`.
--- a/docs/tutorial/5-packaging.rst
+++ b/docs/tutorial/5-packaging.rst
@ -1,32 +1,67 @@
 Part 5: Projects and Packaging
 ==============================
 .. include:: _wip_note.rst
 Until then, we worked with one file managing a job.
 Real life often involves more complicated setups, with relations and imports between different files.
 This section will describe the options available to move this file into a package, either a new one or something
 that already exists in your own project.
 Data processing is something a wide variety of tools may want to include, and thus |bonobo| does not enforce any
-kind of project structure, as the targert structure will be dicated by the hosting project. For example, a `pipelines`
+kind of project structure, as the target structure will be dictated by the hosting project. For example, a `pipelines`
 sub-package would perfectly fit a django or flask project, or even a regular package, but it's up to you to chose the
 structure of your project.
 is about set of jobs working together within a project.
-Let's see how to move from the current status to a package.
+Imports mechanism
 :::::::::::::::::
 |bonobo| does not enforce anything on how the python import mechanism work. Especially, it won't add anything to your
 `sys.path`, unlike some popular projects, because we're not sure that's something you want.
 If you want to use imports, you should move your script in a python package, and it's up to you to have it setup
 correctly.
 Moving into an existing project
 :::::::::::::::::::::::::::::::
 First, and quite popular option, is to move your ETL job file into a package that already exists.
 For example, it can be your existing software, eventually using some frameworks like django, flask, twisted, celery...
 Name yours!
 We suggest, but nothing is compulsory, that you decide on a namespace that will hold all your ETL pipelines and move all
 your jobs in it. For example, it can be `mypkg.pipelines`.
 Creating a brand new package
 ::::::::::::::::::::::::::::
 Because you're maybe starting a project with the data-engineering part, then you may not have a python package yet. As
 it can be a bit tedious to setup right, there is an helper, using `Medikit <http://medikit.rdc.li/en/latest/>`_, that
 you can use to create a brand new project:
 .. code-block:: shell-session
    $ bonobo init --package pipelines
 Answer a few questions, and you should now have a `pipelines` package, with an example transformation in it.
 You can now follow the instructions on how to install it (`pip install --editable pipelines`), and the import mechanism
 will work "just right" in it.
 Common stuff
 ::::::::::::
 Probably, you'll want to separate the `get_services()` factory from your pipelines, and just import it, as the
 dependencies may very well be project wide.
 But hey, it's just python! You're at home, now!
 Moving forward
 ::::::::::::::
 You now know:
 * How to ...
 That's the end of the tutorial, you should now be familiar with all the basics.
 A few appendixes to the tutorial can explain how to integrate with other systems (we'll use the "fablabs" application
@ -40,6 +75,9 @@ created in this tutorial and extend it):
 Then, you can either to jump head-first into your code, or you can have a better grasp at all concepts by
 :doc:`reading the full bonobo guide </guide/index>`.
 You should also `join the slack community <https://bonobo-slack.herokuapp.com/>`_ and ask all your questions there! No
 need to stay alone, and the only stupid question is the one nobody asks!
 Happy data flows!