[docs] rewriting the tutorial.

This commit is contained in:
Romain Dorgueil
2018-01-14 14:25:42 +01:00
parent 8900c567d9
commit c311b05a42
19 changed files with 207 additions and 534 deletions

View File

@ -3,7 +3,7 @@ Part 3: Working with Files
.. include:: _wip_note.rst
Writing to the console is nice, but using files is probably more realistic.
Writing to the console is nice, but let's be serious, real world will require us to use files or external services.
Let's see how to use a few builtin writers and both local and remote filesystems.
@ -11,50 +11,129 @@ Let's see how to use a few builtin writers and both local and remote filesystems
Filesystems
:::::::::::
In |bonobo|, files are accessed within a **filesystem** service which must be something with the same interface as
`fs' FileSystem objects <https://docs.pyfilesystem.org/en/latest/builtin.html>`_. As a default, you'll get an instance
of a local filesystem mapped to the current working directory as the `fs` service. You'll learn more about services in
the next step, but for now, let's just use it.
In |bonobo|, files are accessed within a **filesystem** service (a `fs' FileSystem object
<https://docs.pyfilesystem.org/en/latest/builtin.html>`_).
As a default, you'll get an instance of a local filesystem mapped to the current working directory as the `fs` service.
You'll learn more about services in the next step, but for now, let's just use it.
Writing using the service
:::::::::::::::::::::::::
Writing to files
::::::::::::::::
Although |bonobo| contains helpers to write to common file formats, let's start by writing it manually.
To write in a file, we'll need to have an open file handle available during the whole transformation life.
We'll use a context processor to do so. A context processor is something very much like a
:obj:`contextlib.contextmanager`, that |bonobo| will use to run a setup/teardown logic on objects that need to have
the same lifecycle as a job execution.
Let's write one that just handle opening and closing the file:
.. code-block:: python
from bonobo.config import use
from bonobo.constants import NOT_MODIFIED
def with_opened_file(self, context):
with open('output.txt', 'w+') as f:
yield f
@use('fs')
def write_repr_to_file(*row, fs):
with fs.open('output.txt', 'a+') as f:
print(row, file=f)
return NOT_MODIFIED
Now, we need to write a `writer` transformation, and apply this context processor on it:
Then, update the `get_graph(...)` function, by adding `write_repr_to_file` just before your `PrettyPrinter()` node.
.. code-block:: python
Let's try to run that and think about what happens.
from bonobo.config import use_context_processor
Each time a row comes to this node, the output file is open in "append or create" mode, a line is written, and the file
is closed.
@use_context_processor(with_opened_file)
def write_repr_to_file(f, *row):
f.write(repr(row))
This is **NOT** how you want to do things. Let's rewrite it so our `open(...)` call becomes execution-wide.
The `f` parameter will contain the value yielded by the context processors, in order of appearance (you can chain
multiple context processors).
Please note that the :func:`bonobo.config.use_context_processor` decorator will modify the function in place, but won't
modify its behaviour. If you want to call it out of the |bonobo| job context, it's your responsibility to provide
the right parameters (and here, the opened file).
Using the filesystem
::::::::::::::::::::
We opened the output file using a hardcoded filename and filesystem implementation. Writing flexible jobs include the
ability to change the load targets at runtime, and |bonobo| suggest to use the `fs` service to achieve this with files.
Let's rewrite our context processor to use it.
.. code-block:: python
def with_opened_file(self, context):
with context.get_service('fs').open('output.txt', 'w+') as f:
yield f
Interface does not change much, but this small change allows the end-user to change the filesystem implementation at
runtime, which is great to handle different environments (local development, staging servers, production, ...).
Note that |bonobo| only provide very few services with default implementation (actually, only `fs` and `http`), but
you can define all the services you want, depending on your system. You'll learn more about this in the next tutorial
chapter.
Using a different filesystem
::::::::::::::::::::::::::::
* Filesystems
To change the `fs` implementation, you need to provide your implementation in the dict returned by `get_services()`.
* Reading files
Let's write to a remote location, which will be an Amazon S3 bucket. First, we need to install the driver:
* Writing files
.. code-block:: shell-session
* Writing files to S3
pip install fs-s3fs
* Atomic writes ???
Then, just provide the correct bucket to :func:`bonobo.open_fs`:
.. code-block:: python
def get_services(**options):
return {
'fs': bonobo.open_fs('s3://bonobo-examples')
}
.. note::
You must provide a bucket for which you have the write permission, and it's up to you to setup your amazon
credentials in such a way that `boto` can access your AWS account.
Using builtin writers
:::::::::::::::::::::
Until then, and to have a better understanding of what happens, we implemented our writers ourselves.
|bonobo| contains writers for a variety of standard file formats, and you're probably better off using builtin writers.
Let's use a :obj:`bonobo.CsvWriter` instance instead, by replacing our custom transformation in the graph factory
function:
.. code-block:: python
def get_graph(**options):
graph = bonobo.Graph()
graph.add_chain(
...
bonobo.CsvWriter('output.csv'),
)
return graph
Reading from files
::::::::::::::::::
Reading from files is done using the same logic as writing, except that you'll probably have only one call to a reader.
Our example application does not include reading from files, but you can read the file we just wrote by using a
:obj:`bonobo.CsvReader` instance.
Atomic writes
:::::::::::::
.. include:: _todo.rst
Moving forward
@ -62,6 +141,10 @@ Moving forward
You now know:
* How to ...
* How to use the filesystem (`fs`) service.
* How to read from files.
* How to write to files.
* How to substitute a service at runtime.
It's now time to jump to :doc:`4-services`.