[docs] rewriting the tutorial.
This commit is contained in:
@ -3,7 +3,7 @@ Part 3: Working with Files
|
||||
|
||||
.. include:: _wip_note.rst
|
||||
|
||||
Writing to the console is nice, but using files is probably more realistic.
|
||||
Writing to the console is nice, but let's be serious, real world will require us to use files or external services.
|
||||
|
||||
Let's see how to use a few builtin writers and both local and remote filesystems.
|
||||
|
||||
@ -11,50 +11,129 @@ Let's see how to use a few builtin writers and both local and remote filesystems
|
||||
Filesystems
|
||||
:::::::::::
|
||||
|
||||
In |bonobo|, files are accessed within a **filesystem** service which must be something with the same interface as
|
||||
`fs' FileSystem objects <https://docs.pyfilesystem.org/en/latest/builtin.html>`_. As a default, you'll get an instance
|
||||
of a local filesystem mapped to the current working directory as the `fs` service. You'll learn more about services in
|
||||
the next step, but for now, let's just use it.
|
||||
In |bonobo|, files are accessed within a **filesystem** service (a `fs' FileSystem object
|
||||
<https://docs.pyfilesystem.org/en/latest/builtin.html>`_).
|
||||
|
||||
As a default, you'll get an instance of a local filesystem mapped to the current working directory as the `fs` service.
|
||||
You'll learn more about services in the next step, but for now, let's just use it.
|
||||
|
||||
|
||||
Writing using the service
|
||||
:::::::::::::::::::::::::
|
||||
Writing to files
|
||||
::::::::::::::::
|
||||
|
||||
Although |bonobo| contains helpers to write to common file formats, let's start by writing it manually.
|
||||
To write in a file, we'll need to have an open file handle available during the whole transformation life.
|
||||
|
||||
We'll use a context processor to do so. A context processor is something very much like a
|
||||
:obj:`contextlib.contextmanager`, that |bonobo| will use to run a setup/teardown logic on objects that need to have
|
||||
the same lifecycle as a job execution.
|
||||
|
||||
Let's write one that just handle opening and closing the file:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from bonobo.config import use
|
||||
from bonobo.constants import NOT_MODIFIED
|
||||
def with_opened_file(self, context):
|
||||
with open('output.txt', 'w+') as f:
|
||||
yield f
|
||||
|
||||
@use('fs')
|
||||
def write_repr_to_file(*row, fs):
|
||||
with fs.open('output.txt', 'a+') as f:
|
||||
print(row, file=f)
|
||||
return NOT_MODIFIED
|
||||
Now, we need to write a `writer` transformation, and apply this context processor on it:
|
||||
|
||||
Then, update the `get_graph(...)` function, by adding `write_repr_to_file` just before your `PrettyPrinter()` node.
|
||||
.. code-block:: python
|
||||
|
||||
Let's try to run that and think about what happens.
|
||||
from bonobo.config import use_context_processor
|
||||
|
||||
Each time a row comes to this node, the output file is open in "append or create" mode, a line is written, and the file
|
||||
is closed.
|
||||
@use_context_processor(with_opened_file)
|
||||
def write_repr_to_file(f, *row):
|
||||
f.write(repr(row))
|
||||
|
||||
This is **NOT** how you want to do things. Let's rewrite it so our `open(...)` call becomes execution-wide.
|
||||
The `f` parameter will contain the value yielded by the context processors, in order of appearance (you can chain
|
||||
multiple context processors).
|
||||
|
||||
Please note that the :func:`bonobo.config.use_context_processor` decorator will modify the function in place, but won't
|
||||
modify its behaviour. If you want to call it out of the |bonobo| job context, it's your responsibility to provide
|
||||
the right parameters (and here, the opened file).
|
||||
|
||||
|
||||
Using the filesystem
|
||||
::::::::::::::::::::
|
||||
|
||||
We opened the output file using a hardcoded filename and filesystem implementation. Writing flexible jobs include the
|
||||
ability to change the load targets at runtime, and |bonobo| suggest to use the `fs` service to achieve this with files.
|
||||
|
||||
Let's rewrite our context processor to use it.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def with_opened_file(self, context):
|
||||
with context.get_service('fs').open('output.txt', 'w+') as f:
|
||||
yield f
|
||||
|
||||
Interface does not change much, but this small change allows the end-user to change the filesystem implementation at
|
||||
runtime, which is great to handle different environments (local development, staging servers, production, ...).
|
||||
|
||||
Note that |bonobo| only provide very few services with default implementation (actually, only `fs` and `http`), but
|
||||
you can define all the services you want, depending on your system. You'll learn more about this in the next tutorial
|
||||
chapter.
|
||||
|
||||
|
||||
Using a different filesystem
|
||||
::::::::::::::::::::::::::::
|
||||
|
||||
* Filesystems
|
||||
To change the `fs` implementation, you need to provide your implementation in the dict returned by `get_services()`.
|
||||
|
||||
* Reading files
|
||||
Let's write to a remote location, which will be an Amazon S3 bucket. First, we need to install the driver:
|
||||
|
||||
* Writing files
|
||||
.. code-block:: shell-session
|
||||
|
||||
* Writing files to S3
|
||||
pip install fs-s3fs
|
||||
|
||||
* Atomic writes ???
|
||||
Then, just provide the correct bucket to :func:`bonobo.open_fs`:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def get_services(**options):
|
||||
return {
|
||||
'fs': bonobo.open_fs('s3://bonobo-examples')
|
||||
}
|
||||
|
||||
.. note::
|
||||
|
||||
You must provide a bucket for which you have the write permission, and it's up to you to setup your amazon
|
||||
credentials in such a way that `boto` can access your AWS account.
|
||||
|
||||
|
||||
Using builtin writers
|
||||
:::::::::::::::::::::
|
||||
|
||||
Until then, and to have a better understanding of what happens, we implemented our writers ourselves.
|
||||
|
||||
|bonobo| contains writers for a variety of standard file formats, and you're probably better off using builtin writers.
|
||||
|
||||
Let's use a :obj:`bonobo.CsvWriter` instance instead, by replacing our custom transformation in the graph factory
|
||||
function:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def get_graph(**options):
|
||||
graph = bonobo.Graph()
|
||||
graph.add_chain(
|
||||
...
|
||||
bonobo.CsvWriter('output.csv'),
|
||||
)
|
||||
return graph
|
||||
|
||||
Reading from files
|
||||
::::::::::::::::::
|
||||
|
||||
Reading from files is done using the same logic as writing, except that you'll probably have only one call to a reader.
|
||||
|
||||
Our example application does not include reading from files, but you can read the file we just wrote by using a
|
||||
:obj:`bonobo.CsvReader` instance.
|
||||
|
||||
|
||||
Atomic writes
|
||||
:::::::::::::
|
||||
|
||||
.. include:: _todo.rst
|
||||
|
||||
|
||||
Moving forward
|
||||
@ -62,6 +141,10 @@ Moving forward
|
||||
|
||||
You now know:
|
||||
|
||||
* How to ...
|
||||
* How to use the filesystem (`fs`) service.
|
||||
* How to read from files.
|
||||
* How to write to files.
|
||||
* How to substitute a service at runtime.
|
||||
|
||||
It's now time to jump to :doc:`4-services`.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user