Files
bonobo/docs/tutorial/3-files.rst
2018-01-14 15:26:04 +01:00

149 lines
4.7 KiB
ReStructuredText

Part 3: Working with Files
==========================
Writing to the console is nice, but let's be serious, real world will require us to use files or external services.
Let's see how to use a few builtin writers and both local and remote filesystems.
Filesystems
:::::::::::
In |bonobo|, files are accessed within a **filesystem** service (a `fs' FileSystem object
<https://docs.pyfilesystem.org/en/latest/builtin.html>`_).
As a default, you'll get an instance of a local filesystem mapped to the current working directory as the `fs` service.
You'll learn more about services in the next step, but for now, let's just use it.
Writing to files
::::::::::::::::
To write in a file, we'll need to have an open file handle available during the whole transformation life.
We'll use a context processor to do so. A context processor is something very much like a
:obj:`contextlib.contextmanager`, that |bonobo| will use to run a setup/teardown logic on objects that need to have
the same lifecycle as a job execution.
Let's write one that just handle opening and closing the file:
.. code-block:: python
def with_opened_file(self, context):
with open('output.txt', 'w+') as f:
yield f
Now, we need to write a `writer` transformation, and apply this context processor on it:
.. code-block:: python
from bonobo.config import use_context_processor
@use_context_processor(with_opened_file)
def write_repr_to_file(f, *row):
f.write(repr(row))
The `f` parameter will contain the value yielded by the context processors, in order of appearance (you can chain
multiple context processors).
Please note that the :func:`bonobo.config.use_context_processor` decorator will modify the function in place, but won't
modify its behaviour. If you want to call it out of the |bonobo| job context, it's your responsibility to provide
the right parameters (and here, the opened file).
Using the filesystem
::::::::::::::::::::
We opened the output file using a hardcoded filename and filesystem implementation. Writing flexible jobs include the
ability to change the load targets at runtime, and |bonobo| suggest to use the `fs` service to achieve this with files.
Let's rewrite our context processor to use it.
.. code-block:: python
def with_opened_file(self, context):
with context.get_service('fs').open('output.txt', 'w+') as f:
yield f
Interface does not change much, but this small change allows the end-user to change the filesystem implementation at
runtime, which is great to handle different environments (local development, staging servers, production, ...).
Note that |bonobo| only provide very few services with default implementation (actually, only `fs` and `http`), but
you can define all the services you want, depending on your system. You'll learn more about this in the next tutorial
chapter.
Using a different filesystem
::::::::::::::::::::::::::::
To change the `fs` implementation, you need to provide your implementation in the dict returned by `get_services()`.
Let's write to a remote location, which will be an Amazon S3 bucket. First, we need to install the driver:
.. code-block:: shell-session
pip install fs-s3fs
Then, just provide the correct bucket to :func:`bonobo.open_fs`:
.. code-block:: python
def get_services(**options):
return {
'fs': bonobo.open_fs('s3://bonobo-examples')
}
.. note::
You must provide a bucket for which you have the write permission, and it's up to you to setup your amazon
credentials in such a way that `boto` can access your AWS account.
Using builtin writers
:::::::::::::::::::::
Until then, and to have a better understanding of what happens, we implemented our writers ourselves.
|bonobo| contains writers for a variety of standard file formats, and you're probably better off using builtin writers.
Let's use a :obj:`bonobo.CsvWriter` instance instead, by replacing our custom transformation in the graph factory
function:
.. code-block:: python
def get_graph(**options):
graph = bonobo.Graph()
graph.add_chain(
...
bonobo.CsvWriter('output.csv'),
)
return graph
Reading from files
::::::::::::::::::
Reading from files is done using the same logic as writing, except that you'll probably have only one call to a reader.
Our example application does not include reading from files, but you can read the file we just wrote by using a
:obj:`bonobo.CsvReader` instance.
Atomic writes
:::::::::::::
.. include:: _todo.rst
Moving forward
::::::::::::::
You now know:
* How to use the filesystem (`fs`) service.
* How to read from files.
* How to write to files.
* How to substitute a service at runtime.
It's now time to jump to :doc:`4-services`.