Doc update

2018-01-01 22:18:21 +01:00
parent 7d4fb1dff0
commit f640e358b4
3 changed files with 49 additions and 32 deletions
--- a/docs/tutorial/1-init.rst
+++ b/docs/tutorial/1-init.rst
@ -249,10 +249,10 @@ That's all for this first step.

 You now know:

-* How to create a new job file.
-* How to inspect the content of a job file.
+* How to create a new job (using a single file).
+* How to inspect the content of a job.
 * What should go in a job file.
 * How to execute a job file.
 * How to read the console output.

-**Jump to** :doc:`2-jobs`
+It's now time to jump to :doc:`2-jobs`.
--- a/docs/tutorial/2-jobs.rst
+++ b/docs/tutorial/2-jobs.rst
@ -4,31 +4,56 @@ Part 2: Writing ETL Jobs
 What's an ETL job ?
 :::::::::::::::::::

- data flow, stream processing
- each node, first in first out
- parallelism
+In |bonobo|, an ETL job is a formal definition of an executable graph.

-Each node has input rows, each row is one call, and each call has the input row passed as *args.
+Each node of a graph will be executed in isolation from the other nodes, and the data is passed from one node to the
+next using FIFO queues, managed by the framework. It's transparent to the end-user, though, and you'll only use
+function arguments (for inputs) and return/yield values (for outputs).

-Each call can have outputs, sent either using return, or yield.
+Each input row of a node will cause one call to this node's callable. Each output is cast internally as a tuple-like
+data structure (or more precisely, a namedtuple-like data structure), and for one given node, each output row must
+have the same structure.

-Each output row is stored internally as a tuple (or a namedtuple-like structure), and each output row must have the same structure (same number of fields, same len for tuple).
+If you return/yield something which is not a tuple, bonobo will create a tuple of one element.

-If you yield something which is not a tuple, bonobo will create a tuple of one element.
+Properties
+----------

-By default, exceptions are not fatal in bonobo. If a call raise an error, then bonobo will display the stack trace, increment the "err" counter for this node and move to the next input row.
+|bonobo| assists you with defining the data-flow of your data engineering process, and then streams data through your
+callable graphs.

-Some errors are fatal, though. For example, if you pass a 2 elements tuple to a node that takes 3 args, bonobo will raise an UnrecoverableTypeError, and exit the current execution.
+* Each node call will process one row of data.
+* Queues that flows the data between node are first-in, first-out (FIFO) standard python :class:`queue.Queue`.
+* Each node will run in parallel
+* Default execution strategy use threading, and each node will run in a separate thread.
+
+Fault tolerance
+---------------
+
+Node execution is fault tolerant.
+
+If an exception is raised from a node call, then this node call will be aborted but bonobo will continue the execution
+with the next row (after outputing the stack trace and incrementing the "err" counter for the node context).
+
+It allows to have ETL jobs that ignore faulty data and try their best to process the valid rows of a dataset.
+
+Some errors are fatal, though.
+
+If you pass a 2 elements tuple to a node that takes 3 args, |bonobo| will raise an :class:`bonobo.errors.UnrecoverableTypeError`, and exit the
+current graph execution as fast as it can (finishing the other node executions that are in progress first, but not
+starting new ones if there are remaining input rows).
+
+
+Let's write a sample data integration job
+:::::::::::::::::::::::::::::::::::::::::
+
+Let's create a sample application.
+
+The goal of this application will be to extract all the fablabs in the world using an open-data API, normalize this
+data and, for now, display it. We'll then build on this foundation in the next steps to write to files, databases, etc.

-Let's write one
-:::::::::::::::

-We'll create a job to do the following

-* Extract all the FabLabs from an open data API
-* Apply a bit of formating
-* Geocode the address and normalize it, if we can
-* Display it (in the next step, we'll learn about writing the result to a file.


 Moving forward
--- a/docs/tutorial/index.rst
+++ b/docs/tutorial/index.rst
@ -1,9 +1,6 @@
 First steps
 ===========

-What is Bonobo?
-:::::::::::::::
-
 Bonobo is an ETL (Extract-Transform-Load) framework for python 3.5. The goal is to define data-transformations, with
 python code in charge of handling similar shaped independent lines of data.

@ -14,8 +11,7 @@ Bonobo is a lean manufacturing assembly line for data that let you focus on the

 Bonobo uses simple python and should be quick and easy to learn.

-Tutorial
-::::::::
+**Tutorials**

 .. toctree::
    :maxdepth: 1
@ -26,8 +22,8 @@ Tutorial
    4-services
    5-packaging

-More
-::::
+
+**Integrations**

 .. toctree::
    :maxdepth: 1
@ -36,9 +32,7 @@ More
    notebooks
    sqlalchemy

-What's next?
-::::::::::::
-
+**What's next?**

 Once you're familiar with all the base concepts, you can...

@ -46,9 +40,7 @@ Once you're familiar with all the base concepts, you can...
 * Explore the :doc:`Extensions </extension/index>` to widen the possibilities.
 * Open the :doc:`References </reference/index>` and start hacking like crazy.

-
-You're not alone!
-:::::::::::::::::
+**You're not alone!**

 Good documentation is not easy to write.