Tighter dependencies, and rewriting a bit of the documentation.
This commit is contained in:
@ -1,23 +1,57 @@
|
||||
Contributing
|
||||
============
|
||||
|
||||
Contributing to bonobo is simple. Although we don't have a complete guide on this topic for now, the best way is to fork
|
||||
Contributing to bonobo is usually done this way:
|
||||
|
||||
* Discuss ideas in the `issue tracker <https://github.com/python-bonobo/bonobo>`_ or on `Slack <https://bonobo-slack.herokuapp.com/>`_.
|
||||
* Fork the `repository <https://github.com/python-bonobo>`_.
|
||||
* Think about what happens for existing userland code if your patch is applied.
|
||||
* Open pull request early with your code to continue the discussion as you're writing code.
|
||||
* Try to write simple tests, and a few lines of documentation.
|
||||
|
||||
Although we don't have a complete guide on this topic for now, the best way is to fork
|
||||
the github repository and send pull requests.
|
||||
|
||||
A few guidelines...
|
||||
|
||||
* Starting at 1.0, the system needs to be 100% backward compatible. Best way to do so is to ensure the actual expected
|
||||
behavior is unit tested before making any change. See http://semver.org/.
|
||||
* There can be changes before 1.0, even backward incompatible changes. There should be a reason for a BC break, but
|
||||
I think it's best for the speed of development right now.
|
||||
* The core should stay as light as possible.
|
||||
* Coding standards are enforced using yapf. That means that you can code the way you want, we just ask you to run
|
||||
`make format` before committing your changes so everybody follows the same conventions.
|
||||
* General rule for anything you're not sure about is "open a github issue to discuss the point".
|
||||
* More formal proposal process will come the day we feel the need for it.
|
||||
Tools
|
||||
:::::
|
||||
|
||||
Issues: https://github.com/python-bonobo/bonobo/issues
|
||||
|
||||
Roadmap: https://www.bonobo-project.org/roadmap
|
||||
|
||||
Slack: https://bonobo-slack.herokuapp.com/
|
||||
|
||||
Guidelines
|
||||
::::::::::
|
||||
|
||||
* We tend to use `semantic versioning <http://semver.org/>`_. This should be 100% true once we reach 1.0, but until then we will fail
|
||||
and learn. Anyway, the user effort for each BC-break is a real pain, and we want to keep that in mind.
|
||||
* The 1.0 milestone has one goal: create a solid foundation we can rely on, in term of API. To reach that, we want to keep it as
|
||||
minimalist as possible, considering only a few userland tools as the public API.
|
||||
* Said simplier, the core should stay as light as possible.
|
||||
* Let's not fight over coding standards. We enforce it using `yapf <https://github.com/google/yapf#yapf>`_, and a `make format` call
|
||||
should reformat the whole codebase for you. We encourage you to run it before making a pull request, and it will be run before each
|
||||
release anyway, so we can focus on things that have value instead of details.
|
||||
* Tests are important. One obvious reason is that we want to have a stable and working system, but one less obvious reason is that
|
||||
it forces better design, making sure responsibilities are well separated and scope of each function is clear. More often than not,
|
||||
the "one and only obvious way to do it" will be obvious once you write the tests.
|
||||
* Documentation is important. It's the only way people can actually understand what the system do, and userless software is pointless.
|
||||
One book I read a long time ago said that half the energy spent building something should be devoted to explaining what and why you're
|
||||
doing something, and that's probably one of the best advice I read about (although, as every good piece of advice, it's more easy to
|
||||
repeat than to apply).
|
||||
|
||||
License
|
||||
:::::::
|
||||
|
||||
`Bonobo is released under the apache license <https://github.com/python-bonobo/bonobo/blob/0.2/LICENSE>`_.
|
||||
|
||||
License for non lawyers
|
||||
:::::::::::::::::::::::
|
||||
|
||||
Use it, change it, hack it, brew it, eat it.
|
||||
|
||||
For pleasure, non-profit, profit or basically anything else, except stealing credit.
|
||||
|
||||
Provided without warranty.
|
||||
|
||||
|
||||
|
||||
@ -1,48 +1,40 @@
|
||||
Pure transformations
|
||||
====================
|
||||
|
||||
The nature of components, and how the data flow from one to another, make them not so easy to write correctly.
|
||||
Hopefully, with a few hints, you will be able to understand why and how they should be written.
|
||||
The nature of components, and how the data flow from one to another, can be a bit tricky.
|
||||
Hopefully, they should be very easy to write with a few hints.
|
||||
|
||||
The major problem we have is that one message can go through more than one component, and at the same time. If you
|
||||
wanna be safe, you tend to :func:`copy.copy()` everything between two calls to two different components, but that
|
||||
will mean that a lot of useless memory space would be taken for copies that are never modified.
|
||||
The major problem we have is that one message (underlying implementation: :class:`bonobo.structs.bags.Bag`) can go
|
||||
through more than one component, and at the same time. If you wanna be safe, you tend to :func:`copy.copy()` everything
|
||||
between two calls to two different components, but that's very expensive.
|
||||
|
||||
Instead of that, we chosed the oposite: copies are never made, and you should not modify in place the inputs of your
|
||||
component before yielding them, and that mostly means that you want to recreate dicts and lists before yielding (or
|
||||
returning) them. Numeric values, strings and tuples being immutable in python, modifying a variable of one of those
|
||||
type will already return a different instance.
|
||||
|
||||
Examples will be shown with `return` statements, of course you can do the same with `yield` statements in generators.
|
||||
|
||||
Numbers
|
||||
:::::::
|
||||
|
||||
You can't be wrong with numbers. All of the following are correct.
|
||||
In python, numbers are immutable. So you can't be wrong with numbers. All of the following are correct.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def do_your_number_thing(n: int) -> int:
|
||||
return n
|
||||
|
||||
def do_your_number_thing(n: int) -> int:
|
||||
yield n
|
||||
|
||||
def do_your_number_thing(n: int) -> int:
|
||||
return n + 1
|
||||
|
||||
def do_your_number_thing(n: int) -> int:
|
||||
yield n + 1
|
||||
|
||||
def do_your_number_thing(n: int) -> int:
|
||||
# correct, but bad style
|
||||
n += 1
|
||||
return n
|
||||
|
||||
def do_your_number_thing(n: int) -> int:
|
||||
# correct, but bad style
|
||||
n += 1
|
||||
yield n
|
||||
The same is true with other numeric types, so don't be shy.
|
||||
|
||||
The same is true with other numeric types, so don't be shy. Operate like crazy, my friend.
|
||||
|
||||
Tuples
|
||||
::::::
|
||||
@ -65,12 +57,27 @@ Tuples are immutable, so you risk nothing.
|
||||
Strings
|
||||
:::::::
|
||||
|
||||
You know the drill, strings are immutable, blablabla ... Examples left as an exercise for the reader.
|
||||
You know the drill, strings are immutable.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def do_your_str_thing(t: str) -> str:
|
||||
return 'foo ' + t + ' bar'
|
||||
|
||||
def do_your_str_thing(t: str) -> str:
|
||||
return ' '.join(('foo', t, 'bar', ))
|
||||
|
||||
def do_your_str_thing(t: str) -> str:
|
||||
return 'foo {} bar'.format(t)
|
||||
|
||||
You can, if you're using python 3.6+, use `f-strings <https://docs.python.org/3/reference/lexical_analysis.html#f-strings>`_,
|
||||
but the core bonobo libraries won't use it to stay 3.5 compatible.
|
||||
|
||||
|
||||
Dicts
|
||||
:::::
|
||||
|
||||
So, now it gets interesting. Dicts are mutable. It means that you can mess things up badly here if you're not cautious.
|
||||
So, now it gets interesting. Dicts are mutable. It means that you can mess things up if you're not cautious.
|
||||
|
||||
For example, doing the following may cause unexpected problems:
|
||||
|
||||
@ -86,8 +93,8 @@ For example, doing the following may cause unexpected problems:
|
||||
return d
|
||||
|
||||
The problem is easy to understand: as **Bonobo** won't make copies of your dict, the same dict will be passed along the
|
||||
transformation graph, and mutations will be seen in components downwards the output, but also upward. Let's see
|
||||
a more obvious example of something you should not do:
|
||||
transformation graph, and mutations will be seen in components downwards the output (and also upward). Let's see
|
||||
a more obvious example of something you should *not* do:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@ -98,7 +105,8 @@ a more obvious example of something you should not do:
|
||||
d['index'] = i
|
||||
yield d
|
||||
|
||||
Here, the same dict is yielded in each iteration, and its state when the next component in chain is called is undetermined.
|
||||
Here, the same dict is yielded in each iteration, and its state when the next component in chain is called is undetermined
|
||||
(how many mutations happened since the `yield`? Hard to tell...).
|
||||
|
||||
Now let's see how to do it correctly:
|
||||
|
||||
@ -120,9 +128,17 @@ Now let's see how to do it correctly:
|
||||
'index': i
|
||||
}
|
||||
|
||||
I hear you think «Yeah, but if I create like millions of dicts ...». The answer is simple. Using dicts like this will
|
||||
create a lot, but also free a lot because as soon as all the future components that take this dict as input are done,
|
||||
the dict will be garbage collected. Youplaboum!
|
||||
I hear you think «Yeah, but if I create like millions of dicts ...».
|
||||
|
||||
Let's say we chosed the oposite way and copy the dict outside the transformation (in fact, `it's what we did in bonobo's
|
||||
ancestor <https://github.com/rdcli/rdc.etl/blob/dev/rdc/etl/io/__init__.py#L187>`_). This means you will also create the
|
||||
same number of dicts, the difference is that you won't even notice it. Also, it means that if you want to yield 1 million
|
||||
times the same dict, going "pure" makes it efficient (you'll just yield the same object 1 million times) while going "copy
|
||||
crazy" will create 1 million objects.
|
||||
|
||||
Using dicts like this will create a lot of dicts, but also free them as soon as all the future components that take this dict
|
||||
as input are done. Also, one important thing to note is that most primitive data structures in python are immutable, so creating
|
||||
a new dict will of course create a new envelope, but the unchanged objects inside won't be duplicated.
|
||||
|
||||
Last thing, copies made in the "pure" approach are explicit, and usually, explicit is better than implicit.
|
||||
|
||||
|
||||
@ -8,3 +8,4 @@ References
|
||||
|
||||
commands
|
||||
api
|
||||
examples
|
||||
|
||||
Reference in New Issue
Block a user