Merge pull request #191 from hartym/develop

Develop
This commit is contained in:
Romain Dorgueil
2017-10-13 17:42:43 +02:00
committed by GitHub
14 changed files with 242 additions and 112 deletions

10
CREDITS.rst Normal file
View File

@ -0,0 +1,10 @@
Credits
=======
Logo
::::
Created by Sarah GHIGLIANO and available on The Noun Project.
License: https://creativecommons.org/licenses/by/3.0/us/
Source: https://thenounproject.com/Ghigliano/collection/animals/?i=320941

View File

@ -34,20 +34,19 @@ Data-processing for humans.
Bonobo is an extract-transform-load framework for python 3.5+ (see comparisons with other data tools). Bonobo is an extract-transform-load framework for python 3.5+ (see comparisons with other data tools).
Bonobo uses plain old python objects (functions, generators and iterators), allows to link them in a directed graph and Bonobo uses plain old python objects (functions, generators and iterators), allows them to be linked together in a directed graph, and then executed using a parallelized strategy, without having to worry about the underlying complexity.
execute them using a parallelized strategy, without having to worry about the underlying complexity.
Developpers can focus on writing simple and atomic operations, that are by-design easy to unit-test, while the Developers can focus on writing simple and atomic operations, that are easy to unit-test by-design, while the focus of the
framework focus on applying them concurrently to rows of data. framework is to apply them concurrently to rows of data.
One thing to note: write pure transformations and you'll be safe. One thing to note: write pure transformations and you'll be safe.
Bonobo is a young rewrite of an old python2.7 tool that ran millions of transformations per day for years on production, Bonobo is a young rewrite of an old python2.7 tool that ran millions of transformations per day for years on production.
so as though it may not yet be complete or fully stable (please, allow us to reach 1.0), the basics are there. Although it may not yet be complete or fully stable (please, allow us to reach 1.0), the basics are there.
---- ----
*Bonobo is under heavy development, we're making the best efforts to keep the core as stable as possible but we also need to move forward. Please allow us to reach 1.0 stability and our sincere apologies for anything we'd break in the process (feel free to complain on issues, so we notice breakages we did not expect)* *Bonobo is under heavy development, we're doing our best to keep the core as stable as possible while still moving forward. Please allow us to reach 1.0 stability and our sincere apologies for anything we break in the process (feel free to complain on issues, allowing us to correct breakages we did not expect)*
---- ----

View File

@ -2,6 +2,10 @@ import mimetypes
import os import os
import bonobo import bonobo
from bonobo.commands.util.arguments import parse_variable_argument
from bonobo.util import require
from bonobo.util.iterators import tuplize
from bonobo.util.python import WorkingDirectoryModulesRegistry
SHORTCUTS = { SHORTCUTS = {
'csv': 'text/csv', 'csv': 'text/csv',
@ -23,7 +27,7 @@ READER = 'reader'
WRITER = 'writer' WRITER = 'writer'
def resolve_factory(name, filename, factory_type): def resolve_factory(name, filename, factory_type, options=None):
""" """
Try to resolve which transformation factory to use for this filename. User eventually provided a name, which has Try to resolve which transformation factory to use for this filename. User eventually provided a name, which has
priority, otherwise we try to detect it using the mimetype detection on filename. priority, otherwise we try to detect it using the mimetype detection on filename.
@ -42,6 +46,11 @@ def resolve_factory(name, filename, factory_type):
if _ext in SHORTCUTS: if _ext in SHORTCUTS:
name = SHORTCUTS[_ext] name = SHORTCUTS[_ext]
if options:
options = dict(map(parse_variable_argument, options))
else:
options = dict()
if not name in REGISTRY: if not name in REGISTRY:
raise RuntimeError( raise RuntimeError(
'Could not resolve {factory_type} factory for {filename} ({name}). Try providing it explicitely using -{opt} <format>.'. 'Could not resolve {factory_type} factory for {filename} ({name}). Try providing it explicitely using -{opt} <format>.'.
@ -49,19 +58,49 @@ def resolve_factory(name, filename, factory_type):
) )
if factory_type == READER: if factory_type == READER:
return REGISTRY[name][0] return REGISTRY[name][0], options
elif factory_type == WRITER: elif factory_type == WRITER:
return REGISTRY[name][1] return REGISTRY[name][1], options
else: else:
raise ValueError('Invalid factory type.') raise ValueError('Invalid factory type.')
def execute(input, output, reader=None, reader_options=None, writer=None, writer_options=None, options=None): @tuplize
reader = resolve_factory(reader, input, READER)(input) def resolve_filters(filters):
writer = resolve_factory(writer, output, WRITER)(output) registry = WorkingDirectoryModulesRegistry()
for f in filters:
try:
mod, attr = f.split(':', 1)
yield getattr(registry.require(mod), attr)
except ValueError:
yield getattr(bonobo, f)
def execute(
input,
output,
reader=None,
reader_option=None,
writer=None,
writer_option=None,
option=None,
filter=None,
):
reader_factory, reader_option = resolve_factory(reader, input, READER, (option or []) + (reader_option or []))
if output == '-':
writer_factory, writer_option = bonobo.PrettyPrinter, {}
else:
writer_factory, writer_option = resolve_factory(writer, output, WRITER, (option or []) + (writer_option or []))
filters = resolve_filters(filter)
graph = bonobo.Graph() graph = bonobo.Graph()
graph.add_chain(reader, writer) graph.add_chain(
reader_factory(input, **reader_option),
*filters,
writer_factory(output, **writer_option),
)
return bonobo.run( return bonobo.run(
graph, services={ graph, services={
@ -71,11 +110,44 @@ def execute(input, output, reader=None, reader_options=None, writer=None, writer
def register(parser): def register(parser):
parser.add_argument('input') parser.add_argument('input', help='Input filename.')
parser.add_argument('output') parser.add_argument('output', help='Output filename.')
parser.add_argument('--' + READER, '-r') parser.add_argument(
parser.add_argument('--' + WRITER, '-w') '--' + READER,
# parser.add_argument('--reader-option', '-ro', dest='reader_options') '-r',
# parser.add_argument('--writer-option', '-wo', dest='writer_options') help='Choose the reader factory if it cannot be detected from extension, or if detection is wrong.'
# parser.add_argument('--option', '-o', dest='options') )
parser.add_argument(
'--' + WRITER,
'-w',
help='Choose the writer factory if it cannot be detected from extension, or if detection is wrong (use - for console pretty print).'
)
parser.add_argument(
'--filter',
'-f',
dest='filter',
action='append',
help='Add a filter between input and output',
)
parser.add_argument(
'--option',
'-O',
dest='option',
action='append',
help='Add a named option to both reader and writer factories (i.e. foo="bar").',
)
parser.add_argument(
'--' + READER + '-option',
'-' + READER[0].upper(),
dest=READER + '_option',
action='append',
help='Add a named option to the reader factory.',
)
parser.add_argument(
'--' + WRITER + '-option',
'-' + WRITER[0].upper(),
dest=WRITER + '_option',
action='append',
help='Add a named option to the writer factory.',
)
return execute return execute

View File

View File

@ -0,0 +1,26 @@
import json
def parse_variable_argument(arg):
try:
key, val = arg.split('=', 1)
except ValueError:
return arg, True
try:
val = json.loads(val)
except json.JSONDecodeError:
pass
return key, val
def test_parse_variable_argument():
assert parse_variable_argument('foo=bar') == ('foo', 'bar')
assert parse_variable_argument('foo="bar"') == ('foo', 'bar')
assert parse_variable_argument('sep=";"') == ('sep', ';')
assert parse_variable_argument('foo') == ('foo', True)
if __name__ == '__main__':
test_parse_var()

View File

@ -53,13 +53,15 @@ class Option:
_creation_counter = 0 _creation_counter = 0
def __init__(self, type=None, *, required=True, positional=False, default=None): def __init__(self, type=None, *, required=True, positional=False, default=None, __doc__=None):
self.name = None self.name = None
self.type = type self.type = type
self.required = required if default is None else False self.required = required if default is None else False
self.positional = positional self.positional = positional
self.default = default self.default = default
self.__doc__ = __doc__ or self.__doc__
# This hack is necessary for python3.5 # This hack is necessary for python3.5
self._creation_counter = Option._creation_counter self._creation_counter = Option._creation_counter
Option._creation_counter += 1 Option._creation_counter += 1

View File

@ -70,7 +70,21 @@ def _count_counter(self, context):
context.send(Bag(counter._value)) context.send(Bag(counter._value))
def _shorten(s, w):
if w and len(s) > w:
s = s[0:w - 3] + '...'
return s
class PrettyPrinter(Configurable): class PrettyPrinter(Configurable):
max_width = Option(
int,
required=False,
__doc__='''
If set, truncates the output values longer than this to this width.
'''
)
def call(self, *args, **kwargs): def call(self, *args, **kwargs):
formater = self._format_quiet if settings.QUIET.get() else self._format_console formater = self._format_quiet if settings.QUIET.get() else self._format_console
@ -82,7 +96,10 @@ class PrettyPrinter(Configurable):
def _format_console(self, i, item, value): def _format_console(self, i, item, value):
return ' '.join( return ' '.join(
((' ' if i else ''), str(item), '=', str(value).strip().replace('\n', '\n' + CLEAR_EOL), CLEAR_EOL) (
(' ' if i else ''), str(item), '=', _shorten(str(value).strip(),
self.max_width).replace('\n', '\n' + CLEAR_EOL), CLEAR_EOL
)
) )

View File

@ -50,6 +50,7 @@ class FileHandler(Configurable):
eol = Option(str, default='\n') # type: str eol = Option(str, default='\n') # type: str
mode = Option(str) # type: str mode = Option(str) # type: str
encoding = Option(str, default='utf-8') # type: str encoding = Option(str, default='utf-8') # type: str
fs = Service('fs') # type: str fs = Service('fs') # type: str
@ContextProcessor @ContextProcessor

View File

@ -9,14 +9,23 @@ class _RequiredModule:
class _RequiredModulesRegistry(dict): class _RequiredModulesRegistry(dict):
@property
def pathname(self):
return os.path.join(os.getcwd(), os.path.dirname(inspect.getfile(inspect.stack()[2][0])))
def require(self, name): def require(self, name):
if name not in self: if name not in self:
bits = name.split('.') bits = name.split('.')
pathname = os.path.join(os.getcwd(), os.path.dirname(inspect.getfile(inspect.stack()[1][0]))) filename = os.path.join(self.pathname, *bits[:-1], bits[-1] + '.py')
filename = os.path.join(pathname, *bits[:-1], bits[-1] + '.py')
self[name] = _RequiredModule(runpy.run_path(filename, run_name=name)) self[name] = _RequiredModule(runpy.run_path(filename, run_name=name))
return self[name] return self[name]
class WorkingDirectoryModulesRegistry(_RequiredModulesRegistry):
@property
def pathname(self):
return os.getcwd()
registry = _RequiredModulesRegistry() registry = _RequiredModulesRegistry()
require = registry.require require = registry.require

View File

@ -14,11 +14,11 @@ def get_path():
def update_context(app, pagename, templatename, context, doctree): def update_context(app, pagename, templatename, context, doctree):
context['alabaster_version'] = version.__version__ context['alabaster_version'] = version.__version__
def setup(app): def setup(app):
# add_html_theme is new in Sphinx 1.6+ # add_html_theme is new in Sphinx 1.6+
if hasattr(app, 'add_html_theme'): if hasattr(app, 'add_html_theme'):
theme_path = os.path.abspath(os.path.dirname(__file__)) theme_path = os.path.abspath(os.path.dirname(__file__))
app.add_html_theme('alabaster', theme_path) app.add_html_theme('alabaster', theme_path)
app.connect('html-page-context', update_context) app.connect('html-page-context', update_context)
return {'version': version.__version__, return {'version': version.__version__, 'parallel_read_safe': True}
'parallel_read_safe': True}

View File

@ -7,82 +7,74 @@ from pygments.token import Keyword, Name, Comment, String, Error, \
# Originally based on FlaskyStyle which was based on 'tango'. # Originally based on FlaskyStyle which was based on 'tango'.
class Alabaster(Style): class Alabaster(Style):
background_color = "#f8f8f8" # doesn't seem to override CSS 'pre' styling? background_color = "#f8f8f8" # doesn't seem to override CSS 'pre' styling?
default_style = "" default_style = ""
styles = { styles = {
# No corresponding class for the following: # No corresponding class for the following:
#Text: "", # class: '' #Text: "", # class: ''
Whitespace: "underline #f8f8f8", # class: 'w' Whitespace: "underline #f8f8f8", # class: 'w'
Error: "#a40000 border:#ef2929", # class: 'err' Error: "#a40000 border:#ef2929", # class: 'err'
Other: "#000000", # class 'x' Other: "#000000", # class 'x'
Comment: "italic #8f5902", # class: 'c'
Comment: "italic #8f5902", # class: 'c' Comment.Preproc: "noitalic", # class: 'cp'
Comment.Preproc: "noitalic", # class: 'cp' Keyword: "bold #004461", # class: 'k'
Keyword.Constant: "bold #004461", # class: 'kc'
Keyword: "bold #004461", # class: 'k' Keyword.Declaration: "bold #004461", # class: 'kd'
Keyword.Constant: "bold #004461", # class: 'kc' Keyword.Namespace: "bold #004461", # class: 'kn'
Keyword.Declaration: "bold #004461", # class: 'kd' Keyword.Pseudo: "bold #004461", # class: 'kp'
Keyword.Namespace: "bold #004461", # class: 'kn' Keyword.Reserved: "bold #004461", # class: 'kr'
Keyword.Pseudo: "bold #004461", # class: 'kp' Keyword.Type: "bold #004461", # class: 'kt'
Keyword.Reserved: "bold #004461", # class: 'kr' Operator: "#582800", # class: 'o'
Keyword.Type: "bold #004461", # class: 'kt' Operator.Word: "bold #004461", # class: 'ow' - like keywords
Punctuation: "bold #000000", # class: 'p'
Operator: "#582800", # class: 'o'
Operator.Word: "bold #004461", # class: 'ow' - like keywords
Punctuation: "bold #000000", # class: 'p'
# because special names such as Name.Class, Name.Function, etc. # because special names such as Name.Class, Name.Function, etc.
# are not recognized as such later in the parsing, we choose them # are not recognized as such later in the parsing, we choose them
# to look the same as ordinary variables. # to look the same as ordinary variables.
Name: "#000000", # class: 'n' Name: "#000000", # class: 'n'
Name.Attribute: "#c4a000", # class: 'na' - to be revised Name.Attribute: "#c4a000", # class: 'na' - to be revised
Name.Builtin: "#004461", # class: 'nb' Name.Builtin: "#004461", # class: 'nb'
Name.Builtin.Pseudo: "#3465a4", # class: 'bp' Name.Builtin.Pseudo: "#3465a4", # class: 'bp'
Name.Class: "#000000", # class: 'nc' - to be revised Name.Class: "#000000", # class: 'nc' - to be revised
Name.Constant: "#000000", # class: 'no' - to be revised Name.Constant: "#000000", # class: 'no' - to be revised
Name.Decorator: "#888", # class: 'nd' - to be revised Name.Decorator: "#888", # class: 'nd' - to be revised
Name.Entity: "#ce5c00", # class: 'ni' Name.Entity: "#ce5c00", # class: 'ni'
Name.Exception: "bold #cc0000", # class: 'ne' Name.Exception: "bold #cc0000", # class: 'ne'
Name.Function: "#000000", # class: 'nf' Name.Function: "#000000", # class: 'nf'
Name.Property: "#000000", # class: 'py' Name.Property: "#000000", # class: 'py'
Name.Label: "#f57900", # class: 'nl' Name.Label: "#f57900", # class: 'nl'
Name.Namespace: "#000000", # class: 'nn' - to be revised Name.Namespace: "#000000", # class: 'nn' - to be revised
Name.Other: "#000000", # class: 'nx' Name.Other: "#000000", # class: 'nx'
Name.Tag: "bold #004461", # class: 'nt' - like a keyword Name.Tag: "bold #004461", # class: 'nt' - like a keyword
Name.Variable: "#000000", # class: 'nv' - to be revised Name.Variable: "#000000", # class: 'nv' - to be revised
Name.Variable.Class: "#000000", # class: 'vc' - to be revised Name.Variable.Class: "#000000", # class: 'vc' - to be revised
Name.Variable.Global: "#000000", # class: 'vg' - to be revised Name.Variable.Global: "#000000", # class: 'vg' - to be revised
Name.Variable.Instance: "#000000", # class: 'vi' - to be revised Name.Variable.Instance: "#000000", # class: 'vi' - to be revised
Number: "#990000", # class: 'm'
Number: "#990000", # class: 'm' Literal: "#000000", # class: 'l'
Literal.Date: "#000000", # class: 'ld'
Literal: "#000000", # class: 'l' String: "#4e9a06", # class: 's'
Literal.Date: "#000000", # class: 'ld' String.Backtick: "#4e9a06", # class: 'sb'
String.Char: "#4e9a06", # class: 'sc'
String: "#4e9a06", # class: 's' String.Doc: "italic #8f5902", # class: 'sd' - like a comment
String.Backtick: "#4e9a06", # class: 'sb' String.Double: "#4e9a06", # class: 's2'
String.Char: "#4e9a06", # class: 'sc' String.Escape: "#4e9a06", # class: 'se'
String.Doc: "italic #8f5902", # class: 'sd' - like a comment String.Heredoc: "#4e9a06", # class: 'sh'
String.Double: "#4e9a06", # class: 's2' String.Interpol: "#4e9a06", # class: 'si'
String.Escape: "#4e9a06", # class: 'se' String.Other: "#4e9a06", # class: 'sx'
String.Heredoc: "#4e9a06", # class: 'sh' String.Regex: "#4e9a06", # class: 'sr'
String.Interpol: "#4e9a06", # class: 'si' String.Single: "#4e9a06", # class: 's1'
String.Other: "#4e9a06", # class: 'sx' String.Symbol: "#4e9a06", # class: 'ss'
String.Regex: "#4e9a06", # class: 'sr' Generic: "#000000", # class: 'g'
String.Single: "#4e9a06", # class: 's1' Generic.Deleted: "#a40000", # class: 'gd'
String.Symbol: "#4e9a06", # class: 'ss' Generic.Emph: "italic #000000", # class: 'ge'
Generic.Error: "#ef2929", # class: 'gr'
Generic: "#000000", # class: 'g' Generic.Heading: "bold #000080", # class: 'gh'
Generic.Deleted: "#a40000", # class: 'gd' Generic.Inserted: "#00A000", # class: 'gi'
Generic.Emph: "italic #000000", # class: 'ge' Generic.Output: "#888", # class: 'go'
Generic.Error: "#ef2929", # class: 'gr' Generic.Prompt: "#745334", # class: 'gp'
Generic.Heading: "bold #000080", # class: 'gh' Generic.Strong: "bold #000000", # class: 'gs'
Generic.Inserted: "#00A000", # class: 'gi' Generic.Subheading: "bold #800080", # class: 'gu'
Generic.Output: "#888", # class: 'go' Generic.Traceback: "bold #a40000", # class: 'gt'
Generic.Prompt: "#745334", # class: 'gp'
Generic.Strong: "bold #000000", # class: 'gs'
Generic.Subheading: "bold #800080", # class: 'gu'
Generic.Traceback: "bold #a40000", # class: 'gt'
} }

View File

@ -21,26 +21,19 @@
{{ relbar() }} {{ relbar() }}
<div class="footer"> <div class="footer">
{% if show_copyright %}&copy;{{ copyright }}.{% endif %} &copy; 2012-2017, <a href="https://romain.dorgueil.net" target="_blank">Romain Dorgueil</a> |
{% if theme_show_powered_by|lower == 'true' %} <a href="https://www.bonobo-project.org/" target="_blank">Bonobo ETL</a>
{% if show_copyright %}|{% endif %}
Powered by <a href="http://sphinx-doc.org/">Sphinx {{ sphinx_version }}</a> {%- if show_source and has_source and sourcename %}
&amp; <a href="https://github.com/bitprophet/alabaster">Alabaster {{ alabaster_version }}</a> | <a href="{{ pathto('_sources/' + sourcename, true)|e }}" rel="nofollow" target="_blank">{{ _('Page source') }}</a>
{% endif %}
{%- if show_source and has_source and sourcename %}
{% if show_copyright or theme_show_powered_by %}|{% endif %}
<a href="{{ pathto('_sources/' + sourcename, true)|e }}"
rel="nofollow">{{ _('Page source') }}</a>
{%- endif %} {%- endif %}
</div> </div>
{% if theme_github_banner|lower != 'false' %} <a href="https://github.com/python-bonobo/bonobo" class="github">
<a href="https://github.com/{{ theme_github_user }}/{{ theme_github_repo }}" class="github">
<img style="position: absolute; top: 0; right: 0; border: 0;" <img style="position: absolute; top: 0; right: 0; border: 0;"
src="{{ pathto('_static/' ~ theme_github_banner, 1) if theme_github_banner|lower != 'true' else 'https://s3.amazonaws.com/github/ribbons/forkme_right_darkblue_121621.png' }}" src="{{ pathto('_static/' ~ theme_github_banner, 1) if theme_github_banner|lower != 'true' else 'https://s3.amazonaws.com/github/ribbons/forkme_right_darkblue_121621.png' }}"
alt="Fork me on GitHub" class="github"/> alt="Fork me on GitHub" class="github"/>
</a> </a>
{% endif %}
{% if theme_analytics_id %} {% if theme_analytics_id %}
<script type="text/javascript"> <script type="text/javascript">
@ -59,4 +52,12 @@
})(); })();
</script> </script>
{% endif %} {% endif %}
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-4678258-14"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'UA-4678258-14');
</script>
{%- endblock %} {%- endblock %}

View File

@ -13,8 +13,7 @@ as input.
By default, it uses a thread pool to execute all functions in parallel, and handle the movement of data rows in the By default, it uses a thread pool to execute all functions in parallel, and handle the movement of data rows in the
directed graph using simple fifo queues. directed graph using simple fifo queues.
It allows the user to focus on the content of the transformations, and not optimizing blocking or long operations, nor It allows the user to focus on the content of the transformations, rather than worrying about optimized blocking, long operations, threads, or subprocesses.
thinking about threads or subprocesses.
It's lean manufacturing for data. It's lean manufacturing for data.
@ -34,7 +33,7 @@ The main reasons about why 3.5+:
* Creating a tool that works well under both python 2 and 3 is a lot more work. * Creating a tool that works well under both python 2 and 3 is a lot more work.
* Python 3 is nearly 10 years old. Consider moving on. * Python 3 is nearly 10 years old. Consider moving on.
* Python 3.5 contains syntaxic sugar that makes working with data a lot more convenient. * Python 3.5+ contains syntactic sugar that makes working with data a lot more convenient (and fun).
Can a graph contain another graph? Can a graph contain another graph?

View File

@ -68,6 +68,8 @@ processing while `B` and `C` are working.
BEGIN2 -> "B" -> "C"; BEGIN2 -> "B" -> "C";
} }
Now, we feed `C` with both `A` and `B` output. It is not a "join", or "cartesian product". It is just two different
pipes plugged to `C` input, and whichever yields data will see this data feeded to `C`, one row at a time.
What is it not? What is it not?
::::::::::::::: :::::::::::::::