API Reference#
Flatten module#
- class spoonbill.flatten.TableFlattenConfig(split, pretty_headers=False, headers=<factory>, repeat=<factory>, unnest=<factory>, only=<factory>, name='')[source]#
Table specific flattening configuration
- Parameters:
split (bool) – Split child arrays to separate tables
pretty_headers (bool) – Use human friendly headers extracted from schema
headers (Mapping[str, str]) – User edited headers to override automatically extracted
unnest (List[str]) – List of columns to output from child to parent table
repeat (List[str]) – List of columns to clone in child tables
name (str) – Overwrite table name
- class spoonbill.flatten.FlattenOptions(selection, exclude=<factory>, count=False)[source]#
Flattening configuration
- Parameters:
- selection: Mapping[str, TableFlattenConfig]#
- class spoonbill.flatten.Flattener(options, tables, language='en')[source]#
Data flattener
In order to export data correctly Flattener requires previously analyzed tables data. During the process flattener could add columns not based on schema analysis, such as itemsCount. In every generated row, depending on table type, flattener will always few add autogenerated columns. For root table: * rowID * id * ocid
For child tables this list well be extended with parentID column.
- Parameters:
options (FlattenOptions) – Flattening options
language – Language to use for the human-readable headings
CLI module#
cli.py - Command line interface related routines
- class spoonbill.cli.CommaSeparated[source]#
Click option type to convert comma separated string into list
- convert(value, param, ctx)[source]#
Convert the value to the correct type. This is not called if the value is
None
(the missing value).This must accept string values from the command line, as well as values that are already the correct type. It may also convert other compatible types.
The
param
andctx
arguments may beNone
in certain situations, such as when converting prompt input.If the value cannot be converted, call
fail()
with a descriptive message.- Parameters:
value – The value to convert.
param – The parameter that is using this type to convert its value. May be
None
.ctx – The current context that arrived at this value. May be
None
.
Spec module#
- class spoonbill.spec.Column(id, path, title, type, hits=0, header=<factory>)[source]#
A container for column information.
- Parameters:
- class spoonbill.spec.Table(name, path, total_rows=0, parent=<factory>, is_root=False, is_combined=False, splitted=False, rolled_up=False, columns=<factory>, combined_columns=<factory>, additional_columns=<factory>, arrays=<factory>, titles=<factory>, child_tables=<factory>, types=<factory>, array_columns=<factory>, array_positions=<factory>, preview_rows=<factory>, preview_rows_combined=<factory>)[source]#
A container for table information.
- Parameters:
name (str) – Table name
path (List[str]) – List of paths to gather data to this table
total_rows (int) – Total available rows in this table
parent (object) – Parent table, None if this table is root table
is_root (bool) – This table is root table
is_combined (bool) – This table contains data collected from different paths
splitted (bool) – This table should be splitted
rolled_up (bool) – This table should be ated from its parent
columns (Mapping[str, Column]) – Columns extracted from schema for split version of this table
combined_columns (Mapping[str, Column]) – Columns extracted from schema for unsplit version of this table
additional_columns (Mapping[str, Column]) – Columns identified in dataset but not in schema
arrays (Mapping[str, int]) – Table array columns and maximum items (not the total count) in each array
titles (Mapping[str, str]) – All human-friendly column titles, extracted from the schema
types (Mapping[str, List[str]]) – All paths matched to this table with corresponding object type on each path
preview_rows (Sequence[dict]) – Generated preview for split version of this table
preview_rows_combined (Sequence[dict]) – Generated preview for unsplit version of this table
- missing_rows(split=True)[source]#
Return the columns that are available in the schema, but not present in the analyzed data.
- add_column(path, item_type, title, *, propagated=False, additional=False, abs_path=None, header=[])[source]#
Add a new column to the table.
- Parameters:
path – The column’s path
item_type – The column’s expected type
title – Column title
combined_only – Make this column available only in combined version of table
propagated – Add column to parent table
additional – Mark this column as missing in schema
abs_path – The column’s full JSON path
- inc_column(abs_path, path)[source]#
Increment the number of non-empty cells in the column.
- Parameters:
abs_path – The column’s full JSON path
path – The column’s JSON path without array indexes
- spoonbill.spec.add_child_table(table, pointer, parent_key, key)[source]#
Create and append a new child table to the given table.
- Parameters:
table – The parent table to the newly created table
pointer – Path to which table should match
parent_key – New table parent object filed name, used to generate table name
key – New table field name object filed name, used to generate table name
- Returns:
Child table
Stats module#
- class spoonbill.stats.DataPreprocessor(schema, root_tables, combined_tables=None, tables=None, table_threshold=5, total_items=0, language='en', multiple_values=False, pkg_type=None, with_preview=True)[source]#
Data analyzer
Processes the given schema and, based on this, extracts information from the iterable dataset.
- Parameters:
schema (Mapping) – The dataset’s schema
root_tables (Mapping[str, List]) – The paths which should become root tables
combined_tables (Mapping[str, List]) – The paths which should become tables that combine data from different locations
tables (Mapping[str, Table]) – Use these tables objects instead of parsing the schema
table_threshold – The maximum array length, before it is recommended to split out a child table
total_items – The total objects processed
language – Language to use for the human-readable headings
- get_table(path)[source]#
Get the table that best matches the given path.
- Parameters:
path – A path
- Returns:
A table
- add_preview_row(rows, item_id, parent_key)[source]#
Append a mostly-empty row to the previews.
This is important to do, because other code uses an index of -1 to access and update the current row.
- Parameters:
rows – The Rows object
item_id – Object id
- process_items(releases, with_preview=True)[source]#
Analyze releases.
Iterates over every release to calculate metrics and optionally generates previews for combined and split versions of each table.
- Parameters:
releases – The releases to analyze
with_preview – Whether to generate previews for each table
Writer modules#
- class spoonbill.writers.base_writer.BaseWriter(workdir, tables, options, schema)[source]#
Base writer class
- __init__(workdir, tables, options, schema)[source]#
- Parameters:
workdir – Working directory
tables – The table objects
options – Flattening options