Library Usage#
File Analyzer#
Analyze file#
To create analyzer object, use:
from spoonbill import FileAnalyzer
from spoonbill.common import ROOT_TABLES, COMBINED_TABLES
analyzer = FileAnalyzer(
'.',
schema=path_to_schema,
root_tables=ROOT_TABLES,
combined_tables=COMBINED_TABLES,
language='en',
table_threshold=5,
)
To analyze file and track progress, use:
for bytes_read, count in analyzer.analyze_file(path_to_file):
print(f'analyzed {count} ({bytes_read})')
Storing state#
To dump state file after analysis, use:
analyzer.dump_to_file('analyzed.state')
Note
This sile may be re-used for new instance of analyzer. Can be used to omit analysis step in case of multiple flatteting of the same file.
To restore from state, use:
analyzer = FileAnalyzer('.', state_file='analyzed.state')
Flattener#
Flattening options#
To create flattening options and extract only table and split if its possible,(for example, tenders) use:
from spoonbill.flatten import FlattenOptions
options = FlattenOptions({"selection": {"tenders": {"split": True}}})
To select multiple tables (for example, tender and parties), use:
from spoonbill.flatten import FlattenOptions
options = FlattenOptions(**{
"selection": {
"tenders": {"split": True},
"parties": {"split": True}
}
})
Flatten file#
To flatten file, use:
from spoonbill import FileFlattener
flattener = FileFlattener(
'.',
options,
analyzer,
csv=True, # Generate csv files
xlsx=True, # Generate xlsx files
language='en',
)
for count in flattener.flatten_file(filename):
print(f'Flattened {count} items')
Note
Please note that flattening routine requires data to be analyzed beforehand.