Documents and Fragments¶

The geniza.corpus application is the heart of this project. The most important models are Document and Fragment, with a number of supporting models to track the source of the fragment, document type, languages and scripts used in a document, etc.

models¶

class geniza.corpus.models.Collection(*args, **kwargs)[source]¶

Collection or library that holds Geniza fragments

exception DoesNotExist¶

exception MultipleObjectsReturned¶

property full_name¶: attempt to combine library and collection name into a human readable format

natural_key()[source]¶: natural key: tuple of name and library

class geniza.corpus.models.CollectionManager(*args, **kwargs)[source]¶

Custom manager for Collection with natural key lookup

get_by_natural_key(name, library)[source]¶: get by natural key: combination of name and library

class geniza.corpus.models.Dating(*args, **kwargs)[source]¶

An inferred date for a document.

exception DoesNotExist¶

exception MultipleObjectsReturned¶

property standard_date_display¶: Standard date in human-readable format for document details pages

class geniza.corpus.models.Document(*args, **kwargs)[source]¶

A unified document such as a letter or legal document that appears on one or more fragments.

exception DoesNotExist¶

exception MultipleObjectsReturned¶

admin_thumbnails()[source]¶: generate html for thumbnails of all iiif images, for image reordering UI in admin

all_languages()[source]¶: comma delimited string of all primary languages for this document

all_secondary_languages()[source]¶: comma delimited string of all secondary languages for this document

attribution()[source]¶: Generate a tuple of three attribution components for use in IIIF manifests or wherever images/transcriptions need attribution.

property available_digital_content¶: Helper method for the ITT viewer to collect all available panels into a list

clean_fields(exclude: Collection[str] | None = None) → None¶: Clean all fields and raise a ValidationError containing a dict of all validation errors if any occur.

property collection¶: collection (abbreviation) for associated fragments

property collections¶: collection objects for associated fragments

dating_range()[source]¶: Return the start and end of the document’s possible date range, as PartialDate objects, including standardized document dates and inferred Datings, if any exist.

property default_translation¶: The first translation footnote that is in the current language, or the first translation footnote ordered alphabetically by source if one is not available in the current language.

digital_editions()[source]¶: All footnotes for this document where the document relation includes digital edition.

digital_footnotes()[source]¶: All footnotes for this document where the document relation includes digital edition or digital translation.

digital_translations()[source]¶: All footnotes for this document where the document relation includes digital translation.

editions()[source]¶: All footnotes for this document where the document relation includes edition.

editors()[source]¶: All unique authors of digital editions for this document.

property formatted_citation¶: a formatted citation for display at the bottom of Document detail pages

property fragment_historical_shelfmarks¶: Property to display set of all historical shelfmarks on the document

fragment_urls()[source]¶: List of external URLs to view the Document’s Fragments.

property fragments_by_provenance¶: Associated fragments ordered by provenance_display, if set

fragments_other_docs()[source]¶: List of other documents that are on the same fragment(s) as this document (does not include suppressed documents). Returns a list of Document objects.

classmethod from_manifest_uri(uri)[source]¶: Given a manifest URI (as used in transcription annotations), find a Document matching its pgpid

get_absolute_url()[source]¶: url for this document

get_deferred_fields() → set[str]¶: Return a set containing names of deferred fields for this instance.

has_digital_content()[source]¶: Helper method for the ITT viewer on the public front-end to determine whether a document has any images, digital editions, or digital translations.

has_image()[source]¶: Admin display field indicating if document has a IIIF image.

has_transcription()[source]¶: Admin display field indicating if document has a transcription.

has_translation()[source]¶

Helper method to determine if document has a translation.

Returns:: Whether document has translation
Return type:: bool

iiif_images(filter_side=False, with_placeholders=False, thumbnail=False)[source]¶

Dict of IIIF images and labels for images of the Document’s Fragments, keyed on canvas.

Parameters:

filter_side – if TextBlocks have side info, filter images by side (default: False)
with_placeholders – if there are digital editions with canvases missing images, include placeholder images for each additional canvas (default: False)

iiif_urls()[source]¶: List of IIIF urls for images of the Document’s Fragments.

index_data()[source]¶: data for indexing in Solr

is_public()[source]¶: admin display field indicating if doc is public or suppressed

classmethod items_to_index()[source]¶: Custom logic for finding items to be indexed when indexing in bulk.

list_thumbnail()[source]¶: generate html for thumbnail of first image, for display in related documents lists

merge_with(merge_docs, rationale, user=None)[source]¶: Merge the specified documents into this one. Combines all metadata into this document, adds the merged documents into list of old PGP IDs, and creates a log entry documenting the merge, including the rationale.

classmethod prep_index_chunk(chunk)[source]¶: Prefetch related information when indexing in chunks (modifies queryset chunk in place)

property primary_lang_code¶: Primary language code for this document, when there is only one primary language set and it has an ISO code available. Returns None if unset or unavailable.

property primary_script¶: Primary script for this document, if shared across all primary languages.

refresh_from_db(using: str | None = None, fields: Iterable[str] | None = None, from_queryset: QuerySet[Any] | None = None) → None¶

Reload field values from the database.

By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.

Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.

When accessing deferred fields of an instance, the deferred loading of the field will call this method.

property related_documents¶: List of other documents with any of the same shelfmarks as this document; does not include suppressed documents. Queries Solr and returns a list of dict objects.

save(*args, **kwargs)[source]¶

Save the current instance. Override this in a subclass if you want to control the saving process.

The ‘force_insert’ and ‘force_update’ parameters can be used to insist that the “save” must be an SQL insert or update (or equivalent for non-SQL backends), respectively. Normally, they should not be set.

property shelfmark¶: shelfmarks for associated fragments

property shelfmark_display¶: Label for this document; by default, based on the combined shelfmarks from all certain associated fragments; uses shelfmark_override if set

solr_dating_range()[source]¶: Return the document’s dating range, including inferred, as a Solr date range.

sources()[source]¶: All unique sources attached to footnotes on this document.

status¶: status of record; currently choices are public or suppressed

property title¶: Short title for identifying the document, e.g. via search.

classmethod total_to_index()[source]¶: static method to efficiently count the number of documents to index in Solr

class geniza.corpus.models.DocumentEventRelation(*args, **kwargs)[source]¶

A relationship between a document and an event

exception DoesNotExist¶

exception MultipleObjectsReturned¶

class geniza.corpus.models.DocumentQuerySet(*args: Any, **kwargs: Any)[source]¶

get_by_any_pgpid(pgpid)[source]¶: Find a document by current or old pgpid

metadata_prefetch()[source]¶: Returns a further QuerySet that has been prefetched for relevant document information.

class geniza.corpus.models.DocumentSignalHandlers[source]¶

Signal handlers for indexing Document records when related records are saved or deleted.

static related_change(instance, raw, mode)[source]¶: reindex all associated documents when related data is changed

static related_delete(sender, instance=None, raw=False, **_kwargs)[source]¶: reindex associated documents when a related object is deleted

static related_save(sender, instance=None, raw=False, **_kwargs)[source]¶: reindex associated documents when a related object is saved

class geniza.corpus.models.DocumentType(*args, **kwargs)[source]¶

Controlled vocabulary of document types.

exception DoesNotExist¶

exception MultipleObjectsReturned¶

clean_fields(exclude: Collection[str] | None = None) → None¶: Clean all fields and raise a ValidationError containing a dict of all validation errors if any occur.

get_deferred_fields() → set[str]¶: Return a set containing names of deferred fields for this instance.

class property objects_by_label¶: A dict of object instances keyed on English display label, used for search form and search results, which should be based on Solr facet and query responses (indexed in English).

refresh_from_db(using: str | None = None, fields: Iterable[str] | None = None, from_queryset: QuerySet[Any] | None = None) → None¶

Reload field values from the database.

By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.

Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.

When accessing deferred fields of an instance, the deferred loading of the field will call this method.

class geniza.corpus.models.DocumentTypeManager(*args, **kwargs)[source]¶

Custom manager for DocumentType with natural key lookup

get_by_natural_key(name)[source]¶: natural key lookup, based on name

class geniza.corpus.models.Fragment(*args, **kwargs)[source]¶

A single fragment or multifragment held by a particular library or archive.

exception DoesNotExist¶

exception MultipleObjectsReturned¶

static admin_thumbnails(images, labels, canvases=[], selected=[])[source]¶: Convenience method for generating IIIF thumbnails HTML from lists of images and labels; separated for reuse in Document

property attribution¶: Generate an attribution for this fragment

clean()[source]¶: Custom validation and cleaning; currently only clean_iiif_url()

clean_iiif_url()[source]¶: Remove redundant manifest parameter from IIIF url when present

iiif_images(allow_network_reqs=True)[source]¶: IIIF image URLs for this fragment. Returns a list of IIIFImageClient and corresponding list of labels, or None if this fragement has no IIIF url associated.

property iiif_provenance¶: Generate a provenance statement for this fragment from IIIF

iiif_thumbnails(selected=[])[source]¶: html for thumbnails of iiif image, for display in admin

natural_key()[source]¶: natural key: shelfmark

save(*args, **kwargs)[source]¶: Remember how shelfmarks have changed by keeping a semi-colon list in the old_shelfmarks field

class geniza.corpus.models.FragmentManager(*args, **kwargs)[source]¶

Custom manager for Fragment with natural key lookup

get_by_natural_key(shelfmark)[source]¶: get fragment by natural key: shelfmark

class geniza.corpus.models.LanguageScript(*args, **kwargs)[source]¶

Combination language and script

exception DoesNotExist¶

exception MultipleObjectsReturned¶

natural_key()[source]¶: natural key: tuple of language and script

class geniza.corpus.models.LanguageScriptManager(*args, **kwargs)[source]¶

Custom manager for LanguageScript with natural key lookup

get_by_natural_key(language, script)[source]¶: get by natural key: combination of language and script

class geniza.corpus.models.PermalinkMixin[source]¶: Mixin to generate a permalink for Django model objects by removing language code from the object’s absolute URL.

class geniza.corpus.models.Provenance(*args, **kwargs)[source]¶

A provenance designation for a Fragment.

exception DoesNotExist¶

exception MultipleObjectsReturned¶

clean_fields(exclude: Collection[str] | None = None) → None¶: Clean all fields and raise a ValidationError containing a dict of all validation errors if any occur.

get_deferred_fields() → set[str]¶: Return a set containing names of deferred fields for this instance.

refresh_from_db(using: str | None = None, fields: Iterable[str] | None = None, from_queryset: QuerySet[Any] | None = None) → None¶

Reload field values from the database.

By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.

Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.

When accessing deferred fields of an instance, the deferred loading of the field will call this method.

class geniza.corpus.models.TagSignalHandlers[source]¶

Signal handlers for taggit.Tag records.

static tagged_item_change(sender, instance, action, **kwargs)[source]¶: Ensure document (=instance) is indexed after the tags m2m relationship is saved and the list of tags is pulled from the database, on any tag change.

static unidecode_tag(sender, instance, **kwargs)[source]¶: Convert saved tags to ascii, stripping diacritics.

class geniza.corpus.models.TextBlock(*args, **kwargs)[source]¶

The portion of a document that appears on a particular fragment.

exception DoesNotExist¶

exception MultipleObjectsReturned¶

property side¶: Recto/verso side information based on selected image indices

thumbnail()[source]¶: iiif thumbnails for this TextBlock, with selected images highlighted

geniza.corpus.models.detach_document_logentries(sender, instance, **kwargs)[source]¶

Document pre-delete signal handler.

To avoid deleting log entries caused by the generic relation from document to log entries, clear out object id for associated log entries before deleting the document.

dates¶

class geniza.corpus.dates.Calendar[source]¶

Codes for supported calendars

ANNO_MUNDI = 'am'¶: Anno Mundi calendar (Hebrew)

HIJRI = 'h'¶: Hijri calendar (Islamic)

KHARAJI = 'k'¶: Kharaji calendar

SELEUCID = 's'¶: Seleucid calendar

SELEUCID_OFFSET = 3449¶: offset for Seleucid calendar: Anno Mundi - 3449

can_convert = ['am', 'h', 's']¶: calendars that can be converted to Julian/Gregorian

class geniza.corpus.dates.DocumentDateMixin(*args, **kwargs)[source]¶

Mixin for document date fields (original and standardized), and related logic for displaying, converting,a nd validating dates.

clean()[source]¶: Require doc_date_original and doc_date_calendar to be set if either one is present.

property document_date¶: Property: formatted display of combined original and standardized dates

property end_date¶: Return the end date of the document’s standardized date or date range, if set.

classmethod get_document_date(doc_date_standard, original_date)[source]¶: Generate formatted display of combined original and standardized dates

property original_date¶: Generate formatted display for the document’s original/historical date

property parsed_date¶: Parse standard date (if set) and return as dictionary of start/end PartialDate objects

solr_date_range()[source]¶: Return a Solr date range for the document’s standardized date.

standardize_date(update=False)[source]¶: Convert the document’s original date to a standardized date, if possible. If update is requested, will store the converted value on doc_date_standard

property start_date¶: Return the start date of the document’s standardized date or date range, if set.

class geniza.corpus.dates.PartialDate(str)[source]¶

Simple partial date object to handle parsing and display of dates in the format YYYY, YYYY-MM, or YYYY-MM-DD. Display format is based on known precision of year, month, or day.

display_format = {'day': 'j F Y', 'month': 'F Y', 'year': 'Y'}¶: public display format based on date precision

static get_date_range(old_range, new_range)[source]¶: Compute the union (widest possible date range) between two PartialDate ranges.

iso_format = {'day': '%Y-%m-%d', 'month': '%Y-%m', 'year': '%Y'}¶: ISO format based on date precision

isoformat(mode='min', fmt='precision')[source]¶

Display partial date in ISO format. By default, will display YYYY, YYYY-MM, or YYYY-MM-DD according to known precision. If min or max is requested, will display YYYY-MM-DD for earliest or latest date based on known precision.

Parameters:

mode – how to fill in unknowns: min, or max (default: min)
fmt – format: precision (default), isoformat, or numeric

num_fmt = '%Y%m%d'¶: numeric format for indexing and sorting

numeric_format(mode='min')[source]¶: “Date in numeric format for sorting; max or min for unknowns. See isoformat() for more details.

geniza.corpus.dates.calendar_converter = {'am': <module 'convertdate.hebrew' from '/opt/hostedtoolcache/Python/3.12.10/x64/lib/python3.12/site-packages/convertdate/hebrew.py'>, 'h': <module 'convertdate.islamic' from '/opt/hostedtoolcache/Python/3.12.10/x64/lib/python3.12/site-packages/convertdate/islamic.py'>, 's': <module 'convertdate.hebrew' from '/opt/hostedtoolcache/Python/3.12.10/x64/lib/python3.12/site-packages/convertdate/hebrew.py'>}¶: mapping between supported calendars and corresponding convertdate module

geniza.corpus.dates.convert_hebrew_date(historic_date)[source]¶: Convert a date in the Hebrew Anno Mundi calendar to the Julian or Gregorian calendar

geniza.corpus.dates.convert_islamic_date(historic_date)[source]¶: Convert a date in the Islamic Hijri calendar to the Julian or Gregorian calendar

geniza.corpus.dates.convert_seleucid_date(historic_date)[source]¶: Convert a date in the Greek Seleucid calendar to the Julian or Gregorian calendar

geniza.corpus.dates.display_date_range(earliest, latest)[source]¶: display a date range or single date in a isoformat

geniza.corpus.dates.get_calendar_date(converter, year, month=None, day=None, mode=None)[source]¶: Convert a date from a supported calendar and return as a datetime.date or tuple of dates for a date range, when the conversion is ambiguous. Takes year and optional month and day.

geniza.corpus.dates.get_calendar_month(convertdate_module, month)[source]¶

“Convert month name to month number for the specified calendar.

Parameters:

convertdate_module – convertdate calendar module to use
month – string month name

Return int:

month number

geniza.corpus.dates.get_hebrew_month(month_name)[source]¶: Convert Hebrew month name to month number. Supports local month name aliases for alternate spellings.

geniza.corpus.dates.get_islamic_month(month_name)[source]¶: Convert Islamic month name to month number; works with or without accents, and supports local month-name overrides.

geniza.corpus.dates.re_original_date = re.compile('(?:(?P<weekday>\\w+day),? )?(?:(?P<day>\\d+) )?(?:(?P<month>[^\\d]+( I{1,2})?) )?(?P<year>\\d{3,4})')¶: regular expression for extracting information from original date string

geniza.corpus.dates.standard_date_display(standard_date)[source]¶: Display a standardized CE date in human readable format.

geniza.corpus.dates.standardize_date(historic_date, calendar)[source]¶: convert hebrew date in text format to standard date range

metadata export¶

class geniza.corpus.metadata_export.AdminDocumentExporter(queryset=None, progress=False)[source]¶

get_export_data_dict(doc)[source]¶: Adding certain fields to DocumentExporter.get_export_data_dict that are admin-only.

class geniza.corpus.metadata_export.AdminFragmentExporter(queryset=None, progress=False)[source]¶

Admin fragment export variant; adds notes, review, and admin url fields.

get_export_data_dict(fragment)[source]¶

A given Exporter class (DocumentExporter, FootnoteExporter, etc) must implement this function. It ought to return a dictionary of exported information for a given object.

Parameters:: obj (object) – Model object (document, footnote, etc)
Raises:: NotImplementedError – This method must be implemented by subclasses

class geniza.corpus.metadata_export.DocumentExporter(queryset=None, progress=False)[source]¶

A subclass of geniza.common.metadata_export.Exporter that exports information relating to Document. Extends get_queryset() and get_export_data_dict().

get_export_data_dict(doc)[source]¶

Get back data about a document in dictionary format.

Parameters:: doc (Document) – A given Document object
Returns:: Dictionary of data about the document
Return type:: dict

get_queryset()[source]¶

Applies some prefetching to the base Exporter’s get_queryset functionality.

Returns:: Custom-given query set or query set of all documents
Return type:: QuerySet

model¶: alias of Document

class geniza.corpus.metadata_export.FragmentExporter(queryset=None, progress=False)[source]¶

A subclass of geniza.common.metadata_export.Exporter that exports information relating to Fragment.

get_export_data_dict(fragment)[source]¶

A given Exporter class (DocumentExporter, FootnoteExporter, etc) must implement this function. It ought to return a dictionary of exported information for a given object.

Parameters:: obj (object) – Model object (document, footnote, etc)
Raises:: NotImplementedError – This method must be implemented by subclasses

get_queryset()[source]¶

Applies some prefetching to the base Exporter’s get_queryset functionality.

Returns:: Custom-given query set or query set of all documents
Return type:: QuerySet

model¶: alias of Fragment

class geniza.corpus.metadata_export.PublicDocumentExporter(queryset=None, progress=False)[source]¶

Public version of the document exporter. It can e.g. modify the get_queryset to ensure it deals with public documents.

get_queryset()[source]¶

Applies some prefetching to the base Exporter’s get_queryset functionality.

Returns:: Custom-given query set or query set of all documents
Return type:: QuerySet

class geniza.corpus.metadata_export.PublicFragmentExporter(queryset=None, progress=False)[source]¶

Public version of the fragment exporter; limits fragments to those associated with public documents. Unassociated fragments or fragments only linked to suppressed documents are not included.

get_queryset()[source]¶

Applies some prefetching to the base Exporter’s get_queryset functionality.

Returns:: Custom-given query set or query set of all documents
Return type:: QuerySet

views¶

class geniza.corpus.views.DocumentAddTranscriptionView(**kwargs)[source]¶

get_context_data(**kwargs)[source]¶: Pass form with autocomplete to context

model¶: alias of Document

page_title()[source]¶: Title of add transcription/translation page

post(request, *args, **kwargs)[source]¶: Create footnote linking source to document, then redirect to edit transcription/translation view

class geniza.corpus.views.DocumentAnnotationListView(**kwargs)[source]¶

Generate a IIIF Annotation List for a document to make transcription content available for inclusion in local IIIF manifest.

get(request, *args, **kwargs)[source]¶: handle GET request: construct and return JSON annotation list

viewname = 'corpus-uris:document-annotations'¶: bound name of this view, for use in generating absolute url for redirect

class geniza.corpus.views.DocumentDetailBase(**kwargs)[source]¶

View mixin to handle lastmodified and redirects for documents with old PGPIDs. Overrides get request in the case of a 404, looking for any records with passed PGPID in old_pgpids, and if found, redirects to that document with current PGPID.

get(request, *args, **kwargs)[source]¶: extend GET to check for old pgpid and redirect on 404

get_solr_lastmodified_filters()[source]¶: Filter solr last modified query by pgpid

class geniza.corpus.views.DocumentDetailView(**kwargs)[source]¶

public display of a single Document

get_absolute_url()[source]¶: Get the permalink to this page.

get_context_data(**kwargs)[source]¶: extend context data to add page metadata

get_queryset(*args, **kwargs)[source]¶: Don’t show document if it isn’t public

model¶: alias of Document

page_description()[source]¶: page description, for metadata; uses truncated document description

page_title()[source]¶: page title, for metadata; uses document title

viewname = 'corpus:document'¶: bound name of this view, for use in generating absolute url for redirect

class geniza.corpus.views.DocumentManifestView(**kwargs)[source]¶

Generate a IIIF Presentation manifest for a document, incorporating available canvases and attaching transcription content via annotation.

get(request, *args, **kwargs)[source]¶: extend GET to check for old pgpid and redirect on 404

viewname = 'corpus-uris:document-manifest'¶: bound name of this view, for use in generating absolute url for redirect

class geniza.corpus.views.DocumentMerge(**kwargs)[source]¶

form_class¶: alias of DocumentMergeForm

form_valid(form)[source]¶: Merge the selected documents into the primary document.

get_form_kwargs()[source]¶: Return the keyword arguments for instantiating the form.

get_initial()[source]¶: Return the initial data to use for forms on this view.

get_success_url()[source]¶: Return the URL to redirect to after processing a valid form.

class geniza.corpus.views.DocumentScholarshipView(**kwargs)[source]¶

List of Footnote references for a single Document

get_context_data(**kwargs)[source]¶: extend context data to add page metadata

get_queryset(*args, **kwargs)[source]¶: Prefetch footnotes, and don’t show the page if there are none.

page_description()[source]¶: page description, for metadata; uses truncated document description

page_title()[source]¶: page title, for metadata; uses document title

viewname = 'corpus:document-scholarship'¶: bound name of this view, for use in generating absolute url for redirect

class geniza.corpus.views.DocumentSearchView(**kwargs)[source]¶

dispatch(request, *args, **kwargs)[source]¶: Wrap the dispatch method to add a last modified header if one is available, then return a conditional response.

form_class¶: alias of DocumentSearchForm

get_apd_link(query)[source]¶: Generate a link to the Arabic Papyrology Database (APD) search page using the entered query, converting any Hebrew script to Arabic with Regex

get_applied_filter_labels(form, field, filters)[source]¶: return a list of objects with field/value pairs, and translated labels, one for each applied filter

get_boolfield_label(form, fieldname)[source]¶: Return a label dict for a boolean field (works differently than other fields)

get_context_data(**kwargs)[source]¶: extend context data to add page metadata, highlighting, and update form with facets

get_form_kwargs()[source]¶: get form arguments from request and configured defaults

get_paginate_by(queryset)[source]¶: Try to get pagination from GET request query, if there is none fallback to the original.

get_queryset()[source]¶: Perform requested search and return solr queryset

get_solr_sort(sort_option, exclude_inferred=False)[source]¶: Return solr sort field for user-seleted sort option; generates random sort field using solr random dynamic field; otherwise uses solr sort field from solr_sort

last_modified()[source]¶: override last modified from solr mixin to not return a value when sorting by random

model¶: alias of Document

solr_lastmodified_filters = {'item_type_s': 'document'}¶: solr query filter for getting last modified date

class geniza.corpus.views.DocumentTranscribeView(**kwargs)[source]¶

View for the Transcription/Translation Editor page that uses annotorious-tahqiq

get_context_data(**kwargs)[source]¶: Pass annotation configuration and TinyMCE API key to page context

page_title()[source]¶: Title of transcription/translation editor page

viewname = 'corpus:document-transcribe'¶: bound name of this view, for use in generating absolute url for redirect

class geniza.corpus.views.DocumentTranscriptionText(**kwargs)[source]¶

Return transcription as plain text for download

get(request, *args, **kwargs)[source]¶: extend GET to check for old pgpid and redirect on 404

viewname = 'corpus:document-transcription-text'¶: bound name of this view, for use in generating absolute url for redirect

class geniza.corpus.views.RelatedDocumentView(**kwargs)[source]¶

List of Document objects that are related to specific Document (e.g., by occuring on the same shelfmark).

get_context_data(**kwargs)[source]¶: extend context data to add page metadata

page_description()[source]¶: page description, for metadata; uses truncated document description

page_title()[source]¶: page title, for metadata; uses document title

viewname = 'corpus:related-documents'¶: bound name of this view, for use in generating absolute url for redirect

class geniza.corpus.views.SolrDateRangeMixin[source]¶

Mixin for solr-based views with start and end date fields to get the full range of dates across the solr queryset.

get_range_stats(queryset_cls, field_name)[source]¶

Return the min and max for range fields based on Solr stats.

Returns:: Dictionary keyed on form field name with a tuple of (min, max) as integers. If stats are not returned from the field, the key is not added to a dictionary.
Return type:: dict

class geniza.corpus.views.SourceAutocompleteView(**kwargs)[source]¶

get_queryset()[source]¶: sources filtered by entered query, or all sources, ordered by author last name

class geniza.corpus.views.TagMerge(**kwargs)[source]¶

Class-based view for merging tags, closely adapted from DocumentMerge.

form_class¶: alias of TagMergeForm

form_valid(form)[source]¶: Merge the selected tags into the primary tag.

get_form_kwargs()[source]¶: Return the keyword arguments for instantiating the form.

get_initial()[source]¶: Return the initial data to use for forms on this view.

get_success_url()[source]¶: Return the URL to redirect to after processing a valid form.

static merge_tags(primary_tag, secondary_tags, user)[source]¶: Merge secondary_tags into primary_tag: tag all documents tagged with any of the secondary_tags with the primary_tag, then delete all secondary_tags, and record the change with a LogEntry.

geniza.corpus.views.old_pgp_edition(editions)[source]¶: output footnote and source information in a format similar to old pgp metadata editor/editions.

geniza.corpus.views.old_pgp_tabulate_data(queryset)[source]¶: Takes a Document queryset and yields rows of data for serialization as csv in pgp_metadata_for_old_site()

geniza.corpus.views.pgp_metadata_for_old_site(request)[source]¶: Stream metadata in CSV format for index and display in the old PGP site.

template tags¶

geniza.corpus.templatetags.corpus_extras.all_doc_relations(footnotes)[source]¶: For scholarship records list: list doc relations for all footnotes.

geniza.corpus.templatetags.corpus_extras.alphabetize(value)[source]¶: Lowercases, then alphabetizes, a list of strings

geniza.corpus.templatetags.corpus_extras.dict_item(dictionary, key)[source]¶

‘Template filter to allow accessing dictionary value by variable key. Example use:

{{ mydict|dict_item:keyvar }}

geniza.corpus.templatetags.corpus_extras.format_attribution(attribution)[source]¶: format attribution for local manifests (deprecated)

geniza.corpus.templatetags.corpus_extras.get_document_label(result_doc)[source]¶: Helper method to construct an appropriate aria-label for a document link with a fallback in case of a missing shelfmark.

geniza.corpus.templatetags.corpus_extras.has_location_or_url(footnotes)[source]¶: For scholarship records list: return True if any footnote in the list has a URL or location.

geniza.corpus.templatetags.corpus_extras.iiif_image(img, args)[source]¶

Add options to resize or otherwise change the display of an iiif image; expects an instance of piffle.image.IIIFImageClient. Provide the method and arguments as filter string, i.e.:

{{ myimg|iiif_image:"size:width=225,height=255" }}

geniza.corpus.templatetags.corpus_extras.iiif_info_json(images)[source]¶: Add /info.json to a list of IIIF image IDs and dump to JSON, for OpenSeaDragon to parse.

geniza.corpus.templatetags.corpus_extras.index(item, i)[source]¶

‘Template filter to allow accessing list element by variable index. Example use:

{{ mylist|index:forloop.counter0 }}

geniza.corpus.templatetags.corpus_extras.is_index_cards(source)[source]¶: For scholarship records list: indicate whether or not a source record relates to Goitein index cards.

geniza.corpus.templatetags.corpus_extras.pgp_urlize(text)[source]¶: Find all instances of “PGPID #” in the passed text, and convert each to a link to the referenced document.

geniza.corpus.templatetags.corpus_extras.process_citation(source)[source]¶: For scholarship records list: handle grouped citations by passing to Footnote.display_multiple class method.

geniza.corpus.templatetags.corpus_extras.querystring_replace(context, **kwargs)[source]¶

Template tag to simplify retaining querystring parameters when paging through search results with active filters. Example use:

<a href="?{% querystring_replace page=paginator.next_page_number %}">

geniza.corpus.templatetags.corpus_extras.shelfmark_wrap(shelfmark)[source]¶: Wrap individual shelfmarks in a span within a combined shelfmark, to avoid wrapping mid-shelfmark

geniza.corpus.templatetags.corpus_extras.translate_url(context, lang_code)[source]¶: Translate current full path into requested language by code.

manage commands¶

class geniza.corpus.management.commands.add_fragment_urls.Command(*args, **options)[source]¶

Takes a CSV of shelfmarks and view URLs and/or IIIF URLs, update corresponding Fragment records in the database with those URLs. Expects CSV headers ‘shelfmark’ and one or both of ‘url’ and ‘iiif_url’

add_arguments(parser)[source]¶: Entry point for subclassed commands to add custom arguments.

add_fragment_urls(row)[source]¶: add view and iiif urls to fragment and save if a match is found for the shelfmark

handle(*args, **options)[source]¶: The actual logic of the command. Subclasses must implement this method.

log_change(fragment, message)[source]¶: create a log entry so there is a record of adding/updating urls

view_to_iiif_url(url)[source]¶: Generate IIIF Manifest URL based on view url, if it can be determined automatically

Importing IIIF manifests to be cached in the database.

class geniza.corpus.management.commands.import_manifests.Command(stdout=None, stderr=None, no_color=False, force_color=False)[source]¶

Import IIIF manifests into the local database.

add_arguments(parser)[source]¶: Entry point for subclassed commands to add custom arguments.

associate_manifests()[source]¶: update fragments with iiif urls to add foreign keys to the new manifests

handle(*args, **kwargs)[source]¶: The actual logic of the command. Subclasses must implement this method.

Script to consolidate redundant or duplicate document records. The script has two modes:

Report mode looks for merge candidates based on duplicate shelfmark combinations, document type, and descriptions. To generate a report of potential merges and actions to be taken:

python manage.py. merge_documents report

Merge mode takes a CSV file in the same format generated by the report and merges documents as specified. There should be one row for each document that is part of any group of documents to be merged. Required fields are:

group id: unique identifier for each set of documents to be merged
action: must be MERGE to merge documents; if not, rows will be ignored
pgpid: document PGPID
role: “primary” for the main document in each group

Example use:

python manage.py. merge_documents merge join-documents.csv

class geniza.corpus.management.commands.merge_documents.Command(stdout=None, stderr=None, no_color=False, force_color=False)[source]¶

Merge documents that are variations of the same joins, based on shelfmark, document type, and description

add_arguments(parser)[source]¶: Entry point for subclassed commands to add custom arguments.

generate_report(report_rows, path)[source]¶: in report mode, generate a csv file of merge candidates

get_merge_candidates()[source]¶: identify merge candidates from the database. Looks for documents associated with multiple fragments, and then groups documents by combination of sorted shelfmarks and document type. Returns a dictionary of candidates. Key is sorted shelfmark + type, value is list of documents in that group.

group_merge_candidates(joins)[source]¶: process candidates identified in get_merge_candidates() to determine which ones should be merged

handle(*args, **options)[source]¶: The actual logic of the command. Subclasses must implement this method.

load_report(path)[source]¶: Load a report .csv file and generate list of merge groups.

merge_group(group_id, group)[source]¶: Takes a group identifier and a list of dicts for the group; there should be one primary record (role = primary) and one or more non-primary records. Each entry should have a pgpid; the status from the primary record will be used as merge rationale. Will find and merge the documents if possible.

class geniza.corpus.management.commands.convert_dates.Command(stdout=None, stderr=None, no_color=False, force_color=False)[source]¶

Report on or update historical date conversions for current data

add_arguments(parser)[source]¶: Entry point for subclassed commands to add custom arguments.

clean_standard_dates()[source]¶: Find documents with standardized dates that are set but don’t match the validation pattern and correct the ones that can be fixed.

handle(*args, **options)[source]¶: The actual logic of the command. Subclasses must implement this method.

report(dated_docs, report_path)[source]¶: Generate a CSV report of documents with dates and converted standard dates

standardize_dates(dated_docs)[source]¶: Reconvert and update documents with historical dates with calendars that support conversion.

class geniza.corpus.management.commands.generate_fixtures.Command(stdout=None, stderr=None, no_color=False, force_color=False)[source]¶

add_arguments(parser)[source]¶: Entry point for subclassed commands to add custom arguments.

handle(*args, **options)[source]¶

Export fixtures from the database, given a list of PGPIDs.

Example usage with PGPIDs 1, 20, and 1234: python manage.py generate_fixtures 1 20 1234

class geniza.corpus.management.commands.export_metadata.Command(stdout=None, stderr=None, no_color=False, force_color=False)[source]¶

add_arguments(parser)[source]¶: Entry point for subclassed commands to add custom arguments.

handle(*args, **options)[source]¶: The actual logic of the command. Subclasses must implement this method.

print(*x, **y)[source]¶: A stdout-friendly method of printing for manage.py commands

class geniza.corpus.management.commands.export_metadata.MetadataExportRepo(local_path=None, remote_url=None, print_func=None, progress=True)[source]¶

Utility class with functionality for generating metadata exports and commiting to git.

default_commit_msg = 'Automated metadata export from PGP'¶: default commit message

export_data(lastrun, sync=False)[source]¶: generate all exports

get_commit_message(modifying_users=None, msg=None)[source]¶: Construct a commit message. Uses the default commit with optional addendum specified by msg parameter, constructs a co-author commit if there are any modifying users, and combines with the base commit message.

get_modifying_users(log_entries)[source]¶: Given a LogEentry queryset, return a User queryset for the set of users who associated with any of the log entries.

get_path_csv(docname)[source]¶: generate export path based on export type

repo_add(filename=None)[source]¶: add modified files to git

repo_commit(modifying_users=None, msg=None)[source]¶: commit changes to local git repository

repo_origin()[source]¶: check if git repository has a remote origin

repo_pull()[source]¶: pull changes from remote

repo_push()[source]¶: push changes to remote git repository

sync_remote()[source]¶: synchronize with remote git repository

Documents and Fragments¶

models¶

dates¶

metadata export¶

views¶

template tags¶

manage commands¶

Princeton Geniza Project

Navigation

Table of Contents