annotations.tasks module

We should probably write some documentation.

annotations.tasks.extract_pdf_file(uploaded_file)[source]

Extract a PDF file and return its content

Parameters:uploaded_file (InMemoryUploadedFile) – The uploaded PDF file
Returns:
Return type:Content of the PDF file as a string
annotations.tasks.extract_text_file(uploaded_file)[source]

Extract the text file, and return its content

Parameters:uploaded_file (InMemoryUploadedFile) – The uploaded text file
Returns:
Return type:Content of the text file as a string
annotations.tasks.get_manager(name)[source]
annotations.tasks.handle_file_upload(request, form)[source]

Handle the uploaded file and route it to corresponding handlers

Parameters:
  • request (django.http.requests.HttpRequest) –
  • form (django.forms.Form) – The form with uploaded content
annotations.tasks.retrieve(repository, resource)[source]

Get the content of a resource.

Parameters:
  • repository (Repository) –
  • resource (unicode or int) – Identifier by which the resource can be retrieved from repository.
Returns:

content

Return type:

unicode

annotations.tasks.save_text_instance(tokenized_content, text_title, date_created, is_public, user, uri=None)[source]

This method creates and saves the text instance based on the parameters passed

Parameters:
  • tokenized_content (String) – The tokenized text
  • text_title (String) – The title of the text instance
  • date_created (Date) – The date to be associated with text instance
  • is_public (Boolean) – Whether the text content is public or not
  • user (User) – The user who saved the text content
annotations.tasks.scrape(url)[source]

Retrieve text content from a website.

Parameters:url (unicode) – Location of web resource.
Returns:textData – Metadata and text content retrieved from url.
Return type:dict
annotations.tasks.tokenize(content, delimiter=u' ')[source]

In order to annotate a text, we must first wrap “annotatable” tokens in <word></word> tags, with arbitrary IDs.

Parameters:
  • content (unicode) –
  • delimiter (unicode) – Character or sequence by which to split and join tokens.
Returns:

tokenizedContent

Return type:

unicode