annotations.tasks module¶

We should probably write some documentation.

annotations.tasks.extract_pdf_file(uploaded_file)[source]¶

Extract a PDF file and return its content

Parameters:	uploaded_file (InMemoryUploadedFile) – The uploaded PDF file
Returns:
Return type:	Content of the PDF file as a string

annotations.tasks.extract_text_file(uploaded_file)[source]¶

Extract the text file, and return its content

Parameters:	uploaded_file (InMemoryUploadedFile) – The uploaded text file
Returns:
Return type:	Content of the text file as a string

annotations.tasks.handle_file_upload(request, form)[source]¶

Handle the uploaded file and route it to corresponding handlers

Parameters:	request (django.http.requests.HttpRequest) – form (django.forms.Form) – The form with uploaded content

annotations.tasks.retrieve(repository, resource)[source]¶

Get the content of a resource.

Parameters:	repository (`Repository`) – resource (unicode or int) – Identifier by which the resource can be retrieved from `repository`.
Returns:	content
Return type:	unicode

annotations.tasks.save_text_instance(tokenized_content, text_title, date_created, is_public, user, uri=None)[source]¶

This method creates and saves the text instance based on the parameters passed

Parameters:	tokenized_content (String) – The tokenized text text_title (String) – The title of the text instance date_created (Date) – The date to be associated with text instance is_public (Boolean) – Whether the text content is public or not user (User) – The user who saved the text content

annotations.tasks.scrape(url)[source]¶

Retrieve text content from a website.

Parameters:	url (unicode) – Location of web resource.
Returns:	textData – Metadata and text content retrieved from `url`.
Return type:	dict

annotations.tasks.tokenize(content, delimiter=u' ')[source]¶

In order to annotate a text, we must first wrap “annotatable” tokens in <word></word> tags, with arbitrary IDs.

Parameters:	content (unicode) – delimiter (unicode) – Character or sequence by which to split and join tokens.
Returns:	tokenizedContent
Return type:	unicode

VogonWeb