annotations.tasks module¶
We should probably write some documentation.
-
annotations.tasks.
extract_pdf_file
(uploaded_file)[source]¶ Extract a PDF file and return its content
Parameters: uploaded_file (InMemoryUploadedFile) – The uploaded PDF file Returns: Return type: Content of the PDF file as a string
-
annotations.tasks.
extract_text_file
(uploaded_file)[source]¶ Extract the text file, and return its content
Parameters: uploaded_file (InMemoryUploadedFile) – The uploaded text file Returns: Return type: Content of the text file as a string
-
annotations.tasks.
handle_file_upload
(request, form)[source]¶ Handle the uploaded file and route it to corresponding handlers
Parameters: - request (django.http.requests.HttpRequest) –
- form (django.forms.Form) – The form with uploaded content
-
annotations.tasks.
retrieve
(repository, resource)[source]¶ Get the content of a resource.
Parameters: - repository (
Repository
) – - resource (unicode or int) – Identifier by which the resource can be retrieved from
repository
.
Returns: content
Return type: unicode
- repository (
-
annotations.tasks.
save_text_instance
(tokenized_content, text_title, date_created, is_public, user, uri=None)[source]¶ This method creates and saves the text instance based on the parameters passed
Parameters: - tokenized_content (String) – The tokenized text
- text_title (String) – The title of the text instance
- date_created (Date) – The date to be associated with text instance
- is_public (Boolean) – Whether the text content is public or not
- user (User) – The user who saved the text content
-
annotations.tasks.
scrape
(url)[source]¶ Retrieve text content from a website.
Parameters: url (unicode) – Location of web resource. Returns: textData – Metadata and text content retrieved from url
.Return type: dict
-
annotations.tasks.
tokenize
(content, delimiter=u' ')[source]¶ In order to annotate a text, we must first wrap “annotatable” tokens in <word></word> tags, with arbitrary IDs.
Parameters: - content (unicode) –
- delimiter (unicode) – Character or sequence by which to split and join tokens.
Returns: tokenizedContent
Return type: unicode