Skip to content

Intermediate Text Analysis: Annotation #8

@salekinsirajus

Description

@salekinsirajus

Problem Statement
The goal of this task is to come up with an algorithm and implementation to analyze the entirety of the content. Note that the deliverable is not a perfectly annotated resume, as that is an almost impossible task. Rather we want to focus on annotating 80% or more of the content with reasonable accuracy.

Files Changed
Implement this in the backend - similar to how issue #4 is done

Approaches
There are multiple ways to attack this problem; please consider the pros and cons of all

  • Search and Find: you look for specific things in the content, and run multiple passes. Use the schema as a catalog for things you are searching
  • Identify as You Go: with this approach, for every word (or set of words) you encounter, you will attempt to classify it based on your catalog. Similar to the other approach, you can run multiple passes.
  • Combination: the strategy could incorporate both of the aforementioned approaches.

Note
This issue needs a substantial research/whiteboarding session before implementation.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions