A sophisticated trie-based autocomplete implementation in Java that builds sentence completions from inserted text data with support for arbitrary prefix queries.
- Trie Data Structure: Uses a prefix tree to store and retrieve sentence completions efficiently
- Multi-word Support: Handles complete sentences and phrases, not just individual words
- Flexible Prefix Search: Supports searching with any prefix, not limited to single letters
- Case Insensitive: All input is normalized to lowercase for consistent matching
- Memory Efficient: Shares common prefixes between different sentences
- Optimized Performance: HashMap-based child storage for O(1) lookups
- Comprehensive Test Data: Includes 200+ sample sentences for robust testing
AutoComplete autoComplete = new AutoComplete();
// Insert training sentences
autoComplete.insert("I love programming");
autoComplete.insert("I love ice cream");
autoComplete.insert("I love chocolate cake");
// Get completions for any prefix
autoComplete.discover("i love"); // Returns all sentences starting with "i love"Creates a new autocomplete instance with an empty trie.
Adds a sentence to the autocomplete trie.
- Parameters:
sentence- The sentence to add (automatically converted to lowercase) - Example:
autoComplete.insert("Hello world how are you"); - Complexity: O(L) where L is the sentence length in words
Finds and prints all sentences starting with the given prefix.
- Parameters:
prefix- The prefix to search for (supports multi-word prefixes) - Output: Prints list of matching sentences to console
- Example:
autoComplete.discover("I love");finds all sentences starting with "I love" - Complexity: O(P + N) where P is prefix length and N is number of results
- Insert: O(L) where L is the sentence length in words
- Discover: O(P + N) where P is prefix length and N is number of matching completions
- Space: O(T) where T is the total number of unique word sequences
| Feature | Implementation | Benefit |
|---|---|---|
| Child Lookup | HashMap | O(1) vs O(C) linear search |
| Case Handling | Lowercase normalization | Consistent matching |
| Prefix Matching | Multi-word support | Flexible query capability |
| Memory Usage | Shared prefixes | Efficient storage |
- Input normalization: Convert sentence to lowercase
- Word splitting: Split sentence into individual words
- Trie traversal: Navigate/create nodes for each word in sequence
- Path creation: Build complete path from root to leaf
- Prefix parsing: Split search prefix into words
- Trie navigation: Follow prefix path in the trie
- Subtree traversal: Recursively collect all completions from matching node
- Result formatting: Combine prefix with found completions
Root (HashMap)
├── "i" → Node
│ └── children (HashMap)
│ ├── "love" → Node
│ │ └── children (HashMap)
│ │ ├── "programming" → Node (leaf)
│ │ ├── "ice" → Node
│ │ │ └── children (HashMap)
│ │ │ └── "cream" → Node (leaf)
│ │ └── "chocolate" → Node
│ │ └── children (HashMap)
│ │ └── "cake" → Node (leaf)
│ ├── "want" → Node
│ │ └── children (HashMap)
│ │ ├── "to" → Node
│ │ │ └── children (HashMap)
│ │ │ ├── "travel" → Node (leaf)
│ │ │ └── "learn" → Node
│ │ │ └── children (HashMap)
│ │ │ └── "java" → Node (leaf)
│ │ └── "pizza" → Node
│ │ └── children (HashMap)
│ │ └── "for" → Node
│ │ └── children (HashMap)
│ │ └── "dinner" → Node (leaf)
│ └── "am" → Node
│ └── children (HashMap)
│ ├── "learning" → Node
│ │ └── children (HashMap)
│ │ └── "algorithms" → Node (leaf)
│ └── "happy" → Node
│ └── children (HashMap)
│ └── "today" → Node (leaf)
└── "hello" → Node
└── children (HashMap)
├── "world" → Node
│ └── children (HashMap)
│ └── "how" → Node
│ └── children (HashMap)
│ └── "are" → Node
│ └── children (HashMap)
│ └── "you" → Node (leaf)
└── "there" → Node
└── children (HashMap)
└── "friend" → Node (leaf)
The implementation includes comprehensive test data across multiple categories:
Personal Expressions
- "I love..." (programming, ice cream, chocolate cake, etc.)
- "I want..." (to travel, to learn Java, pizza for dinner, etc.)
- "I am..." (learning algorithms, happy today, coding right now, etc.)
- "I think..." (this is awesome, programming is fun, etc.)
Questions & Interactions
- "How..." (are you doing, can I help, long will this take, etc.)
- "What..." (is your name, time is it, do you think, etc.)
- "Where..." (is the library, are you going, can I find help, etc.)
- "When..." (will this be ready, can we start, did you arrive, etc.)
Greetings & Daily Life
- "Hello..." (world, there friend, everyone welcome, etc.)
- "The..." (weather is nice, cat is sleeping, sun is shining, etc.)
- "Today..." / "Tomorrow..." / "Life..." / "Love..." expressions
// Input: autoComplete.discover("i love");
// Output:
[i love programming, i love ice cream, i love chocolate cake, i love sunny days,
i love winter snow, i love playing guitar, i love watching movies, i love reading novels,
i love cooking pasta, i love morning coffee, i love my mom yay, i love my pops]
// Input: autoComplete.discover("hello");
// Output:
[hello world how are you, hello there friend, hello everyone welcome,
hello beautiful morning, hello my dear friend, hello from the other side,
hello sunshine good morning, hello stranger nice day, hello again old friend,
hello lovely people]autoComplete.insert("I LOVE UPPERCASE");
autoComplete.insert("i love lowercase");
autoComplete.insert("I Love Mixed Case");
autoComplete.discover("i love"); // Finds all three variantsautoComplete.discover("i want to"); // Finds "i want to travel", "i want to learn java", etc.
autoComplete.discover("hello there"); // Finds "hello there friend"
autoComplete.discover("the weather"); // Finds "the weather is nice today", "the weather is terrible"- 200+ sample sentences across diverse topics
- Multiple prefix patterns (I, Hello, The, How, What, etc.)
- Various sentence structures and lengths
- Real-world conversational data
- Memory: Efficient sharing of common prefixes reduces storage overhead
- Speed: HashMap-based lookups provide consistent O(1) child access
- Flexibility: Supports arbitrary prefix lengths without performance degradation
- Lowercase normalization: Eliminates case sensitivity issues
- Recursive traversal: Efficient depth-first search for completions
- StringBuilder usage: Minimizes string concatenation overhead
- HashMap storage: O(1) average case for node lookups
- Java 8 or higher
- No external dependencies
- Minimum 256MB heap space for large datasets
The main method includes comprehensive test data with:
- 100+ "I" prefix variations across different sentence patterns
- Multiple question word patterns (How, What, Where, When, Who, Why)
- Common conversation starters (Hello, Today, Tomorrow, Life, Love)
- Technical content (Programming, Learning related phrases)
- Case sensitivity tests (UPPERCASE, lowercase, Mixed Case)
MIT License - feel free to use and modify as needed.