Abstract
We define the family of n-gram embeddings from strings over a finite alphabet into the semimodule double struck N signK. We classify all ζ ∈ double struck N signk that are valid images of strings under such embeddings, as well as all ζ whose inverse image consists of exactly 1 string (we call such ζ uniquely decodable). We prove that for a fixed alphabet, the set of all strings whose image is uniquely decodable is a regular language.
Original language | English |
---|---|
Pages (from-to) | 271-284 |
Number of pages | 14 |
Journal | Theoretical Computer Science |
Volume | 329 |
Issue number | 1-3 |
DOIs | |
State | Published - 13 Dec 2004 |
Externally published | Yes |
Keywords
- Embedding
- Finite state automaton
- Finite transducer
- N-gram
- String
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science