Uniquely decodable n-gram embeddings

We define the family of n-gram embeddings from strings over a finite alphabet into the semimodule double struck N signK. We classify all ζ ∈ double struck N signk that are valid images of strings under such embeddings, as well as all ζ whose inverse image consists of exactly 1 string (we call such ζ uniquely decodable). We prove that for a fixed alphabet, the set of all strings whose image is uniquely decodable is a regular language.

Original languageEnglish
Pages (from-to)271-284
Number of pages14
JournalTheoretical Computer Science
Issue number1-3
StatePublished - 13 Dec 2004
  • Embedding
  • Finite state automaton
  • Finite transducer
  • N-gram
  • String

  • Theoretical Computer Science
  • General Computer Science


