Uniquely decodable n-gram embeddings

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

We define the family of n-gram embeddings from strings over a finite alphabet into the semimodule double struck N signK. We classify all ζ ∈ double struck N signk that are valid images of strings under such embeddings, as well as all ζ whose inverse image consists of exactly 1 string (we call such ζ uniquely decodable). We prove that for a fixed alphabet, the set of all strings whose image is uniquely decodable is a regular language.

Original languageEnglish
Pages (from-to)271-284
Number of pages14
JournalTheoretical Computer Science
Volume329
Issue number1-3
DOIs
StatePublished - 13 Dec 2004
Externally publishedYes

Keywords

  • Embedding
  • Finite state automaton
  • Finite transducer
  • N-gram
  • String

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science (all)

Fingerprint

Dive into the research topics of 'Uniquely decodable n-gram embeddings'. Together they form a unique fingerprint.

Cite this