Multivariate classification of cannabis chemovars based on their terpene and cannabinoid profiles

Matan Birenboim, Daniel Chalupowicz, Dalia Maurer, Shimon Barel, Yaira Chen, Elazar Fallik, Tarin Paz-Kagan, Tal Rapaport, Alona Sadeh, David Kengisbuch, Jakob A. Shimshoni

Research output: Contribution to journalArticlepeer-review

12 Scopus citations


Cannabis is used to treat various medical conditions, and lines are commonly classified according to their total concentrations of Δ9-tetrahydrocannabinol (THC) and cannabidiol (CBD). Based on their ratio of total THC to total CBD, cannabis cultivars are commonly classified into high-THC, high-CBD, and hybrid classes. While cultivars from the same class have similar compositions of major cannabinoids, their levels of other cannabinoids and their terpene compositions may differ substantially. Therefore, a more comprehensive and accurate classification of medicinal cannabis cultivars, based on a large number of cannabinoids and terpenes is needed. For this purpose, three different chemometric-based classification models were constructed using three sets of chemical profiles. We examined those models to determine which provides the most accurate “chemovar” classification. This was done by analyzing profiles of cannabinoids, terpenes, and the combination of these substances using the partial least square-discriminant analysis multivariate (PLS-DA) technique. The chemical profiles were selected from the three major classes of medicinal cannabis that are most commonly prescribed to patients in Israel: high-THC, high-cannabigerol (CBG), and hybrid. We studied the correlations between cannabinoids and terpenes to identify major bio-indicators representing the plant's terpene and cannabinoid content. All three PLS-DA models provided highly accurate classifications, utilizing six to nine latent variables with an overall accuracy ranging from 2 to 11% CV. The PLS-DA model applied to the combined cannabinoid-and-terpene profile did the best job of differentiating between the chemovars in terms of misclassification error, sensitivity, specificity, and accuracy. The combined cannabinoid-and-terpene PLS-DA profile had cross-validation and prediction misclassification errors of 4% and 0%, respectively. This is the first study to demonstrate the highly accurate classification of samples of medicinal cannabis based on their cannabinoid and terpene profiles, as compared to cannabinoid profiles alone. Furthermore, our correlation analysis indicated that 11 cannabinoids and terpenes might serve as bio-indicators for 32 different active compounds. These findings suggest that the use of multivariate statistics could assist in breeding studies and serve as a tool for minimizing the mislabeling of cannabis inflorescences.

Original languageEnglish
Article number113215
StatePublished - 1 Aug 2022
Externally publishedYes


  • Cannabaceae
  • Cannabinoids
  • Cannabis sativa
  • Chemical composition
  • Chemovar classification
  • Correlation matrix
  • Partial least square-discriminant analysis (PLS-DA)
  • Terpenes

ASJC Scopus subject areas

  • Horticulture
  • Molecular Biology
  • Biochemistry
  • Plant Science


Dive into the research topics of 'Multivariate classification of cannabis chemovars based on their terpene and cannabinoid profiles'. Together they form a unique fingerprint.

Cite this