TY - JOUR
T1 - Hallucination Alleviation-based smart decision for early soybean cultivation in greenhouse and field scenarios
AU - Zhang, Chuankun
AU - Jiang, Dan
AU - Wan, Tianyu
AU - Rao, Yuan
AU - Jin, Xiu
AU - Wang, Tan
AU - Wang, Xiaobo
AU - Li, Jiajia
AU - Zhang, Wu
AU - Shao, Xing
N1 - Publisher Copyright:
© 2025 Elsevier B.V.
PY - 2025/11/1
Y1 - 2025/11/1
N2 - The selection of optimal cultivation techniques at different growth stages of crops has a significant influence on the monitoring of plant growth and the potential for yield stability. The advent of Multimodal Large Language Models (MLLMs) presents a promising avenue for the formulation of beneficial cultivation decisions. However, the pervasive issue of hallucination within MLLMs represents a substantial obstacle to their application in agricultural scenarios. Particularly in the context of smart soybean cultivation, the effectiveness of the prompt may significantly influence the occurrence of hallucinations, which could lead to inappropriate or even detrimental agricultural practices. In this paper, we propose an innovative method of multimodal prompt fine-tuning, which comprises both text and visual prompts, with the aim of offering one practical solution to the issue of MLLMs’ hallucinations while avoiding the need for high training costs of MLLMs. Specifically, the context-aware prompt is presented as an effective way of refining the relevance of the generated agricultural content. This is accomplished through the establishment of contextual keywords relevant to soybean cultivation, the formulation of detailed queries about these concepts, the validation of responses with visual knowledge for self-criticism mechanisms, the inclusion of in-context examples, and in-context learning. Additionally, the proposed method introduces one multimodal collaboration strategy, which provides soybean localization and contour information at early growth stages for facilitating fine-grained analysis. Extensive experimental results demonstrate that the proposed multimodal prompt fine-tuning method significantly alleviates hallucinations in MLLMs. Compared with the text-only prompt baseline, the proposed method reduces the hallucination rates by 54.17%, thus enabling open-source models to match comparable performance to closed-source models in soybean cultivation decision-making in terms of hydroponics, weed control, and pest management during early soybean growth stages. Therefore, it can be concluded that the proposed method offers a promising solution for alleviating hallucination in practical agricultural applications while reducing the computational burden required for MLLMs to generate appropriate crop cultivation recommendations.
AB - The selection of optimal cultivation techniques at different growth stages of crops has a significant influence on the monitoring of plant growth and the potential for yield stability. The advent of Multimodal Large Language Models (MLLMs) presents a promising avenue for the formulation of beneficial cultivation decisions. However, the pervasive issue of hallucination within MLLMs represents a substantial obstacle to their application in agricultural scenarios. Particularly in the context of smart soybean cultivation, the effectiveness of the prompt may significantly influence the occurrence of hallucinations, which could lead to inappropriate or even detrimental agricultural practices. In this paper, we propose an innovative method of multimodal prompt fine-tuning, which comprises both text and visual prompts, with the aim of offering one practical solution to the issue of MLLMs’ hallucinations while avoiding the need for high training costs of MLLMs. Specifically, the context-aware prompt is presented as an effective way of refining the relevance of the generated agricultural content. This is accomplished through the establishment of contextual keywords relevant to soybean cultivation, the formulation of detailed queries about these concepts, the validation of responses with visual knowledge for self-criticism mechanisms, the inclusion of in-context examples, and in-context learning. Additionally, the proposed method introduces one multimodal collaboration strategy, which provides soybean localization and contour information at early growth stages for facilitating fine-grained analysis. Extensive experimental results demonstrate that the proposed multimodal prompt fine-tuning method significantly alleviates hallucinations in MLLMs. Compared with the text-only prompt baseline, the proposed method reduces the hallucination rates by 54.17%, thus enabling open-source models to match comparable performance to closed-source models in soybean cultivation decision-making in terms of hydroponics, weed control, and pest management during early soybean growth stages. Therefore, it can be concluded that the proposed method offers a promising solution for alleviating hallucination in practical agricultural applications while reducing the computational burden required for MLLMs to generate appropriate crop cultivation recommendations.
KW - Hallucination alleviation
KW - MLLMs
KW - Prompt engineering
KW - Smart decision
KW - Soybean growth
UR - https://www.scopus.com/pages/publications/105011963652
U2 - 10.1016/j.compag.2025.110811
DO - 10.1016/j.compag.2025.110811
M3 - Article
AN - SCOPUS:105011963652
SN - 0168-1699
VL - 238
JO - Computers and Electronics in Agriculture
JF - Computers and Electronics in Agriculture
M1 - 110811
ER -