TY - GEN
T1 - Covert Adversarial Actuators in Finite MDPS
AU - Santi, Edoardo David
AU - Chen, Gongpu
AU - Gunduz, Deniz
AU - Cohen, Asaf
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025/1/1
Y1 - 2025/1/1
N2 - We consider a Markov decision process (MDP) in which actions prescribed by the controller are executed by a separate actuator, which may behave adversarially. At each time step, the controller selects and transmits an action to the actuator; however, the actuator may deviate from the intended action to degrade the control reward. Given that the controller observes only the sequence of visited states, we investigate whether the actuator can covertly deviate from the controller's policy to minimize its reward without being detected. We establish conditions for covert adversarial behavior over an infinite time horizon and formulate an optimization problem to determine the optimal adversarial policy under these conditions. Additionally, we derive the asymptotic error exponents for detection in two scenarios: (1) a binary hypothesis testing framework, where the actuator either follows the prescribed policy or a known adversarial strategy, and (2) a composite hypothesis testing framework, where the actuator may employ any stationary policy. For the latter case, we also propose an optimization problem to maximize the adversary's performance.
AB - We consider a Markov decision process (MDP) in which actions prescribed by the controller are executed by a separate actuator, which may behave adversarially. At each time step, the controller selects and transmits an action to the actuator; however, the actuator may deviate from the intended action to degrade the control reward. Given that the controller observes only the sequence of visited states, we investigate whether the actuator can covertly deviate from the controller's policy to minimize its reward without being detected. We establish conditions for covert adversarial behavior over an infinite time horizon and formulate an optimization problem to determine the optimal adversarial policy under these conditions. Additionally, we derive the asymptotic error exponents for detection in two scenarios: (1) a binary hypothesis testing framework, where the actuator either follows the prescribed policy or a known adversarial strategy, and (2) a composite hypothesis testing framework, where the actuator may employ any stationary policy. For the latter case, we also propose an optimization problem to maximize the adversary's performance.
UR - https://www.scopus.com/pages/publications/105021982077
U2 - 10.1109/ISIT63088.2025.11195349
DO - 10.1109/ISIT63088.2025.11195349
M3 - Conference contribution
AN - SCOPUS:105021982077
T3 - IEEE International Symposium on Information Theory - Proceedings
BT - ISIT 2025 - 2025 IEEE International Symposium on Information Theory, Proceedings
PB - Institute of Electrical and Electronics Engineers
T2 - 2025 IEEE International Symposium on Information Theory, ISIT 2025
Y2 - 22 June 2025 through 27 June 2025
ER -