Caches are commonly used in DSP architecture, as an alternative for fast on-chip memory, to improve performance by reducing the average memory access latencies. In this paper we propose a new approach for instruction cache performance enhancement, utilizing a-priori knowledge of the program flow to improve the common used LRU replacement algorithm. To improve replacement decision in set-associative caches, we develop a new profile-based algorithm that predicts which code-block will be reused. The proposed algorithm enables the user to affect the cache performance by combining existing LRU hardware and cache dedicated software commands. Simulation results on Starcore's SC140e DSP platform show 2-5% cycle times improvement over the LRU policy for MPEG4 application. Further significant improvement can be achieved when using memories with longer access latencies.