With the rapid development of the social economy, consumer demand is evolving towards diversification. To satisfy market demand, enterprises tend to improve competitiveness by providing differentiated products. How to price differentiated products becomes a hot topic. Traditionally, customers preferences are assumed to be independent and identically distributed. With a known distribution, companies can easily make pricing decisions for differentiated products. However, such an assumption may be invalid in practice, especially for rapidly updating products. In this paper, a dynamic pricing policy for differentiated products with incomplete information is developed. An adaptive multi-armed bandit algorithm based on reinforcement learning is proposed to balance exploration and exploitation. Numerical examples show that the frequency of price adjustment affects the total profit significantly. Specifically, the more chances to adjust the price, the higher the total profit. Furthermore, experiments show that the dynamic pricing policy proposed in this paper outperforms other algorithms, such as Softmax and UCB1.