Many of the planning and control problems can be modelled as sequential decision making processes. Reinforcement learning is considered as a reliable framework for solving such problems at the presence of uncertainties. In this regard, having an environment replicating the characteristics of the real system has been a major challenge. In this master thesis, the idea of employing a discrete event simulated model of a production cell in order to train a reinforcement learning scheduler is investigated. The focus is on energy optimization while meeting production rate as a demand. The thesis is conducted as part of a joint project be-tween University West, Chalmers University of Technology, and Volvo Group Trucks Op-erations.