While scheduling is extensively studied by operations research and heuristic methods, both paradigms face challenges to include uncertainties into the problem and to be used on a realtime basis. Reinforcement learning is a well-documented approach for solving various planning and control problems. In this thesis, application of reinforcement learning for production scheduling is studied. Inverse scheduling is defined as the problem of finding the number of jobs and duration times to meet an input capacity demand. A reinforcement learning framework is proposed for solving inverse scheduling while optimizing energy consumption. Discrete event simulation is used to model the environment while involving uncertainties. Due to the inherent characteristics of the problem in hand, deep neural networks are applied for approximating the policy. The trained agent can be used both for scheduling (production planning) and rescheduling (production control). A production cell is studied as a testbed, and using the trained agent, a six percent reduction in the energy level is observed in simulation following a proposed energy signature. Strengths and weaknesses of the suggested framework are presented, and essential features identifying its success are discussed. To generalize the approach to large scale problems, further modifications are needed both at the side of modelling and implementation. The thesis is conducted as part of a joint project between University West, Chalmers University of Technology, and Volvo Group Trucks Operations, named SmoothIT.