Continuous Control

Project #2 for Udacity Deep Reinforcement Learning course.

Project Details

This project uses the Unity Reacher environment.

In this environment, a double-jointed arm can move to target locations. A reward of +0.1 is provided for each step that the agent's hand is in the goal location. Thus, the goal of your agent is to maintain its position at the target location for as many time steps as possible.

The observation space consists of 33 variables corresponding to position, rotation, velocity, and angular velocities of the arm. Each action is a vector with four numbers, corresponding to torque applicable to two joints. Every entry in the action vector should be a number between -1 and 1.

This project uses the distributed (second) version that contains 20 identical agents, each with its own copy of the environment. The environment is considered solved when the average (over the last 100 episodes) score across the agents is at least 30.

Getting Started

To set up your python environment to run the code in this repository, follow the instructions below.

Create (and activate) a new environment with Python 3.6.

 - __Linux__ or __Mac__:
 ```bash
 conda create --name drlnd python=3.6
 source activate drlnd
 ```
 - __Windows__:
 ```bash
 conda create --name drlnd python=3.6
 activate drlnd
 ```

Clone the repository and navigate to the python/ folder. Then, install several dependencies.

git clone https://github.com/udacity/deep-reinforcement-learning.git
cd deep-reinforcement-learning/python
pip install .

Download the Reacher environment from one of the links below. Select the environment that matches your operating system:

Version 2: Twenty (20) Agents
- Linux: click here
- Mac OSX: click here
- Windows (64-bit): click here

Place the file in this directory, and unzip (or decompress) the file.

Instructions

Running the code in the repository and training the agent is straightforward. The Continuous_Control.ipynb Jupyter notebook should be used to run the code in the repository and train the agent.

I trained the policy on a MacBook Pro. The policy is implemented using PyTorch and the model is trained on CPU (only).

The PPO algorithm was used to train the policy. See the report file for a detailed description of the model and the training process.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
Continuous_Control.ipynb		Continuous_Control.ipynb
README.md		README.md
Report.pdf		Report.pdf
checkpoint.pth		checkpoint.pth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Continuous Control

Project Details

Getting Started

Instructions

About

Uh oh!

Releases

Packages

Languages

troyjc/Continuous_Control

Folders and files

Latest commit

History

Repository files navigation

Continuous Control

Project Details

Getting Started

Instructions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages