self-driving car AI
Self-driving cars are poised to become the most impactful technology of the 21st century. Yet, to work on them, you must be in one of the few companies who have taken on the task. Even then, you would be working in isolation from other groups doing the same thing. This does not have to be the case. Linux, PostGres, and ROS are just a few examples of a different approach -- where companies, researchers, and individuals work together on foundational technology to provide a common platform for bringing better products to market. DeepDrive aims to be such a platform for self-driving cars powered by deep learning. By combining a highly realistic driving simulation with hooks for perception and control, DeepDrive gives anyone with a computer the opportunity to build better self-driving cars.
The current generation of self-driving cars struggles in an urban setting at speeds above 25mph, requiring people to be ever-ready at the wheel. This problem is aggravated by the high risk of getting into an accident and the resultant limitation of testing new approaches on real roads. Modern video-games like GTAV, however, present a world where you can test self-driving car AI’s in large complex urban areas replete with realistic roads, weather, pedestrians, cyclists, and vehicles with zero risk or cost. Another important advantage of this type of simulation is that you can quickly run the car through a barrage of safety critical situations, some of which may only occur every several million miles in reality. This dramatically decreases the time involved in properly vetting cars while giving transparency to the decisions different types of AI's will make.
The high fidelity and vast open world provided by GTAV also presents the most complex virtual RL environment to date for testing sensorimotor AI. This allows a new level of testing for safety in AI as existing environments don’t offer the same opportunities for reward hacking, distributional shift, and negative side effects. Finally, developing self-driving cars out in the open provides a level of transparency not usually seen in AI applications and in an area where it is crucial to have visibility into both the safety and correctness of the system.
An initial 8-layer neural net with the AlexNet architecture is being made available as well as the dataset it was trained on. Training was done on raw image input from a forward mounted camera regressed against steering, throttle, yaw, and forward-speed control values produced by an in-game AI. This model is able to steer the car to stay in the lane, stop for other cars, and works well in a variety of weather and lighting conditions as shown below:
Prior work on extracting depth buffer for LIDAR simulation:
Work is in progress on an amazon machine image as well as integration with OpenAI Gym to facilitate easy experimentation in the simulator.
Please let me know what other types of setups and environments you'd like support for at craigdeepdrive.io.
Tips and Tricks
- Adding examples of course correction to the training data is crucial. NVIDIA does this by simulating rotation of real-world images, and ALVINN created its own simulation for adding variety to its training. Since we are already in simulation, course correction can be added by stopping recording, steering the car off course, and recording actions and images taken during course correction. This was done at three levels of severity. Levels one and two consisted of driving the car with a previous model for one and two seconds respectively, then recording corrective actions taken by the in-game AI. The most severe level consisted of performing a random action (hard right, hard left, brake, or strong accelerate) and relinquishing control to the in-game AI after 230ms.
- Training takes less than one day on a GeForce 980 starting from pretrained imagenet weights using 0.002 as the base learning rate and decaying by 10x when performance plateaus. See the model for all hyperparameters used.
- Don’t mirror images without mirroring targets. Horizontally asymmetric targets like steering, rotational velocity, and direction need to be negated in order to correlate with mirrored images. Mirroring was not done in the initial release, but is likely a good direction for improved generalization.
- Predicting steering, speed, etc.. ahead-of-time works by adding future targets to the last fully connected layer, but causes more overfitting (e.g. 2x worst test performance with 3 frames, or 775ms, of advance data). So if you predict the future, it's probably a good idea to add more regularization. Only the current desired state of the vehicle is output by the intial model, however support for arbitrary prediction duration is provided.
- Feeding the current steering, speed, etc… as input to the net causes overreliance on these inputs vs. the images, so follow advice from ALVINN and feed random values to these units during early stages of training so that the image garners more influence over the output. This was also not done in the initial release, however examples are provided on how to do so.
- DeepDriving at Princeton suggests that detecting lane markers and other cars for determining the desired steering and throttle is a more tractable strategy than directly inferring steering and throttle from the image. It’s possible to get lane and nearby car location from GTAV for this approach although it is not yet implemented.
- The car currently has no preference for turning left or right and will wobble until a certain direction is obvious. To combat this, adversarial manipulation of input pixels could be used to perturbate activations towards a desired duration.
- Reinforcement learning is a very promising direction for future improvement. A DQN was attempted before supervised learning, but the agent developed unsatisfactorily circuitous paths to a goal given on-road/off-road and distance based rewards. Using supervised pre-training and more precise rewards will likely improve this.