Car Pricing: Machine Learning

Article written in collaboration with the law firm GALEA associés. 


The car insurance market is now at a turning point. In this already saturated market, offering few margins and waiting for autonomous vehicles, the development of new players is increasing competition. In addition, the Hamon law (facilitating the change of insurer) and the growth of pay-as-you-drive offers are making it increasingly difficult to retain customers and good risks in particular. 


In the coming years, motor insurers will therefore continue to refine their ability to individualise their rates, while respecting the principle of risk pooling, which is the basis of insurance, as much as possible. Those who succeed in making each policyholder pay his or her “fair price” will be able to build loyalty among their members while maintaining technical balance. Conversely, less adapted rates will lead to more and more anti-selection. The pricing process therefore appears to be the main lever for technical excellence. 

As part of their work, Galea’s consultants conducted a study to test two ways of improving premium calculation: 

Classically, premium calculation is based on a generalised linear model (GLM). The first idea is to compare the results obtained by this model with those from data science approaches. Do these different machine learning models (such as CART, Random Forest or XGBoost) improve predictions and refine pricing criteria?  The model is enriched by the contribution of new external data, notably from telematics provided by our partner Ellis-Car. Does the integration of these data make it possible to isolate specific behaviours that the historical data available to insurers do not detect?  This study was conducted in partnership with Ellis-Car, which offers a solution for vehicle fleets and private individuals that combines on-board telematics, training and profitability. 

The start-up offers a geolocation and driving profiling solution using a simple smartphone for corporate fleets. Developed in the academic world, many times rewarded and finely tuned by hundreds of millions of driving kilometres, the self-learning algorithms proposed by the startup are capable of detecting in real time any deviation in driving behaviour in relation to all drivers. A system of voice and visual alerts makes it possible to modify driver behaviour in a very significant and beneficial way for the company. These improvements in driving behaviour are also sustainable thanks to the gamification of the user experience. 


The Ellis-Car algorithm is based on a set of several cartographic layers, which are fed by numerous Open Data data: weather, traffic, road visibility, road signs, accident history, etc. These layers are also enriched by any journey made by a driver, with the aim of being able to compare driving behaviour with the entire knowledge base and estimate the risk.

The study : 


Initially, the data science methods are compared with the GLM approach: 

Galea carried out a study on the claims experience of a motor insurer for its civil liability cover. The objective was to model the number and cost of claims of the insured, both by the “classic” GLM approach and by data science methods, and to compare the efficiency of the different models obtained. 

The quality of the models was measured by Root-Mean-Square Error (RMSE). The lower the RMSE, the better the approach. The table below shows the results obtained. The best approach is indicated in red. 


Table 1- Summary of errors on the test basis (RMSE)    

For the prediction of the number of claims, the GLM model proved to be the best. The data science approaches show a similar level of quality, however, the best being Random Forest.  Concerning the cost of claims, the CART approach allows a finer modelling than the GLM.  The analysis carried out shows that, on two examples, data science methods offer performances comparable to those of linear models. In most structures, the determination of automobile rates is today exclusively based on generalised linear GLM models, which it is interesting to challenge using different approaches to determine the most relevant case by case. 

The fact remains, however, that GLM is better understood by many operators and easier for some to insert into their management systems and OAV. 

The main data science approaches : 


The table below presents the different tariff approaches studied. The relevance of these models can be evaluated according to several criteria: the learning speed, the ease of explanation of the algorithm and the interpretability of the results that go with it, the ease of parameterisation of the models and the predictive power of the models. The table below summarises these different notions. 

Reading the table: The more “+” signs a model has for a studied criterion, the more efficient it is. 

This shows that generalised linear models have many advantages, and that the results of methods derived from data science must be much better to supplant them. This is perhaps one of the reasons for the slow take-off of these methods at present. 

Use of external data from telematics : 


In a second step, as mentioned in the introduction, the models were strengthened by integrating data from telematics, again provided by our partner Ellis-Car. Galea used external data, provided by a service provider. These data provide information on the types of road network (percentage of motorways, population density, number of traffic lights or stops, etc.) and on the types of driving (average speed, number of accelerations or stops, etc.) for each geographical area. 

 The idea of the study was to determine to what extent the addition of this public data (thus potentially usable by any insurer) would improve the quality of the tariff models. The figures below compare the predictive capacity of the different models before and after taking into account external data. In all cases, the addition of these data significantly improves the models (see graph below): 


Comparison of the quality of prediction before / after integration of telematics data 

In conclusion, Today, most insurers base their rates on GLM analyses, as learning machine models are generally not widely deployed. However, these methods often prove to be relevant and sometimes even more effective than traditional approaches. In the future, it will be interesting to test the two families of approaches during tariff reviews and to determine on a case-by-case basis which one is the most relevant, by putting into perspective the expected technical gains and the costs resulting from the application of the new methods. 


Concerning the contribution of telematic data, this study shows, unequivocally, that the addition of external data makes it possible to significantly improve the relevance of a tariff and, in particular, to better predict the number of claims – a considerable challenge in motor insurance. 

GALEA & associés and Ellis-Car can assist you in enriching your databases and improving your algorithms through feature engineering. Actuarial experts and data scientists will assist you in carrying out predictive studies with the use of algorithms and their interpretation in all technical fields: creation of innovative products, pricing, provisioning, reinsurance optimisation. 


Source :Tarification automobile à l’aide de modèles de machine learning et apport des données télématiques – Galea Associés (

45-day free trial period


Enjoy a free demo of the Ellis-Car experience!


Ellis-Car. All rights reserved, 2014 - 2022