Cred Protocol: Building a decentralized credit score
Summary
We built a credit score based on the activity of accounts borrowing via the Aave v2 liquidity protocol. The score is based on a machine learning model that takes as inputs the behaviour and attributes of Aave v2 users over time and predicts their propensity to be eligible for liquidation on a loan in the near future (liquidation being a good proxy for default). The results have been very promising with the model able to predict with strong likelihood whether a random position will be eligible for liquidation in the next 90 days. In this post, we describe some of the early work building one of the first DeFi credit scores and explain the features of the current model.
This model is powering our Cred Score API. Apply for Beta access and view your own Cred Score here.
As a service to the community, we’ve open-sourced a historical Aave Health Factor dataset.
Note: This version is based on the same data as our high-fidelity dataset yet on one-week intervals. The repository includes download instructions, as well as further information regarding data provenance, contents and quality.
Context
Cred Protocol’s mission is to bring DeFi lending to 1 billion people. Many individuals around the world are unable to access basic financial resources. We believe DeFi will one day provide access to these resources to anyone with an internet connection. Key to its future success, however, is the quantification of risk at scale. To help achieve this, Cred Protocol has created one of the first credit scores based on open, fair and transparent blockchain data.
In this post we describe how we built our first credit scoring model for users of the Aave v2 liquidity protocol. For further details, our corresponding research paper is published here.
Early data ingestion and credit modelling
The first step on this journey was to ingest and model Aave liquidity protocol data. We focused on Aave v2 as the protocol with the highest TVL.
At its core, the Aave liquidity protocol can be thought of as pools of assets. Users participate by depositing assets into these pools or borrowing assets from them. Each asset has its own pool and the interest rate model incentivizes depositors to provide liquidity to meet borrower demand. Therefore, to develop a risk model, we started with the ingesting obvious events such as: borrow, deposit, and liquidation (default) events. We used these events alongside account-level attributes to build the first few iterations of a DeFi credit score.
Early results were promising. However, they did not i) utilize the time series nature of the data we had at hand to model account behaviour, ii) include debt/collateral asset types and iii) account for interest accrued. By stacking these features into a model we were missing key, important account behaviour such as: What does this account actively do when their position gets close to liquidation? Does this account routinely take out stable coin debt or non-stable coin debt that will fluctuate in price? Does the account give themselves an adequate collateral buffer against sharp market downturns?
Refined credit model using historical health factors
With the learnings from early attempts and these new important questions to answer, we realized it was necessary to model an account’s management of debt over time. To do this, we would model the collateral/debt composition and health factors of all Aave v2 accounts at 15 minute intervals from their 1st Aave interaction until present: resulting in a dataset of 360m observations. Health factor is the statistic central to maintaining Aave protocol solvency and removing bad debt (i.e. under collateralized debt) from the system. This statistic is defined as follows:
When a user’s health factor drops below 1 they are eligible for liquidation. This can occur when the supplied collateral (numerator) does not cover their outstanding borrowed balance (denominator). Their debt is repaid by another user who in turn receives (liquidates) a corresponding amount of their collateral plus an incentivization bonus. This usually happens when their collateral decreases in value (e.g. in a market down-turn) or their debt increases in value. Liquidation events are crucial to maintaining protocol solvency. More details on this statistic can be found in our research paper or the Aave docs.
Developing this granular dataset for 35K Aave accounts since Aave v2 inception was a complex exercise (which we will leave for another post!). The below image shows a high-level view of the data pipeline that powers this dataset.
This dataset enabled a more sophisticated classifier of the probability that an account’s position will be eligible for liquidation (HF < 1) in the 90 days after it is opened. Note: Many of the most active accounts interacting with Aave v2 were smart contracts that held positions for < 1 day. We removed these from our dataset to focus on accounts that more closely resembled ‘genuine’, human borrowing behaviour.
Some of the features included in this classifier were:
- Account age
- Aggregations of the time series of its historical health factors
- Interactions with the Aave protocol
- The types of assets it borrows and keeps as collateral
Results
The results showed that our “tree-based” classifier (purple line in the ROC chart below) was the strongest predictor of eligibility for liquidation (and by proxy, creditworthiness). We evaluated our model by measuring the area under the receiver operating curve (AUC).
This statistic quantifies, classification-threshold irrespective, the model’s ability to score positive examples of eligibility for liquidation (HF < 1) more highly than negative examples. A positive example is a position that we know has HF < 1 in the following 90 days while a negative example is one that does not. This gives us confidence that our system works well at predicting future propensity of eligibility for liquidation and, by proxy, creditworthiness of accounts interacting with Aave v2.
Comparatively, our single-feature baseline model of: counts of HF < 1 had similarly high performance. More information can be found in our research paper.
Future work and upcoming posts
This is only just the beginning and our work is far from complete. Over the next few months we plan to improve our credit scoring model and the number of data points that we ingest. On the data engineering side, we will build out datasets for other liquidity/lending protocols including Compound and MakerDAO. These will amplify our existing credit scoring model and power our other data products. On the modelling side, we are working to include historical and future (simulated) pricing of assets to determine how an account adjusts its positions to maintain HF > 1 in reaction to market conditions. Further, we are adapting our model to predict short-term eligibility of liquidation for positions that are opened by smart-contracts and held for very short periods of time.
Future posts will cover how we developed our high-fidelity dataset (that informs our models) and dive deeper into the architecture of Aave, Compound and MakerDAO. Tweet us @cred_protocol if you have specific article requests.
If you’d like to try our beta product and view your own Cred Score, you can here.