Summary of ‘Facies_classification.ipynb’

2021-01-05 Progress

The first work of my internship in CSRD is follwing up ‘2016-ml-contest’. The object of this contest is to make the best lithology prediction using well log data. Basically, using many of ML techniques, train the model. In this case, classification ML is considered because the predicted label is not continuous value. And applying to new data, predict proper lithology labels.

There are many ML methods in github codes, such as SVM, Random forest, and Deep neural network, but the most of high ranks are designed using Gradient Boosting.

Until now, I analyzed the code ‘Facies_classification.ipynb’ made by Brendon. This is open access for who will get started in ML in Python. This article is a summary of the codes.

To do in future

Studying about ‘Gradient Boosting Tree’ method.
Analyzing some version of codes of team ISPL and comparing what is different from the previous version.
Analyzing of LA_Team and PA Team in the same way.

Exploring the dataset

Data : facies_vector.csv

Columns

Facies	Formation	Well Name	Depth	GR	ILD_log10	DeltaPHI	PHIND	PE	NM_M	RELPOS
1 ~ 9		Well name	Depth	Gamma ray	resistivity logging	neutron-density porosity difference	average neutron-density porosity	photoelectric effect	nonmarine-marine indicator	relative position

These facies aren’t discrete, and gradually blend into one another. Some have neighboring facies that are rather close. Mislabeling within these neighboring facies can be expected to occur. The following table lists the facies, their abbreviated labels and their approximate neighbors.

Facies	classes of rocks	Label	Adjacent Facies
1	Nonmarine sandstone	SS	2
2	Nonmarine coarse siltstone	CSiS	1,3
3	Nonmarine fine siltstone	FSiS	2
4	Marine siltstone and shale	SiSh	5
5	Mudstone (limestone)	MS	4,6
6	Wackestone (limestone)	WS	5,7
7	Dolomite	D	6,8
8	Packstone-grainstone (limestone)	PS	6,7,9
9	Phylloid-algal bafflestone (limestone)	BS	7,8

About a method of training and Training the model

Usage : SVM Classifier

Basic code for using SVM

from sklearn import svm

clf = svm.SVC()
clf.fit(X_train,y_train)
predicted_labels = clf.predict(X_test)

from sklearn.metrics import confusion_matrix
from classification_utilities import display_cm, display_adj_cm

Confusion matrix

conf = confusion_matrix(y_test, predicted_labels)
display_cm(conf, facies_labels, hide_zeros=True) %% 'display_cm' is given

Define accuracy and adjacent accuracy

def accuracy(conf):
    total_correct = 0.
    nb_classes = conf.shape[0]
    for i in np.arange(0,nb_classes):
        total_correct += conf[i][i]
    acc = total_correct/sum(sum(conf))
    return acc

Model parameter selection

SVM using rbf kernel function has two parameters, c and gamma. Tune these parameters using ‘cross validation’ method.

Cross validation

Two nested loops are used to train a classifier for every possible combination of values in the ranges specified.

In this codes, C_range and gamma_range is following.

C_range = np.array([.01, 1, 5, 10, 20, 50, 100, 1000, 5000, 10000])
gamma_range = np.array([0.0001, 0.001, 0.01, 0.1, 1, 10])

We can get the proper parameters when the accuracy value is higest. The value of gamma and C are 1 and 10 each in this codes.

Precision, Recall and F1 score

Precision is the probability that given a classification result for a sample, the sample actually belongs to that class.
Recall is the probability that a sample will be correctly classified for a given class.

F1 score combines both to give a single measure of relevancy of the classifier results.

Precision and Recall can be computed using below function set display_metrics = True.

display_cm(cv_conf, facies_labels, display_metrics=True, hide_zeros=True)

display_adj_cm(cv_conf, facies_labels, adjacent_facies, display_metrics=True, hide_zeros=True)

Applying the model to new data

Save the results

well_data.to_csv('well_data_with_facies.csv')

Lim Kwangsun