Summary of ‘Facies_classification.ipynb’

2021-01-05 Progress

The first work of my internship in CSRD is follwing up ‘2016-ml-contest’. The object of this contest is to make the best lithology prediction using well log data. Basically, using many of ML techniques, train the model. In this case, classification ML is considered because the predicted label is not continuous value. And applying to new data, predict proper lithology labels.

There are many ML methods in github codes, such as SVM, Random forest, and Deep neural network, but the most of high ranks are designed using Gradient Boosting.

Until now, I analyzed the code ‘Facies_classification.ipynb’ made by Brendon. This is open access for who will get started in ML in Python. This article is a summary of the codes.

To do in future

  • Studying about ‘Gradient Boosting Tree’ method.
  • Analyzing some version of codes of team ISPL and comparing what is different from the previous version.
  • Analyzing of LA_Team and PA Team in the same way.



Exploring the dataset

Data : facies_vector.csv

Columns

Facies Formation Well Name Depth GR ILD_log10 DeltaPHI PHIND PE NM_M RELPOS
1 ~ 9   Well name Depth Gamma ray resistivity logging neutron-density porosity difference average neutron-density porosity photoelectric effect nonmarine-marine indicator relative position


These facies aren’t discrete, and gradually blend into one another. Some have neighboring facies that are rather close. Mislabeling within these neighboring facies can be expected to occur. The following table lists the facies, their abbreviated labels and their approximate neighbors.

Facies classes of rocks Label Adjacent Facies
1 Nonmarine sandstone SS 2
2 Nonmarine coarse siltstone CSiS 1,3
3 Nonmarine fine siltstone FSiS 2
4 Marine siltstone and shale SiSh 5
5 Mudstone (limestone) MS 4,6
6 Wackestone (limestone) WS 5,7
7 Dolomite D 6,8
8 Packstone-grainstone (limestone) PS 6,7,9
9 Phylloid-algal bafflestone (limestone) BS 7,8

About a method of training and Training the model

Usage : SVM Classifier

Basic code for using SVM

from sklearn import svm

clf = svm.SVC()
clf.fit(X_train,y_train)
predicted_labels = clf.predict(X_test)

from sklearn.metrics import confusion_matrix
from classification_utilities import display_cm, display_adj_cm

Confusion matrix

conf = confusion_matrix(y_test, predicted_labels)
display_cm(conf, facies_labels, hide_zeros=True) %% 'display_cm' is given

Define accuracy and adjacent accuracy

def accuracy(conf):
    total_correct = 0.
    nb_classes = conf.shape[0]
    for i in np.arange(0,nb_classes):
        total_correct += conf[i][i]
    acc = total_correct/sum(sum(conf))
    return acc

Model parameter selection

SVM using rbf kernel function has two parameters, c and gamma. Tune these parameters using ‘cross validation’ method.

Cross validation

Two nested loops are used to train a classifier for every possible combination of values in the ranges specified.

In this codes, C_range and gamma_range is following.

C_range = np.array([.01, 1, 5, 10, 20, 50, 100, 1000, 5000, 10000])
gamma_range = np.array([0.0001, 0.001, 0.01, 0.1, 1, 10])

We can get the proper parameters when the accuracy value is higest. The value of gamma and C are 1 and 10 each in this codes.

Precision, Recall and F1 score

Precision is the probability that given a classification result for a sample, the sample actually belongs to that class.
Recall is the probability that a sample will be correctly classified for a given class.

F1 score combines both to give a single measure of relevancy of the classifier results.

Precision and Recall can be computed using below function set display_metrics = True.

display_cm(cv_conf, facies_labels, display_metrics=True, hide_zeros=True)
display_adj_cm(cv_conf, facies_labels, adjacent_facies, display_metrics=True, hide_zeros=True)

Applying the model to the blind data

Applying the model to new data

Save the results

well_data.to_csv('well_data_with_facies.csv')

Categories:

Updated: