[Kaggle] Titanic: Machine Learning from Disaster - (4) Using NN/CNN to improve the performance

티스토리 뷰

AI/Machine Learning

[Kaggle] Titanic: Machine Learning from Disaster - (4) Using NN/CNN to improve the performance

Arc Lab. 2017. 11. 28. 15:30

[업데이트 2017.11.28 14:49]

기존에 사용하였던 logistic regression에서 성능 향상을 위해 간단한 Neural Network 및 Convolutional Neural Network를 사용하여 Kaggle submission 결과를 비교 하였고, NN의 경우 hyper parameter 조정을 통해 약 78%, CNN을 사용함으로써 accuracy를 약 80%까지 올릴수 있었습니다.

< CNN 사용후 성능 향상 결과>

NN의 경우 Layer를 3개, 각 Layer당 activation unit 16개를 사용 하였고, ReLU 및 Xavier initializer를 사용하였고, dropout(0.5)을 설정하였습니다. activation unit, layer 개수를 무작정 늘린다고 성능 향상이 있진 않았습니다.

* 참고: ReLU, Xavier initialization을 통해 neural network의 이슈중 하나인 vanishing gradient 이슈 해결.

dropout을 통해 layer 간 network 연결 일부를 끊어줌으로써 over fitting 이슈 해결.

binary classification 문제이긴하나, softmax를 사용하여 최종 분류 하였습니다. (class 2개로 분류 및 one-hot encoding사용)

CNN의 경우 feature를 conv2d 3x3 처리를 위해 일부 feature에 대해 제곱을 통해 feature수를 추가하였습니다.(총 9개의 feature) 아래와 같이 간단한 구조의 CNN을 사용하였습니다. 설명을 위해 크게 conv1, conv2로 나누겠습니다.(max pooling 등 포함)

conv1 conv2

Conv->ReLU->max_pool->dropout -> Conv->ReLU->max_pool->dropout -> Fully Connected Layer

conv1을 위해 filter size는 2x2, stride는 1 그리고 input size를 유지하기 위해 padding은 "SAME"을 설정하였고,

max_pool의 경우 no padding으로 설정하였습니다. 그리고 conv filter/max pooling이 적용된 output의 개수는 32가 되도록 하였습니다.

dropout의 경우 0.7로 설정하였습니다.

conv2의 경우는 conv1과 유사하나 max pooling filter size를 1x1 및 padding을 "SAME" 으로 설정하였고, dropout을 0.5로 설정 하였습니다.

이제 Fully Connected Layer를 구성하고, softmax를 통해 최종적으로 분류를 진행합니다. 최종 conv2의 output size가 [?, 2, 2, 64] 이므로 reshape를 통해 flatting을 수행합니다.

현재까지 간단히 적용하였을때 성능 향상은 다음과 같았습니다. 확실히 CNN이 분류를 함에 있어서 성능이 좋음을 확인했습니다.

logistic regression 약 0.76 < NN 약 0.78 < CNN 약 0.80

아래는 accuracy 및 cost를 Tensorboard를 통해 확인한 결과입니다.

NN, CNN에 대한 코드입니다.

 
import asyncml as ml
import tensorflow as tf
import numpy as np
import csv

feature_size = 7
max_epochs = 500
train_batch = 10000
test_batch = 10000
shuffle = False
shuffle_size = 10000
learning_rate =0.01
skip_size = 1
num_of_neuron = 16
train_dropout = 0.5
classes = 2
trials = 39

training_cv_dataset_record_defaults = [
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([""], dtype=tf.string),
tf.constant([0], dtype=tf.float32),
tf.constant([""], dtype=tf.string),
tf.constant([0], dtype=tf.float32)
]

test_dataset_record_defaults = [
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([""], dtype=tf.string),
tf.constant([0], dtype=tf.float32),
tf.constant([""], dtype=tf.string),
tf.constant([0], dtype=tf.float32)
]

def decode_csv_for_training_cv_set(line):
    PassengerId, labels, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked = tf.decode_csv(line,
                                                                                                             training_cv_dataset_record_defaults)

    features = Pclass, Name, Sex, Age, SibSp, Parch, Embarked, labels
    #features = Pclass, Name, Sex, Age, SibSp, Parch, labels
    #features = Pclass, Sex, Age, SibSp, Parch, labels

    #print(features)

    features = tf.reshape(features, [-1])

    return features

def decode_csv_for_test_set(line):
    PassengerId, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked = tf.decode_csv(line,
                                                                                                             test_dataset_record_defaults)

    features =  Pclass, Name, Sex, Age, SibSp, Parch, Embarked
    #features = Pclass, Name, Sex, Age, SibSp, Parch
    #features = Pclass, Sex, Age, SibSp, Parch

    #print(features)

    features = tf.reshape(features, [-1])

    return features


def line_pre_process_train(line):
    # Name
    Name = line[3]

    if "Mr." in Name:
        line[3] = 0
    elif "Mrs." in Name:
        line[3] = 1
    elif "Miss." in Name:
        line[3] = 2
    elif "Master." in Name:
        line[3] = 3
    elif "Rev." in Name:
        line[3] = 4
    elif "Dr." in Name:
        line[3] = 5
    elif "Mlle." in Name:
        line[3] = 6
    elif "Col." in Name:
        line[3] = 7
    elif "Lady." in Name:
        line[3] = 8
    elif "Don." in Name:
        line[3] = 9
    elif "Mme." in Name:
        line[3] = 10
    elif "Ms." in Name:
        line[3] = 11
    elif "Sir." in Name:
        line[3] = 12
    elif "Capt." in Name:
        line[3] = 13
    elif "the Countess." in Name:
        line[3] = 14
    elif "Jonkheer." in Name:
        line[3] = 15
    elif "Major." in Name:
        line[3] = 16
    else:
        line[3] = 17

    # Sex
    Sex = line[4]

    if Sex == "male":
        line[4] = 0
    elif Sex == "female":
        line[4] = 1

    # Age
    Age = 0

    try:
        Age = float(line[5])
    except ValueError:
        pass

    if Age < 1:
        line[5] = 0
    elif Age >= 1 and Age <= 9:
        line[5] = 1
    elif Age >= 10 and Age <= 19:
        line[5] = 2
    elif Age >= 20 and Age <= 29:
        line[5] = 3
    elif Age >= 30 and Age <= 39:
        line[5] = 4
    elif Age >= 40 and Age <= 49:
        line[5] = 5
    elif Age >= 50 and Age <= 59:
        line[5] = 6
    elif Age >= 60 and Age <= 69:
        line[5] = 7
    elif Age >= 70 and Age <= 79:
        line[5] = 8
    elif Age >= 80 and Age <= 89:
        line[5] = 9
    else:
        line[5] = 10


    # Embarked
    Embarked = line[11]

    if Embarked == "S":
        line[11] = 0
    elif Embarked == "C":
        line[11] = 1
    elif Embarked == "Q":
        line[11] = 2
    else:
        line[11] = 3


    return line

def line_pre_process_test(line):
    # Name
    Name = line[2]

    if "Mr." in Name:
        line[2] = 0
    elif "Mrs." in Name:
        line[2] = 1
    elif "Miss." in Name:
        line[2] = 2
    elif "Master." in Name:
        line[2] = 3
    elif "Rev." in Name:
        line[2] = 4
    elif "Dr." in Name:
        line[2] = 5
    elif "Mlle." in Name:
        line[2] = 6
    elif "Col." in Name:
        line[2] = 7
    elif "Lady." in Name:
        line[2] = 8
    elif "Don." in Name:
        line[2] = 9
    elif "Mme." in Name:
        line[2] = 10
    elif "Ms." in Name:
        line[2] = 11
    elif "Sir." in Name:
        line[2] = 12
    elif "Capt." in Name:
        line[2] = 13
    elif "the Countess." in Name:
        line[2] = 14
    elif "Jonkheer." in Name:
        line[2] = 15
    elif "Major." in Name:
        line[2] = 16
    else:
        line[2] = 17

    # Sex
    Sex = line[3]

    if Sex == "male":
        line[3] = 0
    elif Sex == "female":
        line[3] = 1

    # Age
    Age = 0

    try:
        Age = float(line[4])
    except ValueError:
        pass

    if Age < 1:
        line[4] = 0
    elif Age >= 1 and Age <= 9:
        line[4] = 1
    elif Age >= 10 and Age <= 19:
        line[4] = 2
    elif Age >= 20 and Age <= 29:
        line[4] = 3
    elif Age >= 30 and Age <= 39:
        line[4] = 4
    elif Age >= 40 and Age <= 49:
        line[4] = 5
    elif Age >= 50 and Age <= 59:
        line[4] = 6
    elif Age >= 60 and Age <= 69:
        line[4] = 7
    elif Age >= 70 and Age <= 79:
        line[4] = 8
    elif Age >= 80 and Age <= 89:
        line[4] = 9
    else:
        line[4] = 10

    # Embarked
    Embarked = line[10]

    if Embarked == "S":
        line[10] = 0
    elif Embarked == "C":
        line[10] = 1
    elif Embarked == "Q":
        line[10] = 2
    else:
        line[10] = 3

    return line

ml.csv_pre_process('train.csv', 'train_pre_processed.csv', _func_line=line_pre_process_train)
ml.csv_pre_process('test.csv', 'test_pre_processed.csv', _func_line=line_pre_process_test)

filenames = tf.constant(["train_pre_processed.csv","test_pre_processed.csv"])

training_cv_dataset = tf.contrib.data.TextLineDataset(filenames[0]).skip(skip_size).map(decode_csv_for_training_cv_set).batch(train_batch).repeat(max_epochs)

if shuffle == True:
    training_cv_dataset.shuffle(shuffle_size)

training_cv_iterator = training_cv_dataset.make_initializable_iterator()
training_batch_features = training_cv_iterator.get_next()

test_dataset = tf.contrib.data.TextLineDataset(filenames[1]).skip(skip_size).map(decode_csv_for_test_set).batch(test_batch)

test_iterator = test_dataset.make_initializable_iterator()
test_batch_features = test_iterator.get_next()

with tf.name_scope("Dropout") as scope:
    keep_prob = tf.placeholder(tf.float32)

# Define a model
with tf.name_scope("Input") as scope:
    X = tf.placeholder(tf.float32, [None, feature_size])
    y = tf.placeholder(tf.int32, [None, 1])
    y_one_hot = tf.one_hot(y, classes)  # one hot
    y_one_hot = tf.reshape(y_one_hot, [-1, classes])

with tf.name_scope("Layer1") as scope:
    W1 = tf.get_variable("W1", shape=[feature_size, num_of_neuron], initializer=tf.contrib.layers.xavier_initializer())
    b1 = tf.Variable(tf.random_normal([num_of_neuron]))
    L1 = tf.nn.relu(tf.matmul(X, W1) + b1)
    L1 = tf.nn.dropout(L1, keep_prob=keep_prob)

with tf.name_scope("Layer2") as scope:
    W2 = tf.get_variable("W2", shape=[num_of_neuron, num_of_neuron], initializer=tf.contrib.layers.xavier_initializer())
    b2 = tf.Variable(tf.random_normal([num_of_neuron]))
    L2 = tf.nn.relu(tf.matmul(L1, W2) + b2)
    L2 = tf.nn.dropout(L2, keep_prob=keep_prob)

with tf.name_scope("Layer3") as scope:
    W3 = tf.get_variable("W3", shape=[num_of_neuron, num_of_neuron], initializer=tf.contrib.layers.xavier_initializer())
    b3 = tf.Variable(tf.random_normal([num_of_neuron]))
    L3 = tf.nn.relu(tf.matmul(L2, W3) + b3)
    L3 = tf.nn.dropout(L3, keep_prob=keep_prob)
'''
with tf.name_scope("Layer4") as scope:
    W4 = tf.get_variable("W4", shape=[num_of_neuron, num_of_neuron], initializer=tf.contrib.layers.xavier_initializer())
    b4 = tf.Variable(tf.random_normal([num_of_neuron]))
    L4 = tf.nn.relu(tf.matmul(L3, W4) + b4)
    L4 = tf.nn.dropout(L4, keep_prob=keep_prob)

with tf.name_scope("Layer5") as scope:
    W5 = tf.get_variable("W5", shape=[num_of_neuron, num_of_neuron], initializer=tf.contrib.layers.xavier_initializer())
    b5 = tf.Variable(tf.random_normal([num_of_neuron]))
    L5 = tf.nn.relu(tf.matmul(L4, W5) + b5)
    L5 = tf.nn.dropout(L5, keep_prob=keep_prob)

with tf.name_scope("Layer6") as scope:
    W6 = tf.get_variable("W6", shape=[num_of_neuron, num_of_neuron], initializer=tf.contrib.layers.xavier_initializer())
    b6 = tf.Variable(tf.random_normal([num_of_neuron]))
    L6 = tf.nn.relu(tf.matmul(L5, W6) + b6)
    L6 = tf.nn.dropout(L6, keep_prob=keep_prob)

with tf.name_scope("Layer7") as scope:
    W7 = tf.get_variable("W7", shape=[num_of_neuron, num_of_neuron],
                         initializer=tf.contrib.layers.xavier_initializer())
    b7 = tf.Variable(tf.random_normal([num_of_neuron]))
    L7 = tf.nn.relu(tf.matmul(L6, W7) + b7)
    L7 = tf.nn.dropout(L7, keep_prob=keep_prob)


with tf.name_scope("Layer8") as scope:
    W8 = tf.get_variable("W8", shape=[num_of_neuron, num_of_neuron],
                         initializer=tf.contrib.layers.xavier_initializer())
    b8 = tf.Variable(tf.random_normal([num_of_neuron]))
    L8 = tf.nn.relu(tf.matmul(L7, W8) + b8)
    L8 = tf.nn.dropout(L8, keep_prob=keep_prob)

with tf.name_scope("Layer9") as scope:
    W9 = tf.get_variable("W9", shape=[num_of_neuron, num_of_neuron],
                         initializer=tf.contrib.layers.xavier_initializer())
    b9 = tf.Variable(tf.random_normal([num_of_neuron]))
    L9 = tf.nn.relu(tf.matmul(L8, W9) + b9)
    L9 = tf.nn.dropout(L9, keep_prob=keep_prob)
'''
with tf.name_scope("Output_Layer") as scope:
    W_Out = tf.get_variable("W_Out", shape=[num_of_neuron, classes], initializer=tf.contrib.layers.xavier_initializer())
    b_Out = tf.Variable(tf.random_normal([classes]))
    hypothesis = tf.matmul(L3, W_Out) + b_Out

with tf.name_scope("Cost") as scope:
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=hypothesis, labels=y_one_hot))

with tf.name_scope("Optimizer") as scope:
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

correct_prediction = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(y_one_hot, 1))
prediction = tf.argmax(hypothesis, 1)

with tf.name_scope("Accuracy") as scope:
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:

    #tf.summary.scalar("Epoch", max_epochs)
    #tf.summary.scalar("Train Batch", train_batch)
    #tf.summary.scalar("Learning Rate", learning_rate)
    #tf.summary.scalar("Number of Neurons", num_of_neuron)

    tb_cost = tf.summary.scalar("Cost", cost)
    tb_acc = tf.summary.scalar("Accuracy", accuracy)

    merged_summary = tf.summary.merge_all()
    writer = tf.summary.FileWriter("./log/kaggle_log_{}".format(trials))
    writer.add_graph(sess.graph)

    sess.run(tf.global_variables_initializer())

    iteration = 0

    '''
    TRAINING SESSION
    '''
    sess.run(training_cv_iterator.initializer)

    while True:
        try:
            train_data = sess.run(training_batch_features)

            X_data = train_data[:, 0:-1]
            y_data = train_data[:, [-1]]

            #X_data, mu, sigma = ml.feature_scaling(X_data)

            tb_summary, _cost, _accuracy, _= sess.run([merged_summary, cost, accuracy, optimizer], feed_dict={X: X_data, y: y_data, keep_prob:train_dropout})

            iteration += 1
            writer.add_summary(tb_summary, global_step=iteration)
            print("{} Training >> Loss: {:.3f}\tAccuracy: {:.2%}".format(iteration, _cost, _accuracy))

        except  tf.errors.OutOfRangeError:
            print("Training Session is done!\n")
            break

    '''
       TEST SESSION
    '''
    sess.run(test_iterator.initializer)

    while True:
        try:
            test_features = sess.run(test_batch_features)
            X_data =test_features
           # X_data, mu, sigma = ml.feature_scaling(test_features)

            _predicted = sess.run(prediction, feed_dict={X: X_data, keep_prob:1.0})

            #print(_predicted)

            num_of_rows = np.size(_predicted, 0)

            #print(num_of_rows)

            print("Test Session is done!\n")

            f = open('my_submission.csv', 'w', encoding='utf-8', newline='')
            wr = csv.writer(f)

            wr.writerow(["PassengerId", "Survived"])

            for i in range(num_of_rows):
                wr.writerow([i+892,_predicted[i]])

            f.close()

        except tf.errors.OutOfRangeError:
                print("Kaggle submission data generation is done!\n")
        break

    writer.close()

 
import asyncml as ml
import tensorflow as tf
import numpy as np
import csv

feature_size = 9
max_epochs = 500
train_batch = 10000
test_batch = 10000
shuffle = False
shuffle_size = 10000
learning_rate =0.01
skip_size = 1
num_of_neuron = 8
train_dropout = 0.7
classes = 2
trials = 40

training_cv_dataset_record_defaults = [
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([""], dtype=tf.string),
tf.constant([0], dtype=tf.float32),
tf.constant([""], dtype=tf.string),
tf.constant([0], dtype=tf.float32)
]

test_dataset_record_defaults = [
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([0], dtype=tf.float32),
tf.constant([""], dtype=tf.string),
tf.constant([0], dtype=tf.float32),
tf.constant([""], dtype=tf.string),
tf.constant([0], dtype=tf.float32)
]

def decode_csv_for_training_cv_set(line):
    PassengerId, labels, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked = tf.decode_csv(line,
                                                                                                             training_cv_dataset_record_defaults)

    features = Pclass, Name, Sex, Sex*Sex, Age, Age*Age, SibSp, Parch, Embarked, labels
    #features = Pclass, Name, Sex, Age, SibSp, Parch, labels
    #features = Pclass, Sex, Age, SibSp, Parch, labels
    #features = Pclass, Sex, Age, SibSp, labels

    #print(features)

    features = tf.reshape(features, [-1])

    return features

def decode_csv_for_test_set(line):
    PassengerId, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked = tf.decode_csv(line,
                                                                                                             test_dataset_record_defaults)

    features =  Pclass, Name, Sex, Sex*Sex, Age, Age*Age, SibSp, Parch, Embarked
    #features = Pclass, Name, Sex, Age, SibSp, Parch
    #features = Pclass, Sex, Age, SibSp, Parch
    #features = Pclass, Sex, Age, SibSp

    #print(features)

    features = tf.reshape(features, [-1])

    return features


def line_pre_process_train(line):
    # Name
    Name = line[3]

    if "Mr." in Name:
        line[3] = 0
    elif "Mrs." in Name:
        line[3] = 1
    elif "Miss." in Name:
        line[3] = 2
    elif "Master." in Name:
        line[3] = 3
    elif "Rev." in Name:
        line[3] = 4
    elif "Dr." in Name:
        line[3] = 5
    elif "Mlle." in Name:
        line[3] = 6
    elif "Col." in Name:
        line[3] = 7
    elif "Lady." in Name:
        line[3] = 8
    elif "Don." in Name:
        line[3] = 9
    elif "Mme." in Name:
        line[3] = 10
    elif "Ms." in Name:
        line[3] = 11
    elif "Sir." in Name:
        line[3] = 12
    elif "Capt." in Name:
        line[3] = 13
    elif "the Countess." in Name:
        line[3] = 14
    elif "Jonkheer." in Name:
        line[3] = 15
    elif "Major." in Name:
        line[3] = 16
    else:
        line[3] = 17

    # Sex
    Sex = line[4]

    if Sex == "male":
        line[4] = 0
    elif Sex == "female":
        line[4] = 1

    # Age
    Age = 0

    try:
        Age = float(line[5])
    except ValueError:
        pass

    if Age < 1:
        line[5] = 0
    elif Age >= 1 and Age <= 9:
        line[5] = 1
    elif Age >= 10 and Age <= 19:
        line[5] = 2
    elif Age >= 20 and Age <= 29:
        line[5] = 3
    elif Age >= 30 and Age <= 39:
        line[5] = 4
    elif Age >= 40 and Age <= 49:
        line[5] = 5
    elif Age >= 50 and Age <= 59:
        line[5] = 6
    elif Age >= 60 and Age <= 69:
        line[5] = 7
    elif Age >= 70 and Age <= 79:
        line[5] = 8
    elif Age >= 80 and Age <= 89:
        line[5] = 9
    else:
        line[5] = 10


    # Embarked
    Embarked = line[11]

    if Embarked == "S":
        line[11] = 0
    elif Embarked == "C":
        line[11] = 1
    elif Embarked == "Q":
        line[11] = 2
    else:
        line[11] = 3


    return line

def line_pre_process_test(line):
    # Name
    Name = line[2]

    if "Mr." in Name:
        line[2] = 0
    elif "Mrs." in Name:
        line[2] = 1
    elif "Miss." in Name:
        line[2] = 2
    elif "Master." in Name:
        line[2] = 3
    elif "Rev." in Name:
        line[2] = 4
    elif "Dr." in Name:
        line[2] = 5
    elif "Mlle." in Name:
        line[2] = 6
    elif "Col." in Name:
        line[2] = 7
    elif "Lady." in Name:
        line[2] = 8
    elif "Don." in Name:
        line[2] = 9
    elif "Mme." in Name:
        line[2] = 10
    elif "Ms." in Name:
        line[2] = 11
    elif "Sir." in Name:
        line[2] = 12
    elif "Capt." in Name:
        line[2] = 13
    elif "the Countess." in Name:
        line[2] = 14
    elif "Jonkheer." in Name:
        line[2] = 15
    elif "Major." in Name:
        line[2] = 16
    else:
        line[2] = 17

    # Sex
    Sex = line[3]

    if Sex == "male":
        line[3] = 0
    elif Sex == "female":
        line[3] = 1

    # Age
    Age = 0

    try:
        Age = float(line[4])
    except ValueError:
        pass

    if Age < 1:
        line[4] = 0
    elif Age >= 1 and Age <= 9:
        line[4] = 1
    elif Age >= 10 and Age <= 19:
        line[4] = 2
    elif Age >= 20 and Age <= 29:
        line[4] = 3
    elif Age >= 30 and Age <= 39:
        line[4] = 4
    elif Age >= 40 and Age <= 49:
        line[4] = 5
    elif Age >= 50 and Age <= 59:
        line[4] = 6
    elif Age >= 60 and Age <= 69:
        line[4] = 7
    elif Age >= 70 and Age <= 79:
        line[4] = 8
    elif Age >= 80 and Age <= 89:
        line[4] = 9
    else:
        line[4] = 10

    # Embarked
    Embarked = line[10]

    if Embarked == "S":
        line[10] = 0
    elif Embarked == "C":
        line[10] = 1
    elif Embarked == "Q":
        line[10] = 2
    else:
        line[10] = 3

    return line

ml.csv_pre_process('train.csv', 'train_pre_processed.csv', _func_line=line_pre_process_train)
ml.csv_pre_process('test.csv', 'test_pre_processed.csv', _func_line=line_pre_process_test)

filenames = tf.constant(["train_pre_processed.csv","test_pre_processed.csv"])

training_cv_dataset = tf.contrib.data.TextLineDataset(filenames[0]).skip(skip_size).map(decode_csv_for_training_cv_set).batch(train_batch).repeat(max_epochs)

if shuffle == True:
    training_cv_dataset.shuffle(shuffle_size)

training_cv_iterator = training_cv_dataset.make_initializable_iterator()
training_batch_features = training_cv_iterator.get_next()

test_dataset = tf.contrib.data.TextLineDataset(filenames[1]).skip(skip_size).map(decode_csv_for_test_set).batch(test_batch)

test_iterator = test_dataset.make_initializable_iterator()
test_batch_features = test_iterator.get_next()

with tf.name_scope("Dropout") as scope:
    keep_prob = tf.placeholder(tf.float32)

# Define a model
with tf.name_scope("Input") as scope:
    X = tf.placeholder(tf.float32, [None, feature_size])
    X_2d = tf.reshape(X, [-1, 3, 3, 1])
    y = tf.placeholder(tf.int32, [None, 1])
    y_one_hot = tf.one_hot(y, classes)  # one hot
    y_one_hot = tf.reshape(y_one_hot, [-1, classes])

with tf.name_scope("Conv1") as scope:
    W1 = tf.get_variable("W1", shape=[2,2,1,32], initializer=tf.contrib.layers.xavier_initializer())
    L1 = tf.nn.conv2d(X_2d, W1, strides=[1,1,1,1], padding="SAME")
    L1 = tf.nn.relu(L1)
    L1 = tf.nn.max_pool(L1, ksize=[1,2,2,1], strides=[1,1,1,1], padding="VALID")
    L1 = tf.nn.dropout(L1, keep_prob=0.7)

with tf.name_scope("Conv2") as scope:
    W2 = tf.get_variable("W2", shape=[1,1,32,64], initializer=tf.contrib.layers.xavier_initializer())
    L2 = tf.nn.conv2d(L1, W2, strides=[1,1,1,1], padding="SAME")
    L2 = tf.nn.relu(L2)
    L2 = tf.nn.max_pool(L2, ksize=[1,1,1,1], strides=[1,1,1,1], padding="SAME")
    L2 = tf.nn.dropout(L2, keep_prob=0.5)
    L2 = tf.reshape(L2, [-1, 2 * 2 * 64])

with tf.name_scope("FC") as scope:
    W_Out = tf.get_variable("W_Out", shape=[2 * 2 * 64, classes], initializer=tf.contrib.layers.xavier_initializer())
    b_Out = tf.Variable(tf.random_normal([classes]))
    hypothesis = tf.matmul(L2, W_Out) + b_Out

with tf.name_scope("Cost") as scope:
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=hypothesis, labels=y_one_hot))

with tf.name_scope("Optimizer") as scope:
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

correct_prediction = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(y_one_hot, 1))
prediction = tf.argmax(hypothesis, 1)

with tf.name_scope("Accuracy") as scope:
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:

    #tf.summary.scalar("Epoch", max_epochs)
    #tf.summary.scalar("Train Batch", train_batch)
    #tf.summary.scalar("Learning Rate", learning_rate)
    #tf.summary.scalar("Number of Neurons", num_of_neuron)

    tb_cost = tf.summary.scalar("Cost", cost)
    tb_acc = tf.summary.scalar("Accuracy", accuracy)

    merged_summary = tf.summary.merge_all()
    writer = tf.summary.FileWriter("./log/kaggle_log_{}".format(trials))
    writer.add_graph(sess.graph)

    sess.run(tf.global_variables_initializer())

    iteration = 0

    '''
    TRAINING SESSION
    '''
    sess.run(training_cv_iterator.initializer)

    while True:
        try:
            train_data = sess.run(training_batch_features)

            X_data = train_data[:, 0:-1]
            y_data = train_data[:, [-1]]

            #X_data, mu, sigma = ml.feature_scaling(X_data)

            tb_summary, _cost, _accuracy, _= sess.run([merged_summary, cost, accuracy, optimizer], feed_dict={X: X_data, y: y_data, keep_prob:train_dropout})

            iteration += 1
            writer.add_summary(tb_summary, global_step=iteration)
            print("{} Training >> Loss: {:.3f}\tAccuracy: {:.2%}".format(iteration, _cost, _accuracy))

        except  tf.errors.OutOfRangeError:
            print("Training Session is done!\n")
            break

    '''
       TEST SESSION
    '''
    sess.run(test_iterator.initializer)

    while True:
        try:
            test_features = sess.run(test_batch_features)
            X_data =test_features
           # X_data, mu, sigma = ml.feature_scaling(test_features)

            _predicted = sess.run(prediction, feed_dict={X: X_data, keep_prob:1.0})

            #print(_predicted)

            num_of_rows = np.size(_predicted, 0)

            #print(num_of_rows)

            print("Test Session is done!\n")

            f = open('my_submission.csv', 'w', encoding='utf-8', newline='')
            wr = csv.writer(f)

            wr.writerow(["PassengerId", "Survived"])

            for i in range(num_of_rows):
                wr.writerow([i+892,_predicted[i]])

            f.close()

        except tf.errors.OutOfRangeError:
                print("Kaggle submission data generation is done!\n")
        break

    writer.close()

asyncml.zip

* 참고: https://hunkim.github.io/ml/

* 참고 : https://www.kaggle.com/c/titanic/

저작자표시 비영리 변경금지

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

글 보관함

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Arc Lab.'s Blog

티스토리 뷰

[Kaggle] Titanic: Machine Learning from Disaster - (4) Using NN/CNN to improve the performance

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역