Name: Emad Takla

DAND P2

Dataset chosen: Titanic

Setup

Headers to be used throughout the project:

In [2]:
import unicodecsv
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats
from IPython.display import HTML
from IPython.display import display_html

Special Commands

In [3]:
#Plot all figures within the html page
%pylab inline

#Make display better, like printing dataframes as tables
pd.set_option('display.notebook_repr_html', True)
Populating the interactive namespace from numpy and matplotlib

Importing Data

In [4]:
titanic_all_data = pd.read_csv('./titanic_data.csv')

Supporting Functions:

A function to map age into generation buckets: Child (0 to 16 years old), Adult (17 to 49 years old) and elderly (50 years old and over)

In [5]:
def age_to_generation(age):
    #Check if the input is negative, return an error
    if age < 0:
        return 'Invalid age'
    #If younger than 17, then it is classified as a child
    if age < 17:
        return 'Child'
    #else if it is between 18 and 49 (both inclusive), the passed age is for an adult
    if age < 50:
        return 'Adult'
    #Else if, the person is an elderly
    if age >= 50:
        return 'Elderly'
    #Else, the input age was unspecified (Blank cell, NaN)
    return 'Unspecified age'

A function to strip the last name from the column 'Name' The assumption is that the format is as follows: 'last_name, title. first_name (Alias\Maiden Name)'

In [6]:
def get_last_name(full_name):
    return full_name.split(',')[0]

A function to plot a population pyramid, a horizontal histogram bar chart plotted back to back

In [7]:
# Plotting code coming from http://stackoverflow.com/questions/27694221/using-python-libraries-to-plot-two-horizontal-bar-charts-sharing-same-y-axis
#Binning Code coming from http://stackoverflow.com/questions/21441259/pandas-groupby-range-of-values

def plot_population_pyramid(bins, left_title, left_data, right_title, right_data, largest_x_value):
    fig, axes = plt.subplots(ncols=2, sharey=True)
    
    #largest_x_value:
    #this variable will be used to preserve scale. If not, the two graphs will extend to the maximum value of the dataset,
    #and will be visually misleading
    #largest_x_value = max(left_data.max(), right_data.max())
    
    axes[0].barh(bins, left_data, align='center')
    axes[0].set(title=left_title)
    axes[0].set_xlim(0, largest_x_value)
    
    axes[1].barh(bins, right_data, align='center')
    axes[1].set(title=right_title)
    axes[1].set_xlim(0, largest_x_value)
    
    axes[0].invert_xaxis()
    axes[0].set(yticks=bins)
    axes[0].yaxis.tick_right()

Two functions to count the number of survivors\victims within a passed GroupBy object. The Function Accepts A GroupBy Object, Not a DataFrame!

The need for the function came when groupby can get an input that had by chance all victims\survivors, and there was no other group for the missing opposite value. This created index errors when I programmatically looped over the groups, and I had to check if the group was present in the data structure before using it

In [8]:
def get_surviving_count(group_obj):
    #The values in Survived are either 1 or 0, with 1 indicating survivors
    if 1 in group_obj.groups.keys():
        return group_obj.get_group(1)['PassengerId'].count()
    else:
        return 0
In [9]:
def get_victim_count(group_obj):
    #The values in Survived are either 1 or 0, with 0 indicating victims
    if 0 in group_obj.groups.keys():
        return group_obj.get_group(0)['PassengerId'].count()
    else:
        return 0

Graphing Helper Functions and Variables.

In [10]:
gender_colors = ['hotpink','dodgerblue']
survival_colors = ['red', 'limegreen']

The following function accepts a groupby object, that must group its original input only by Survived (ie only at maximum two groups, group 0 (victim) and group 1 (survived)

In [11]:
def plot_survival_pie_chart(group_obj, label):
    graph_label = []
    survival_colors = []
    
    if 0 in group_obj.groups.keys():
        graph_label.append('Victim')
        survival_colors.append('red')
    if 1 in group_obj.groups.keys():
        graph_label.append('Survived')
        survival_colors.append('limegreen')
        
    group_obj['PassengerId'].count().plot.pie(label = label, autopct='%1.1f%%', colors=survival_colors,labels=graph_label)

A function to get the count of passengers in a passed dataframe

In [12]:
def get_count(df):
    return df['PassengerId'].count()

A function to make subplots. It will be overloaded to have a version where we can pass the labels parameter

In [13]:
def draw_pie_subplot(groupby_data, subplot_position, graph_label,  graph_type):
    graph_colors = []
    if(graph_type == 'SURVIVAL_GRAPH'):
        graph_colors = survival_colors
    elif(graph_type == 'GENDER_GRAPH'):
        graph_colors = gender_colors

    fig.add_subplot(subplot_position)
    
    #If it's a survival graph, use the already created function to plot it
    if (graph_type == 'SURVIVAL_GRAPH'):
        plot_survival_pie_chart(groupby_data ,graph_label )
        
    #Default colors. Do not pass the colors parameter to the plotting function
    elif graph_type == 'DEFAULT':
        get_count(groupby_data).plot.pie(label = graph_label, autopct='%1.1f%%')
        
    else:
        get_count(groupby_data).plot.pie(label = graph_label, \
                                     autopct='%1.1f%%',\
                                     colors = graph_colors)
    plt.axis('equal')
    

Questions

Q1: What is the effect of Traveling With First Degree Relatives Over the Survival of a Passenger ?

http://www.durhamcollege.ca/wp-content/uploads/STAT_nullalternate_hypothesis.pdf

  • The question that will be investigated in this analysis is: if a passenger was traveling without a family (ie both SibSp and Parch were equal to zero), did he have a higher\lower chance of survival ? Was having a family on board an advantage, disadvantage or irrelevant for the survival of a passenger ?

Null Hypothesis: There is NO difference in the survival rate of passengers traveling with their immediate family and that of passengers traveling alone.

Alternate Hypothesis: There IS a difference in the survival rate of passengers traveling with their immediate family and that of passengers traveling alone.

**α: 0.05**

Q2: Is It Possible From The Provided Data To Identify Same Family Members?

This is more of a data investigation question, not a statistical inference one. The data provides the last names, ages, traveling class and companionship (Spouses, Sibling - Parents, Children). Are these enough to make (partial) educated guesses about the family members? And if there is a success to do that, can we infer if having a bigger family improved the chances of survival ? (Possibly, since the rest of the family would pressure the crew to allow their left-behind family member on board of the life boats)

Data Wrangling

New Fields to the Data

An interesting feature to show, is the generation to which the passenger belongs. There shall be three categories for that parameter (Child, Adult and Elderly) based on their age range. The breakdown is as follows

  • Child: 16 years old >= Age >= 0 years old
  • Adult: 49 years old >= Age >= 17 years old
  • Elderly: Age >= 50 years old

Any other values like NaN, negative numbers, blank fields, non-numerical values..etc will be noted as "Unspecified age"

The 'isSolo' field will be used to see if a passenger is traveling with his\her family or not. Here are the criterias used:

  • A solo traveler is a traveler whose SibSp\Parch fields are equal to zero.
  • A child cannot be a solo traveler, even if the SibSp\Parch fields are equal to zero.
  • Just a word of caution when interpreting the isSolo field: isSolo does not mean that the traveler was traveling totally alone, they can be traveling with friends for example.

The total number of companions is the sum of the fields Parch and SibSp. It will be used for creating some descriptive statitistics about companionship

The 'LastName' field simply extracts the last name of the passenger from the full name provided. This will be useful in answering the second question.

In [14]:
#Adding the generation data field to the table. The generation can have three values: Child, Adult or Elderly
#The buckets ranges were arbitrarily chosen 
titanic_all_data['Generation']  = titanic_all_data.loc[:,'Age'].apply(age_to_generation)

"""
If the traveler is a child (Under 17 years old), then automatically they are not a solo traveler (Can be traveling with a 
nanny or a close family-friend..etc). Or, if the traveler has a non-zero value in any of SibSp and Parch, then they are not 
solo Otherwise, they are a solo passenger, traveling alone
"""
#This one was tough, using the 'and' operator raised a ValueError, and it took me a while to find out that
#I should substitute it with bitwise &
titanic_all_data['isSolo'] = (titanic_all_data['SibSp'] + titanic_all_data['Parch'] == 0) & \
                             (titanic_all_data['Generation'] != 'Child')

titanic_all_data['TotalCompanions'] = titanic_all_data['Parch'] + titanic_all_data['SibSp']


#Stripping the last name of the passengers. This will be used to determine which passengers probably belong to the same family
titanic_all_data['LastName'] = titanic_all_data.loc[:,'Name'].apply(get_last_name)

Data Slicing

Extracting General Data About the Passengers

In [15]:
#Passengers by Gender
male_passengers = titanic_all_data[titanic_all_data['Sex'] == 'male'] 
female_passengers = titanic_all_data[titanic_all_data['Sex'] == 'female'] 

#Passengers by Generation
children_passengers = titanic_all_data[ titanic_all_data['Generation'] == 'Child' ]
adult_passengers = titanic_all_data[ titanic_all_data['Generation'] == 'Adult' ]
elderly_passengers = titanic_all_data[ titanic_all_data['Generation'] == 'Elderly' ]

#Passengers by Survival
surviving_passengers = titanic_all_data[ titanic_all_data['Survived'] == 1 ]
victim_passengers = titanic_all_data[ titanic_all_data['Survived'] == 0 ]

Data Groups:

In my opinion, groups are very straight forward when it comes to plot; but slicing dataframes is better for calculations. Just easier for me.

In [16]:
passengers_by_gender = titanic_all_data.groupby('Sex')
passengers_by_generation = titanic_all_data.groupby('Generation')
passengers_by_class = titanic_all_data.groupby('Pclass')
passengers_by_survival = titanic_all_data.groupby('Survived')
passengers_by_companionship = titanic_all_data.groupby('isSolo')

passengers_by_class_and_gender = titanic_all_data.groupby(['Pclass', 'Sex'])
passengers_by_class_and_generation = titanic_all_data.groupby(['Pclass', 'Generation'])
passengers_by_class_and_survival = titanic_all_data.groupby(['Pclass', 'Survived'])
passengers_by_generation_and_gender = titanic_all_data.groupby(['Generation', 'Sex'])

Slicing The Passengers' Parameters According to their Traveling Class

I have followed the instructions in the previous review by using groupby instead of slicing. However, I still cannot see the advantage in that. In fact, I think that the first way was more readable. For example: first_class_passengers[ first_class_passengers['Survived'] == 0 ] vs passengers_by_class_and_survival.get_group( (1,0) )

In [17]:
#Passengers by class
########################################################################################################################
#First Class Data Splitting:
first_class_passengers = passengers_by_class.get_group(1)

#Gender                  
first_class_male_passengers = passengers_by_class_and_gender.get_group( (1,'male') )
first_class_female_passengers = passengers_by_class_and_gender.get_group( (1,'female') ) 
#Children
first_class_children_passengers = passengers_by_class_and_generation.get_group( (1, 'Child') )
first_class_children_male_passengers = first_class_children_passengers.groupby('Sex').get_group('male')
first_class_children_female_passengers = first_class_children_passengers.groupby('Sex').get_group('female')
#Adults
first_class_adult_passengers = passengers_by_class_and_generation.get_group( (1, 'Adult') )
first_class_adult_male_passengers = first_class_adult_passengers.groupby('Sex').get_group('male')
first_class_adult_female_passengers = first_class_adult_passengers.groupby('Sex').get_group('female')
#Elderly
first_class_elderly_passengers = passengers_by_class_and_generation.get_group( (1, 'Elderly') )
first_class_elderly_male_passengers = first_class_elderly_passengers.groupby('Sex').get_group('male')
first_class_elderly_female_passengers = first_class_elderly_passengers.groupby('Sex').get_group('female')

#First Class Survival
first_class_survivors = passengers_by_class_and_survival.get_group( (1,1) )
first_class_victims = passengers_by_class_and_survival.get_group( (1,0) )

########################################################################################################################
#Second Class Data Splitting:
second_class_passengers = passengers_by_class.get_group(2)

#Gender                  
second_class_male_passengers = passengers_by_class_and_gender.get_group( (2,'male') )
second_class_female_passengers = passengers_by_class_and_gender.get_group( (2,'female') ) 
#Children
second_class_children_passengers = passengers_by_class_and_generation.get_group( (2, 'Child') )
second_class_children_male_passengers = second_class_children_passengers.groupby('Sex').get_group('male')
second_class_children_female_passengers = second_class_children_passengers.groupby('Sex').get_group('female')
#Adults
second_class_adult_passengers = passengers_by_class_and_generation.get_group( (2, 'Adult') )
second_class_adult_male_passengers = second_class_adult_passengers.groupby('Sex').get_group('male')
second_class_adult_female_passengers = second_class_adult_passengers.groupby('Sex').get_group('female')
#Elderly
second_class_elderly_passengers = passengers_by_class_and_generation.get_group( (2, 'Elderly') )
second_class_elderly_male_passengers = second_class_elderly_passengers.groupby('Sex').get_group('male')
second_class_elderly_female_passengers = second_class_elderly_passengers.groupby('Sex').get_group('female')

#second Class Survival
second_class_survivors = passengers_by_class_and_survival.get_group( (2,1) )
second_class_victims = passengers_by_class_and_survival.get_group( (2,0) )

########################################################################################################################
#Third Class Data Splitting:
third_class_passengers = passengers_by_class.get_group(3)

#Gender                  
third_class_male_passengers = passengers_by_class_and_gender.get_group( (3,'male') )
third_class_female_passengers = passengers_by_class_and_gender.get_group( (3,'female') ) 
#Children
third_class_children_passengers = passengers_by_class_and_generation.get_group( (3, 'Child') )
third_class_children_male_passengers = third_class_children_passengers.groupby('Sex').get_group('male')
third_class_children_female_passengers = third_class_children_passengers.groupby('Sex').get_group('female')
#Adults
third_class_adult_passengers = passengers_by_class_and_generation.get_group( (3, 'Adult') )
third_class_adult_male_passengers = third_class_adult_passengers.groupby('Sex').get_group('male')
third_class_adult_female_passengers = third_class_adult_passengers.groupby('Sex').get_group('female')
#Elderly
third_class_elderly_passengers = passengers_by_class_and_generation.get_group( (1, 'Elderly') )
third_class_elderly_male_passengers = third_class_elderly_passengers.groupby('Sex').get_group('male')
third_class_elderly_female_passengers = third_class_elderly_passengers.groupby('Sex').get_group('female')

#third Class Survival
third_class_survivors = passengers_by_class_and_survival.get_group( (3,1) )
third_class_victims = passengers_by_class_and_survival.get_group( (3,0) )

Data Exploration and Analysis

Miscellaneous Data Count Computation

In [18]:
sample_size = get_count(titanic_all_data)
male_count = get_count(passengers_by_gender.get_group('male'))
female_count = get_count(passengers_by_gender.get_group('female'))

children_count = get_count(children_passengers)
children_male_count = get_count(passengers_by_generation_and_gender.get_group(('Child', 'male')) )
children_female_count = get_count(passengers_by_generation_and_gender.get_group(('Child', 'female')) )

adult_count = get_count(adult_passengers)
adult_male_count = get_count( passengers_by_generation_and_gender.get_group(('Adult', 'male')) )
adult_female_count = get_count( passengers_by_generation_and_gender.get_group(('Child', 'female')) )

elderly_count = get_count(elderly_passengers)
elderly_male_count = get_count( passengers_by_generation_and_gender.get_group(('Elderly', 'male')) )
elderly_female_count = get_count( passengers_by_generation_and_gender.get_group(('Elderly', 'female')) )

first_class_count = get_count(first_class_passengers)
first_class_male_count = get_count(first_class_male_passengers)
first_class_female_count = get_count(first_class_female_passengers)
first_class_children_count = get_count(first_class_children_passengers)
first_class_children_male_count = get_count(first_class_children_male_passengers)
first_class_children_female_count = get_count(first_class_children_female_passengers)
first_class_adult_count = get_count(first_class_adult_passengers)
first_class_adult_male_count = get_count(first_class_adult_male_passengers)
first_class_adult_female_count = get_count(first_class_adult_female_passengers)
first_class_elderly_count = get_count(first_class_elderly_passengers)
first_class_elderly_male_count = get_count(first_class_elderly_male_passengers)
first_class_elderly_female_count = get_count(first_class_elderly_female_passengers)

second_class_count = get_count(second_class_passengers)
second_class_male_count = get_count(second_class_male_passengers)
second_class_female_count = get_count(second_class_female_passengers)
second_class_children_count = get_count(second_class_children_passengers)
second_class_children_male_count = get_count(second_class_children_male_passengers)
second_class_children_female_count = get_count(second_class_children_female_passengers)
second_class_adult_count = get_count(second_class_adult_passengers)
second_class_adult_male_count = get_count(second_class_adult_male_passengers)
second_class_adult_female_count = get_count(second_class_adult_female_passengers)
second_class_elderly_count = get_count(second_class_elderly_passengers)
second_class_elderly_male_count = get_count(second_class_elderly_male_passengers)
second_class_elderly_female_count = get_count(second_class_elderly_female_passengers)

third_class_count = get_count(third_class_passengers)
third_class_male_count = get_count(third_class_male_passengers)
third_class_female_count = get_count(third_class_female_passengers)
third_class_children_count = get_count(third_class_children_passengers)
third_class_children_male_count = get_count(third_class_children_male_passengers)
third_class_children_female_count = get_count(third_class_children_female_passengers)
third_class_adult_count = get_count(third_class_adult_passengers)
third_class_adult_male_count = get_count(third_class_adult_male_passengers)
third_class_adult_female_count = get_count(third_class_adult_female_passengers)
third_class_elderly_count = get_count(third_class_elderly_passengers)
third_class_elderly_male_count = get_count(third_class_elderly_male_passengers)
third_class_elderly_female_count = get_count(third_class_elderly_female_passengers)

Finding Out Missing Fields

In [19]:
#The PassengerId is present for all the passengers, but there are some columns that has some missing values. Which columns 
#are they?
for column_id in titanic_all_data.columns.values:
    column_non_NaN_count = titanic_all_data[column_id].count()
    if column_non_NaN_count != sample_size:
        print str(column_id) + ": missing " + str(sample_size - column_non_NaN_count) + " values"
Age: missing 177 values
Cabin: missing 687 values
Embarked: missing 2 values

I am going to leave the blank cells empty, until I need to fill them with a value - if needed. For the time being, I am not sure for example if I should fill the missing ages with the mean age or just treat them as zeros. Let us wait and see what need would arise.

A word of caution about the ages in this data set

Not all passengers have known ages. An assumption: that the passengers whose age is not know are equally distributed along all age range, so the effect of missing this piece of information is minimal. For curiosity, below is provided the percentages of survivors and victims whose ages are unknown:

In [20]:
#Get the count of both survivors and victims whose age is null
survivors_without_age_count = surviving_passengers['Age'].isnull().sum()
victims_without_age_count = victim_passengers['Age'].isnull().sum()

#Count the total of survivors and victims, so we can calculate the percentage of passengers with missing ages
total_survivors_count = get_count(surviving_passengers)
total_victims_count = get_count(victim_passengers)

print "Percentage of survivors with missing age: ", float(survivors_without_age_count)*100/total_survivors_count,"%"
print "Percentage of victims with missing age:   ", float(victims_without_age_count)*100/total_victims_count,"%"
Percentage of survivors with missing age:  15.2046783626 %
Percentage of victims with missing age:    22.7686703097 %

Demographics

Some default columns, will be used as a dataframe column. These dataframes will be used to display the data in an HTML table format. Although there are alternatives to this, using the dataframe as a way to display tables was the easiest one.

Quick Statistics, All The Sample

Sample's Gender Make Up

In [21]:
#Rows to be displayed
rows = [{male_count, female_count, sample_size}]

#create the data frame, in preparation for HTML table display
df = pd.DataFrame(rows, columns=['Male', 'Female',"Total"], index=["count"])

#Display the dataframe as an HTML table. I had to use this function since just writing "df" on a single line
#will not display the table when there is a plot coming afterwards in the same cell.
display_html(df)

#Plot the pie chart, all passengers by gender
passengers_by_gender.size().plot.pie(label="Gender Ratio of All Passengers", autopct='%1.1f%%', colors=gender_colors)
plt.axis('equal')

#This command removes some floating numbers that appears
plt.tight_layout()
Male Female Total
count 577 314 891

Sample's Generation Make Up:

In [22]:
#I had a problem creating the dataframe in the same manner as the previous one, since the order was not preserved 
#(ie Children count was below column elderly for example)
unspecified_age_count = get_count(passengers_by_generation.get_group("Unspecified age"))

#Create the dataframe in preparation for HTML table display
df = pd.DataFrame({'Children':children_count, 'Adult':adult_count,'Elderly':elderly_count,'Unspecified Age':unspecified_age_count},\
                   index=["count"])

#Display the table
display_html(df)

#Plot the pie chart of all the passengers, grouped by their generation
passengers_by_generation.size().plot.pie(label="Generation Ratio of All Passengers", autopct='%1.1f%%', startangle=90)
plt.axis('equal')

#This command removes some floating numbers that appears
plt.tight_layout()
Adult Children Elderly Unspecified Age
count 540 100 74 177

Population Pyramid

The population pyramid of the passengers. Naturally, this excludes the passengers whose age was missing, so the total count here will not be equal to that of the whole sample.

In [23]:
#Age bins, from zero to 75 years old with 5 years increment
age_bins = range(0,75,5)

#Group both genders according to the age bin
grouped_female_ages = get_count(female_passengers.groupby( pd.cut( female_passengers["Age"], np.arange(0, 80, 5) ) ))
grouped_male_ages =   get_count(male_passengers.groupby( pd.cut( male_passengers["Age"], np.arange(0, 80, 5) ) ))

#Get the highest count. Required so that both plots would keep the same scale when displayed
largest_x_value = max(grouped_female_ages.max(), grouped_male_ages.max())

#A helper function defined in the beginning of the file. Plots the age pyramid
plot_population_pyramid(age_bins, "Male Population", grouped_male_ages, "Female Population", grouped_female_ages, largest_x_value)

A noticeable mode is seen on the 20 years old bucket, in both genders.

Sample's Class Make Up

In [24]:
#Create the dataframe in preparation for HTML table display
df = pd.DataFrame({'First Class':first_class_count,\
                   'Second Class':second_class_count,\
                   'Third Class':third_class_count}, index=["count"])

#Display the table
display_html(df)

#Plot the pie chart of all the passengers, grouped by class
passengers_by_class.size().plot.pie(label="Passengers, by Class", autopct='%1.1f%%', startangle=90)
plt.axis('equal')

#This command removes some floating numbers that appears
plt.tight_layout()
First Class Second Class Third Class
count 216 184 491

Sample's Survival

In [25]:
#Create the dataframe in preparation for HTML table display
df = pd.DataFrame({'Survivors': get_count(surviving_passengers) ,\
                   'Victims':   get_count(victim_passengers) }, index=["count"])

#Display the table
display_html(df)

#Plot the pie chart of all the passengers, grouped by class
#I am not goin to use the custom survival pie chart because I want to make the start angle = 90 degrees
passengers_by_survival.size().plot.pie(label="Passengers, by Survival", \
                                       autopct='%1.1f%%', \
                                       labels=['Victim', 'Survived'], \
                                       startangle=90,\
                                       colors=survival_colors)

plt.axis('equal')

#This command removes some floating numbers that appears
plt.tight_layout()
Survivors Victims
count 342 549

Sample's Companionship (Passengers Traveling Alone vs Passengers Traveling With First Degree Family Member(s) )

In [26]:
#Create the dataframe in preparation for HTML table display
df = pd.DataFrame({'In Group':get_count(surviving_passengers),\
                   'Solo   ': get_count(victim_passengers) }, index=["count"])

#Display the table
display_html(df)

#Plot the pie chart of all the passengers, grouped by class
get_count(titanic_all_data.groupby('isSolo')).plot.pie(labels=['Group', 'Solo'], \
                                                                   label = "Companionship", \
                                                                   autopct='%1.1f%%', \
                                                                   startangle=90)
plt.axis('equal')

#This command removes some floating numbers that appears
plt.tight_layout()
In Group Solo
count 342 549

Exploring The Data From the Generation Point of View

Sample's Generation, Broken Down by Gender

In [27]:
#Compute the unspecified age parameters.
unspecified_age_by_gender = passengers_by_generation.get_group("Unspecified age").groupby('Sex')
unspecified_age_male_count = get_count(unspecified_age_by_gender.get_group('male'))
unspecified_age_female_count = get_count(unspecified_age_by_gender.get_group('female'))

#Create the dataframe of Children, Adult, Elderly and Unspecified age's gender make up.
df = pd.DataFrame({'Children':[children_male_count, children_female_count], \
                   'Adult':[adult_male_count, adult_female_count],\
                   'Elderly':[elderly_male_count, elderly_female_count],\
                   'Unspecified Age':[unspecified_age_male_count, unspecified_age_female_count]}, \
                  index=["Male Count", "Female Count"])

#Display the dataframe as an HTML table
display_html(df)

#STart a multiplot plot
fig = plt.figure(figsize=(8,12))

#Children gender ratio pie plot
draw_pie_subplot( passengers_by_generation.get_group("Child").groupby('Sex'), 411, 'Children', 'GENDER_GRAPH')

#Adults gender ratio pie plot
draw_pie_subplot( passengers_by_generation.get_group("Adult").groupby('Sex'), 412, 'Adults', 'GENDER_GRAPH')

#Elderly  gender ratio pie plot
draw_pie_subplot( passengers_by_generation.get_group("Elderly").groupby('Sex'), 413, 'Elderly', 'GENDER_GRAPH')


#Unspecified age gender ratio pie plot
draw_pie_subplot( passengers_by_generation.get_group("Unspecified age").groupby('Sex'), 414, 'Unspecified Age', 'GENDER_GRAPH')

plt.tight_layout()
Adult Children Elderly Unspecified Age
Male Count 350 51 52 124
Female Count 49 49 22 53

Only the children passengers seemed to have a balanced proportion of both gender, all the rest have a higher proportion of men.

Survival, Broken Down by Generation

In [28]:
#Group each generation by survival.
children_by_survival = children_passengers.groupby('Survived')
adult_by_survival = adult_passengers.groupby('Survived')
elderly_by_survival = elderly_passengers.groupby('Survived')
unspecified_age_by_survival = passengers_by_generation.get_group("Unspecified age").groupby('Survived')

#Get the count of each group's survival
children_survived = get_count(children_by_survival.get_group(1))
children_victim = get_count(children_by_survival.get_group(0))

adult_survived = get_count(adult_by_survival.get_group(1))
adult_victim = get_count(adult_by_survival.get_group(0))

elderly_survived = get_count(elderly_by_survival.get_group(1))
elderly_victim = get_count(elderly_by_survival.get_group(0))

unspecified_age_survived = get_count(unspecified_age_by_survival.get_group(1))
unspecified_age_victim = get_count(unspecified_age_by_survival.get_group(0))



#Create the dataframe of Children, Adult, Elderly and Unspecified age's survival make up.
df = pd.DataFrame({'Children':[children_survived, children_victim], \
                   'Adult':[adult_survived, adult_victim],\
                   'Elderly':[elderly_survived, elderly_victim],\
                   'Unspecified Age':[unspecified_age_survived, unspecified_age_victim]}, \
                  index=["Survivors Count", "Victims Count"])

#Display the dataframe as an HTML table
display_html(df)

#Start a multiplot plot
fig = plt.figure(figsize=(8,12))

#Children gender ratio pie plot
draw_pie_subplot( passengers_by_generation.get_group("Child").groupby('Survived'), 411, 'Children', 'SURVIVAL_GRAPH')

#Adults gender ratio pie plot
draw_pie_subplot( passengers_by_generation.get_group("Adult").groupby('Survived'), 412, 'Adults', 'SURVIVAL_GRAPH')

#Elderly  gender ratio pie plot
draw_pie_subplot( passengers_by_generation.get_group("Elderly").groupby('Survived'), 413, 'Elderly', 'SURVIVAL_GRAPH')

#Unspecified age gender ratio pie plot
draw_pie_subplot( passengers_by_generation.get_group("Unspecified age").groupby('Survived'), 414, 'Unspecified', 'SURVIVAL_GRAPH')

plt.tight_layout()
Adult Children Elderly Unspecified Age
Survivors Count 208 55 27 52
Victims Count 332 45 47 125

Generation Survival, Broken Down by Gender

In [29]:
#####################
#Children
children_male_passengers = children_passengers[ children_passengers['Sex'] == 'male' ]
children_female_passengers = children_passengers[ children_passengers['Sex'] == 'female' ]

children_male_by_survival = children_male_passengers.groupby('Survived')
children_male_survived = get_count(children_male_by_survival.get_group(1))
children_male_victim = get_count(children_male_by_survival.get_group(0))

children_female_by_survival = children_female_passengers.groupby('Survived')
children_female_survived = get_count(children_female_by_survival.get_group(1))
children_female_victim = get_count(children_female_by_survival.get_group(0))


#####################
#Adults
adults_male_passengers = adult_passengers[ adult_passengers['Sex'] == 'male' ]
adults_female_passengers = adult_passengers[ adult_passengers['Sex'] == 'female' ]

adults_male_by_survival = adults_male_passengers.groupby('Survived')
adults_male_survived = get_count(adults_male_by_survival.get_group(1))
adults_male_victim = get_count(adults_male_by_survival.get_group(0))

adults_female_by_survival = adults_female_passengers.groupby('Survived')
adults_female_survived = get_count(adults_female_by_survival.get_group(1))
adults_female_victim = get_count(adults_female_by_survival.get_group(0))

#####################
#Elderly
elderly_male_passengers = elderly_passengers[ elderly_passengers['Sex'] == 'male' ]
elderly_female_passengers = elderly_passengers[ elderly_passengers['Sex'] == 'female' ]

elderly_male_by_survival = elderly_male_passengers.groupby('Survived')
elderly_male_survived = get_count(elderly_male_by_survival.get_group(1))
elderly_male_victim = get_count(elderly_male_by_survival.get_group(0))

elderly_female_by_survival = elderly_female_passengers.groupby('Survived')
elderly_female_survived = get_count(elderly_female_by_survival.get_group(1))
elderly_female_victim = get_count(elderly_female_by_survival.get_group(0))



#Create the dataframe of Children, Adult, Elderly and Unspecified age's gender make up.
df = pd.DataFrame({'Male Children':   [children_male_survived, children_male_victim], \
                   'Female Children': [children_female_survived, children_female_victim],\
                   'Male Adults':     [adults_male_survived, adults_male_victim],\
                   'Female Adults':   [adults_female_survived, adults_female_victim],\
                   'Male Elderly':    [elderly_male_survived, elderly_male_victim],\
                   'Female Elderly':  [elderly_female_survived, elderly_female_victim] }, index = ["Survived", "Victim"])


#Display the dataframe as an HTML table
display_html(df)


#Start a multiplot plot
fig = plt.figure(figsize=(15,12))

#Children pie plot
## Male children
draw_pie_subplot( children_male_by_survival, 321, 'Children Male', 'SURVIVAL_GRAPH')
##Female Children
draw_pie_subplot( children_female_by_survival, 322, 'Children Female', 'SURVIVAL_GRAPH')


#Adults pie plot
##Male Adults
draw_pie_subplot( adults_male_by_survival, 323, 'Adults Male', 'SURVIVAL_GRAPH')
##Female Adults
draw_pie_subplot( adults_female_by_survival, 324, 'Adults Female', 'SURVIVAL_GRAPH')


#Elderly  pie plot
##Male Elderly
draw_pie_subplot( elderly_male_by_survival, 325, 'Elderly Males', 'SURVIVAL_GRAPH')
##Female Elderly
draw_pie_subplot( elderly_female_by_survival, 326, 'Elderly Females', 'SURVIVAL_GRAPH')

plt.tight_layout()
Female Adults Female Children Female Elderly Male Adults Male Children Male Elderly
Survived 144 33 20 64 22 7
Victim 46 16 2 286 29 45

I find it interesting that even male children had a noticeably lower survival rate than female children.

Exploring The Sample From The Class Point of View

Population Pyramid of The Classes

In [30]:
#The code here follows the same logic for drawing the population pyramid for all the population above.
#But here, we are going to draw three graphs, one for each class
age_bins = range(0,75,5)

first_class_grouped_female_ages = get_count(first_class_female_passengers.groupby( pd.cut( first_class_female_passengers["Age"], np.arange(0, 80, 5) ) ))
first_class_grouped_male_ages =   get_count(first_class_male_passengers.groupby( pd.cut( first_class_male_passengers["Age"], np.arange(0, 80, 5) ) ))

second_class_grouped_female_ages = get_count(second_class_female_passengers.groupby( pd.cut( second_class_female_passengers["Age"], np.arange(0, 80, 5) ) ))
second_class_grouped_male_ages =   get_count(second_class_male_passengers.groupby( pd.cut( second_class_male_passengers["Age"], np.arange(0, 80, 5) ) ))

third_class_grouped_female_ages = get_count(third_class_female_passengers.groupby( pd.cut( third_class_female_passengers["Age"], np.arange(0, 80, 5) ) ))
third_class_grouped_male_ages =   get_count(third_class_male_passengers.groupby( pd.cut( third_class_male_passengers["Age"], np.arange(0, 80, 5) ) ))



fig = plt.figure(figsize=(7,7))

first_max_x = max(first_class_grouped_female_ages.max(), first_class_grouped_male_ages.max())
second_max_x = max(second_class_grouped_female_ages.max(), second_class_grouped_male_ages.max())
third_max_x = max(third_class_grouped_female_ages.max(), third_class_grouped_male_ages.max())

largest_x_value = max( [first_max_x, second_max_x, third_max_x] )

#Draw the population pyramid for the first class
fig.add_subplot(311)
plot_population_pyramid(age_bins, \
                        "1st classs Male Population", \
                        first_class_grouped_male_ages, \
                        "1st class Female Population", \
                        first_class_grouped_female_ages, \
                        largest_x_value)

#Draw the population pyramid for the second class
fig.add_subplot(312)
plot_population_pyramid(age_bins, \
                        "2nd class Male Population", \
                        second_class_grouped_male_ages, \
                        "2nd class Female Population", \
                        second_class_grouped_female_ages, \
                        largest_x_value)

#Draw the population pyramid for the third class
fig.add_subplot(313)
plot_population_pyramid(age_bins, \
                        "3rd class Male Population", \
                        third_class_grouped_male_ages, \
                        "3rd class Female Population", \
                        third_class_grouped_female_ages, \
                        largest_x_value)
  • I was not able to remove the first 3 emtpy plots, I think it is maybe a bug on PyPlot. More inverstigations will be needed.

The third class' population pyramid is very skewed towards the male, with the largest spikes occur in between 15 to 30 years old.

Classes, By Gender Ratio

In [31]:
#Create the dataframe of each class' gender make up.
df = pd.DataFrame({'First Class':  [first_class_male_count, first_class_female_count], \
                   'Second Class': [second_class_male_count, second_class_female_count],\
                   'Third Class':  [third_class_male_count, third_class_female_count]},\
                   index = ["Male", "Female"])


#Display the dataframe as an HTML table
display_html(df)


fig = plt.figure(figsize=(10,7))
#Plot the first class
draw_pie_subplot( first_class_passengers.groupby('Sex'), 131, 'First Class', 'GENDER_GRAPH')

#Plot the second class
draw_pie_subplot( second_class_passengers.groupby('Sex'), 132, 'Second Class', 'GENDER_GRAPH')

#Plot the third class
draw_pie_subplot( third_class_passengers.groupby('Sex'), 133, 'Third Class', 'GENDER_GRAPH')

plt.tight_layout()
First Class Second Class Third Class
Male 122 108 347
Female 94 76 144

Classes, By Generation Make Up

In [32]:
first_class_unspecified_age = first_class_count - (first_class_children_count + first_class_adult_count + first_class_elderly_count)
second_class_unspecified_age = second_class_count - (second_class_children_count + second_class_adult_count + second_class_elderly_count)
third_class_unspecified_age = third_class_count - (third_class_children_count + third_class_adult_count + third_class_elderly_count)

#Create the dataframe of Children, Adult, Elderly and Unspecified age's gender make up, by class.
df = pd.DataFrame({'Children':       [first_class_children_count, second_class_children_count, third_class_children_count], \
                   'Adults':         [first_class_adult_count, second_class_adult_count, third_class_adult_count],\
                   'Elderly':        [first_class_elderly_count, second_class_elderly_count, third_class_elderly_count],\
                   'Unspecified Age':[first_class_unspecified_age, second_class_unspecified_age, third_class_unspecified_age]},\
                   index = ["First Class", "Second Class", "Third Class"])


#Display the dataframe as an HTML table
display_html(df)


fig = plt.figure(figsize=(9,12))

#First Class
draw_pie_subplot(first_class_passengers.groupby('Generation'), 311, 'First Class by Generation', 'DEFAULT')

#Second Class
draw_pie_subplot(second_class_passengers.groupby('Generation'), 312, 'Second Class by Generation', 'DEFAULT')

#Third Class
draw_pie_subplot(third_class_passengers.groupby('Generation'), 313, 'Third Class by Generation', 'DEFAULT')


plt.tight_layout()
Adults Children Elderly Unspecified Age
First Class 133 9 44 30
Second Class 133 21 19 11
Third Class 274 70 44 103

It might be worth noting that the majority of the passengers of unspecified age come from the third class.

Classes, By Companionship

Class Survival

In [33]:
#Separate each class passengers by survival
first_class_by_survival = first_class_passengers.groupby("Survived")
second_class_by_survival = second_class_passengers.groupby("Survived")
third_class_by_survival = third_class_passengers.groupby("Survived")

#Get the count of both survivors and victims, by each class
first_class_surviving_count = get_count(first_class_by_survival.get_group(1))
first_class_victims_count = get_count(first_class_by_survival.get_group(0))

second_class_surviving_count = get_count(second_class_by_survival.get_group(1))
second_class_victims_count = get_count(second_class_by_survival.get_group(0))

third_class_surviving_count = get_count(third_class_by_survival.get_group(1))
third_class_victims_count = get_count(third_class_by_survival.get_group(0))


#Create the dataframe of Children, Adult, Elderly and Unspecified age's gender make up, by class.
df = pd.DataFrame({'Survived': [first_class_surviving_count, second_class_surviving_count, third_class_surviving_count], \
                   'Died':     [first_class_victims_count, second_class_victims_count, third_class_victims_count]},\
                   index = ["First Class", "Second Class", "Third Class"])

#Display the dataframe as an HTML table
display_html(df)


fig = plt.figure(figsize=(9,12))

#First Class
draw_pie_subplot( first_class_passengers.groupby('Survived'), 311, 'First Class by Survival', 'SURVIVAL_GRAPH')

#Second Class
draw_pie_subplot( second_class_passengers.groupby('Survived') , 312, 'Second Class by Survival', 'SURVIVAL_GRAPH')

#Third Class
draw_pie_subplot( third_class_passengers.groupby('Survived') , 313, 'Third Class by Survival', 'SURVIVAL_GRAPH')

plt.tight_layout()
Died Survived
First Class 80 136
Second Class 97 87
Third Class 372 119

Just visually, one can see that the better the class was, the higher the chances of survival were. But this is just superficially, from the pie charts. Other factors may be involved as well, like the total number of passengers and the gender makeup of each class (The percentage of males in the third class was much higher than the other two classes).

Survival: From The Point of View of The Generation And Gender

Class' Children Survival, Broken down By Both The Total And The Gender

In [34]:
#First, group each class by generation survival, AND by (generation and gender)'s survival
#####First Class
first_class_children_by_survival = first_class_children_passengers.groupby("Survived")
first_class_children_male_by_survival = first_class_children_male_passengers.groupby("Survived")
first_class_children_female_by_survival = first_class_children_female_passengers.groupby("Survived")

#####Second Class
second_class_children_by_survival = second_class_children_passengers.groupby("Survived")
second_class_children_male_by_survival = second_class_children_male_passengers.groupby("Survived")
second_class_children_female_by_survival = second_class_children_female_passengers.groupby("Survived")

#####Third Class
third_class_children_by_survival = third_class_children_passengers.groupby("Survived")
third_class_children_male_by_survival = third_class_children_male_passengers.groupby("Survived")
third_class_children_female_by_survival = third_class_children_female_passengers.groupby("Survived")

#Get the count of each category created in the cell above
first_class_children_surviving_count = get_surviving_count( first_class_children_by_survival)
first_class_children_victims_count = get_victim_count(first_class_children_by_survival)
first_class_children_male_surviving_count = get_surviving_count( first_class_children_male_by_survival)
first_class_children_male_victims_count = get_victim_count(first_class_children_male_by_survival)
first_class_children_female_surviving_count = get_surviving_count(first_class_children_female_by_survival)
first_class_children_female_victims_count = get_victim_count(first_class_children_female_by_survival)

second_class_children_surviving_count = get_surviving_count( second_class_children_by_survival)
second_class_children_victims_count = get_victim_count(second_class_children_by_survival)
second_class_children_male_surviving_count = get_surviving_count( second_class_children_male_by_survival)
second_class_children_male_victims_count = get_victim_count(second_class_children_male_by_survival)
second_class_children_female_surviving_count = get_surviving_count(second_class_children_female_by_survival)
second_class_children_female_victims_count = get_victim_count(second_class_children_female_by_survival)

third_class_children_surviving_count = get_surviving_count( third_class_children_by_survival)
third_class_children_victims_count = get_victim_count(third_class_children_by_survival)
third_class_children_male_surviving_count = get_surviving_count( third_class_children_male_by_survival)
third_class_children_male_victims_count = get_victim_count(third_class_children_male_by_survival)
third_class_children_female_surviving_count = get_surviving_count(third_class_children_female_by_survival)
third_class_children_female_victims_count = get_victim_count(third_class_children_female_by_survival)

#Create the dataframe of the classes' children by survival
df = pd.DataFrame({'Survived': [first_class_children_surviving_count, second_class_children_surviving_count, third_class_children_surviving_count], \
                   'Died':     [first_class_children_victims_count, second_class_children_victims_count, third_class_children_victims_count]},\
                   index = ["First Class", "Second Class", "Third Class"])

#Display the dataframe as an HTML table
display_html(df)

#Create the dataframe of the classes' children by survival
df = pd.DataFrame({'First Class':  [first_class_children_male_surviving_count, \
                                    first_class_children_male_victims_count, \
                                    first_class_children_female_surviving_count, \
                                    first_class_children_female_victims_count], \
                   
                   'Second Class': [second_class_children_male_surviving_count, \
                                    second_class_children_male_victims_count, \
                                    second_class_children_female_surviving_count, \
                                    second_class_children_female_victims_count],\
                   
                   'Third Class':  [third_class_children_male_surviving_count, \
                                    third_class_children_male_victims_count, \
                                    third_class_children_female_surviving_count, \
                                    third_class_children_female_victims_count]},\
                  
                   index = ["Male Survivor", "Male Victim", "Female Survivor", "Female Victim"] )

#Display the dataframe as an HTML table
display_html(df)


fig = plt.figure(figsize=(15,12))

#First Class
draw_pie_subplot( first_class_children_by_survival , 331, 'First Class Children, Total', 'SURVIVAL_GRAPH')
draw_pie_subplot( first_class_children_male_by_survival , 332, 'First Class Male Children', 'SURVIVAL_GRAPH')
draw_pie_subplot( first_class_children_female_by_survival , 333, 'First Class Female Children', 'SURVIVAL_GRAPH')

#Second Class
draw_pie_subplot( second_class_children_by_survival , 334, 'Second Class Children, Total', 'SURVIVAL_GRAPH')
draw_pie_subplot( second_class_children_male_by_survival , 335, 'Second Class Male Children', 'SURVIVAL_GRAPH')
draw_pie_subplot( second_class_children_female_by_survival , 336, 'Second Class Female Children', 'SURVIVAL_GRAPH')

#Third Class
draw_pie_subplot( third_class_children_by_survival , 337, 'Third Class Children, Total', 'SURVIVAL_GRAPH')
draw_pie_subplot( third_class_children_male_by_survival , 338, 'Third Class Male Children', 'SURVIVAL_GRAPH')
draw_pie_subplot( third_class_children_female_by_survival , 339, 'Third Class Female Children', 'SURVIVAL_GRAPH')
plt.tight_layout()
Died Survived
First Class 1 8
Second Class 2 19
Third Class 42 28
First Class Second Class Third Class
Male Survivor 3 9 10
Male Victim 0 2 27
Female Survivor 5 10 18
Female Victim 1 0 15

Total Children Survival, By Class

In [35]:
children_by_class = children_passengers.groupby('Pclass')

children_survived_by_class = children_by_class.apply(lambda t : t[t['Survived'] == 1]).groupby('Pclass')
children_drowned_by_class = children_by_class.apply(lambda t : t[t['Survived'] == 0]).groupby('Pclass')

fig = plt.figure(figsize=(9,5))

draw_pie_subplot( children_survived_by_class, 121, 'Surviving Children By Class', 'DEFAULT')

draw_pie_subplot( children_drowned_by_class, 122, 'Victim Children By Class', 'DEFAULT')

plt.tight_layout()

Class' Children Survival, Broken down By Both The Total And The Gender

In [36]:
#First, group each class by generation survival, AND by (generation and gender)'s survival
#####First Class
first_class_adult_by_survival = first_class_adult_passengers.groupby("Survived")
first_class_adult_male_by_survival = first_class_adult_male_passengers.groupby("Survived")
first_class_adult_female_by_survival = first_class_adult_female_passengers.groupby("Survived")

#####Second Class
second_class_adult_by_survival = second_class_adult_passengers.groupby("Survived")
second_class_adult_male_by_survival = second_class_adult_male_passengers.groupby("Survived")
second_class_adult_female_by_survival = second_class_adult_female_passengers.groupby("Survived")

#####Third Class
third_class_adult_by_survival = third_class_adult_passengers.groupby("Survived")
third_class_adult_male_by_survival = third_class_adult_male_passengers.groupby("Survived")
third_class_adult_female_by_survival = third_class_adult_female_passengers.groupby("Survived")

#Get the count of each category created in the cell above
first_class_adult_surviving_count = get_surviving_count( first_class_adult_by_survival)
first_class_adult_victims_count = get_victim_count(first_class_adult_by_survival)
first_class_adult_male_surviving_count = get_surviving_count( first_class_adult_male_by_survival)
first_class_adult_male_victims_count = get_victim_count(first_class_adult_male_by_survival)
first_class_adult_female_surviving_count = get_surviving_count(first_class_adult_female_by_survival)
first_class_adult_female_victims_count = get_victim_count(first_class_adult_female_by_survival)

second_class_adult_surviving_count = get_surviving_count( second_class_adult_by_survival)
second_class_adult_victims_count = get_victim_count(second_class_adult_by_survival)
second_class_adult_male_surviving_count = get_surviving_count( second_class_adult_male_by_survival)
second_class_adult_male_victims_count = get_victim_count(second_class_adult_male_by_survival)
second_class_adult_female_surviving_count = get_surviving_count(second_class_adult_female_by_survival)
second_class_adult_female_victims_count = get_victim_count(second_class_adult_female_by_survival)

third_class_adult_surviving_count = get_surviving_count( third_class_adult_by_survival)
third_class_adult_victims_count = get_victim_count(third_class_adult_by_survival)
third_class_adult_male_surviving_count = get_surviving_count( third_class_adult_male_by_survival)
third_class_adult_male_victims_count = get_victim_count(third_class_adult_male_by_survival)
third_class_adult_female_surviving_count = get_surviving_count(third_class_adult_female_by_survival)
third_class_adult_female_victims_count = get_victim_count(third_class_adult_female_by_survival)

#Create the dataframe of the classes' adult by survival
df = pd.DataFrame({'Survived': [first_class_adult_surviving_count, second_class_adult_surviving_count, third_class_adult_surviving_count], \
                   'Died':     [first_class_adult_victims_count, second_class_adult_victims_count, third_class_adult_victims_count]},\
                   index = ["First Class", "Second Class", "Third Class"])

#Display the dataframe as an HTML table
display_html(df)

#Create the dataframe of the classes' adult by survival
df = pd.DataFrame({'First Class':  [first_class_adult_male_surviving_count, \
                                    first_class_adult_male_victims_count, \
                                    first_class_adult_female_surviving_count, \
                                    first_class_adult_female_victims_count], \
                   
                   'Second Class': [second_class_adult_male_surviving_count, \
                                    second_class_adult_male_victims_count, \
                                    second_class_adult_female_surviving_count, \
                                    second_class_adult_female_victims_count],\
                   
                   'Third Class':  [third_class_adult_male_surviving_count, \
                                    third_class_adult_male_victims_count, \
                                    third_class_adult_female_surviving_count, \
                                    third_class_adult_female_victims_count]},\
                   index = ["Male Survivor", "Male Victim", "Female Survivor", "Female Victim"] )

#Display the dataframe as an HTML table
display_html(df)


fig = plt.figure(figsize=(15,12))

#First Class
draw_pie_subplot( first_class_adult_by_survival , 331, 'First Class Adult, Total', 'SURVIVAL_GRAPH')
draw_pie_subplot( first_class_adult_male_by_survival , 332, 'First Class Male Adult', 'SURVIVAL_GRAPH')
draw_pie_subplot( first_class_adult_female_by_survival , 333, 'First Class Female Adult', 'SURVIVAL_GRAPH')

#Second Class
draw_pie_subplot( second_class_adult_by_survival , 334, 'Second Class Adult, Total', 'SURVIVAL_GRAPH')
draw_pie_subplot( second_class_adult_male_by_survival , 335, 'Second Class Male Adult', 'SURVIVAL_GRAPH')
draw_pie_subplot( second_class_adult_female_by_survival , 336, 'Second Class Female Adult', 'SURVIVAL_GRAPH')

#Third Class
draw_pie_subplot( third_class_adult_by_survival , 337, 'Third Class Adult, Total', 'SURVIVAL_GRAPH')
draw_pie_subplot( third_class_adult_male_by_survival , 338, 'Third Class Male Adult', 'SURVIVAL_GRAPH')
draw_pie_subplot( third_class_adult_female_by_survival , 339, 'Third Class Female Adult', 'SURVIVAL_GRAPH')

plt.tight_layout()
Died Survived
First Class 39 94
Second Class 75 58
Third Class 218 56
First Class Second Class Third Class
Male Survivor 31 5 28
Male Victim 38 70 178
Female Survivor 63 53 28
Female Victim 1 5 40

Second Class males, to my surprise, had the toughest luck. I was expecting the third class males to have the toughest one.

Total Adults Survival, By Class

In [37]:
adult_by_class = adult_passengers.groupby('Pclass')

adult_survived_by_class = adult_by_class.apply(lambda t : t[t['Survived'] == 1]).groupby('Pclass')
adult_drowned_by_class = adult_by_class.apply(lambda t : t[t['Survived'] == 0]).groupby('Pclass')

fig = plt.figure(figsize=(9,5))

draw_pie_subplot( adult_survived_by_class, 121, 'Surviving Adult By Class', 'DEFAULT')

draw_pie_subplot( adult_drowned_by_class, 122, 'Victim Adult By Class', 'DEFAULT')

plt.tight_layout()

Class' Elderly Survival, Broken Down By Both The Total And The Gender

In [38]:
#First, group each class by generation survival, AND by (generation and gender)'s survival
#####First Class
first_class_elderly_by_survival = first_class_elderly_passengers.groupby("Survived")
first_class_elderly_male_by_survival = first_class_elderly_male_passengers.groupby("Survived")
first_class_elderly_female_by_survival = first_class_elderly_female_passengers.groupby("Survived")

#####Second Class
second_class_elderly_by_survival = second_class_elderly_passengers.groupby("Survived")
second_class_elderly_male_by_survival = second_class_elderly_male_passengers.groupby("Survived")
second_class_elderly_female_by_survival = second_class_elderly_female_passengers.groupby("Survived")

#####Third Class
third_class_elderly_by_survival = third_class_elderly_passengers.groupby("Survived")
third_class_elderly_male_by_survival = third_class_elderly_male_passengers.groupby("Survived")
third_class_elderly_female_by_survival = third_class_elderly_female_passengers.groupby("Survived")

#Get the count of each category created in the cell above
first_class_elderly_surviving_count = get_surviving_count( first_class_elderly_by_survival)
first_class_elderly_victims_count = get_victim_count(first_class_elderly_by_survival)
first_class_elderly_male_surviving_count = get_surviving_count( first_class_elderly_male_by_survival)
first_class_elderly_male_victims_count = get_victim_count(first_class_elderly_male_by_survival)
first_class_elderly_female_surviving_count = get_surviving_count(first_class_elderly_female_by_survival)
first_class_elderly_female_victims_count = get_victim_count(first_class_elderly_female_by_survival)

second_class_elderly_surviving_count = get_surviving_count( second_class_elderly_by_survival)
second_class_elderly_victims_count = get_victim_count(second_class_elderly_by_survival)
second_class_elderly_male_surviving_count = get_surviving_count( second_class_elderly_male_by_survival)
second_class_elderly_male_victims_count = get_victim_count(second_class_elderly_male_by_survival)
second_class_elderly_female_surviving_count = get_surviving_count(second_class_elderly_female_by_survival)
second_class_elderly_female_victims_count = get_victim_count(second_class_elderly_female_by_survival)

third_class_elderly_surviving_count = get_surviving_count( third_class_elderly_by_survival)
third_class_elderly_victims_count = get_victim_count(third_class_elderly_by_survival)
third_class_elderly_male_surviving_count = get_surviving_count( third_class_elderly_male_by_survival)
third_class_elderly_male_victims_count = get_victim_count(third_class_elderly_male_by_survival)
third_class_elderly_female_surviving_count = get_surviving_count(third_class_elderly_female_by_survival)
third_class_elderly_female_victims_count = get_victim_count(third_class_elderly_female_by_survival)

#Create the dataframe of the classes' elderly by survival
df = pd.DataFrame({'Survived': [first_class_elderly_surviving_count, second_class_elderly_surviving_count, third_class_elderly_surviving_count], \
                   'Died':     [first_class_elderly_victims_count, second_class_elderly_victims_count, third_class_elderly_victims_count]},\
                   index = ["First Class", "Second Class", "Third Class"])

#Display the dataframe as an HTML table
display_html(df)

#Create the dataframe of the classes' elderly by survival
df = pd.DataFrame({'First Class':  [first_class_elderly_male_surviving_count, \
                                    first_class_elderly_male_victims_count, \
                                    first_class_elderly_female_surviving_count, \
                                    first_class_elderly_female_victims_count], \
                   
                   'Second Class': [second_class_elderly_male_surviving_count, \
                                    second_class_elderly_male_victims_count, \
                                    second_class_elderly_female_surviving_count, \
                                    second_class_elderly_female_victims_count],\
                   
                   'Third Class':  [third_class_elderly_male_surviving_count, \
                                    third_class_elderly_male_victims_count, \
                                    third_class_elderly_female_surviving_count, \
                                    third_class_elderly_female_victims_count]},\
                   index = ["Male Survivor", "Male Victim", "Female Survivor", "Female Victim"] )

#Display the dataframe as an HTML table
display_html(df)


fig = plt.figure(figsize=(15,12))
#First Class
draw_pie_subplot( first_class_elderly_by_survival , 331, 'First Class Elderly, Total', 'SURVIVAL_GRAPH')
draw_pie_subplot( first_class_elderly_male_by_survival , 332, 'First Class Male Elderly', 'SURVIVAL_GRAPH')
draw_pie_subplot( first_class_elderly_female_by_survival , 333, 'First Class Female Elderly', 'SURVIVAL_GRAPH')

#Second Class
draw_pie_subplot( second_class_elderly_by_survival , 334, 'Second Class Elderly, Total', 'SURVIVAL_GRAPH')
draw_pie_subplot( second_class_elderly_male_by_survival , 335, 'Second Class Male Elderly', 'SURVIVAL_GRAPH')
draw_pie_subplot( second_class_elderly_female_by_survival , 336, 'Second Class Female Elderly', 'SURVIVAL_GRAPH')

#Third Class
draw_pie_subplot( third_class_elderly_by_survival , 337, 'Third Class Elderly, Total', 'SURVIVAL_GRAPH')
draw_pie_subplot( third_class_elderly_male_by_survival , 338, 'Third Class Male Elderly', 'SURVIVAL_GRAPH')
draw_pie_subplot( third_class_elderly_female_by_survival , 339, 'Third Class Female Elderly', 'SURVIVAL_GRAPH')

plt.tight_layout()
Died Survived
First Class 24 20
Second Class 13 6
Third Class 24 20
First Class Second Class Third Class
Male Survivor 6 1 6
Male Victim 23 12 23
Female Survivor 14 5 14
Female Victim 1 1 1

Total Elderly Survival, By Class

In [39]:
elderly_by_class = elderly_passengers.groupby('Pclass')

elderly_survived_by_class = elderly_by_class.apply(lambda t : t[t['Survived'] == 1]).groupby('Pclass')
elderly_drowned_by_class = elderly_by_class.apply(lambda t : t[t['Survived'] == 0]).groupby('Pclass')

fig = plt.figure(figsize=(9,5))

draw_pie_subplot( elderly_survived_by_class, 121, 'Victim Elderly By Class', 'DEFAULT')

draw_pie_subplot( elderly_drowned_by_class, 122, 'Victim Elderly By Class', 'DEFAULT')

plt.tight_layout()

Survival

Because of the tragedy, I feel that we should look again in the survival of the individual. Some of the data that will be presented here can be redundunt with what have been presented above, already. But I think we should still re-examine the survival from angles.

Survivors vs Victims age Pyramid

In [40]:
age_bins = range(0,75,5)

survivors_ages = get_count(surviving_passengers.groupby( pd.cut( surviving_passengers["Age"], np.arange(0, 80, 5) ) ))
victims_ages = get_count(victim_passengers.groupby( pd.cut( victim_passengers["Age"], np.arange(0, 80, 5) ) ))

largest_x_value = max(survivors_ages.max(), victims_ages.max() )

plot_population_pyramid(age_bins, "Survivors", survivors_ages, "Victims", victims_ages, largest_x_value)

Male Survivors vs Male Victims Age Pyramid

In [41]:
surviving_males = surviving_passengers[ surviving_passengers['Sex'] == 'male']
victim_males = victim_passengers[ victim_passengers['Sex'] == 'male']

age_bins = range(0,75,5)

surviving_males_ages = get_count(surviving_males.groupby( pd.cut( surviving_males["Age"], np.arange(0, 80, 5) ) ))
victim_males_ages = get_count(victim_males.groupby( pd.cut( victim_males["Age"], np.arange(0, 80, 5) ) ))

largest_x_value = max(surviving_males_ages.max(), victim_males_ages.max() )

plot_population_pyramid(age_bins, "Surviving Males", surviving_males_ages, "Victim Males", victim_males_ages, largest_x_value)

Female Survivors vs Female Victims Age Pyramid

In [42]:
surviving_females = surviving_passengers[ surviving_passengers['Sex'] == 'female']
victim_females = victim_passengers[ victim_passengers['Sex'] == 'female']

age_bins = range(0,75,5)

surviving_females_ages = get_count(surviving_females.groupby( pd.cut( surviving_females["Age"], np.arange(0, 80, 5) ) ))
victim_females_ages = get_count(victim_females.groupby( pd.cut( victim_females["Age"], np.arange(0, 80, 5) ) ))

largest_x_value = max(surviving_females_ages.max(), victim_females_ages.max() )

plot_population_pyramid(age_bins, \
                        "Surviving Females", \
                        surviving_females_ages, \
                        "Victim Females", \
                        victim_females_ages, \
                        largest_x_value)

How Were The Boats Divided Among Different Classes ?

In [43]:
fig = plt.figure()


fig.add_subplot(121)
titanic_all_data['Survived'].groupby(titanic_all_data['Pclass']).sum().plot.pie(label = 'Total Passengers Saved, by Class', \
                                                                                autopct='%1.1f%%')
plt.axis('equal')

fig.add_subplot(122)
titanic_all_data['Survived'].groupby(titanic_all_data['Pclass']).sum().plot.bar(label = 'Total Passengers Saved, by Class')

plt.tight_layout()

It looks much more fair than what I have expected. Of course one can argue here that, the majority of women and children of the upper classes were already saved, and this is why the pie chart looks evenly distributed; but I would still say that it is not as bad as I thought before I have plot these graphs.

Survivors, By Generation, General

In [44]:
fig = plt.figure(figsize=(9,5))


fig.add_subplot(121)
titanic_all_data['Survived'].groupby(titanic_all_data['Generation']).sum().plot.pie(label = 'Total Passengers Saved, by Generation',\
                                                                                    autopct='%1.1f%%')
plt.axis('equal')

fig.add_subplot(122)
titanic_all_data['Survived'].groupby(titanic_all_data['Generation']).sum().plot.bar(label = 'Total Passengers Saved, by Generation')

plt.tight_layout()

Victims, By Generation, General

In [45]:
fig = plt.figure(figsize=(9,5))

fig.add_subplot(121)
victim_passengers['Survived'].groupby(victim_passengers['Generation']).count().plot.pie(label = 'Victims, by Generation', \
                                                                                        autopct='%1.1f%%')
plt.axis('equal')

fig.add_subplot(122)
victim_passengers['Survived'].groupby(victim_passengers['Generation']).count().plot.bar(label = 'Victims, by Generation')

plt.tight_layout()

Survivors, By Generation and Class

In [46]:
fig = plt.figure(figsize=(15,6))

fig.add_subplot(131)
#First class plot
first_class_survivors['Survived'].groupby(first_class_survivors['Generation']).count().plot.pie(label = 'First Class', \
                                                                                                autopct='%1.1f%%')
plt.axis('equal')

fig.add_subplot(132)
#Second class plot
second_class_survivors['Survived'].groupby(second_class_survivors['Generation']).count().plot.pie(label = 'Second Class', \
                                                                                                  autopct='%1.1f%%')
plt.axis('equal')

fig.add_subplot(133)
#Third class plot
third_class_survivors['Survived'].groupby(third_class_survivors['Generation']).count().plot.pie(label = 'Third Class', \
                                                                                                autopct='%1.1f%%')
plt.axis('equal')

plt.tight_layout()

Victims, By Generation and Class

In [47]:
fig = plt.figure(figsize=(15,6))

fig.add_subplot(131)
#First class plot
first_class_victims['Survived'].groupby(first_class_victims['Generation']).count().plot.pie(label = 'First Class', \
                                                                                            autopct='%1.1f%%')
plt.axis('equal')

fig.add_subplot(132)
#Second class plot
second_class_victims['Survived'].groupby(second_class_victims['Generation']).count().plot.pie(label = 'Second Class', \
                                                                                              autopct='%1.1f%%')
plt.axis('equal')

fig.add_subplot(133)
#Third class plot
third_class_victims['Survived'].groupby(third_class_victims['Generation']).count().plot.pie(label = 'Third Class', \
                                                                                            autopct='%1.1f%%')
plt.axis('equal')

plt.tight_layout()

Travel Companionship

Now, after gaining some intuition about the data, one final parameter is left to explore so we can start answering our first question: companionship.

Solo Travelers vs Group Travelers, Total

In [48]:
solo_passengers_count =  get_count(passengers_by_companionship.get_group(True))
group_passengers_count = get_count(passengers_by_companionship.get_group(False))


df = pd.DataFrame({'Count': [solo_passengers_count, group_passengers_count]}, index = ['Solo', 'Group'])

display_html(df)

get_count(titanic_all_data.groupby('isSolo')).plot.pie(labels=['Group', 'Solo'], \
                                                                   label = "Companionship", \
                                                                   autopct='%1.1f%%')
plt.axis('equal')

plt.tight_layout()
Count
Solo 521
Group 370

Travel Companionship, by Class Make Up

In [49]:
group_travelers_by_class = passengers_by_class.apply(lambda t : t[t['isSolo'] == False]).groupby('Pclass')
solo_travelers_by_class = passengers_by_class.apply(lambda t : t[t['isSolo'] == True]).groupby('Pclass')


fig = plt.figure(figsize=(9,5))

draw_pie_subplot( group_travelers_by_class, 121, 'Group Travelers By Class', 'DEFAULT')
draw_pie_subplot( solo_travelers_by_class, 122, 'Solo Travelers By Class', 'DEFAULT')

plt.tight_layout()

Solo Travelers vs Group Travelers, By Class

In [50]:
solo_passengers_count =  get_count(passengers_by_companionship.get_group(True))
group_passengers_count = get_count(passengers_by_companionship.get_group(False))

first_class_solo_passengers_count =  get_count(first_class_passengers[ first_class_passengers['isSolo'] == True])
first_class_group_passengers_count =  get_count(first_class_passengers[ first_class_passengers['isSolo'] == False])

second_class_solo_passengers_count =  get_count(second_class_passengers[ second_class_passengers['isSolo'] == True])
second_class_group_passengers_count =  get_count(second_class_passengers[ second_class_passengers['isSolo'] == False])

third_class_solo_passengers_count =  get_count(third_class_passengers[ third_class_passengers['isSolo'] == True])
third_class_group_passengers_count =  get_count(third_class_passengers[ third_class_passengers['isSolo'] == False])

df = pd.DataFrame({'First Class':  [first_class_solo_passengers_count, first_class_group_passengers_count], \
                   'Second Class': [second_class_solo_passengers_count, second_class_group_passengers_count], \
                   'Third Class':  [third_class_solo_passengers_count, third_class_group_passengers_count]}, \
                  index = ['Group', 'Solo'])

display_html(df)

fig = plt.figure(figsize=(12,5))

fig.add_subplot(131)
get_count(first_class_passengers.groupby('isSolo')).plot.pie(label = 'First Class', \
                                                                         labels=['Group', 'Solo'],\
                                                                         autopct='%1.1f%%')
plt.axis('equal')

fig.add_subplot(132)
get_count(second_class_passengers.groupby('isSolo')).plot.pie(label = 'Second Class', \
                                                                         labels=['Group', 'Solo'],\
                                                                          autopct='%1.1f%%')
plt.axis('equal')

fig.add_subplot(133)
get_count(third_class_passengers.groupby('isSolo')).plot.pie(label = 'Third Class', \
                                                                         labels=['Group', 'Solo'],\
                                                                         autopct='%1.1f%%')
plt.axis('equal')

plt.tight_layout()
First Class Second Class Third Class
Group 108 102 311
Solo 108 82 180

Solo Travelers Age Pyramid

In [51]:
age_bins = range(16,76,4)

solo_female_passengers = female_passengers[ female_passengers['isSolo'] == True ]
solo_male_passengers = male_passengers[ male_passengers['isSolo'] == True ]

grouped_female_ages = get_count(solo_female_passengers.groupby( pd.cut( solo_female_passengers["Age"], np.arange(16, 80, 4) ) ))
grouped_male_ages =   get_count(solo_male_passengers.groupby( pd.cut( solo_male_passengers["Age"], np.arange(16, 80, 4) ) ))

largest_x_value = max(grouped_female_ages.max(), grouped_male_ages.max())

plot_population_pyramid(age_bins, \
                        "Solo Male Population", \
                        grouped_male_ages, \
                        "Solo Female Population", \
                        grouped_female_ages, \
                        largest_x_value)

Women Traveling Companionship Pattern

I got curious about solo women traveling on the ship, since the accident happened during a time when the culture was more conservative than nowadays.

In [52]:
get_count(passengers_by_gender.get_group('female').groupby('isSolo')).plot.pie(labels=['Group', 'Solo'], \
                                                                                           label = " Women Companionship", \
                                                                                           autopct='%1.1f%%')
plt.axis('equal')

plt.tight_layout()

group_women_count = get_count(passengers_by_gender.get_group('female').groupby('isSolo'))[0]
solo_women_count = get_count(passengers_by_gender.get_group('female').groupby('isSolo'))[1]

print "Women traveling alone:      " + str(solo_women_count) + "  Percentage: " + str ( format((float(solo_women_count) / female_count )*100.0, '.2f')) + "%"
print "Women traveling in a group: "+ str(group_women_count) + "  Percentage: " + str( format((float(group_women_count) / female_count )*100.0, '.2f')) + "%"
Women traveling alone:      118  Percentage: 37.58%
Women traveling in a group: 196  Percentage: 62.42%

Women Traveling Companionship, by Class

In [53]:
fig = plt.figure(figsize=(12,5))

fig.add_subplot(131)
get_count(first_class_female_passengers.groupby('isSolo')).plot.pie(labels=['Group', 'Solo'],\
                                                                            label = 'First Class', \
                                                                            autopct='%1.1f%%')
plt.axis('equal')

fig.add_subplot(132)
get_count(second_class_female_passengers.groupby('isSolo')).plot.pie(labels=['Group', 'Solo'],\
                                                                             label = 'Second Class', \
                                                                             autopct='%1.1f%%')
plt.axis('equal')

fig.add_subplot(133)
get_count(third_class_female_passengers.groupby('isSolo')).plot.pie(labels=['Group', 'Solo'],\
                                                                            label = 'Third Class', \
                                                                            autopct='%1.1f%%')
plt.axis('equal')

plt.tight_layout()

Solo Women Traveling, by Class

In [54]:
first_class_solo_women_passengers =  solo_female_passengers[ solo_female_passengers['Pclass'] == 1 ]
second_class_solo_women_passengers =  solo_female_passengers[ solo_female_passengers['Pclass'] == 2 ]
third_class_solo_women_passengers =  solo_female_passengers[ solo_female_passengers['Pclass'] == 3 ]

fig = plt.figure(figsize=(7,5))
fig.add_subplot(121)
get_count(solo_female_passengers.groupby('Pclass')).plot.pie(label = 'Solo Women by Class',\
                                                                         labels = ['1st Class', '2nd Class', '3rd Class'],\
                                                                                autopct='%1.1f%%')
plt.axis('equal')

fig.add_subplot(122)
get_count(solo_female_passengers.groupby('Pclass')).plot.bar()

plt.tight_layout()

The class, visually, does not look to like it has an effect over the women's traveling companionship pattern.

Men Traveling Companionship Pattern

Now, let us have a closer look at the male's traveling companionship; just to make the picture complete:

In [55]:
get_count(passengers_by_gender.get_group('male').groupby('isSolo')).plot.pie(labels=['Group', 'Solo'], \
                                                                                         label = " Men Companionship", \
                                                                                         autopct='%1.1f%%')
plt.axis('equal')

plt.tight_layout()

group_men_count = get_count(passengers_by_gender.get_group('male').groupby('isSolo'))[0]
solo_men_count = get_count(passengers_by_gender.get_group('male').groupby('isSolo'))[1]

print "Men traveling alone:      " + str(solo_men_count) + "  Percentage: " + str ( format((float(solo_men_count) / male_count )*100.0, '.2f')) + "%"
print "Men traveling in a group: "+ str(group_men_count) + "  Percentage: " + str( format((float(group_men_count) / male_count )*100.0, '.2f')) + "%"
Men traveling alone:      403  Percentage: 69.84%
Men traveling in a group: 174  Percentage: 30.16%

Men Traveling Companionship, by Class

In [56]:
fig = plt.figure(figsize=(12,5))

fig.add_subplot(131)
get_count(first_class_male_passengers.groupby('isSolo')).plot.pie(labels=['Group', 'Solo'],\
                                                                              label = 'First Class', \
                                                                              autopct='%1.1f%%')
plt.axis('equal')

fig.add_subplot(132)
get_count(second_class_male_passengers.groupby('isSolo')).plot.pie(labels=['Group', 'Solo'],\
                                                                               label = 'Second Class', \
                                                                               autopct='%1.1f%%')
plt.axis('equal')

fig.add_subplot(133)
get_count(third_class_male_passengers.groupby('isSolo')).plot.pie(labels=['Group', 'Solo'],\
                                                                              label = 'Third Class', \
                                                                              autopct='%1.1f%%')
plt.axis('equal')

plt.tight_layout()

Solo Men Traveling, by Class

In [57]:
first_class_solo_men_passengers =  solo_male_passengers[ solo_male_passengers['Pclass'] == 1 ]
second_class_solo_men_passengers =  solo_male_passengers[ solo_male_passengers['Pclass'] == 2 ]
third_class_solo_men_passengers =  solo_male_passengers[ solo_male_passengers['Pclass'] == 3 ]

fig = plt.figure()
fig.add_subplot(121)
get_count(solo_male_passengers.groupby('Pclass')).plot.pie(label = 'Solo Women by Class', \
                                                                       autopct='%1.1f%%')
plt.axis('equal')

fig.add_subplot(122)
get_count(solo_male_passengers.groupby('Pclass')).plot.bar()

plt.tight_layout()

Survival By Companionship

Now, down to the visualization of our first question: how does survival chance look like when seen through the lense of companionship?

In [58]:
fig = plt.figure(figsize=(9,6))

draw_pie_subplot( surviving_passengers.groupby(surviving_passengers['isSolo']) , 121, 'Solo Survival', 'SURVIVAL_GRAPH')

draw_pie_subplot( victim_passengers.groupby(victim_passengers['isSolo']) , 122, 'Group Survival', 'SURVIVAL_GRAPH')

plt.tight_layout()

Analysis

Q1: What is the effect of Traveling Companionship Over the Survival of a Grown Up Passengers ?

Descriptive Statistics

Total Companions, For All Passengers

In [59]:
titanic_all_data['TotalCompanions'].describe()
Out[59]:
count    891.000000
mean       0.904602
std        1.613459
min        0.000000
25%        0.000000
50%        0.000000
75%        1.000000
max       10.000000
Name: TotalCompanions, dtype: float64

Total Companions, For Group Travelers

In [60]:
group_passengers = passengers_by_companionship.get_group(False);
solo_passengers = passengers_by_companionship.get_group(True)

group_passengers['TotalCompanions'].describe()
Out[60]:
count    370.000000
mean       2.178378
std        1.869906
min        0.000000
25%        1.000000
50%        2.000000
75%        2.000000
max       10.000000
Name: TotalCompanions, dtype: float64

I think it is noteworthy to remind here that, the data has 0 as its minimum for the total number of companions (although we are examining the group passengers) is due to the choice of including children with the group travelers.

The statistics for solo travelers will not be performed, since all the data (Except to the count) is equal to zero

In [61]:
group_passengers['Survived'].describe()
Out[61]:
count    370.000000
mean       0.505405
std        0.500648
min        0.000000
25%        0.000000
50%        1.000000
75%        1.000000
max        1.000000
Name: Survived, dtype: float64
In [62]:
solo_passengers['Survived'].describe()
Out[62]:
count    521.000000
mean       0.297505
std        0.457600
min        0.000000
25%        0.000000
50%        0.000000
75%        1.000000
max        1.000000
Name: Survived, dtype: float64

Hypothesis Testing:

The contingency table will be built so we can examine the relationship and perform the hypothesis testing

In [63]:
#Expected frequency: Total survival, total drowning
#Solo
group_passengers_survived_count = get_count(group_passengers.groupby('Survived').get_group(1))
group_passengers_victims_count = get_count(group_passengers.groupby('Survived').get_group(0))

#Group
solo_passengers_survived_count = get_count(solo_passengers.groupby('Survived').get_group(1))
solo_passengers_victims_count = get_count(solo_passengers.groupby('Survived').get_group(0))

# passengers_by_survival
survivors_count_list = [group_passengers_survived_count, solo_passengers_survived_count,  total_survivors_count]
victims_count_list = [group_passengers_victims_count, solo_passengers_victims_count, total_victims_count]
total_count_list = [get_count(group_passengers), get_count(solo_passengers), sample_size]

#Had to pass the - redundunt - column argument to skip Pandas default ordering
contingency_table = pd.DataFrame({'Survived': survivors_count_list , \
                                  'Victims': victims_count_list , \
                                  'Total, Group': total_count_list},\
                                   index = ['Group', 'Solo', 'Total, State'], columns = [ 'Survived', 'Victims', 'Total, Group'])

display_html(contingency_table)
Survived Victims Total, Group
Group 187 183 370
Solo 155 366 521
Total, State 342 549 891

Now, all is set to calculate the Χ2

In [64]:
chi_squared, p, degrees_of_freedom, expected_frequency = scipy.stats.chi2_contingency( contingency_table )

print "Chi Squared: ", chi_squared
print "p value: ", p
print "Degrees of Freedom", degrees_of_freedom
print "Expected Frequency for The Group Passengers:", expected_frequency[0]
print "Expected Frequency for The Solo Passengers:", expected_frequency[1]
Chi Squared:  39.539412799
p value:  5.38959286299e-08
Degrees of Freedom 4
Expected Frequency for The Group Passengers: [ 142.02020202  227.97979798  370.        ]
Expected Frequency for The Solo Passengers: [ 199.97979798  321.02020202  521.        ]

The statistical significance here is very high, but I this can be affected by children: Probably there are a good portion of children traveling with their families, and that can lead to a higher than usual women and children presence in the group passengers, and lower than usual presence of women in the solo group (By our definition of a solo group, a child cannot be solo even if Parch and SibSp are equal to zero). Let us examine how much women and children are present in the groups.

In [65]:
women_and_children_group_passengers = titanic_all_data[ (titanic_all_data['Sex'] == 'female') | \
                                                       (titanic_all_data['Generation'] == 'Child')]
women_and_children_group_passengers_count = get_count(women_and_children_group_passengers)

print "Number of women and children in the group travelers: ", women_and_children_group_passengers_count
print "Percentage of women and children in the group traveler: ",\
(float(women_and_children_group_passengers_count)/ get_count(group_passengers))*100.0 , "%"
Number of women and children in the group travelers:  365
Percentage of women and children in the group traveler:  98.6486486486 %

I was a bit shocked at first about such a high percentage. The percentage of grown up men(Males above 17 years old) constitute less than 2% of the total passengers that are traveling within their immediate family. But after a second thought, that can make sense because there could be only one grown up man within a family made of both parents and their children.

There could be other instances, of course, than a family made of both parents and children that can bring more grown up men to the count, like adult male brothers traveling together or a grown up son traveling with his elderly father

I will try to refine the hypothesis, by excluding children from the group data, and just compare grown up passengers (adults and elderly) from both groups.

Note: The two grownup variables do not have the members whose age is unknown, they contain only members that we know for sure that they are adults

In [66]:
passengers_by_generation = titanic_all_data.groupby('Generation')

#The output will be a dictionary, with keys Adult and Elderly
passengers_by_grown_up = {key: value for key, value in passengers_by_generation if key in ['Adult', 'Elderly']}

#Now, expand the values of the dictionary into a dataframe
grown_up_passengers = pd.concat([grown_up_passengers_values for keys, grown_up_passengers_values in passengers_by_grown_up.items()])

#Get the count of survival. This will be used as our expected count for the Chi square computation
grown_up_survival_count = get_count(grown_up_passengers.groupby('Survived'))

#The append function did not behave as I expected, ie it did not perform its action in place nor did it have an inplace parameter. 
#I had to take the whole thing and make it equal to the Series in question
#Also, the index = [2] is done to prevent index duplicates, otherwise the total will be added with index 0. It would not really 
#affect the calculation, but it affected when I wanted to rename the indices for readability
grown_up_survival_count = grown_up_survival_count.append( pd.Series(len(grown_up_passengers), index = [2] )) 

#Separate passengers by companionship
group_grown_up_passengers = grown_up_passengers.groupby('isSolo').get_group(0)
solo_grown_up_passengers = grown_up_passengers.groupby('isSolo').get_group(1)

#Get the survival for each group and solo grown up passengers
group_grown_up_passengers_survival_count = get_count(group_grown_up_passengers.groupby('Survived'))
solo_grown_up_passengers_survival_count = get_count(solo_grown_up_passengers.groupby('Survived'))

#Append the total to each series
group_grown_up_passengers_survival_count = group_grown_up_passengers_survival_count.append( pd.Series(len(group_grown_up_passengers), index = [2] )) 
solo_grown_up_passengers_survival_count = solo_grown_up_passengers_survival_count.append( pd.Series(len(solo_grown_up_passengers), index = [2] ))


#Build the contingency table
contingency_table = pd.concat([ group_grown_up_passengers_survival_count, solo_grown_up_passengers_survival_count, grown_up_survival_count],\
                              axis=1, \
                              keys ={'Group','Solo','Total'})

contingency_table.rename(index= { 0 : 'Victims', 1 : 'Survivors', 2 : 'Total'},\
                         inplace = True)

contingency_table
Out[66]:
Solo Total Group
Survived
Victims 113 266 379
Survivors 113 122 235
Total 226 388 614

Next, it is the Chi square for the adult passengers, by companionship:

In [67]:
chi_squared, p, degrees_of_freedom, expected_frequency = scipy.stats.chi2_contingency( contingency_table )

print "Chi Squared: ", chi_squared
print "p value: ", p
print "Degrees of Freedom", degrees_of_freedom
print "Expected Frequency for The Group Passengers:", expected_frequency[0]
print "Expected Frequency for The Solo Passengers:", expected_frequency[1]
Chi Squared:  20.8162744573
p value:  0.000344364339067
Degrees of Freedom 4
Expected Frequency for The Group Passengers: [ 139.50162866  239.49837134  379.        ]
Expected Frequency for The Solo Passengers: [  86.49837134  148.50162866  235.        ]

Well, the results were shifted towards the mean by orders of magnitude after removing the children, but still they are very statistically significant even under the most conservative standards. The chances to get such a sample is about 1 in 3000, very low indeed.

Calculate The Effect Size (Cohen d)

In [68]:
avergae_std = (group_passengers['Survived'].std() + solo_passengers['Survived'].std()) / 2
cohens_d = abs(solo_passengers['Survived'].mean() - group_passengers['Survived'].mean() )/avergae_std
print "Cohen d: ", cohens_d
Cohen d:  0.43391833554

Conclusion For Q1:

**p < α**
**Χ2 < Χ2Critical**

Χ2 = 20.82, p < 0.001, two tailed

Effect Size Measures:

  • d = 0.43

NB: I have based this way of writing up the conclusion from the book "Statistics in plain english"

A chi-square analysis was performed to determine whether traveling companionship affected the chances of survival for a passenger.The analysis produced a significant χ2 value (39.54, df = 4, p < .001), indicating that traveling with first degree family affected the chances of survival. The question was then refined, by removing the children passengers since they had the best survival chances, and were absent from the Solo passengers group because of how we defined who is a solo traveler. The new χ2 value remained significant (20.82, df = 4, p < .001) although less than the original question by orders of magnitude. Therefore, we must reject the null hypothesis.

The limitation I see in this analysis is the gender bias between the two groups. The majority of solo travelers are men (And, mostly from the third class), and that can be an alternative explanation for the significance. It might have been appropriate to explore the question more, by separating gender from each group and then compare each gender from each group to each other (ie Solo women vs Group women AND Solo men vs Group men), but unfortunately the count of adult men within the group traveler was too low to perform such a test. If the dataset was complete, such a test may have been feasible.

The limitation I have found is that, since the question under investigation involved categorizing passengers based over their age, missing ages meant that the test was not done all over the sample, but only a fraction of it. I have opted to omit passengers with missing ages rather than making any other assumptions (Like assuming their age is mean\median, or even assume that their age distribution is the same as the that of the bigger sample) since I had a good amount of data already to run the test; so there was no need to take any risks by making extra assumptions.

The data field that I wanted to have, although it can be hard to get - especially for adult passengers -, is who are they traveling with other than the immediate family. Cousins, friends..etc could have proved useful for such a question. The reason I have picked this question from the first place is that, I imagined what might have been the situation on deck during such a hard time, and I though that a group of people who care for each other can be useful in such an extreme situation: they can push, fight, beg or even something illegal like bribe to save all the rest of the group, a priviliged support that a solo traveler would not have. Of course the immediate family might be the most aggressive\protective, but still cousins and close friends can provide a comparable support.

Q2: Is It Possible From The Provided Data To Identify Family Members?

The problem with this question is that this is only a partial data, so numbers will not add up: there could be other family members in the other set. But I will proceed anyway. So here are my assumptions:

  • Same family travels within the same class
  • Same family bears the same last name
  • Same family share the same number of TotalCompanions
  • Children with TotalCompanions == 0 will be excluded. Of course common sense tells that they are not traveling alone, but the thing is that, according to the provided data, they are not traveling with their immediate family.
In [69]:
group_family = group_passengers[ group_passengers['TotalCompanions'] > 0 ]
group_by_family_size = group_family.groupby(['TotalCompanions','Pclass', 'LastName'])#, sort=False)



#We are going to create three lists, one for all the families whose members have NOT survived
#Another for the families which had some members who survived but not all
#The third is for the families whose all members were able to make it
families_totally_perished = []
families_partially_saved = []
families_totally_saved = []

#Now, let's loop over the groupby of family by size created above, and check which family belongs 
#to which of the three lists created above

#Loop over the group
#tc = TotalCompanions, pcls = pclass, lname = LastName. There was no need to unpack the keys, but I am going to leave it
#that way.
for (tc, pcls, lname), data in group_by_family_size:
    n = 0

    #A variable to count how many members survived within the same family
    saved_count = 0

    #Loop over each member within the same family
    for current_family_member in range(len(data)):    
        #If the current member in the loop has survived, increment the saved count
        if data.iloc[current_family_member]['Survived'] == 1:
            saved_count += 1
    #After looping over all family members, now let us check the count to decide in which of the 3 lists
    #are we going to add the family to
    if saved_count == 0:
        families_totally_perished.append(data)
    #If the total count is not equal to the total companions + 1. We add one here because the family size
    # is always greater than the total companions by one. This is because the count always tells the number
    #of other family members within the family of the passenger, but does not count the passenger himself\herself
    elif (saved_count != (data.iloc[current_family_member]['TotalCompanions'] + 1) ):
        families_partially_saved.append(data)
    else:
        families_totally_saved.append(data)

        
#A function that will display each list as a table. The three lists to display are:
#families_totally_perished
#families_partially_saved
#families_totally_saved
def display_family_list(family_list):
    #Loop over each family within the passed list
    for family in family_list:
        #Create the container dataframe. THis data frame will hold temporarily all members that belong to the same family
        #Later on, this dataframe will be used to display the family in a table
        df = pd.DataFrame( columns=("Generation", 'Sex', 'Age', 'Name', 'Class','PassengerId'))

        #Print the name of the family, and then print on a new line how many members the family has
        print "The " + str(family.iloc[0]['LastName']) + " Family: "
        print "The Family had " + str(family.iloc[0]['TotalCompanions'] + 1) + " members on board"


        #Write how many family members from the current family were present in the data set
        #If the number of family members within the set is equal to the total family size (Companions + 1),
        #the print that all family members were present in the data set
        if( (len(family)) == (family.iloc[0]['TotalCompanions'] + 1) ):
            print "All of the family members were available in the dataset"
        #Else, write how many family members were present in the data set
        else:
            #Just some nice formatting, divide the singular and plural
            if(len(family) == 1):
                print ' --> Only one member was available in the dataset'
            else:
                print " --> Only "+ str(len(family)) + " members were available in the dataset"

        #Loop over each family member within the current family
        for i in range(len(family)):
            #Append the current family member to the dataframe, so that member would be displayed inside the same table.
            df = df.append( {"Generation":family.iloc[i]['Generation'],\
                             'Sex':family.iloc[i]['Sex'], \
                             'Age':family.iloc[i]['Age'], \
                             'Name': family.iloc[i]['Name'],\
                             'Class': family.iloc[i]['Pclass'],\
                             'PassengerId': family.iloc[i]['PassengerId'] } , ignore_index=True )
            #df = df.append([1,2,3,4],ignore_index=True)
        display_html(df)
        print '_________________________________________________________________'

Families Whose Members Within The Dataset That Never Made It

In [70]:
display_family_list(families_totally_perished)   
The Cavendish Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 36 Cavendish, Mr. Tyrell William 1 742
_________________________________________________________________
The Chaffee Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 46 Chaffee, Mr. Herbert Fuller 1 93
_________________________________________________________________
The Davidson Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 31 Davidson, Mr. Thornton 1 672
_________________________________________________________________
The Douglas Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Elderly male 50 Douglas, Mr. Walter Donald 1 545
_________________________________________________________________
The Marvin Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 19 Marvin, Mr. Daniel Warner 1 749
_________________________________________________________________
The Natsch Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 37 Natsch, Mr. Charles H 1 274
_________________________________________________________________
The Ostby Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Elderly male 65 Ostby, Mr. Engelhart Cornelius 1 55
_________________________________________________________________
The White Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 21 White, Mr. Richard Frasar 1 103
1 Elderly male 54 White, Mr. Percival Wayland 1 125
_________________________________________________________________
The Williams Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Elderly male 51 Williams, Mr. Charles Duane 1 156
_________________________________________________________________
The Bryhl Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 25 Bryhl, Mr. Kurt Arnold Gottfrid 2 729
_________________________________________________________________
The Carter Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Elderly male 54 Carter, Rev. Ernest Courtenay 2 250
1 Adult female 44 Carter, Mrs. Ernest Courtenay (Lilian Hughes) 2 855
_________________________________________________________________
The Chapman Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 37 Chapman, Mr. John Henry 2 595
_________________________________________________________________
The Gale Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 34 Gale, Mr. Shadrach 2 406
_________________________________________________________________
The Giles Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 21 Giles, Mr. Frederick Edward 2 862
_________________________________________________________________
The Hold Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 44 Hold, Mr. Stephen 2 237
_________________________________________________________________
The Jacobsohn Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 42 Jacobsohn, Mr. Sidney Samuel 2 218
_________________________________________________________________
The Renouf Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 34 Renouf, Mr. Peter Henry 2 477
_________________________________________________________________
The Turpin Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 27 Turpin, Mrs. William John Robert (Dorothy Ann ... 2 42
1 Adult male 29 Turpin, Mr. William John Robert 2 118
_________________________________________________________________
The del Carlo Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 29 del Carlo, Mr. Sebastiano 2 362
_________________________________________________________________
The Ahlin Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 40 Ahlin, Mrs. Johan (Johanna Persdotter Larsson) 3 41
_________________________________________________________________
The Arnold-Franchi Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 18 Arnold-Franchi, Mrs. Josef (Josefine Franchi) 3 50
1 Adult male 25 Arnold-Franchi, Mr. Josef 3 354
_________________________________________________________________
The Backstrom Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 32 Backstrom, Mr. Karl Alfred 3 207
_________________________________________________________________
The Barbara Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 45 Barbara, Mrs. (Catherine David) 3 363
1 Adult female 18 Barbara, Miss. Saiide 3 703
_________________________________________________________________
The Braund Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 22 Braund, Mr. Owen Harris 3 1
1 Adult male 29 Braund, Mr. Lewis Richard 3 478
_________________________________________________________________
The Caram Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Unspecified age female NaN Caram, Mrs. Joseph (Maria Elias) 3 579
_________________________________________________________________
The Chronopoulos Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 26 Chronopoulos, Mr. Apostolos 3 74
_________________________________________________________________
The Cribb Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 44 Cribb, Mr. John Hatfield 3 161
_________________________________________________________________
The Hagland Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Unspecified age male NaN Hagland, Mr. Ingvald Olai Olsen 3 452
1 Unspecified age male NaN Hagland, Mr. Konrad Mathias Reiersen 3 491
_________________________________________________________________
The Hansen Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 26 Hansen, Mr. Henrik Juul 3 705
_________________________________________________________________
The Ilmakangas Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 25 Ilmakangas, Miss. Pieta Sofia 3 730
_________________________________________________________________
The Jensen Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 17 Jensen, Mr. Svend Lauritz 3 722
_________________________________________________________________
The Jussila Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 20 Jussila, Miss. Katriina 3 114
1 Adult female 21 Jussila, Miss. Mari Aina 3 403
_________________________________________________________________
The Kiernan Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Unspecified age male NaN Kiernan, Mr. Philip 3 215
_________________________________________________________________
The Lennon Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Unspecified age male NaN Lennon, Mr. Denis 3 47
_________________________________________________________________
The Lindell Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 36 Lindell, Mr. Edvard Bengtsson 3 606
_________________________________________________________________
The Lobb Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 30 Lobb, Mr. William Arthur 3 254
1 Adult female 26 Lobb, Mrs. William Arthur (Cordelia K Stanlick) 3 618
_________________________________________________________________
The McNamee Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 24 McNamee, Mr. Neal 3 744
_________________________________________________________________
The Olsen Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 42 Olsen, Mr. Karl Siegwart Andreas 3 198
_________________________________________________________________
The Petterson Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 25 Petterson, Mr. Johan Emil 3 443
_________________________________________________________________
The Robins Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 47 Robins, Mrs. Alexander A (Grace Charity Laury) 3 133
_________________________________________________________________
The Strom Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Child female 2 Strom, Miss. Telma Matilda 3 206
_________________________________________________________________
The Vander Planke Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 31 Vander Planke, Mrs. Julius (Emelia Maria Vande... 3 19
_________________________________________________________________
The Wiklund Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 18 Wiklund, Mr. Jakob Alfred 3 372
_________________________________________________________________
The Zabour Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child female 14.5 Zabour, Miss. Hileni 3 112
1 Unspecified age female NaN Zabour, Miss. Thamine 3 241
_________________________________________________________________
The Minahan Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 44 Minahan, Dr. William Edward 1 246
_________________________________________________________________
The Newell Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Elderly male 58 Newell, Mr. Arthur Webster 1 660
_________________________________________________________________
The Widener Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 27 Widener, Mr. Harry Elkins 1 378
_________________________________________________________________
The Hickman Family: 
The Family had 3 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 21 Hickman, Mr. Stanley George 2 121
1 Adult male 24 Hickman, Mr. Leonard Mark 2 656
2 Adult male 32 Hickman, Mr. Lewis 2 666
_________________________________________________________________
The Lahtinen Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 26 Lahtinen, Mrs. William (Anna Sylfven) 2 313
_________________________________________________________________
The Nicholls Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 19 Nicholls, Mr. Joseph Charles 2 146
_________________________________________________________________
The Boulos Family: 
The Family had 3 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Unspecified age female NaN Boulos, Mrs. Joseph (Sultana) 3 141
1 Child female 9 Boulos, Miss. Nourelain 3 853
_________________________________________________________________
The Bourke Family: 
The Family had 3 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 40 Bourke, Mr. John 3 189
1 Unspecified age female NaN Bourke, Miss. Mary 3 594
2 Adult female 32 Bourke, Mrs. John (Catherine) 3 658
_________________________________________________________________
The Danbom Family: 
The Family had 3 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 28 Danbom, Mrs. Ernst Gilbert (Anna Sigrid Maria ... 3 424
1 Adult male 34 Danbom, Mr. Ernst Gilbert 3 617
_________________________________________________________________
The Davies Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 24 Davies, Mr. Alfred J 3 566
_________________________________________________________________
The Elias Family: 
The Family had 3 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child male 15 Elias, Mr. Tannous 3 353
1 Adult male 17 Elias, Mr. Joseph Jr 3 533
_________________________________________________________________
The Gustafsson Family: 
The Family had 3 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 37 Gustafsson, Mr. Anders Vilhelm 3 105
1 Adult male 28 Gustafsson, Mr. Johan Birger 3 393
_________________________________________________________________
The Hansen Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 41 Hansen, Mr. Claus Peter 3 861
_________________________________________________________________
The Kink Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 26 Kink, Mr. Vincenz 3 70
_________________________________________________________________
The Klasen Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 18 Klasen, Mr. Klas Albin 3 176
_________________________________________________________________
The Rosblom Family: 
The Family had 3 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 41 Rosblom, Mrs. Viktor (Helena Wilhelmina) 3 255
1 Adult male 18 Rosblom, Mr. Viktor Richard 3 425
_________________________________________________________________
The Samaan Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Unspecified age male NaN Samaan, Mr. Youssef 3 49
_________________________________________________________________
The Strom Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 29 Strom, Mrs. Wilhelm (Elna Matilda Persson) 3 252
_________________________________________________________________
The Van Impe Family: 
The Family had 3 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child female 10 Van Impe, Miss. Catharina 3 420
1 Adult male 36 Van Impe, Mr. Jean Baptiste 3 596
2 Adult female 30 Van Impe, Mrs. Jean Baptiste (Rosalie Paula Go... 3 800
_________________________________________________________________
The Vander Planke Family: 
The Family had 3 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 18 Vander Planke, Miss. Augusta Maria 3 39
1 Child male 16 Vander Planke, Mr. Leo Edmondus 3 334
_________________________________________________________________
The van Billiard Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 40.5 van Billiard, Mr. Austin Blyler 3 154
_________________________________________________________________
The Hocking Family: 
The Family had 4 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 23 Hocking, Mr. Richard George 2 530
_________________________________________________________________
The Johnston Family: 
The Family had 4 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Unspecified age male NaN Johnston, Mr. Andrew G 3 784
1 Unspecified age female NaN Johnston, Miss. Catherine Helen "Carrie" 3 889
_________________________________________________________________
The Ford Family: 
The Family had 5 members on board
 --> Only 4 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child male 16 Ford, Mr. William Neal 3 87
1 Child female 9 Ford, Miss. Robina Maggie "Ruby" 3 148
2 Adult female 21 Ford, Miss. Doolina Margaret "Daisy" 3 437
3 Adult female 48 Ford, Mrs. Edward (Margaret Ann Watson) 3 737
_________________________________________________________________
The Lefebre Family: 
The Family had 5 members on board
 --> Only 4 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Unspecified age male NaN Lefebre, Master. Henry Forbes 3 177
1 Unspecified age female NaN Lefebre, Miss. Mathilde 3 230
2 Unspecified age female NaN Lefebre, Miss. Ida 3 410
3 Unspecified age female NaN Lefebre, Miss. Jeannie 3 486
_________________________________________________________________
The Palsson Family: 
The Family had 5 members on board
 --> Only 4 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child male 2 Palsson, Master. Gosta Leonard 3 8
1 Child female 8 Palsson, Miss. Torborg Danira 3 25
2 Child female 3 Palsson, Miss. Stina Viola 3 375
3 Adult female 29 Palsson, Mrs. Nils (Alma Cornelia Berglund) 3 568
_________________________________________________________________
The Panula Family: 
The Family had 6 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child male 7 Panula, Master. Juha Niilo 3 51
1 Child male 1 Panula, Master. Eino Viljami 3 165
2 Child male 16 Panula, Mr. Ernesti Arvid 3 267
3 Adult female 41 Panula, Mrs. Juha (Maria Emilia Ojala) 3 639
4 Child male 14 Panula, Mr. Jaako Arnold 3 687
5 Child male 2 Panula, Master. Urho Abraham 3 825
_________________________________________________________________
The Rice Family: 
The Family had 6 members on board
 --> Only 5 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child male 2 Rice, Master. Eugene 3 17
1 Child male 4 Rice, Master. Arthur 3 172
2 Child male 7 Rice, Master. Eric 3 279
3 Child male 8 Rice, Master. George Hugh 3 788
4 Adult female 39 Rice, Mrs. William (Margaret Norton) 3 886
_________________________________________________________________
The Skoog Family: 
The Family had 6 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child male 4 Skoog, Master. Harald 3 64
1 Adult female 45 Skoog, Mrs. William (Anna Bernhardina Karlsson) 3 168
2 Adult male 40 Skoog, Mr. Wilhelm 3 361
3 Child female 9 Skoog, Miss. Mabel 3 635
4 Child female 2 Skoog, Miss. Margit Elizabeth 3 643
5 Child male 10 Skoog, Master. Karl Thorsten 3 820
_________________________________________________________________
The Goodwin Family: 
The Family had 8 members on board
 --> Only 6 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child male 11 Goodwin, Master. William Frederick 3 60
1 Child female 16 Goodwin, Miss. Lillian Amy 3 72
2 Child male 1 Goodwin, Master. Sidney Leonard 3 387
3 Child male 9 Goodwin, Master. Harold Victor 3 481
4 Adult female 43 Goodwin, Mrs. Frederick (Augusta Tyler) 3 679
5 Child male 14 Goodwin, Mr. Charles Edward 3 684
_________________________________________________________________
The Sage Family: 
The Family had 11 members on board
 --> Only 7 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Unspecified age male NaN Sage, Master. Thomas Henry 3 160
1 Unspecified age female NaN Sage, Miss. Constance Gladys 3 181
2 Unspecified age male NaN Sage, Mr. Frederick 3 202
3 Unspecified age male NaN Sage, Mr. George John Jr 3 325
4 Unspecified age female NaN Sage, Miss. Stella Anna 3 793
5 Unspecified age male NaN Sage, Mr. Douglas Bullen 3 847
6 Unspecified age female NaN Sage, Miss. Dorothy Edith "Dolly" 3 864
_________________________________________________________________

Families Whose Members Within The Dataset Partially Survived

In [71]:
display_family_list(families_partially_saved)   
The Andrews Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Elderly female 63 Andrews, Miss. Kornelia Theodosia 1 276
_________________________________________________________________
The Astor Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 18 Astor, Mrs. John Jacob (Madeleine Talmadge Force) 1 701
_________________________________________________________________
The Baxter Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 24 Baxter, Mr. Quigg Edmond 1 119
1 Elderly female 50 Baxter, Mrs. James (Helene DeLaudeniere Chaput) 1 300
_________________________________________________________________
The Bowerman Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 22 Bowerman, Miss. Elsie Edith 1 357
_________________________________________________________________
The Cardeza Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 36 Cardeza, Mr. Thomas Drake Martinez 1 680
_________________________________________________________________
The Chibnall Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Unspecified age female NaN Chibnall, Mrs. (Edith Martha Bowerman) 1 167
_________________________________________________________________
The Cumings Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 38 Cumings, Mrs. John Bradley (Florence Briggs Th... 1 2
_________________________________________________________________
The Eustis Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Elderly female 54 Eustis, Miss. Elizabeth Mussey 1 497
_________________________________________________________________
The Frauenthal Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Unspecified age female NaN Frauenthal, Mrs. Henry William (Clara Heinshei... 1 335
_________________________________________________________________
The Futrelle Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 35 Futrelle, Mrs. Jacques Heath (Lily May Peel) 1 4
1 Adult male 37 Futrelle, Mr. Jacques Heath 1 138
_________________________________________________________________
The Graham Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Elderly female 58 Graham, Mrs. William Thompson (Edith Junkins) 1 269
1 Adult male 38 Graham, Mr. George Edward 1 333
_________________________________________________________________
The Greenfield Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 23 Greenfield, Mr. William Bertram 1 98
_________________________________________________________________
The Harder Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 25 Harder, Mr. George Achilles 1 371
_________________________________________________________________
The Harris Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 45 Harris, Mr. Henry Birkhardt 1 63
1 Adult female 35 Harris, Mrs. Henry Birkhardt (Irene Wallach) 1 231
_________________________________________________________________
The Hogeboom Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Elderly female 51 Hogeboom, Mrs. John C (Anna Andrews) 1 766
_________________________________________________________________
The Holverson Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 42 Holverson, Mr. Alexander Oskar 1 36
1 Adult female 35 Holverson, Mrs. Alexander Oskar (Mary Aline To... 1 384
_________________________________________________________________
The Kenyon Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Unspecified age female NaN Kenyon, Mrs. Frederick R (Marion) 1 458
_________________________________________________________________
The Kimball Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 42 Kimball, Mr. Edwin Nelson Jr 1 622
_________________________________________________________________
The Lines Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Child female 16 Lines, Miss. Mary Conover 1 854
_________________________________________________________________
The Madill Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Child female 15 Madill, Miss. Georgette Alexandra 1 690
_________________________________________________________________
The Meyer Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 28 Meyer, Mr. Edgar Joseph 1 35
1 Unspecified age female NaN Meyer, Mrs. Edgar Joseph (Leila Saks) 1 376
_________________________________________________________________
The Minahan Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 33 Minahan, Miss. Daisy E 1 413
_________________________________________________________________
The Pears Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 22 Pears, Mrs. Thomas (Edith Wearne) 1 152
1 Adult male 29 Pears, Mr. Thomas Clinton 1 337
_________________________________________________________________
The Penasco y Castellana Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 17 Penasco y Castellana, Mrs. Victor de Satode (M... 1 308
1 Adult male 18 Penasco y Castellana, Mr. Victor de Satode 1 506
_________________________________________________________________
The Potter Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Elderly female 56 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) 1 880
_________________________________________________________________
The Robert Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 43 Robert, Mrs. Edward Scott (Elisabeth Walton Mc... 1 780
_________________________________________________________________
The Rothschild Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Elderly female 54 Rothschild, Mrs. Martin (Elizabeth L. Barrett) 1 514
_________________________________________________________________
The Silvey Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Elderly male 50 Silvey, Mr. William Baird 1 435
1 Adult female 39 Silvey, Mrs. William Baird (Alice Munger) 1 578
_________________________________________________________________
The Spencer Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Unspecified age female NaN Spencer, Mrs. William Augustus (Marie Eugenie) 1 32
_________________________________________________________________
The Stephenson Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Elderly female 52 Stephenson, Mrs. Walter Bertram (Martha Eustis) 1 592
_________________________________________________________________
The Warren Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Elderly female 60 Warren, Mrs. Frank Manley (Anna Sophia Atkinson) 1 367
_________________________________________________________________
The Abelson Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 30 Abelson, Mr. Samuel 2 309
1 Adult female 28 Abelson, Mrs. Samuel (Hannah Wizosky) 2 875
_________________________________________________________________
The Angle Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 36 Angle, Mrs. William A (Florence "Mary" Agnes H... 2 519
_________________________________________________________________
The Clarke Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 28 Clarke, Mrs. Charles V (Ada Maria Winfield) 2 427
_________________________________________________________________
The Duran y More Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 27 Duran y More, Miss. Asuncion 2 867
_________________________________________________________________
The Faunthorpe Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 29 Faunthorpe, Mrs. Lizzie (Elizabeth Anne Wilkin... 2 54
_________________________________________________________________
The Harper Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child female 6 Harper, Miss. Annie Jessie "Nina" 2 721
1 Adult male 28 Harper, Rev. John 2 849
_________________________________________________________________
The Kantor Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 34 Kantor, Mr. Sinai 2 100
1 Adult female 24 Kantor, Mrs. Sinai (Miriam Sternin) 2 317
_________________________________________________________________
The Louch Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 42 Louch, Mrs. Charles Alexander (Alice Adelaide ... 2 433
_________________________________________________________________
The Nasser Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child female 14.0 Nasser, Mrs. Nicholas (Adele Achem) 2 10
1 Adult male 32.5 Nasser, Mr. Nicholas 2 123
_________________________________________________________________
The Parrish Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Elderly female 50 Parrish, Mrs. (Lutie Davis) 2 260
_________________________________________________________________
The Shelley Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 25 Shelley, Mrs. William (Imanita Parrish Hall) 2 881
_________________________________________________________________
The Weisz Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 29 Weisz, Mrs. Leopold (Mathilde Francoise Pede) 2 134
_________________________________________________________________
The Aks Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 18 Aks, Mrs. Sam (Leah Rosen) 3 856
_________________________________________________________________
The Andersen-Jensen Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 19 Andersen-Jensen, Miss. Carla Christine Nielsine 3 193
_________________________________________________________________
The Davison Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Unspecified age female NaN Davison, Mrs. Thomas Henry (Mary E Finck) 3 348
_________________________________________________________________
The Hakkarainen Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 24 Hakkarainen, Mrs. Pekka Pietari (Elin Matilda ... 3 143
1 Adult male 28 Hakkarainen, Mr. Pekka Pietari 3 404
_________________________________________________________________
The Hirvonen Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Child female 2 Hirvonen, Miss. Hildur E 3 480
_________________________________________________________________
The Karun Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Child female 4 Karun, Miss. Manca 3 692
_________________________________________________________________
The Lindqvist Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 20 Lindqvist, Mr. Eino William 3 665
_________________________________________________________________
The Moran Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Unspecified age female NaN Moran, Miss. Bertha 3 110
1 Unspecified age male NaN Moran, Mr. Daniel J 3 769
_________________________________________________________________
The O'Brien Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Unspecified age female NaN O'Brien, Mrs. Thomas (Johanna "Hannah" Godfrey) 3 187
1 Unspecified age male NaN O'Brien, Mr. Thomas 3 365
_________________________________________________________________
The Persson Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 25 Persson, Mr. Ernst Ulrik 3 268
_________________________________________________________________
The Thomas Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Child male 0.42 Thomas, Master. Assad Alexander 3 804
_________________________________________________________________
The Thorneycroft Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Unspecified age female NaN Thorneycroft, Mrs. Percival (Florence Kate White) 3 432
1 Unspecified age male NaN Thorneycroft, Mr. Percival 3 640
_________________________________________________________________
The Yasbeck Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 27 Yasbeck, Mr. Antoni 3 621
1 Child female 15 Yasbeck, Mrs. Antoni (Selini Alexander) 3 831
_________________________________________________________________
The de Messemaeker Family: 
The Family had 2 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 36 de Messemaeker, Mrs. Guillaume Joseph (Emma) 3 560
_________________________________________________________________
The Appleton Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Elderly female 53 Appleton, Mrs. Edward Dale (Charlotte Lamson) 1 572
_________________________________________________________________
The Beckwith Family: 
The Family had 3 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 37 Beckwith, Mr. Richard Leonard 1 249
1 Adult female 47 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) 1 872
_________________________________________________________________
The Compton Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 39 Compton, Miss. Sara Rebecca 1 836
_________________________________________________________________
The Crosby Family: 
The Family had 3 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 36 Crosby, Miss. Harriet R 1 541
1 Elderly male 70 Crosby, Capt. Edward Gifford 1 746
_________________________________________________________________
The Dodge Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Child male 4 Dodge, Master. Washington 1 446
_________________________________________________________________
The Frauenthal Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Elderly male 50 Frauenthal, Dr. Henry William 1 661
_________________________________________________________________
The Frolicher Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 22 Frolicher, Miss. Hedwig Margaritha 1 540
_________________________________________________________________
The Frolicher-Stehli Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Elderly male 60 Frolicher-Stehli, Mr. Maxmillian 1 588
_________________________________________________________________
The Hays Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Elderly female 52 Hays, Mrs. Charles Melville (Clara Jennings Gr... 1 821
_________________________________________________________________
The Newsom Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 19 Newsom, Miss. Helen Monypeny 1 137
_________________________________________________________________
The Spedden Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 40 Spedden, Mrs. Frederic Oakley (Margaretta Corn... 1 320
_________________________________________________________________
The Taussig Family: 
The Family had 3 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Elderly male 52 Taussig, Mr. Emil 1 263
1 Adult female 39 Taussig, Mrs. Emil (Tillie Mandelbaum) 1 559
2 Adult female 18 Taussig, Miss. Ruth 1 586
_________________________________________________________________
The Thayer Family: 
The Family had 3 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 17 Thayer, Mr. John Borland Jr 1 551
1 Adult female 39 Thayer, Mrs. John Borland (Marian Longstreth M... 1 582
2 Adult male 49 Thayer, Mr. John Borland 1 699
_________________________________________________________________
The Wick Family: 
The Family had 3 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 31 Wick, Miss. Mary Natalie 1 319
1 Adult female 45 Wick, Mrs. George Dennick (Mary Hitchcock) 1 857
_________________________________________________________________
The Brown Family: 
The Family had 3 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 40 Brown, Mrs. Thomas William Solomon (Elizabeth ... 2 671
1 Elderly male 60 Brown, Mr. Thomas William Solomon 2 685
_________________________________________________________________
The Caldwell Family: 
The Family had 3 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child male 0.83 Caldwell, Master. Alden Gates 2 79
1 Adult female 22.00 Caldwell, Mrs. Albert Francis (Sylvia Mae Harb... 2 324
_________________________________________________________________
The Christy Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 25 Christy, Miss. Julie Rachel 2 581
_________________________________________________________________
The Collyer Family: 
The Family had 3 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child female 8 Collyer, Miss. Marjorie "Lottie" 2 238
1 Adult male 31 Collyer, Mr. Harvey 2 638
2 Adult female 31 Collyer, Mrs. Harvey (Charlotte Annie Tate) 2 802
_________________________________________________________________
The Davies Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Child male 8 Davies, Master. John Morgan Jr 2 550
_________________________________________________________________
The Drew Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 34 Drew, Mrs. James Vivian (Lulu Thorne Christian) 2 417
_________________________________________________________________
The Hamalainen Family: 
The Family had 3 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 24.00 Hamalainen, Mrs. William (Anna) 2 248
1 Child male 0.67 Hamalainen, Master. Viljo 2 756
_________________________________________________________________
The Hart Family: 
The Family had 3 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 43 Hart, Mr. Benjamin 2 315
1 Adult female 45 Hart, Mrs. Benjamin (Esther Ada Bloomfield) 2 441
2 Child female 7 Hart, Miss. Eva Miriam 2 536
_________________________________________________________________
The Mallet Family: 
The Family had 3 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 31 Mallet, Mr. Albert 2 818
1 Child male 1 Mallet, Master. Andre 2 828
_________________________________________________________________
The Navratil Family: 
The Family had 3 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 36.5 Navratil, Mr. Michel ("Louis M Hoffman") 2 149
1 Child male 3.0 Navratil, Master. Michel M 2 194
2 Child male 2.0 Navratil, Master. Edmond Roger 2 341
_________________________________________________________________
The Quick Family: 
The Family had 3 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 33 Quick, Mrs. Frederick Charles (Jane Richards) 2 507
1 Child female 2 Quick, Miss. Phyllis May 2 531
_________________________________________________________________
The Richards Family: 
The Family had 3 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child male 3.00 Richards, Master. William Rowe 2 408
1 Child male 0.83 Richards, Master. George Sibley 2 832
_________________________________________________________________
The Silven Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 18 Silven, Miss. Lyyli Karoliina 2 418
_________________________________________________________________
The Wells Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Child female 4 Wells, Miss. Joan 2 751
_________________________________________________________________
The Abbott Family: 
The Family had 3 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 35 Abbott, Mrs. Stanton (Rosa Hunt) 3 280
1 Child male 16 Abbott, Mr. Rossmore Edward 3 747
_________________________________________________________________
The Coutts Family: 
The Family had 3 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child male 3 Coutts, Master. William Loch "William" 3 349
1 Child male 9 Coutts, Master. Eden Leslie "Neville" 3 490
_________________________________________________________________
The Goldsmith Family: 
The Family had 3 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child male 9 Goldsmith, Master. Frank John William "Frankie" 3 166
1 Adult female 31 Goldsmith, Mrs. Frank John (Emily Alice Brown) 3 329
2 Adult male 33 Goldsmith, Mr. Frank John 3 549
_________________________________________________________________
The Kink-Heilmann Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Child female 4 Kink-Heilmann, Miss. Luise Gretchen 3 185
_________________________________________________________________
The McCoy Family: 
The Family had 3 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Unspecified age male NaN McCoy, Mr. Bernard 3 302
1 Unspecified age female NaN McCoy, Miss. Agnes 3 331
_________________________________________________________________
The Moubarek Family: 
The Family had 3 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Unspecified age male NaN Moubarek, Master. Gerios 3 66
1 Unspecified age male NaN Moubarek, Master. Halim Gonios ("William George") 3 710
_________________________________________________________________
The Nakid Family: 
The Family had 3 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child female 1 Nakid, Miss. Maria ("Mary") 3 382
1 Adult male 20 Nakid, Mr. Sahid 3 623
_________________________________________________________________
The Peter Family: 
The Family had 3 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Unspecified age female NaN Peter, Miss. Anna 3 129
1 Unspecified age female NaN Peter, Mrs. Catherine (Catherine Rizk) 3 534
_________________________________________________________________
The Sandstrom Family: 
The Family had 3 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child female 4 Sandstrom, Miss. Marguerite Rut 3 11
1 Adult female 24 Sandstrom, Mrs. Hjalmar (Agnes Charlotta Bengt... 3 395
_________________________________________________________________
The Touma Family: 
The Family had 3 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 29 Touma, Mrs. Darwis (Hanne Youssef Razi) 3 256
_________________________________________________________________
The Allison Family: 
The Family had 4 members on board
 --> Only 3 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child female 2.00 Allison, Miss. Helen Loraine 1 298
1 Child male 0.92 Allison, Master. Hudson Trevor 1 306
2 Adult female 25.00 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) 1 499
_________________________________________________________________
The Becker Family: 
The Family had 4 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child male 1 Becker, Master. Richard F 2 184
1 Child female 4 Becker, Miss. Marion Louise 2 619
_________________________________________________________________
The Herman Family: 
The Family had 4 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 24 Herman, Miss. Alice 2 616
1 Adult female 48 Herman, Mrs. Samuel (Jane Laver) 2 755
_________________________________________________________________
The Jacobsohn Family: 
The Family had 4 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 24 Jacobsohn, Mrs. Sidney Samuel (Amy Frances Chr... 2 601
_________________________________________________________________
The Laroche Family: 
The Family had 4 members on board
 --> Only 3 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child female 3 Laroche, Miss. Simonne Marie Anne Andree 2 44
1 Adult female 22 Laroche, Mrs. Joseph (Juliette Marie Louise La... 2 609
2 Adult male 25 Laroche, Mr. Joseph Philippe Lemercier 2 686
_________________________________________________________________
The Renouf Family: 
The Family had 4 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 30 Renouf, Mrs. Peter Henry (Lillian Jefferys) 2 727
_________________________________________________________________
The West Family: 
The Family had 4 members on board
 --> Only 3 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child female 5 West, Miss. Constance Mirium 2 59
1 Adult male 36 West, Mr. Edwy Arthur 2 451
2 Adult female 33 West, Mrs. Edwy Arthur (Ada Mary Worth) 2 473
_________________________________________________________________
The Backstrom Family: 
The Family had 4 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 33 Backstrom, Mrs. Karl Alfred (Maria Mathilda Gu... 3 86
_________________________________________________________________
The Dean Family: 
The Family had 4 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 26 Dean, Mr. Bertram Frank 3 94
1 Child male 1 Dean, Master. Bertram Vere 3 789
_________________________________________________________________
The Ryerson Family: 
The Family had 5 members on board
 --> Only 2 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 18 Ryerson, Miss. Emily Borie 1 312
1 Adult female 21 Ryerson, Miss. Susan Parker "Suzette" 1 743
_________________________________________________________________
The Hocking Family: 
The Family had 5 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Elderly female 54 Hocking, Mrs. Elizabeth (Eliza Needs) 2 775
_________________________________________________________________
The Fortune Family: 
The Family had 6 members on board
 --> Only 4 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 19 Fortune, Mr. Charles Alexander 1 28
1 Adult female 23 Fortune, Miss. Mabel Helen 1 89
2 Adult female 24 Fortune, Miss. Alice Elizabeth 1 342
3 Elderly male 64 Fortune, Mr. Mark 1 439
_________________________________________________________________
The Richards Family: 
The Family had 6 members on board
 --> Only one member was available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 24 Richards, Mrs. Sidney (Emily Hocking) 2 438
_________________________________________________________________
The Andersson Family: 
The Family had 7 members on board
 --> Only 8 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 39 Andersson, Mr. Anders Johan 3 14
1 Adult female 17 Andersson, Miss. Erna Alexandra 3 69
2 Child female 2 Andersson, Miss. Ellis Anna Maria 3 120
3 Child female 9 Andersson, Miss. Ingeborg Constanzia 3 542
4 Child female 11 Andersson, Miss. Sigrid Elisabeth 3 543
5 Adult female 39 Andersson, Mrs. Anders Johan (Alfrida Konstant... 3 611
6 Child female 6 Andersson, Miss. Ebba Iris Alfrida 3 814
7 Child male 4 Andersson, Master. Sigvard Harald Elias 3 851
_________________________________________________________________
The Asplund Family: 
The Family had 7 members on board
 --> Only 4 members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 38 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia... 3 26
1 Child male 9 Asplund, Master. Clarence Gustaf Hugo 3 183
2 Child female 5 Asplund, Miss. Lillian Gertrud 3 234
3 Child male 3 Asplund, Master. Edvin Rojj Felix 3 262
_________________________________________________________________

Families Whose All Members Within The Dataset Survived

In [72]:
display_family_list(families_totally_saved)
The Bishop Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 19 Bishop, Mrs. Dickinson H (Helen Walton) 1 292
1 Adult male 25 Bishop, Mr. Dickinson H 1 485
_________________________________________________________________
The Chambers Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 27 Chambers, Mr. Norman Campbell 1 725
1 Adult female 33 Chambers, Mrs. Norman Campbell (Bertha Griggs) 1 810
_________________________________________________________________
The Dick Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 31 Dick, Mr. Albert Adrian 1 691
1 Adult female 17 Dick, Mrs. Albert Adrian (Vera Gillespie) 1 782
_________________________________________________________________
The Duff Gordon Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 48 Duff Gordon, Lady. (Lucille Christiana Sutherl... 1 557
1 Adult male 49 Duff Gordon, Sir. Cosmo Edmund ("Mr Morgan") 1 600
_________________________________________________________________
The Goldenberg Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 49 Goldenberg, Mr. Samuel L 1 454
1 Unspecified age female NaN Goldenberg, Mrs. Samuel L (Edwiga Grabowska) 1 850
_________________________________________________________________
The Harper Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 49 Harper, Mrs. Henry Sleeper (Myna Haxtun) 1 53
1 Adult male 48 Harper, Mr. Henry Sleeper 1 646
_________________________________________________________________
The Hippach Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child female 16 Hippach, Miss. Jean Gertrude 1 330
1 Adult female 44 Hippach, Mrs. Louis Albert (Ida Sophia Fischer) 1 524
_________________________________________________________________
The Hoyt Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 38 Hoyt, Mr. Frederick Maxfield 1 225
1 Adult female 35 Hoyt, Mrs. Frederick Maxfield (Jane Anne Forby) 1 487
_________________________________________________________________
The Newell Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 31 Newell, Miss. Madeleine 1 216
1 Adult female 23 Newell, Miss. Marjorie 1 394
_________________________________________________________________
The Taylor Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Unspecified age female NaN Taylor, Mrs. Elmer Zebley (Juliet Cummins Wright) 1 670
1 Adult male 48 Taylor, Mr. Elmer Zebley 1 713
_________________________________________________________________
The Beane Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 32 Beane, Mr. Edward 2 544
1 Adult female 19 Beane, Mrs. Edward (Ethel Clarke) 2 547
_________________________________________________________________
The Doling Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 34 Doling, Mrs. John T (Ada Julia Bone) 2 99
1 Adult female 18 Doling, Miss. Elsie 2 652
_________________________________________________________________
The Mellinger Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 41 Mellinger, Mrs. (Elizabeth Anne Maidment) 2 273
1 Child female 13 Mellinger, Miss. Madeleine Violet 2 447
_________________________________________________________________
The Moor Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child male 6 Moor, Master. Meier 3 752
1 Adult female 27 Moor, Mrs. (Beila) 3 824
_________________________________________________________________
The Murphy Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Unspecified age female NaN Murphy, Miss. Katherine "Kate" 3 242
1 Unspecified age female NaN Murphy, Miss. Margaret Jane 3 613
_________________________________________________________________
The Nicola-Yarred Family: 
The Family had 2 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child female 14 Nicola-Yarred, Miss. Jamila 3 40
1 Child male 12 Nicola-Yarred, Master. Elias 3 126
_________________________________________________________________
The Johnson Family: 
The Family had 3 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult female 27 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) 3 9
1 Child female 1 Johnson, Miss. Eleanor Ileen 3 173
2 Child male 4 Johnson, Master. Harold Theodor 3 870
_________________________________________________________________
The Carter Family: 
The Family had 4 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Adult male 36 Carter, Mr. William Ernest 1 391
1 Child female 14 Carter, Miss. Lucile Polk 1 436
2 Adult female 36 Carter, Mrs. William Ernest (Lucile Polk) 1 764
3 Child male 11 Carter, Master. William Thornton II 1 803
_________________________________________________________________
The Baclini Family: 
The Family had 4 members on board
All of the family members were available in the dataset
Generation Sex Age Name Class PassengerId
0 Child female 5.00 Baclini, Miss. Marie Catherine 3 449
1 Child female 0.75 Baclini, Miss. Helene Barbara 3 470
2 Child female 0.75 Baclini, Miss. Eugenie 3 645
3 Adult female 24.00 Baclini, Mrs. Solomon (Latifa Qurban) 3 859
_________________________________________________________________

The dataset