Star Temperature Prediction¶
Description¶
This project (educational purpose) aims to develop a neural network model for predicting the surface temperature of stars. Traditional methods such as Wien’s displacement law, Stefan-Boltzmann law, and spectral analysis are typically used for temperature calculation. However, this project explores the potential of machine learning to improve accuracy and efficiency.
Project Steps¶
Data Preparation
The dataset includes information on 240 stars with features like relative luminosity, radius, color, absolute magnitude, temperature, and star type. Initial data analysis and visualization will provide insights into the data structure and help make necessary adjustments during preparation.
Data Processing
Required transformations, scaling, and categorization of quantitative and categorical data will be applied, and the data will then be split into training and test sets.
Basic Neural Network Model
A simple neural network model will be created, with experiments on hidden layer configurations and activation functions. Model evaluation will be based on prediction accuracy, and a 'Actual vs Predicted' temperature chart will allow visual comparison of the results.
Model Fine-Tuning
By adjusting hyperparameters such as dropout and batch size, an attempt will be made to improve model performance while maintaining the same network architecture as the basic model for valid comparison. Model performance will be assessed by the RMSE metric, with a target RMSE of no higher than 4500.
Results and Analysis
The basic and enhanced models will be compared based on RMSE values and the 'Actual vs Predicted' temperature visualization to assess improvements. The final summary will present comparative results on model accuracy and efficiency, allowing evaluation of neural network’s applicability for predicting star temperatures.
Objective¶
In the end, the project demonstrates the application of machine learning in astrophysics, using neural networks to enhance traditional methods for predicting star temperatures.
Project Configuration¶
Dependencies¶
# !pip install python-dotenv==1.0.1 > /dev/null 2>&1
# !pip install phik==0.12.4 > /dev/null 2>&1
# Python ≈ v3.10
# ==============
import copy
import os
from copy import deepcopy
from dataclasses import dataclass, field
from enum import Enum
from functools import wraps
from platform import python_version
from typing import Callable
# Libriaries
# ==========
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import torch
import torch.nn as nn
import torch.optim as optim
from dotenv import load_dotenv
from matplotlib.axes import Axes
from phik.report import plot_correlation_matrix
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.metrics import mean_squared_error
from sklearn.mixture import GaussianMixture
from sklearn.model_selection import train_test_split
python_version()
'3.11.1'
Constants¶
RANDOM_STATE = 42
# Aliases
# =======
BR = '\n'
RS = RANDOM_STATE
Initialization of Environment Parameters¶
load_dotenv()
torch.manual_seed(RANDOM_STATE)
class Env(str, Enum):
PATH = os.environ.get('ROOT', 'datasets')
CSV = '6_class.csv'
@staticmethod
def csv() -> str:
return os.path.join(Env.PATH, Env.CSV)
Project Constants¶
class Star(Enum):
Y_KELVIN = 'Temperature (K)' # Target Feature
NUM_LUMINOSITY = 'Luminosity(L/Lo)'
NUM_RADIUS = 'Radius(R/Ro)'
NUM_MAGNITUDE = 'Absolute magnitude(Mv)'
CAT_STAR_TYPE = 'Star type'
CAT_SPECTRAL_CLASS = 'Star color'
@property
def key_(self) -> str:
return self.name.lower()
@property
def key(self) -> str:
return self.key_.removeprefix('y_').removeprefix('num_').removeprefix('cat_')
@staticmethod
def columns(origin: bool = False) -> list[str]:
return [c.value if origin else c.key for c in Star]
@staticmethod
def features(num: bool) -> list[str]:
return [c.key for c in Star if c.key_.startswith('num_' if num else 'cat_')]
@staticmethod
def column_mapping(reverse: bool = False) -> dict[str, str]:
origin, alias = Star.columns(origin=True), Star.columns()
return dict(zip(*((alias, origin) if reverse else (origin, alias))))
Utils (EDA)¶
@dataclass
class DiagramMixin:
df: pd.DataFrame
bins: int = 100
figsize: tuple[int, int] = (12, 3)
y_label: str = 'Amount'
title: tuple[str, str] = ('Histogram', 'Boxplot')
theme: list[str] = field(
default_factory=lambda: ['#eb4034', '#2b2f75', '#2b7275', '#752b58']
)
def _create_subplots(self, rows: int = 1, cols: int = 2) -> tuple[Axes, ...]:
_, axes = plt.subplots(rows, cols, figsize=self.figsize)
return tuple(axes[ax_i] for ax_i in range(cols))
def configure(func: Callable) -> Callable:
@wraps(func)
def wrapper(
self, ax: Axes, title: str, x_label: str, y_label: str, **kw
) -> None:
ax.set_title(title)
ax.set_xlabel(x_label)
ax.set_ylabel(y_label)
func(self, ax=ax, **kw)
plt.tight_layout()
return wrapper
@configure
def _plot_histogram(self, ax: Axes, column: str, **_) -> None:
sns.histplot(self.df[column], bins=self.bins, ax=ax)
@configure
def _plot_boxplot(self, ax: Axes, column: str, **_) -> None:
sns.boxplot(x=self.df[column], ax=ax)
class StringProccessing:
@staticmethod
def slug(string: str) -> str:
return string.strip().lower().replace(' ', '-')
Loading Data¶
origin_data = pd.read_csv(Env.csv())
c = Star
data = origin_data.rename(columns=Star.column_mapping())[c.columns()]
data.head()
kelvin | luminosity | radius | magnitude | star_type | spectral_class | |
---|---|---|---|---|---|---|
0 | 3068 | 0.002400 | 0.1700 | 16.12 | 0 | Red |
1 | 3042 | 0.000500 | 0.1542 | 16.60 | 0 | Red |
2 | 2600 | 0.000300 | 0.1020 | 18.70 | 0 | Red |
3 | 2800 | 0.000200 | 0.1600 | 16.65 | 0 | Red |
4 | 1939 | 0.000138 | 0.1030 | 20.06 | 0 | Red |
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 240 entries, 0 to 239 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 kelvin 240 non-null int64 1 luminosity 240 non-null float64 2 radius 240 non-null float64 3 magnitude 240 non-null float64 4 star_type 240 non-null int64 5 spectral_class 240 non-null object dtypes: float64(3), int64(2), object(1) memory usage: 11.4+ KB
Data Preprocessing and Analysis¶
Auxiliary Classes and Functions for EDA¶
class Diagram(DiagramMixin):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def histogram_and_boxplot(self, x_label: str, column: str = '') -> None:
column = x_label.lower() if not column else column
ax1, ax2 = self._create_subplots()
plot_kwargs = {
'column': column,
'x_label': x_label,
'y_label': self.y_label,
}
self._plot_histogram(ax1, title=self.title[0], **plot_kwargs)
self._plot_boxplot(ax2, title=self.title[1], **plot_kwargs)
plt.show()
def category_count(
self, column: str, title: str = '', rotation: int = 0, sort_vc: bool = False
) -> None:
val_counts = self.df[column].sort_values().value_counts(sort=sort_vc)
plt.figure(figsize=self.figsize)
val_counts.plot(kind='bar', color=self.theme)
x_label_title = column.replace('_', ' ').title()
plot_title = title if title else f'Distribution of {x_label_title}'
plt.title(plot_title)
plt.ylabel(self.y_label)
plt.xlabel(x_label_title)
plt.xticks(rotation=rotation)
plt.tight_layout()
plt.show()
class HarvardClass(StringProccessing, Enum):
'''Harvard Classification: https://en.wikipedia.org/wiki/Stellar_classification'''
O = 'blue'
B = 'blue-white'
A = 'white'
F = 'yellowish-white'
G = 'yellow'
K = 'orange'
M = 'red'
@staticmethod
def get_class(color: str, str_processor: Callable = lambda _: None) -> str:
'''Returns the spectral class for a given standardized color.'''
color_processed = HarvardClass.slug(
color if str_processor(color) is None else str_processor(color)
)
for classification in HarvardClass:
if classification.value == color_processed:
return classification.name
raise KeyError(f'Color "{color}" not found')
def correlation_mtx(df_: pd.DataFrame, *columns: list[str], h: int = 9) -> None:
phik_data = pd.get_dummies(
df_[[*columns]],
drop_first=True
)
phik_matrix = phik_data[phik_data.columns].phik_matrix(
interval_cols=phik_data.columns
)
plot_correlation_matrix(
phik_matrix.values,
x_labels=phik_matrix.columns,
y_labels=phik_matrix.index,
title=r"Correlation $\phi_K$",
fontsize_factor=0.8,
figsize=(int(h/4+h), h),
top=50,
)
Analysis¶
diagram = Diagram(df=data)
Stellar temperature (target variable)¶
data[c.Y_KELVIN.key].describe()
count 240.000000 mean 10497.462500 std 9552.425037 min 1939.000000 25% 3344.250000 50% 5776.000000 75% 15055.500000 max 40000.000000 Name: kelvin, dtype: float64
diagram.histogram_and_boxplot(x_label=c.Y_KELVIN.key.title())
The temperature distribution of stars in the dataset is right-skewed: most stars are concentrated around lower temperatures (below 10,000
$K$), while some stars reach up to 40,000
$K$, creating a long tail towards high temperatures. The median temperature is 5,776
$K$, with significant variability (standard deviation = 9,552
$K$) and a few high-temperature outliers. This skewed distribution indicates that most stars are relatively cool, with only a few possessing exceptionally high temperatures.
Quantitative Features¶
data[c.features(num=True)].describe()
luminosity | radius | magnitude | |
---|---|---|---|
count | 240.000000 | 240.000000 | 240.000000 |
mean | 107188.361635 | 237.157781 | 4.382396 |
std | 179432.244940 | 517.155763 | 10.532512 |
min | 0.000080 | 0.008400 | -11.920000 |
25% | 0.000865 | 0.102750 | -6.232500 |
50% | 0.070500 | 0.762500 | 8.313000 |
75% | 198050.000000 | 42.750000 | 13.697500 |
max | 849420.000000 | 1948.500000 | 20.060000 |
for col in c.features(num=True):
diagram.histogram_and_boxplot(x_label=col)
Luminosity:
The luminosity distribution is heavily right-skewed, with most stars having low luminosity (close to zero), while some stars reach very high luminosity values, up to 849,420
. The median is 0.0705
, and the data contains significant outliers beyond the upper quartile, as shown in the boxplot. The standard deviation of 179,432
also indicates high variability in luminosity.
Radius:
Similar to luminosity, the radius distribution is also heavily right-skewed, with most stars having a small radius close to zero, while only a few stars have very large radii (up to 1,948
). The median radius is 0.7625
, and there are many high-radius outliers. The large standard deviation (517.16
) reflects the variability in star sizes.
Magnitude:
The magnitude distribution is more balanced compared to luminosity and radius, with values ranging from -11.92
to 20.06
. The median magnitude is 8.313
, and the standard deviation is 10.53
. The boxplot shows a relatively symmetrical distribution without significant outliers, indicating a more uniform distribution of star brightness.
Categorical Features¶
data[c.CAT_STAR_TYPE.key] = data[c.CAT_STAR_TYPE.key].astype(str) # int64 -> str
data[c.features(num=False)].info()
data[c.features(num=False)].describe()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 240 entries, 0 to 239 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 star_type 240 non-null object 1 spectral_class 240 non-null object dtypes: object(2) memory usage: 3.9+ KB
star_type | spectral_class | |
---|---|---|
count | 240 | 240 |
unique | 6 | 19 |
top | 0 | Red |
freq | 40 | 112 |
Star Type¶
diagram.category_count(column='star_type', title='Distribution of Star Types')
The distribution of stellar types is balanced, with each type represented by about 40 stars.
Spectral Classes¶
len_color = len(list(data[c.CAT_SPECTRAL_CLASS.key].unique()))
data[c.CAT_SPECTRAL_CLASS.key] = data[c.CAT_SPECTRAL_CLASS.key].apply(
StringProccessing.slug
)
print('Amount of unproceessed distinct values:', len_color)
display(list(data[c.CAT_SPECTRAL_CLASS.key].unique()))
data[c.CAT_SPECTRAL_CLASS.key].describe()
Amount of unproceessed distinct values: 19
['red', 'blue-white', 'white', 'yellowish-white', 'pale-yellow-orange', 'blue', 'whitish', 'yellow-white', 'orange', 'white-yellow', 'yellowish', 'orange-red']
count 240 unique 12 top red freq 112 Name: spectral_class, dtype: object
cm = {
'pale-yellow-orange': 'orange',
'white-yellow': 'yellowish-white',
'yellow-white': 'yellowish-white',
'whitish': 'white',
'yellowish': 'yellow',
'orange-red': 'red',
}
data[c.CAT_SPECTRAL_CLASS.key] = data[c.CAT_SPECTRAL_CLASS.key].apply(
HarvardClass.get_class,
str_processor=lambda x: cm.get(x)
)
list(data[c.CAT_SPECTRAL_CLASS.key].unique())
['M', 'B', 'A', 'F', 'K', 'O', 'G']
diagram.category_count(
column='spectral_class', title='Harvard Spectral Classification', sort_vc=True
)
query_afgk = data[c.CAT_SPECTRAL_CLASS.key].isin(['A', 'F', 'G', 'K'])
data.loc[query_afgk, c.CAT_SPECTRAL_CLASS.key] = 'AFGK'
diagram.category_count(
column='spectral_class', title='Harvard Spectral Classification', sort_vc=True
)
Harvard System
Initially, the dataset contained 19 different spectral color labels, which were standardized to 12 unique categories after processing with the slug
function. Using a mapping dictionary, similar colors were further grouped (e.g., ‘pale-yellow-orange’ was transformed into ‘orange’) and then classified into broader spectral classes using the HarvardClass.get_class
function. This classification reduced the spectral classes to 7 main categories in accordance with the Harvard classification system.
Combinations of Star Type and Spectral Class¶
plt.figure(figsize=(10, 4))
sns.scatterplot(data=data, x=c.CAT_STAR_TYPE.key, y=c.CAT_SPECTRAL_CLASS.key)
plt.title('Star Color and Star Type Scatterplot')
plt.show()
This scatter plot, shows the relationship between star type on the x-axis and spectral class on the y-axis.
Here’s a breakdown of its elements:
- X-Axis (
star_type
): The star type values range from0
to5
, though the plot does not provide specific labels for each type. - Y-Axis (
spectral_class
): The spectral classes are labeled according to the Harvard spectral classification system, with classes from top to bottom asM
,B
,A
,F
,K
,O
, andG
.
Correlation Analysis¶
correlation_mtx(data, c.Y_KELVIN.key, *c.features(num=True), h=5)
Quantitative Features
The correlation heatmap shows the relationships between four variables: stellar magnitude, radius, luminosity, and temperature (kelvin
) of stars.
- Stellar Magnitude has a strong positive correlation with luminosity (
0.71
) and temperature (0.71
), as well as a moderate correlation with radius (0.51
). - Radius has a moderate positive correlation with luminosity (
0.57
) but only a weak correlation with temperature (0.24
), indicating that radius depends less on temperature compared to other variables. - Luminosity has a moderate correlation with temperature (
0.56
), suggesting that as luminosity increases, star temperature also tends to increase. - Temperature ($Kelvin$) has the strongest correlation with stellar magnitude (
0.71
), indicating a close relationship between temperature and magnitude.
Overall, stellar magnitude and luminosity are the variables most associated with temperature, while radius shows a weaker connection, especially with temperature.
correlation_mtx(data, *c.columns(), h=8)
Quantitative and Categorical Features
The correlation map shows the relationships between different spectral classes, star types, and quantitative characteristics, such as stellar magnitude
, radius
, luminosity
, and temperature (kelvin
).
Neural Network Design (Baseline Model)¶
Auxiliary Classes¶
@dataclass
class Df:
X_train: pd.DataFrame
X_test: pd.DataFrame
y_train: pd.DataFrame
y_test: pd.DataFrame
X_val: pd.DataFrame = None
y_val: pd.DataFrame = None
@dataclass
class Scaled:
X_train: np.ndarray
X_test: np.ndarray
X_val: pd.DataFrame = None
@dataclass
class Xy:
X: torch.Tensor = None
y: torch.Tensor = None
@dataclass
class Tensors:
train: Xy = field(default_factory=Xy)
test: Xy = field(default_factory=Xy)
val: Xy = field(default_factory=Xy)
Definition of classes for structuring the split of data into training and test sets (Df
), scaled data (Scaled
), and data in PyTorch tensor format (Xy
and Tensors
). These classes are used to store and manage training and test datasets.
@dataclass
class SyntheticData:
rs: int = 42
n_samples: int = 100
random_state: np.random.RandomState = field(init=False)
def __post_init__(self):
self.random_state = np.random.RandomState(self.rs)
def concat(self, df_1: pd.DataFrame, df_2: pd.DataFrame) -> pd.DataFrame:
df1_df2 = pd.concat([df_1, df_2], ignore_index=True)
return df1_df2.sample(frac=1, random_state=self.rs)
def interpolate(
self, X: pd.DataFrame, y: pd.Series
) -> tuple[pd.DataFrame, pd.Series]:
X_new, y_new = [], []
for _ in range(self.n_samples):
idx1, idx2 = self.random_state.choice(len(X), 2, replace=False)
alpha = self.random_state.rand()
X_synthetic = X.iloc[idx1] * alpha + X.iloc[idx2] * (1 - alpha)
y_synthetic = y.iloc[idx1] * alpha + y.iloc[idx2] * (1 - alpha)
X_new.append(X_synthetic)
y_new.append(y_synthetic)
return pd.DataFrame(X_new, columns=X.columns.astype(str)), pd.Series(y_new)
def g_mix(
self, X: pd.DataFrame, y: pd.Series
) -> tuple[pd.DataFrame, pd.Series]:
Xy_ = pd.concat([X, y], axis=1)
Xy_.columns = Xy_.columns.astype(str)
gmm = GaussianMixture(n_components=5, random_state=self.rs).fit(Xy_)
Xy_synthetic = gmm.sample(n_samples=self.n_samples)[0]
X_synth = pd.DataFrame(Xy_synthetic[:, :-1], columns=X.columns.astype(str))
return X_synth, pd.Series(Xy_synthetic[:, -1], name=y.name)
Definition of the SyntheticData
class for increasing the size of the training dataset. It includes methods for interpolation between data points and generating synthetic data using a Gaussian Mixture Model (GMM).
Splitting training and test data¶
X = data.drop(columns=[c.Y_KELVIN.key])
y = data[c.Y_KELVIN.key]
# Train-Test Split
# ================
df = Df(*train_test_split(X, y, test_size=0.3, random_state=RS))
# Encoding on Training Data Only
# ===============================
ppr = ColumnTransformer(
transformers=[
(
'cat', OneHotEncoder(handle_unknown='ignore'),
[c.CAT_SPECTRAL_CLASS.key, c.CAT_STAR_TYPE.key]
)
],
remainder='passthrough',
)
df.X_train = pd.DataFrame(ppr.fit_transform(df.X_train), columns=ppr.get_feature_names_out())
df.X_test = pd.DataFrame(ppr.transform(df.X_test), columns=ppr.get_feature_names_out())
# Validation Split on Test Data
# =============================
df.X_val, df.X_test, df.y_val, df.y_test = train_test_split(
df.X_test, df.y_test, test_size=0.35, random_state=RS # Validation 35%
)
display((df.X_val.shape, df.y_val.shape))
df.X_train.shape, df.X_test.shape, df.y_train.shape, df.y_test.shape
((46, 13), (46,))
((168, 13), (26, 13), (168,), (26,))
Feature engineering using categorical variable encoding and splitting the data into features (X
) and the target variable (y
). It divides the data into training and test sets and displays their sizes for confirmation.
Application of Data Augmentation Techniques¶
synth = SyntheticData(rs=RS, n_samples=150)
augmentation_1 = True
augmentation_2 = False # GMM is deactivated
if augmentation_1:
X_synthetic, y_synthetic = synth.interpolate(df.X_train, df.y_train)
df.X_train = synth.concat(df.X_train, X_synthetic)
df.y_train = synth.concat(df.y_train, y_synthetic)
display((df.X_train.shape, df.X_test.shape, df.y_train.shape, df.y_test.shape))
if augmentation_2:
X_synthetic, y_synthetic = synth.g_mix(df.X_train, df.y_train)
df.X_train = synth.concat(df.X_train, X_synthetic)
df.y_train = synth.concat(df.y_train, y_synthetic)
df.X_train.shape, df.X_test.shape, df.y_train.shape, df.y_test.shape
((318, 13), (26, 13), (318,), (26,))
((318, 13), (26, 13), (318,), (26,))
Scaling¶
scaler = StandardScaler()
scaled = Scaled(
X_train = scaler.fit_transform(df.X_train),
X_test = scaler.transform(df.X_test),
X_val = scaler.transform(df.X_val), # Validation
)
scaled.X_train[:2], scaled.X_test[:2], scaled.X_val[:2]
(array([[-0.50408012, -0.46535582, 1.31276004, -0.65594101, -0.43679212, -0.47190343, -0.49561238, -0.51314232, 2.17493583, -0.46169569, 0.97338033, -0.42421398, -1.04498518], [-0.50408012, -0.46535582, -0.64377783, 1.56184479, -0.43679212, -0.08924704, -0.49561238, -0.51314232, 1.82671638, -0.46169569, 1.66333423, -0.3185267 , -0.7272568 ]]), array([[-0.50408012, -0.46535582, -0.92933899, 1.88553568, -0.43679212, -0.47190343, -0.49561238, -0.51314232, 2.17493583, -0.46169569, -0.03955726, -0.45312564, -1.03537126], [-0.50408012, -0.46535582, 1.31276004, -0.65594101, 2.75390647, -0.47190343, -0.49561238, -0.51314232, -0.55912824, -0.46169569, -0.71484898, -0.4796105 , 1.4289969 ]]), array([[-0.50408012, -0.46535582, -0.92933899, 1.88553568, -0.43679212, -0.47190343, 2.41888311, -0.51314232, -0.55912824, -0.46169569, -0.71484898, -0.47979114, 0.83720671], [-0.50408012, -0.46535582, -0.92933899, 1.88553568, -0.43679212, -0.47190343, -0.49561238, -0.51314232, 2.17493583, -0.46169569, 1.29004392, -0.31079132, -1.15394294]]))
Standardized scaling applied to training and test data, normalizing the features. Displays a sample of the scaled data for verification.
Conversion of Data into Tensors¶
tensors = Tensors()
# Train Validation
# ================
tensors.train.X = torch.tensor(scaled.X_train, dtype=torch.float32)
tensors.train.y = torch.tensor(df.y_train.values, dtype=torch.float32).view(-1, 1)
tensors.val.X = torch.tensor(scaled.X_val, dtype=torch.float32)
tensors.val.y = torch.tensor(df.y_val.values, dtype=torch.float32).view(-1, 1)
# Test
# ====
tensors.test.X = torch.tensor(scaled.X_test, dtype=torch.float32)
tensors.test.y = torch.tensor(df.y_test.values, dtype=torch.float32).view(-1, 1)
Transforming scaled training and test data into PyTorch tensors and preparing the data for transmission to the neural network.
Loss Function (RMSE)¶
class RMSELoss(nn.Module):
def __init__(self, eps=1e-6):
super().__init__()
self.mse = nn.MSELoss()
self.eps = eps
def forward(self,yhat,y):
loss = torch.sqrt(self.mse(yhat,y) + self.eps)
return loss
Defines a loss function for computing the root mean square error (RMSE), which is suitable for regression problems such as temperature prediction.
Definition of Neural Network Architecture¶
@dataclass
class NnConfig:
input_size: int
device: torch.device
dropout_rate: float = None
size: int = 36
decrease_factor: float = 0.8
min_hidden_layer_size: int = 4
learning_rate: float = 1e-4
weight_decay: float = 2e-3
epochs: int = 3_000
batch_size: int = None
use_batch_norm: bool = False
relu_is_leaky: bool = True
slope: float = 0.01
hidden_layer_sizes: list[int] = field(default_factory=list)
x_patience: int = None
x_print_every: int = None
x_dropout_rates: list[float] = field(default_factory=list)
x_batch_sizes_and_lrs: list[tuple[int, float]] = field(default_factory=list)
@staticmethod
def append_relu(layers: list, conf: 'NnConfig') -> None:
layers.append(nn.LeakyReLU(conf.slope) if conf.relu_is_leaky else nn.ReLU())
def initialize_weights(self, model: nn.Linear) -> None:
for layer in model:
if isinstance(layer, nn.Linear):
nonlinear = 'leaky_relu' if self.relu_is_leaky is True else 'relu'
nn.init.kaiming_normal_(layer.weight, nonlinearity=nonlinear)
if layer.bias is not None:
nn.init.normal_(layer.bias, mean=0.0, std=0.01)
def hyperparams(self) -> None:
display(pd.DataFrame({'hyperparameters': self.__dict__}))
Creation of NnConfig
configuration class to store neural network hyperparameters and initialize the model. Contains methods for adding activation layers and initializing weights.
class StarTemperatureNN(nn.Module):
def __init__(self, conf: NnConfig):
super(StarTemperatureNN, self).__init__()
self.conf = conf
layers = []
current_size = conf.size
layers.append(nn.Linear(conf.input_size, current_size))
if conf.use_batch_norm:
layers.append(nn.BatchNorm1d(current_size))
NnConfig.append_relu(layers, conf)
if conf.dropout_rate is not None:
layers.append(nn.Dropout(conf.dropout_rate))
while int(current_size * conf.decrease_factor) > conf.min_hidden_layer_size:
next_size = max(
conf.min_hidden_layer_size, int(current_size * conf.decrease_factor)
)
layers.append(nn.Linear(current_size, next_size))
self.conf.hidden_layer_sizes.append(current_size)
if conf.use_batch_norm:
layers.append(nn.BatchNorm1d(next_size))
NnConfig.append_relu(layers, conf)
if conf.dropout_rate is not None:
layers.append(nn.Dropout(conf.dropout_rate))
current_size = next_size
layers.append(nn.Linear(current_size, 1))
self.conf.hidden_layer_sizes.append(current_size)
self.model = nn.Sequential(*layers)
self.conf.initialize_weights(model=self.model)
def forward(self, x):
return self.model(x)
Defining the neural network architecture (StarTemperatureNN
) based on the configuration specified in NnConfig
. The model dynamically adjusts the number of layers and blocks according to the parameters size
, decrease_factor
, and min_hidden_layer_size
.
Initialization of the Model and Optimizer¶
nn_config = NnConfig(
input_size=tensors.train.X.shape[1],
device=torch.device('cuda' if torch.cuda.is_available() else 'cpu')
)
model = StarTemperatureNN(conf=nn_config).to(nn_config.device)
criterion = RMSELoss()
optimizer = optim.Adam(
model.parameters(), lr=nn_config.learning_rate, weight_decay=nn_config.weight_decay
)
model.conf.hyperparams()
hyperparameters | |
---|---|
batch_size | None |
decrease_factor | 0.8 |
device | cpu |
dropout_rate | None |
epochs | 3000 |
hidden_layer_sizes | [36, 28, 22, 17, 13, 10, 8, 6] |
input_size | 13 |
learning_rate | 0.0001 |
min_hidden_layer_size | 4 |
relu_is_leaky | True |
size | 36 |
slope | 0.01 |
use_batch_norm | False |
weight_decay | 0.002 |
x_batch_sizes_and_lrs | [] |
x_dropout_rates | [] |
x_patience | None |
x_print_every | None |
The neural network in this project uses a layered structure with a decreasing number of neurons:
model.conf.hidden_layer_sizes = [36, 28, 22, 17, 13, 10, 8, 6]
This architecture starts with a relatively large number of neurons (36
) in the first hidden layer, allowing the network to capture a wide range of patterns and relationships in the data. As the layers progress, the number of neurons gradually decreases, enabling the network to focus on refining and consolidating the information extracted in the previous layers. This 'funnel' structure helps reduce data dimensionality while retaining relevant features, enhancing the model’s ability to generalize by minimizing unnecessary complexity. The final layers, containing only 6
neurons in the last layer before the output, ensure that only the most important patterns are preserved, contributing to a more efficient and targeted prediction process.
Logger of Learning Progress¶
@dataclass
class TrainProgress:
val_loss_1st: int = -1
is_first_call: bool = False
train_losses: list = field(default_factory=list)
val_losses: list = field(default_factory=list)
def __pprint_15(self, *args, suffix: str = '') -> None:
return print(''.join([f'{arg:^15}' for arg in args]) + suffix)
def __progress_25(self, test_loss_: float):
progress_bar = '_' * int(
((self.test_loss_1st - test_loss_) / self.test_loss_1st) * 25
)
return f'.{progress_bar:<22}.'
def log(self, epoch: int, epochs: int, train_loss: float, test_loss: float) -> None:
if self.is_first_call is False:
self.is_first_call = True
self.__pprint_15(f'Epoch (of {epochs})', 'Training Loss', 'Validation Loss')
print('=' * 50)
self.test_loss_1st = test_loss
p_bar = self.__progress_25(test_loss)
self.__pprint_15(
str(epoch+1), round(train_loss, 1), round(test_loss, 1), p_bar
)
Training Cycle¶
progress = TrainProgress()
for epoch in range(nn_config.epochs):
model.train()
optimizer.zero_grad()
loss: torch.Tensor = criterion(model(tensors.train.X), tensors.train.y)
loss.backward()
optimizer.step()
progress.train_losses.append(loss.item())
model.eval()
with torch.no_grad():
test_predictions = model(tensors.val.X)
test_loss: torch.Tensor = criterion(model(tensors.val.X), tensors.val.y)
progress.val_losses.append(test_loss.item())
if (epoch+1) % (nn_config.epochs // 20) == 0:
progress.log(epoch, nn_config.epochs, loss.item(), test_loss.item())
Epoch (of 3000) Training Loss Validation Loss ================================================== 150 13930.2 15446.7 . . 300 13930.0 15446.5 . . 450 13928.4 15444.8 . . 600 13889.7 15403.3 . . 750 13636.6 15129.4 . . 900 12503.3 13895.1 .__ . 1050 8862.5 9952.9 .________ . 1200 6053.9 6834.0 ._____________ . 1350 5725.7 6389.8 .______________ . 1500 5491.9 6117.6 ._______________ . 1650 5297.6 5919.3 ._______________ . 1800 5129.7 5772.4 ._______________ . 1950 4983.9 5665.8 ._______________ . 2100 4855.5 5591.3 ._______________ . 2250 4739.9 5538.2 .________________ . 2400 4639.1 5508.9 .________________ . 2550 4549.4 5501.5 .________________ . 2700 4471.5 5508.4 .________________ . 2850 4401.9 5526.6 .________________ . 3000 4339.6 5552.5 .________________ .
Implementation of the basic training loop, including loss calculation and logging. Records losses on training and test data at regular intervals to monitor model performance across different epochs.
plt.figure(figsize=(12, 6))
plt.plot(progress.train_losses, label='Training Loss')
plt.plot(progress.val_losses, label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss (RMSE)')
plt.legend()
plt.title('Training and Validation Loss over Epochs')
plt.show()
Plotting losses on training and test data over epochs to visually represent the model’s learning progress and its performance on test data.
model.eval()
with torch.no_grad():
predictions = model(tensors.test.X).cpu().numpy().flatten() # Test
actuals = tensors.test.y.cpu().numpy().flatten() # Test
num_samples = min(30, len(df.y_test))
star_ids = np.arange(1, num_samples + 1)
true_label, pred_label = 'Actual Temperature', 'Predicted Temperature'
plt.figure(figsize=(12, 6))
plt.bar(star_ids, actuals[:num_samples], width=0.3, label=true_label, color='#333')
plt.bar(star_ids, predictions[:num_samples], label=pred_label, color='#222', alpha=0.5)
plt.xlabel('Star ID')
plt.ylabel('Temperature (K)')
plt.title('Actual vs. Predicted Temperatures (Test Data)')
plt.xticks(star_ids)
plt.legend()
plt.show()
This chart compares actual and predicted temperatures (in $Kelvin$) for a subset of 26
stars from the test dataset, identified by Star ID
. Each pair of bars represents one star, with the dark bar showing the actual temperature and the light bar showing the predicted temperature.
The chart shows that while the model’s predictions match the actual temperature for some stars, there are significant discrepancies for others, especially for stars with very high temperatures. This suggests that, overall, the model is capable of capturing temperature patterns but may struggle with extreme or outlier temperatures, indicating potential areas for further model tuning.
Conclusion (Baseline Model)¶
During training, there is a significant decrease in both training and testing losses over the epochs, indicating effective model training. However, around epoch 3,000
, the validation loss reaches a minimum of approximately 5,500
and then begins to slightly increase, while the training loss continues to decrease. This divergence suggests that if training is not stopped at epoch 3,000
, the model may start overfitting, capturing noise in the training data rather than generalized patterns. Implementing early stopping at epoch 3,000
will likely help the model generalize better on the test data.
Enhancement of the Neural Network¶
Advanced Learning Progress Logger¶
@dataclass
class TrainProgressAdvanced(TrainProgress):
results: dict = field(default_factory=dict)
def save_result(
self, model_key: str, train: float, val: float, config: NnConfig,
) -> None:
self.results[model_key] = {
'model_state_dict': copy.deepcopy(model.state_dict()),
'train_loss': train,
'val_loss': val,
'train_losses': self.train_losses,
'val_losses': self.val_losses,
'batch_size': config.batch_size,
'dropout_rate': config.dropout_rate,
'learning_rate': config.learning_rate
}
print(f'Finished training for {model_key} with Validation Loss: {val:.1f}')
self.train_losses = []
self.val_losses = []
def early_stopping(self, epoch: int, config: NnConfig):
print(
f'Early stopping at epoch {epoch+1}/{config.epochs}.',
f'No improvement for {config.x_patience} epochs.'
)
def print_config(self, config: NnConfig):
print(
BR + BR + f'Training with batch_size={config.batch_size}, ',
f'dropout_rate={config.dropout_rate}, ',
f'learning_rate={config.learning_rate}' + BR,
)
Model Training Configuration¶
@dataclass
class ModelTrainerConfig:
model: torch.nn.Module
criterion: torch.nn.Module
optimizer: torch.optim.Optimizer
data: 'Tensors'
config: 'NnConfig'
progressor: 'TrainProgressAdvanced'
best_val_loss: float = field(init=False, default=float('inf'))
epochs_no_improve: int = field(init=False, default=0)
best_model_wts: dict = field(init=False)
train_loader: torch.utils.data.DataLoader = field(init=False)
def __post_init__(self):
self.model.to(self.config.device)
self.best_model_wts = copy.deepcopy(self.model.state_dict())
self.train_loader = torch.utils.data.DataLoader(
dataset=torch.utils.data.TensorDataset(
self.data.train.X, self.data.train.y
),
batch_size=self.config.batch_size,
shuffle=True
)
Model Trainer¶
@dataclass
class ModelTrainer:
mdc: ModelTrainerConfig
def train(self):
for epoch in range(self.mdc.config.epochs):
self.mdc.model.train()
running_train_loss = 0.0
for X_batch, y_batch in self.mdc.train_loader:
X_batch = X_batch.to(self.mdc.config.device)
y_batch = y_batch.to(self.mdc.config.device)
self.mdc.optimizer.zero_grad()
predictions = self.mdc.model(X_batch)
loss: torch.Tensor = self.mdc.criterion(predictions, y_batch)
loss.backward()
self.mdc.optimizer.step()
running_train_loss += loss.item() * X_batch.size(0)
avg_train_loss = running_train_loss / len(self.mdc.data.train.X)
self.mdc.progressor.train_losses.append(avg_train_loss)
self.mdc.model.eval()
with torch.no_grad():
val_predictions = self.mdc.model(self.mdc.data.val.X)
val_loss: torch.Tensor = self.mdc.criterion(
val_predictions, self.mdc.data.val.y
)
self.mdc.progressor.val_losses.append(val_loss.item())
if val_loss.item() < self.mdc.best_val_loss:
self.mdc.best_val_loss = val_loss.item()
self.mdc.best_model_wts = copy.deepcopy(self.mdc.model.state_dict())
self.mdc.epochs_no_improve = 0
else:
self.mdc.epochs_no_improve += 1
if self.mdc.epochs_no_improve >= self.mdc.config.x_patience:
self.mdc.progressor.early_stopping(epoch=epoch, config=self.mdc.config)
break
if (epoch + 1) % self.mdc.config.x_print_every == 0 or epoch == 0:
self.mdc.progressor.log(
epoch, self.mdc.config.epochs, avg_train_loss, val_loss.item()
)
self.mdc.model.load_state_dict(self.mdc.best_model_wts)
return avg_train_loss, self.mdc.best_val_loss, self.mdc.progressor
Definition of the ModelTrainer
class, which implements the model training process, including mini-batches (batch_size
), loss calculation, and model evaluation on test data. This class also monitors the training progress and stops early if the validation loss does not improve over a specified number of epochs.
Model Training¶
Setting Parameters¶
progressor = TrainProgressAdvanced()
nn_config_advanced = deepcopy(nn_config)
criterion = RMSELoss()
# Batch Size
# ==========
batch_size = 2048
# Update Confuguration I
# ======================
nn_config_advanced.epochs = 50_000
nn_config_advanced.use_batch_norm = True
nn_config_advanced.x_print_every = 300
nn_config_advanced.x_patience = 500
nn_config_advanced.x_dropout_rates = [0.02, 0.05]
nn_config_advanced.x_batch_sizes_and_lrs = [
(batch_size, nn_config.learning_rate * 500),
(batch_size, nn_config.learning_rate * 600),
]
nn_config_advanced.hyperparams()
hyperparameters | |
---|---|
batch_size | None |
decrease_factor | 0.8 |
device | cpu |
dropout_rate | None |
epochs | 50000 |
hidden_layer_sizes | [36, 28, 22, 17, 13, 10, 8, 6] |
input_size | 13 |
learning_rate | 0.0001 |
min_hidden_layer_size | 4 |
relu_is_leaky | True |
size | 36 |
slope | 0.01 |
use_batch_norm | True |
weight_decay | 0.002 |
x_batch_sizes_and_lrs | [(2048, 0.05), (2048, 0.060000000000000005)] |
x_dropout_rates | [0.02, 0.05] |
x_patience | 500 |
x_print_every | 300 |
The TrainProgressAdvanced
logger and updated neural network configuration nn_config_advanced
are initialized, incorporating batch normalization and enabling early stopping. It configures training parameters such as patience
, batch size, and learning_rate (x_batch_sizes_and_lrs
) to find the optimal combination of hyperparameters.
Training with Different Combinations of Parameters¶
for batch_size, learning_rate in nn_config_advanced.x_batch_sizes_and_lrs:
for dropout_rate in nn_config_advanced.x_dropout_rates:
# Update Confuguration II
# =======================
nn_config_advanced.batch_size = batch_size
nn_config_advanced.dropout_rate = dropout_rate
nn_config_advanced.learning_rate = learning_rate
progressor.print_config(nn_config_advanced)
# Iteration Setup
# ===============
model = StarTemperatureNN(conf=nn_config_advanced)
optimizer = optim.Adam(
model.parameters(),
lr=learning_rate,
weight_decay=nn_config_advanced.weight_decay
)
tr_model_config = ModelTrainerConfig(
model=model,
criterion=criterion,
optimizer=optimizer,
data=tensors,
config=nn_config_advanced,
progressor=progressor
)
tr_model = ModelTrainer(mdc=tr_model_config)
# Train Model
# ===========
train_loss, val_loss, progressor = tr_model.train()
progressor.save_result(
model_key=f'bs{batch_size}_dr{dropout_rate}_lr{learning_rate}',
train=train_loss,
val=val_loss,
config=nn_config_advanced,
)
progressor.is_first_call, progressor.val_loss_1st = False, -1
Training with batch_size=2048, dropout_rate=0.02, learning_rate=0.05 Epoch (of 50000) Training Loss Validation Loss ================================================== 1 13930.4 15445.9 . . 300 10500.9 12184.0 ._____ . 600 2897.1 5758.9 ._______________ . 900 2403.6 5253.6 .________________ . 1200 2341.2 5269.4 .________________ . Early stopping at epoch 1216/50000. No improvement for 500 epochs. Finished training for bs2048_dr0.02_lr0.05 with Validation Loss: 4795.1 Training with batch_size=2048, dropout_rate=0.05, learning_rate=0.05 Epoch (of 50000) Training Loss Validation Loss ================================================== 1 13929.5 15445.4 . . 300 9617.1 10791.8 ._______ . 600 3384.6 6034.1 ._______________ . 900 3411.3 6098.7 ._______________ . 1200 3334.1 6085.7 ._______________ . 1500 2881.7 6124.9 ._______________ . 1800 3286.6 5313.9 .________________ . 2100 2569.4 5292.5 .________________ . 2400 2675.0 5470.4 .________________ . Early stopping at epoch 2554/50000. No improvement for 500 epochs. Finished training for bs2048_dr0.05_lr0.05 with Validation Loss: 4930.4 Training with batch_size=2048, dropout_rate=0.02, learning_rate=0.060000000000000005 Epoch (of 50000) Training Loss Validation Loss ================================================== 1 13930.4 15446.7 . . 300 7663.9 8409.7 .___________ . 600 2712.6 6007.6 ._______________ . 900 2821.9 6098.5 ._______________ . Early stopping at epoch 916/50000. No improvement for 500 epochs. Finished training for bs2048_dr0.02_lr0.060000000000000005 with Validation Loss: 5601.9 Training with batch_size=2048, dropout_rate=0.05, learning_rate=0.060000000000000005 Epoch (of 50000) Training Loss Validation Loss ================================================== 1 13930.8 15447.3 . . 300 9757.2 11246.9 .______ . 600 3568.2 6797.2 ._____________ . 900 3384.5 6314.0 .______________ . 1200 3171.4 6293.6 .______________ . 1500 3129.7 6261.5 .______________ . 1800 3128.5 6960.2 ._____________ . Early stopping at epoch 1984/50000. No improvement for 500 epochs. Finished training for bs2048_dr0.05_lr0.060000000000000005 with Validation Loss: 5479.5
Analysis of Training Results¶
Hyperparameter Summary
Multiple models were trained using different combinations of batch_size (batch size), dropout_rate (dropout probability), and learning_rate (learning rate).
Early Stopping
- Early stopping was used for each training run, which was triggered if there were no improvements in Validation Loss (loss on the validation set) for
200
epochs. As a result, all models finished training significantly earlier than the50,000
epoch limit. - Early stopping helped prevent overfitting by halting the training process when additional epochs did not yield improvements in the model’s generalization.
Model Performance
- The model with batch_size=
2048
and dropout_rate=0.05
achieved the lowest Validation Loss (4795
). - Models with higher dropout or a smaller batch size generally stabilized at slightly higher Validation Loss values.
Selection of the best model¶
results_df = pd.DataFrame([
{
'model': key,
'batch_size': config['batch_size'],
'dropout_rate': config['dropout_rate'],
'learning_rate': config['learning_rate'],
'final_train_loss': config['train_loss'],
'best_val_loss': config['val_loss'],
}
for key, config in progressor.results.items()
])
results_df
model | batch_size | dropout_rate | learning_rate | final_train_loss | best_val_loss | |
---|---|---|---|---|---|---|
0 | bs2048_dr0.02_lr0.05 | 2048 | 0.02 | 0.05 | 2073.602783 | 4795.062012 |
1 | bs2048_dr0.05_lr0.05 | 2048 | 0.05 | 0.05 | 3009.123047 | 4930.350098 |
2 | bs2048_dr0.02_lr0.060000000000000005 | 2048 | 0.02 | 0.06 | 2634.253418 | 5601.937500 |
3 | bs2048_dr0.05_lr0.060000000000000005 | 2048 | 0.05 | 0.06 | 2695.391113 | 5479.462402 |
best_model_record = results_df.loc[results_df['best_val_loss'].idxmin()]
best_model_record.to_frame('Best Model')
Best Model | |
---|---|
model | bs2048_dr0.02_lr0.05 |
batch_size | 2048 |
dropout_rate | 0.02 |
learning_rate | 0.05 |
final_train_loss | 2073.602783 |
best_val_loss | 4795.062012 |
Configuration parameters for the best model.
best_config = progressor.results[best_model_record['model']]
best_model = StarTemperatureNN(conf=nn_config_advanced)
best_model.load_state_dict(best_config['model_state_dict'])
best_model.eval()
with torch.no_grad():
best_test_predictions = best_model(
tensors.test.X.to(nn_config_advanced.device), # Test
)
best_test_loss: torch.Tensor = criterion(
best_test_predictions,
tensors.test.y.to(nn_config_advanced.device), # Test
)
f'Best model test loss after reloading: {best_test_loss.item():.4f}'
'Best model test loss after reloading: 3076.2283'
Loads the weights of the best model stored after training and evaluates it on the test data. Outputs the value of the test error (RMSE
).
Bootstrap Test¶
actual_values = tensors.test.y.cpu().numpy().flatten() # Test
predicted_values = best_test_predictions.cpu().numpy().flatten() # Test
original_rmse = np.sqrt(mean_squared_error(actual_values, predicted_values))
# Bootstrap
# =========
n_iterations = 20_000
rmse_values = []
np.random.seed(RS)
for i in range(n_iterations):
indices = np.random.choice(len(actual_values), len(actual_values), replace=True)
sample_actual = actual_values[indices]
sample_predicted = predicted_values[indices]
rmse_values.append(np.sqrt(mean_squared_error(sample_actual, sample_predicted)))
# Visualization
# =============
rmse_upper_bound = np.percentile(rmse_values, 98)
lbl_rmse = f'Observed RMSE: {original_rmse:.2f}'
lbl_uper = f'98% CI Upper Bound: {rmse_upper_bound:.2f}'
plt.figure(figsize=(10, 6))
plt.hist(rmse_values, bins=50, alpha=0.7, color='skyblue', edgecolor='black')
plt.axvline(original_rmse, color='blue', linestyle='--', linewidth=2, label=lbl_rmse)
plt.axvline(
rmse_upper_bound, color='green', linestyle='--', linewidth=2, label=lbl_uper
)
plt.axvline(4500, color='red', linestyle='-', linewidth=2, label='Threshold RMSE: 4500')
plt.xlabel('RMSE')
plt.ylabel('Frequency')
plt.title('Bootstrap Distribution of RMSE with 98% Confidence Interval (Test Data)')
plt.legend()
plt.show()
Loss Graph of the Best Bodel¶
train_losses = best_config['train_losses']
val_losses = best_config['val_losses'] # Validation
epochs = range(1, len(train_losses) + 1)
plt.figure(figsize=(12, 6))
plt.plot(epochs, train_losses, label='Training Loss')
plt.plot(epochs, val_losses, label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.title('Training and Validation Loss Curves (Best Model)')
plt.show()
Predicted and Actual Temperatures¶
best_model.eval()
with torch.no_grad(): # Test
test_predictions = best_model(tensors.test.X.to(nn_config.device)).cpu().numpy()
test_actual = tensors.test.y.cpu().numpy()
with torch.no_grad(): # Validation
val_predictions = best_model(tensors.val.X.to(nn_config.device)).cpu().numpy()
val_actual = tensors.val.y.cpu().numpy()
# Plotting
# ========
plt.figure(figsize=(8, 8))
plt.scatter(test_actual, test_predictions, alpha=0.3, color='orange', label='Test')
plt.scatter(val_actual, val_predictions, alpha=0.5, color='blue', label='Validation')
plt.plot(
[min(val_actual.min(), test_actual.min()), max(val_actual.max(), test_actual.max())],
[min(val_actual.min(), test_actual.min()), max(val_actual.max(), test_actual.max())],
'--',
color='red'
)
plt.xlabel('Actual Values')
plt.ylabel('Predicted Values')
plt.title('Predicted vs Actual Values (Validation and Test Sets)')
plt.legend()
plt.show()
The chart compares the predicted temperatures of stars with the actual temperatures from the test dataset. The diagonal dashed line represents an ideal prediction, where predicted values equal actual values. Points close to this line indicate accurate predictions, while points further away highlight discrepancies, showing that the baseline model performs reasonably well but has some areas of inaccuracy. This is understandable due to the small dataset size.
Summary¶
The objective of this project was to develop a neural network model to predict the surface temperature of stars based on several characteristics, including relative luminosity, relative radius, absolute magnitude, color, and star type. The project aimed to leverage machine learning methods to improve the accuracy and efficiency of temperature prediction, offering an alternative to traditional methods such as Wien’s displacement law and the Stefan-Boltzmann law.
Key Stages and Results¶
Data Preparation and Exploratory Analysis¶
- The dataset included information on 240 stars with features such as relative luminosity, relative radius, absolute magnitude, star color, absolute temperature, and star type.
- Data preprocessing involved encoding categorical features, scaling quantitative features, and splitting data into training and test sets.
- To expand the data volume, synthetic data was generated using interpolation and Gaussian mixtures, which improved the model’s learning capability.
Baseline Neural Network Model¶
- A baseline neural network model was built with several hidden layers, utilizing ReLU activation functions and dropout layers for regularization.
- The baseline model demonstrated satisfactory initial results, with Test Loss (root mean square error in Kelvin) stabilizing around 4500, meeting the target level.
- However, during training, signs of overfitting appeared around epoch
4500
, as Test Loss began to increase slightly despite Training Loss continuing to decrease. This suggested that early stopping could help avoid overfitting and improve the model’s generalization ability.
Hyperparameter Tuning and Model Improvement¶
- Various configurations of batch size, dropout level, and learning rate were tested to enhance model performance.
- Early stopping with a
patience
threshold of500
epochs was implemented to automatically stop training in the absence of improvements, preventing overfitting. - The optimal configuration was determined as
batch_size=2048
,dropout_rate=0.02
, andlearning_rate=0.05
, achieving a minimal Test Loss of4795
, the best result among all configurations.
Final Model Evaluation¶
- The best model showed good alignment between predicted and actual temperatures for many stars, although some high-temperature stars remained challenging to predict accurately, suggesting potential for further improvements to handle extreme values.
- Visual evaluations using scatter and bar charts helped identify areas where the model performed accurately as well as cases of deviations, confirming the model’s overall effectiveness with room for minor refinements.