Model Class¶
- class sitesyncro.Model.Model(**kwargs)¶
A class representing a Bayesian model of dated samples interconnected by stratigraphic relationships.
- Parameters:
directory (str) – Working directory for model data (default is “model”).
samples (list) – List of samples as instances of the class [Sample](#sample_class)
curve_name (str) – The name of the calibration curve to use (default is “intcal20.14c”).
phase_model (str) – OxCal phase model type. Can be ‘sequence’, ‘contiguous’, ‘overlapping’, or ‘none’ (default is “sequence”).
cluster_n (int) – Number of clusters to form (-1 = automatic; default is -1).
cluster_selection (str) – The method used to select the optimal number of clusters. Can be ‘silhouette’ or ‘mcst’ (default is ‘silhouette’).
use_wasserstein (bool) – Use Wasserstein distance to calculate the distance matrix for clustering.
uniform (bool) – Flag indicating whether to use uniform randomization (default is False).
p_value (float) – The P-value for statistical tests (default is 0.05).
uncertainty_base (float) – The base uncertainty for randomization (default is 15).
npass (int) – Minimum number of passes for the randomization tests (default is 100).
convergence (float) – Convergence threshold for the randomization tests (default is 0.99).
oxcal_url (str) – Url to download the OxCal program (default is “https://c14.arch.ox.ac.uk/OxCalDistribution.zip”).
overwrite (bool) – Flag indicating whether to overwrite existing data in the model directory (default is False).
- add_sample(*args, **kwargs) None ¶
Adds a sample to the model.
Accepts either a single argument of type Sample or a set of arguments and keyword arguments to create a new Sample instance.
If a single argument of type Sample is provided, it is added to the model’s samples directly. If multiple arguments are provided, they are used to create a new Sample instance which is then added to the model’s samples.
- Parameters:
args – Either a single argument of type Sample or multiple arguments to create a new Sample instance.
kwargs – Keyword arguments to create a new Sample instance.
- Returns:
None
- property areas: List[str]¶
Unique area names extracted from the samples.
- Returns:
A sorted list of unique area names.
- Return type:
List[str]
- property cluster_means: Dict[int, Dict[int, float]]¶
Mean date of the samples in each cluster in calendar years BP.
- Returns:
{clusters_n: {cluster: year, …}, …}; clusters_n = number of clusters
- Return type:
Dict[int, Dict[int, float]]
- property cluster_n: int¶
Number of clusters to form (-1 = automatic).
- Returns:
The number of clusters to form.
- Return type:
int
- property cluster_opt_n: int¶
Optimal number of clusters based on the silhouette scores.
The optimal number of clusters is the one that maximizes the average silhouette score for clustering solutions for which the p-value is lower than Model.p_value.
- Returns:
The optimal number of clusters if the clustering has been performed, None otherwise.
- Return type:
int or None
- property cluster_ps: Dict[int, float]¶
P-values of the clustering solutions.
- Returns:
{clusters_n: p, …}; clusters_n = number of clusters
- Return type:
Dict[int, float]
- property cluster_selection: str¶
The method used to select the optimal number of clusters.
- Returns:
The method used to select the optimal number of clusters. Can be ‘silhouette’ or ‘mcst’.
- Return type:
str
- property cluster_sils: Dict[int, float]¶
Silhouette scores of each clustering solution.
The silhouette score is a measure of how similar an object is to its own cluster compared to other clusters.
- Returns:
{clusters_n: silhouette, …}; clusters_n = number of clusters
- Return type:
Dict[int, float]
- property clusters: Dict[int, Dict[int, List[str]]]¶
Clusters of samples based on the similarity of their probability distributions.
- Returns:
{clusters_n: [cluster: [sample name, …], …}, …}; clusters_n = number of clusters
- Return type:
Dict[int, Dict[int, List[str]]]
- property contexts: List[str]¶
Unique context names extracted from the samples.
- Returns:
A sorted list of unique context names.
- Return type:
List[str]
- property convergence: float¶
The convergence threshold for the randomization test.
- Returns:
The convergence threshold for the randomization tests.
- Return type:
float
- copy(directory: str) object ¶
Creates a copy of the current model with a new directory.
Creates a new instance of the Model class with the same parameters and data as the current model, but with a different directory. The new directory is provided as an argument.
- Parameters:
directory (str) – The directory for the new model.
- Returns:
A new instance of the Model class with the same data as the current model but a different directory.
- Return type:
- property curve: ndarray¶
Radiocarbon calibration curve.
2D array containing the calibration curve data. Each row represents a calendar year BP, C-14 year, and uncertainty.
- Returns:
An array of the calibration curve.
- Return type:
np.ndarray
- property curve_name: str¶
File name of the radiocarbon age calibration curve (see OxCal/bin directory).
- Returns:
The name of the calibration curve.
- Return type:
str
- del_sample(name: str) None ¶
Deletes a sample from the model.
Removes a sample from the model’s samples based on the provided sample name.
- Parameters:
name (str) – The name of the sample to be deleted.
- Returns:
None
- property directory: str¶
The directory where the model data is stored.
- Returns:
The directory where the model data is stored.
- Return type:
str
- property groups: Dict[str, List[str]]¶
Groups that the samples belong to based on stratigraphic interconnection with other samples. The groups are represented as a dictionary where the keys are the group names and the values are lists of sample names.
- Returns:
A dictionary where the keys are the group names and the values are lists of sample names.
- Return type:
Dict[str, List[str]]
- property has_data: bool¶
Checks if the model has any associated sample data.
- Returns:
True if the model has sample data, False otherwise.
- Return type:
bool
- import_csv(fname: str) None ¶
Loads sample data from a CSV file.
The file should be in the following format: - Each line represents a data record. - Data fields are separated by semicolons. - The first line is a header and is skipped. - Each line should have 11 fields: Sample, Context, Area, C14 Age, Uncertainty, Excavation Area Phase, Earlier-Than, Site Phase, Long-Lived, Redeposited, Outlier.
- Parameters:
fname (str) – The file path of the CSV file to be imported.
- Raises:
ValueError – If the input file does not exist or is not formatted correctly.
- Returns:
None
- property is_clustered: bool¶
Checks if the model has been clustered.
- Returns:
True if the model has been clustered, False otherwise.
- Return type:
bool
- property is_modeled: bool¶
Checks if Bayesian modeling of sample dates has been performed for all samples.
- Returns:
True if Bayesian modeling has been performed for all samples, False otherwise.
- Return type:
bool
- property is_randomized: bool¶
Checks if the randomization test has been performed.
- Returns:
True if the randomization test has been performed, False otherwise.
- Return type:
bool
- load(directory: str | None = None) bool ¶
Loads the model from a JSON file.
Attempts to load the model data from a JSON file located in the provided directory. If no directory is provided, it uses the model’s current directory. Supports both regular and zipped JSON files.
- Parameters:
directory (str, optional) – The directory from where the model data should be loaded. If None, the model’s current directory is used.
- Returns:
True if the model data was successfully loaded, False otherwise.
- Return type:
bool
- load_oxcal_data() None ¶
Loads the OxCal data associated with the model.
Attempts to load the OxCal data from a file named ‘model.js’ located in the model’s directory. The loaded data is then stored in the model’s ‘oxcal_data’ attribute.
- Returns:
None
- property npass: int¶
The minimum number of passes for the randomization test.
- Returns:
The minimum number of passes.
- Return type:
int
- property outlier_candidates: List[str]¶
List of candidates for outliers, from which the final outliers to be eliminated were picked. These samples have conflicts between dating ranges and stratigraphic relationships with other samples.
- Returns:
A list of sample names that are considered candidates for outliers.
- Return type:
List[str]
- property outliers: List[str]¶
List of outliers among samples which need to be removed for the model to be valid.
- Returns:
A list of sample names that are considered outliers.
- Return type:
List[str]
- property oxcal_data: dict¶
The OxCal data associated with the model.
- Returns:
A dictionary containing the OxCal data if it exists, otherwise None.
- Return type:
dict or None
- property oxcal_url: str¶
The URL from where the OxCal software can be downloaded.
- Returns:
The URL of the OxCal software.
- Return type:
str
- property p_value: float¶
The p-value used for the randomization test.
- Returns:
The P-value for statistical tests.
- Return type:
float
- property phase_model: str¶
OxCal phase model type.
- Returns:
The type of the phase model. Can be ‘sequence’, ‘contiguous’, ‘overlapping’, or ‘none’.
- Return type:
str
- property phases: List[Phase]¶
- Returns:
A dictionary where the keys are (group, phase) and the values are Phase objects.
- Return type:
Dict[tuple, Phase]
- plot_clusters(fname: str | None = None, show: bool = False) str ¶
Plots the clustering results.
The plot can either be saved to a file or displayed.
- Parameters:
fname (str, optional) – The file name to save the plot to. If None, a default file name is used. Defaults to None.
show (bool, optional) – If True, the plot is displayed. Defaults to False.
- Returns:
The file name the plot was saved to.
- Return type:
str
- plot_randomized(fname: str | None = None, show: bool = False) str ¶
Plots the results of the randomization test.
The plot can either be saved to a file or displayed.
- Args:
fname (str, optional): The file name to save the plot to. If None, a default file name is used. Defaults to None. show (bool, optional): If True, the plot is displayed. Defaults to False.
- Parameters:
fname (str, optional) – The file name to save the plot to. If None, a default file name is used. Defaults to None.
show (bool, optional) – If True, the plot is displayed. Defaults to False.
- Returns:
The file name the plot was saved to.
- Return type:
str
- process(by_clusters: bool = False, by_dates: bool = False, max_cpus: int = -1, max_queue_size: int = -1, max_clusters: int = -1, save: bool = False) None ¶
Processes the complete model.
Performs the following steps: 1. Modeling stratigraphy to determine phasing 2. Finding outliers 3. Bayesian modeling of C-14 dates 4. Testing the distribution of dates for randomness 5. Clustering temporal distributions
- Parameters:
by_clusters (bool, optional) – If True, update the phasing by clustering sample dates. Defaults to False.
by_dates (bool, optional) – If True, update the phasing by comparing sample dates. Defaults to False.
max_cpus (int, optional) – Maximum number of CPUs to use for parallel processing. If -1, all available CPUs are used. Defaults to -1.
max_queue_size (int, optional) – Maximum queue size for parallel processing. If -1, the queue size is unlimited. Defaults to -1.
max_clusters (int, optional) – Maximum number of clusters to create
- Returns:
None
- process_clustering(max_cpus=-1, max_queue_size=-1, max_clusters=-1) None ¶
Performs clustering on the sample dates.
Clusters the sample dates and uses randomization testing to find the optimal clustering solution.
- Parameters:
max_cpus (int, optional) – Maximum number of CPUs to use for parallel processing. If -1, all available CPUs are used. Defaults to -1.
max_queue_size (int, optional) – Maximum queue size for parallel processing. If -1, the queue size is unlimited. Defaults to -1.
- Returns:
None
- process_dates() None ¶
Calculates the posterior probabilities of sample dates based on phasing using Bayesian modeling in OxCal.
Generates an OxCal file from the current model and runs the OxCal software on it. The results of the OxCal modeling are then loaded back into the model. The calculated posterior probabilities can be retrieved via the samples’ attributes.
Note: This method resets all calculated attributes that depend on the dating posteriors.
- Returns:
None
- process_outliers() None ¶
Identifies and marks dating outliers among the samples in the model.
Finds dating outliers among the samples which need to be removed for the model to be valid. The outliers are identified based on conflicts between their dating ranges and stratigraphic relationships with other samples. The identified outliers can be retrieved via the attributes Model.outliers and Model.outlier_candidates.
- Returns:
None
- process_phasing(by_clusters: bool = False, by_dates: bool = False) bool ¶
Updates the phasing of samples based on stratigraphic relations.
Updates the groups and phases of samples based on their stratigraphic relations.
- Parameters:
by_clusters (bool, optional) – If True, update the phasing by clustering sample dates. Defaults to False.
by_dates (bool, optional) – If True, update the phasing by comparing sample dates. Defaults to False.
- Returns:
True if phasing has changed, False otherwise.
- Return type:
bool
- process_randomization(max_cpus: int = -1, max_queue_size: int = -1) None ¶
Performs a randomization test on the sample dates.
Tests if the sample dates represent a uniform or normal distribution in time, depending on the Model.uniform parameter.
- Parameters:
max_cpus (int, optional) – Maximum number of CPUs to use for parallel processing. If -1, all available CPUs are used. Defaults to -1.
max_queue_size (int, optional) – Maximum queue size for parallel processing. If -1, the queue size is unlimited. Defaults to -1.
- Returns:
None
- property random_lower: ndarray¶
Lower bound of the randomization test. The lower bound is represented as an array where each element is the probability of the calendar year.
- Returns:
An array of the lower bound of the randomization test if it exists, otherwise None.
- Return type:
np.ndarray or None
- property random_p: float¶
The calculated p-value from the randomization test.
- Returns:
The p-value if the randomization test has been performed, otherwise None.
- Return type:
float or None
- property random_upper: ndarray¶
Upper bound of the randomization test. The Upper bound is represented as an array where each element is the probability of the calendar year.
- Returns:
An array of the upper bound of the randomization test if it exists, otherwise None.
- Return type:
np.ndarray or None
- reset_model() None ¶
Resets the model.
Resets all calculated properties of the model to their initial state. It also sets the posterior distribution of each sample in the model to None.
- Returns:
None
- property samples: Dict[str, Sample]¶
A dictionary of samples associated with the model.
- Returns:
A dictionary where the keys are the sample names and the values are Sample objects.
- Return type:
Dict[str, Sample]
- save(zipped: bool = False) None ¶
Saves the model to a JSON file.
The file is saved in the model’s directory.
- Parameters:
zipped (bool) – If True, the model is saved as a zipped JSON file. Defaults to False.
- Returns:
None
- save_csv_phases(fcsv: str | None = None) str ¶
Saves the results for phases to a CSV file.
- Parameters:
fcsv (str, optional) – The file path for the CSV file. If None, a default file name and path are used. Defaults to None.
- Returns:
The file path the results were saved to.
- Return type:
str
- save_csv_samples(fcsv: str | None = None) str ¶
Saves the results for samples to a CSV file.
- Parameters:
fcsv (str, optional) – The file path for the CSV file. If None, a default file name and path are used. Defaults to None.
- Returns:
The file path the results were saved to.
- Return type:
str
- save_outliers(fname: str | None = None) str ¶
Saves a list of outliers to a text file.
- Parameters:
fname (str, optional) – The file name to save the outliers to. If None, a default file name is used. Defaults to None.
- Returns:
The file name the outliers were saved to.
- Return type:
str
- property summed: ndarray¶
Summed probability distribution of the dating of all samples. The summed probability is represented as an array where each element is the probability of the calendar year.
- Returns:
An array of the summed probability distribution if it exists, otherwise None.
- Return type:
np.ndarray or None
- to_oxcal(fname: str | None = None) str ¶
Exports the model to an OxCal file.
Generates an OxCal file from the current model. The OxCal file can be used for further analysis in the OxCal software. If a file name is provided, the OxCal file is saved to that file. If no file name is provided, a default file name is used.
- Parameters:
fname (str, optional) – The file name to save the OxCal file to. If None, a default file name is used. Defaults to None.
- Returns:
The file name the OxCal file was saved to.
- Return type:
str
- property uncertainties: List[float]¶
List of uncertainties from C-14 dates of samples.
- Returns:
A list of uncertainties from C-14 dates of samples.
- Return type:
List[float]
- property uncertainty_base: float¶
The base uncertainty for the radiocarbon dates.
- Returns:
The base uncertainty for the radiocarbon dates.
- Return type:
float
- property uniform: bool¶
Flag indicating whether the model tests for a uniform distribution of the calendar ages.
- Returns:
True if a uniform distribution is used, False otherwise.
- Return type:
bool
- update_params(**kwargs) -> (typing.Dict[str, typing.List], <class 'set'>)¶
Updates the model parameters and resets calculated attributes if necessary.
Accepts keyword arguments that correspond to the model parameters. If a parameter is provided that differs from the current value, the model parameter is updated and all related calculated attributes are reset.
- Parameters:
kwargs (dict) – Keyword arguments corresponding to the model parameters.
- Returns:
(reset_assigned, reset_calculated); reset_assigned: {parameter: [old value, new value], …}; parameters and their values that have been updated; reset_calculated: {attribute, …}; calculated attributes that have been reset
- Return type:
(Dict[str, List], set)
- property use_wasserstein: bool¶
Flag indicating whether the model uses Wasserstein distance for the clustering distance matrix.
- Returns:
True if Wasserstein distance is used, False otherwise.
- Return type:
bool
- property years: ndarray¶
Calendar years BP corresponding to the probability distributions.
- Returns:
An array of calendar years.
- Return type:
np.ndarray