Model Class

class sitesyncro.Model.Model(**kwargs)

A class representing a Bayesian model of dated samples interconnected by stratigraphic relationships.

Parameters:
  • directory (str) – Working directory for model data (default is “model”).

  • samples (list) – List of samples as instances of the class [Sample](#sample_class)

  • curve_name (str) – The name of the calibration curve to use (default is “intcal20.14c”).

  • phase_model (str) – OxCal phase model type. Can be ‘sequence’, ‘contiguous’, ‘overlapping’, or ‘none’ (default is “sequence”).

  • cluster_n (int) – Number of clusters to form (-1 = automatic; default is -1).

  • cluster_selection (str) – The method used to select the optimal number of clusters. Can be ‘silhouette’ or ‘mcst’ (default is ‘silhouette’).

  • use_wasserstein (bool) – Use Wasserstein distance to calculate the distance matrix for clustering.

  • uniform (bool) – Flag indicating whether to use uniform randomization (default is False).

  • p_value (float) – The P-value for statistical tests (default is 0.05).

  • uncertainty_base (float) – The base uncertainty for randomization (default is 15).

  • npass (int) – Minimum number of passes for the randomization tests (default is 100).

  • convergence (float) – Convergence threshold for the randomization tests (default is 0.99).

  • oxcal_url (str) – Url to download the OxCal program (default is “https://c14.arch.ox.ac.uk/OxCalDistribution.zip”).

  • overwrite (bool) – Flag indicating whether to overwrite existing data in the model directory (default is False).

add_sample(*args, **kwargs) None

Adds a sample to the model.

Accepts either a single argument of type Sample or a set of arguments and keyword arguments to create a new Sample instance.

If a single argument of type Sample is provided, it is added to the model’s samples directly. If multiple arguments are provided, they are used to create a new Sample instance which is then added to the model’s samples.

Parameters:
  • args – Either a single argument of type Sample or multiple arguments to create a new Sample instance.

  • kwargs – Keyword arguments to create a new Sample instance.

Returns:

None

property areas: List[str]

Unique area names extracted from the samples.

Returns:

A sorted list of unique area names.

Return type:

List[str]

property cluster_means: Dict[int, Dict[int, float]]

Mean date of the samples in each cluster in calendar years BP.

Returns:

{clusters_n: {cluster: year, …}, …}; clusters_n = number of clusters

Return type:

Dict[int, Dict[int, float]]

property cluster_n: int

Number of clusters to form (-1 = automatic).

Returns:

The number of clusters to form.

Return type:

int

property cluster_opt_n: int

Optimal number of clusters based on the silhouette scores.

The optimal number of clusters is the one that maximizes the average silhouette score for clustering solutions for which the p-value is lower than Model.p_value.

Returns:

The optimal number of clusters if the clustering has been performed, None otherwise.

Return type:

int or None

property cluster_ps: Dict[int, float]

P-values of the clustering solutions.

Returns:

{clusters_n: p, …}; clusters_n = number of clusters

Return type:

Dict[int, float]

property cluster_selection: str

The method used to select the optimal number of clusters.

Returns:

The method used to select the optimal number of clusters. Can be ‘silhouette’ or ‘mcst’.

Return type:

str

property cluster_sils: Dict[int, float]

Silhouette scores of each clustering solution.

The silhouette score is a measure of how similar an object is to its own cluster compared to other clusters.

Returns:

{clusters_n: silhouette, …}; clusters_n = number of clusters

Return type:

Dict[int, float]

property clusters: Dict[int, Dict[int, List[str]]]

Clusters of samples based on the similarity of their probability distributions.

Returns:

{clusters_n: [cluster: [sample name, …], …}, …}; clusters_n = number of clusters

Return type:

Dict[int, Dict[int, List[str]]]

property contexts: List[str]

Unique context names extracted from the samples.

Returns:

A sorted list of unique context names.

Return type:

List[str]

property convergence: float

The convergence threshold for the randomization test.

Returns:

The convergence threshold for the randomization tests.

Return type:

float

copy(directory: str) object

Creates a copy of the current model with a new directory.

Creates a new instance of the Model class with the same parameters and data as the current model, but with a different directory. The new directory is provided as an argument.

Parameters:

directory (str) – The directory for the new model.

Returns:

A new instance of the Model class with the same data as the current model but a different directory.

Return type:

Model

property curve: ndarray

Radiocarbon calibration curve.

2D array containing the calibration curve data. Each row represents a calendar year BP, C-14 year, and uncertainty.

Returns:

An array of the calibration curve.

Return type:

np.ndarray

property curve_name: str

File name of the radiocarbon age calibration curve (see OxCal/bin directory).

Returns:

The name of the calibration curve.

Return type:

str

del_sample(name: str) None

Deletes a sample from the model.

Removes a sample from the model’s samples based on the provided sample name.

Parameters:

name (str) – The name of the sample to be deleted.

Returns:

None

property directory: str

The directory where the model data is stored.

Returns:

The directory where the model data is stored.

Return type:

str

property groups: Dict[str, List[str]]

Groups that the samples belong to based on stratigraphic interconnection with other samples. The groups are represented as a dictionary where the keys are the group names and the values are lists of sample names.

Returns:

A dictionary where the keys are the group names and the values are lists of sample names.

Return type:

Dict[str, List[str]]

property has_data: bool

Checks if the model has any associated sample data.

Returns:

True if the model has sample data, False otherwise.

Return type:

bool

import_csv(fname: str) None

Loads sample data from a CSV file.

The file should be in the following format: - Each line represents a data record. - Data fields are separated by semicolons. - The first line is a header and is skipped. - Each line should have 11 fields: Sample, Context, Area, C14 Age, Uncertainty, Excavation Area Phase, Earlier-Than, Site Phase, Long-Lived, Redeposited, Outlier.

Parameters:

fname (str) – The file path of the CSV file to be imported.

Raises:

ValueError – If the input file does not exist or is not formatted correctly.

Returns:

None

property is_clustered: bool

Checks if the model has been clustered.

Returns:

True if the model has been clustered, False otherwise.

Return type:

bool

property is_modeled: bool

Checks if Bayesian modeling of sample dates has been performed for all samples.

Returns:

True if Bayesian modeling has been performed for all samples, False otherwise.

Return type:

bool

property is_randomized: bool

Checks if the randomization test has been performed.

Returns:

True if the randomization test has been performed, False otherwise.

Return type:

bool

load(directory: str | None = None) bool

Loads the model from a JSON file.

Attempts to load the model data from a JSON file located in the provided directory. If no directory is provided, it uses the model’s current directory. Supports both regular and zipped JSON files.

Parameters:

directory (str, optional) – The directory from where the model data should be loaded. If None, the model’s current directory is used.

Returns:

True if the model data was successfully loaded, False otherwise.

Return type:

bool

load_oxcal_data() None

Loads the OxCal data associated with the model.

Attempts to load the OxCal data from a file named ‘model.js’ located in the model’s directory. The loaded data is then stored in the model’s ‘oxcal_data’ attribute.

Returns:

None

property npass: int

The minimum number of passes for the randomization test.

Returns:

The minimum number of passes.

Return type:

int

property outlier_candidates: List[str]

List of candidates for outliers, from which the final outliers to be eliminated were picked. These samples have conflicts between dating ranges and stratigraphic relationships with other samples.

Returns:

A list of sample names that are considered candidates for outliers.

Return type:

List[str]

property outliers: List[str]

List of outliers among samples which need to be removed for the model to be valid.

Returns:

A list of sample names that are considered outliers.

Return type:

List[str]

property oxcal_data: dict

The OxCal data associated with the model.

Returns:

A dictionary containing the OxCal data if it exists, otherwise None.

Return type:

dict or None

property oxcal_url: str

The URL from where the OxCal software can be downloaded.

Returns:

The URL of the OxCal software.

Return type:

str

property p_value: float

The p-value used for the randomization test.

Returns:

The P-value for statistical tests.

Return type:

float

property phase_model: str

OxCal phase model type.

Returns:

The type of the phase model. Can be ‘sequence’, ‘contiguous’, ‘overlapping’, or ‘none’.

Return type:

str

property phases: List[Phase]
Returns:

A dictionary where the keys are (group, phase) and the values are Phase objects.

Return type:

Dict[tuple, Phase]

plot_clusters(fname: str | None = None, show: bool = False) str

Plots the clustering results.

The plot can either be saved to a file or displayed.

Parameters:
  • fname (str, optional) – The file name to save the plot to. If None, a default file name is used. Defaults to None.

  • show (bool, optional) – If True, the plot is displayed. Defaults to False.

Returns:

The file name the plot was saved to.

Return type:

str

plot_randomized(fname: str | None = None, show: bool = False) str

Plots the results of the randomization test.

The plot can either be saved to a file or displayed.

Args:

fname (str, optional): The file name to save the plot to. If None, a default file name is used. Defaults to None. show (bool, optional): If True, the plot is displayed. Defaults to False.

Parameters:
  • fname (str, optional) – The file name to save the plot to. If None, a default file name is used. Defaults to None.

  • show (bool, optional) – If True, the plot is displayed. Defaults to False.

Returns:

The file name the plot was saved to.

Return type:

str

process(by_clusters: bool = False, by_dates: bool = False, max_cpus: int = -1, max_queue_size: int = -1, max_clusters: int = -1, save: bool = False) None

Processes the complete model.

Performs the following steps: 1. Modeling stratigraphy to determine phasing 2. Finding outliers 3. Bayesian modeling of C-14 dates 4. Testing the distribution of dates for randomness 5. Clustering temporal distributions

Parameters:
  • by_clusters (bool, optional) – If True, update the phasing by clustering sample dates. Defaults to False.

  • by_dates (bool, optional) – If True, update the phasing by comparing sample dates. Defaults to False.

  • max_cpus (int, optional) – Maximum number of CPUs to use for parallel processing. If -1, all available CPUs are used. Defaults to -1.

  • max_queue_size (int, optional) – Maximum queue size for parallel processing. If -1, the queue size is unlimited. Defaults to -1.

  • max_clusters (int, optional) – Maximum number of clusters to create

Returns:

None

process_clustering(max_cpus=-1, max_queue_size=-1, max_clusters=-1) None

Performs clustering on the sample dates.

Clusters the sample dates and uses randomization testing to find the optimal clustering solution.

Parameters:
  • max_cpus (int, optional) – Maximum number of CPUs to use for parallel processing. If -1, all available CPUs are used. Defaults to -1.

  • max_queue_size (int, optional) – Maximum queue size for parallel processing. If -1, the queue size is unlimited. Defaults to -1.

Returns:

None

process_dates() None

Calculates the posterior probabilities of sample dates based on phasing using Bayesian modeling in OxCal.

Generates an OxCal file from the current model and runs the OxCal software on it. The results of the OxCal modeling are then loaded back into the model. The calculated posterior probabilities can be retrieved via the samples’ attributes.

Note: This method resets all calculated attributes that depend on the dating posteriors.

Returns:

None

process_outliers() None

Identifies and marks dating outliers among the samples in the model.

Finds dating outliers among the samples which need to be removed for the model to be valid. The outliers are identified based on conflicts between their dating ranges and stratigraphic relationships with other samples. The identified outliers can be retrieved via the attributes Model.outliers and Model.outlier_candidates.

Returns:

None

process_phasing(by_clusters: bool = False, by_dates: bool = False) bool

Updates the phasing of samples based on stratigraphic relations.

Updates the groups and phases of samples based on their stratigraphic relations.

Parameters:
  • by_clusters (bool, optional) – If True, update the phasing by clustering sample dates. Defaults to False.

  • by_dates (bool, optional) – If True, update the phasing by comparing sample dates. Defaults to False.

Returns:

True if phasing has changed, False otherwise.

Return type:

bool

process_randomization(max_cpus: int = -1, max_queue_size: int = -1) None

Performs a randomization test on the sample dates.

Tests if the sample dates represent a uniform or normal distribution in time, depending on the Model.uniform parameter.

Parameters:
  • max_cpus (int, optional) – Maximum number of CPUs to use for parallel processing. If -1, all available CPUs are used. Defaults to -1.

  • max_queue_size (int, optional) – Maximum queue size for parallel processing. If -1, the queue size is unlimited. Defaults to -1.

Returns:

None

property random_lower: ndarray

Lower bound of the randomization test. The lower bound is represented as an array where each element is the probability of the calendar year.

Returns:

An array of the lower bound of the randomization test if it exists, otherwise None.

Return type:

np.ndarray or None

property random_p: float

The calculated p-value from the randomization test.

Returns:

The p-value if the randomization test has been performed, otherwise None.

Return type:

float or None

property random_upper: ndarray

Upper bound of the randomization test. The Upper bound is represented as an array where each element is the probability of the calendar year.

Returns:

An array of the upper bound of the randomization test if it exists, otherwise None.

Return type:

np.ndarray or None

reset_model() None

Resets the model.

Resets all calculated properties of the model to their initial state. It also sets the posterior distribution of each sample in the model to None.

Returns:

None

property samples: Dict[str, Sample]

A dictionary of samples associated with the model.

Returns:

A dictionary where the keys are the sample names and the values are Sample objects.

Return type:

Dict[str, Sample]

save(zipped: bool = False) None

Saves the model to a JSON file.

The file is saved in the model’s directory.

Parameters:

zipped (bool) – If True, the model is saved as a zipped JSON file. Defaults to False.

Returns:

None

save_csv_phases(fcsv: str | None = None) str

Saves the results for phases to a CSV file.

Parameters:

fcsv (str, optional) – The file path for the CSV file. If None, a default file name and path are used. Defaults to None.

Returns:

The file path the results were saved to.

Return type:

str

save_csv_samples(fcsv: str | None = None) str

Saves the results for samples to a CSV file.

Parameters:

fcsv (str, optional) – The file path for the CSV file. If None, a default file name and path are used. Defaults to None.

Returns:

The file path the results were saved to.

Return type:

str

save_outliers(fname: str | None = None) str

Saves a list of outliers to a text file.

Parameters:

fname (str, optional) – The file name to save the outliers to. If None, a default file name is used. Defaults to None.

Returns:

The file name the outliers were saved to.

Return type:

str

property summed: ndarray

Summed probability distribution of the dating of all samples. The summed probability is represented as an array where each element is the probability of the calendar year.

Returns:

An array of the summed probability distribution if it exists, otherwise None.

Return type:

np.ndarray or None

to_oxcal(fname: str | None = None) str

Exports the model to an OxCal file.

Generates an OxCal file from the current model. The OxCal file can be used for further analysis in the OxCal software. If a file name is provided, the OxCal file is saved to that file. If no file name is provided, a default file name is used.

Parameters:

fname (str, optional) – The file name to save the OxCal file to. If None, a default file name is used. Defaults to None.

Returns:

The file name the OxCal file was saved to.

Return type:

str

property uncertainties: List[float]

List of uncertainties from C-14 dates of samples.

Returns:

A list of uncertainties from C-14 dates of samples.

Return type:

List[float]

property uncertainty_base: float

The base uncertainty for the radiocarbon dates.

Returns:

The base uncertainty for the radiocarbon dates.

Return type:

float

property uniform: bool

Flag indicating whether the model tests for a uniform distribution of the calendar ages.

Returns:

True if a uniform distribution is used, False otherwise.

Return type:

bool

update_params(**kwargs) -> (typing.Dict[str, typing.List], <class 'set'>)

Updates the model parameters and resets calculated attributes if necessary.

Accepts keyword arguments that correspond to the model parameters. If a parameter is provided that differs from the current value, the model parameter is updated and all related calculated attributes are reset.

Parameters:

kwargs (dict) – Keyword arguments corresponding to the model parameters.

Returns:

(reset_assigned, reset_calculated); reset_assigned: {parameter: [old value, new value], …}; parameters and their values that have been updated; reset_calculated: {attribute, …}; calculated attributes that have been reset

Return type:

(Dict[str, List], set)

property use_wasserstein: bool

Flag indicating whether the model uses Wasserstein distance for the clustering distance matrix.

Returns:

True if Wasserstein distance is used, False otherwise.

Return type:

bool

property years: ndarray

Calendar years BP corresponding to the probability distributions.

Returns:

An array of calendar years.

Return type:

np.ndarray