Testing Documentation
Run all tests
make test
Run specific test file
pytest tests/unit/test_mlflow_analytics.py -v
Run with coverage
pytest --cov=src --cov-report=html
Test Logging
The project is configured to have separate log files for the application and the tests.
- Application Logs:
logs/fraud_detection.log
- Test Logs:
logs/pytest.log
This is achieved by a special configuration in src/utils/logger.py
that detects when pytest
is running and modifies the logging configuration accordingly.
When pytest
is running, the setup_logging
function in src/utils/logger.py
does the following:
- It preserves any existing logging handlers that
pytest
has configured. - It adds a dedicated handler for the
logs/pytest.log
file. - It forces all loggers to propagate to the root logger, which ensures that all logs (including those from third-party libraries) are captured in the
logs/pytest.log
file.
This setup ensures that the test logs are captured in a separate file without interfering with the application's logging.
FILES
./tests/
./tests/conftest.py
./tests/integration/
./tests/integration/test_feast_basic.py
./tests/unit/
./tests/unit/test_data_loader.py
./tests/unit/test_data_profiler.py
./tests/unit/test_data_splitter.py
./tests/unit/test_data_validator.py
./tests/unit/test_mlflow_analytics.py
./tests/unit/test_mlflow_duckdb_setup.py
tests/
- Purpose: This directory contains all the tests for the project, including unit and integration tests.
- Usage: It is used to ensure the quality and correctness of the code.
- Key Information:
- It is organized into
unit/
andintegration/
subdirectories. conftest.py
provides shared fixtures for all tests.
- It is organized into
tests/conftest.py
- Purpose: This file contains shared fixtures for the pytest tests.
- Usage: It is used to define fixtures that can be used by multiple tests, such as temporary directories, mock objects, and sample data.
- Key Information:
temp_dir()
: A fixture that creates a temporary directory for test artifacts.mock_mlflow_config()
: A fixture that creates a mock MLflow configuration for testing.sample_experiment_data()
: A fixture that creates a sample DataFrame with MLflow experiment data for analytics testing.
tests/integration/
- Purpose: This directory contains all the integration tests for the project.
- Usage: It is used to test the interaction between different components of the application.
- Key Information:
- Tests are organized by the workflow they are testing.
tests/integration/test_feast_basic.py
- Purpose: This file contains integration tests for the Feast feature store.
- Usage: It verifies that the feature store is correctly configured and that features can be retrieved from both the online and offline stores.
- Key Information:
- It tests the loading of feature definitions.
- It tests the retrieval of historical features from the offline store.
- It tests the retrieval of online features from the online store.
- It tests for consistency between the online and offline stores.
tests/unit/
- Purpose: This directory contains all the unit tests for the project.
- Usage: It is used to test individual components of the application in isolation.
- Key Information:
- Tests are organized by the module they are testing.
tests/unit/test_data_loader.py
- Purpose: This file contains unit tests for the data loader.
- Usage: It is used to test the
load_data()
function insrc/data/data_loader.py
to ensure that it is working correctly. - Key Information:
- It tests for successful loading of a valid CSV file.
- It tests that a
FileNotFoundError
is raised for a non-existent file. - It tests that a
ValueError
is raised for an empty CSV file. - It tests that invalid rows are skipped and valid ones are loaded.
- It tests that an error is raised if no rows are valid.
- It directly tests the
TransactionSchema
Pydantic schema.
tests/unit/test_data_processing.py
- Purpose: This file contains unit tests for the
DataProcessor
class fromsrc/data/data_processing.py
. - Usage: It verifies the correctness of each data transformation step.
- Key Information:
- It uses
pytest
fixtures to create aDataProcessor
instance and a sample DataFrame. test_standardize_converts_type_and_adds_timestamp
: Verifies that thestandardize
method correctly converts thetype
column to a categorical dtype and creates theevent_timestamp
column with the correct values.test_filter_transaction_types
: Ensures that thefilter_transaction_types
method correctly filters the DataFrame to only includeCASH_OUT
andTRANSFER
types.test_encode_transaction_type
: Checks that theencode_transaction_type
method correctly encodes thetype
column into a binary (0/1) format.
- It uses
tests/unit/test_data_profiler.py
- Purpose: This file contains a comprehensive suite of unit tests for the
DataProfiler
class fromsrc/data/data_profiler.py
. - Usage: It verifies that the data profiling and report generation logic is correct under a wide variety of conditions.
- Key Information:
- It uses
pytest
fixtures to create a sample DataFrame and aDataProfiler
instance for testing. - It tests the successful generation of the main profile, ensuring all expected statistical sections (
numerical_stats
,categorical_stats
, etc.) are present. - It verifies that the profile can be correctly exported to a JSON file.
- It includes numerous tests for specific data quality checks, such as the detection of high missing values, constant columns, and transaction balance inconsistencies.
- It tests edge cases for the temporal and class distribution analyses, such as when the relevant columns (
step
,isFraud
) are missing or contain no positive cases.
- It uses
tests/unit/test_data_validator.py
- Purpose: This file contains unit tests for the data validator.
- Usage: It is used to test the
DataValidator
class insrc/data/data_validator.py
to ensure that it is working correctly. - Key Information:
- It tests for successful validation of a valid DataFrame.
- It tests that validation fails for incorrect data types, out-of-range values, disallowed categories, missing columns, and extra columns.
tests/unit/test_data_splitter.py
- Purpose: This file contains unit tests for the
DataSplitter
class. - Usage: It verifies the core logic of the chronological data split without depending on the actual file system.
- Key Information:
- It uses a mock configuration and an in-memory sample DataFrame to ensure the test is isolated.
- It patches
pandas.read_parquet
andpandas.to_parquet
to prevent any actual file I/O. - It verifies that the data is split according to the configured ratio (e.g., 80/20).
- Most importantly, it asserts that all data in the training set is chronologically earlier than any data in the test set, confirming the integrity of the time-based split.
tests/unit/test_mlflow_analytics.py
- Purpose: This file contains unit tests for the MLflow analytics.
- Usage: It is used to test the
MLflowAnalytics
class insrc/utils/mlflow_analytics.py
to ensure that it is working correctly. - Key Information:
- It tests the initialization of the
MLflowAnalytics
class. - It tests the
get_model_comparison()
method. - It tests the
get_experiment_timeline()
method. - It tests the
find_best_hyperparameters()
method.
- It tests the initialization of the
tests/unit/test_mlflow_duckdb_setup.py
- Purpose: This file contains unit tests for the MLflow DuckDB setup.
- Usage: It is used to test the
MLflowConfig
andMLflowDuckDBManager
classes insrc/utils/mlflow_duckdb_setup.py
to ensure that they are working correctly. - Key Information:
- It tests the initialization of the
MLflowConfig
class with default values and from environment variables. - It tests the initialization of the
MLflowDuckDBManager
class. - It tests the
setup_mlflow()
method with existing and new experiments. - It tests the
get_connection()
method for both read-write and read-only connections. - It tests the
query_experiments()
method. - It tests the handling of connection failures.
- It tests the initialization of the