Library Analysis#
The Library Analysis module provides tools for analyzing Thompson Sampling results, identifying top building blocks, and creating visualizations for benchmarking studies.
Module Architecture#
Quick Start#
Analyze results and find top building blocks:
import polars as pl
from TACTICS.library_analysis import LibraryAnalysis, LibraryVisualization
# Load results
results_df = pl.read_csv("results.csv")
# Create analysis with SMILES files for building block lookup
analysis = LibraryAnalysis(
df=results_df,
smiles_files=["acids.smi", "amines.smi"],
product_code_column="Product_Code",
score_column="Scores"
)
# Find top building blocks in top 100 compounds
counters, total = analysis.find_top_building_blocks(cutoff=100)
# Visualize top building blocks
viz = LibraryVisualization(analysis)
viz.visualize_top_building_blocks(top_n=20)
Benchmark Thompson Sampling methods:
from TACTICS.library_analysis.visualization import TS_Benchmarks
# TS_Benchmarks auto-generates all data during initialization
benchmarks = TS_Benchmarks(
no_of_cycles=10,
methods_list=["roulette_wheel", "bayes_ucb", "greedy"],
TS_runs_data={
"roulette_wheel": [cycle1_df, cycle2_df, ...],
"bayes_ucb": [cycle1_df, cycle2_df, ...],
"greedy": [cycle1_df, cycle2_df, ...],
},
reference_data=reference_df,
top_n=100,
sort_type="minimize"
)
# All data pre-calculated - ready to plot immediately
benchmarks.stripplot_TS_results()
benchmarks.plot_barplot_TS_results()
benchmarks.plot_line_performance_with_error_bars()
LibraryAnalysis#
Analyzes chemical libraries to identify top building blocks and their overlap.
Dependencies
Polars or Pandas DataFrame with product codes and scores
SMILES files (.smi) containing building block structures
Used by LibraryVisualization for visualizations
Constructor#
Parameter |
Type |
Required |
Description |
|---|---|---|---|
|
|
Yes |
Polars or Pandas DataFrame with product codes and scores. |
|
|
Yes |
Path(s) to .smi file(s) containing SMILES and building block codes. |
|
|
No |
Name of product code column. Default: |
|
|
No |
Name of score column. Default: |
Methods#
find_top_building_blocks#
Identify and count building blocks in top products.
Parameter |
Type |
Required |
Description |
|---|---|---|---|
|
|
Yes |
Number of top products to analyze. |
|
|
No |
|
|
|
No |
Number of top building blocks per position. Default: 100. |
Returns
Type |
Description |
|---|---|
|
(position_counters, total_molecules) |
Example
from TACTICS.library_analysis import LibraryAnalysis
analysis = LibraryAnalysis(
df=results_df,
smiles_files=["acids.smi", "amines.smi"],
product_code_column="Product_Code",
score_column="Scores"
)
counters, total = analysis.find_top_building_blocks(cutoff=100)
check_overlap#
Check overlap between current cutoff and a new cutoff value.
Parameter |
Type |
Required |
Description |
|---|---|---|---|
|
|
Yes |
New cutoff value to compare against. |
Returns
Type |
Description |
|---|---|
|
Overlap information for each position. |
compare_analysis_overlap#
Compare building block overlap with another LibraryAnalysis instance.
Parameter |
Type |
Required |
Description |
|---|---|---|---|
|
|
Yes |
Another analysis to compare with. |
|
|
No |
Number of top building blocks. Default: 100. |
Returns
Type |
Description |
|---|---|
|
Overlap results for each position. |
LibraryVisualization#
Creates visualizations from LibraryAnalysis instances.
Dependencies
Requires LibraryAnalysis instance
RDKit for molecular structure rendering
Depends on: LibraryAnalysis
Constructor#
Parameter |
Type |
Required |
Description |
|---|---|---|---|
|
|
Yes |
Analysis instance containing data to visualize. |
Methods#
visualize_top_building_blocks#
Display top building blocks using RDKit molecular drawings.
Parameter |
Type |
Required |
Description |
|---|---|---|---|
|
|
No |
Show overlapping building blocks. Default: False. |
|
|
No |
Molecules per row in grid. Default: 5. |
|
|
No |
Size of each molecule image. Default: (300, 300). |
|
|
No |
Another analysis for overlap comparison. |
|
|
No |
Number of top building blocks. Default: 20. |
plot_top_products_comparison#
Generate subplots comparing overlap between analysis instances and a reference.
Parameter |
Type |
Required |
Description |
|---|---|---|---|
|
|
Yes |
List of LibraryAnalysis instances to compare. |
|
|
Yes |
Reference analysis instance. |
|
|
No |
Top products to consider. Default: 100. |
|
|
No |
Plot title. |
|
|
No |
Figure size (width, height). Default: (15, 10). |
|
|
No |
Labels for each analysis group. |
|
|
No |
Path to save the plot. |
TS_Benchmarks#
Comprehensive benchmarking and visualization for Thompson Sampling results across multiple cycles and search strategies.
Key Features:
Automatic data generation during initialization
Consistent color schemes across all plots
Multi-cycle analysis with statistics
Reference comparison with performance metrics
Multiple visualization types (strip, bar, line plots)
Constructor#
All required data is automatically generated during initialization.
Parameter |
Type |
Required |
Description |
|---|---|---|---|
|
|
Yes |
Number of cycles to analyze. |
|
|
Yes |
List of method names (search strategies). |
|
|
Yes |
Maps method names to lists of DataFrames (one per cycle). |
|
|
No |
Ground truth reference data for comparison. |
|
|
No |
Top products for bar plot. Default: 100. |
|
|
No |
|
|
|
No |
Top-N values for line plot. Default: [50, 100, 200, 300, 400, 500]. |
Automatic Data Storage:
Attribute |
Description |
|---|---|
|
Top N compounds from each method/cycle (for stripplot). |
|
All compounds from each method/cycle. |
|
Hit recovery data for bar plots. |
|
Raw performance data across cycles. |
|
Statistical summaries with mean, std, error bounds. |
|
Methods found in data (for validation). |
Visualization Methods#
stripplot_TS_results#
Generate strip plot showing score distributions across cycles and methods.
Parameter |
Type |
Required |
Description |
|---|---|---|---|
|
|
No |
Plot width in pixels (auto-calculated if None). |
|
|
No |
Plot height in pixels (auto-calculated if None). |
|
|
No |
Path to save (.html, .png, .svg). |
|
|
No |
Display in Jupyter. Default: True. |
|
|
No |
Position of legend: |
Returns
Type |
Description |
|---|---|
|
Altair chart object (or None if saved). |
plot_barplot_TS_results#
Create grouped bar plot showing reference hit recovery by method and cycle.
Parameter |
Type |
Required |
Description |
|---|---|---|---|
|
|
No |
Plot width in pixels (auto-calculated if None). |
|
|
No |
Plot height in pixels. Default: 400. |
|
|
No |
Path to save (.html, .png, .svg). |
|
|
No |
Display in Jupyter. Default: True. |
|
|
No |
Position of legend: |
|
|
No |
Use white text for bar labels (for dark backgrounds). Default: False. |
plot_line_performance_with_error_bars#
Create line plot with error bars showing mean performance across top-N cutoffs.
Parameter |
Type |
Required |
Description |
|---|---|---|---|
|
|
No |
Plot width in pixels. Default: 800. |
|
|
No |
Plot height in pixels. Default: 500. |
|
|
No |
Path to save (.html, .png, .svg). |
|
|
No |
Display in Jupyter. Default: True. |
|
|
No |
Position of legend: |
Other Methods#
Method |
Description |
|---|---|
|
Get dict containing all stored data and charts. |
|
Internal: Generate combined datasets. |
|
Internal: Generate bar plot data. |
|
Internal: Generate line plot data. |
Complete Example#
import polars as pl
from TACTICS.library_analysis.visualization import TS_Benchmarks
# Load results from multiple runs
rw_runs = [pl.read_csv(f"rw_cycle_{i}.csv") for i in range(10)]
ucb_runs = [pl.read_csv(f"ucb_cycle_{i}.csv") for i in range(10)]
greedy_runs = [pl.read_csv(f"greedy_cycle_{i}.csv") for i in range(10)]
# Load reference data
reference = pl.read_csv("reference_scores.csv")
# Create benchmarks (all data generated automatically)
benchmarks = TS_Benchmarks(
no_of_cycles=10,
methods_list=["RouletteWheel", "BayesUCB", "Greedy"],
TS_runs_data={
"RouletteWheel": rw_runs,
"BayesUCB": ucb_runs,
"Greedy": greedy_runs,
},
reference_data=reference,
top_n=100,
sort_type="minimize",
top_ns=[25, 50, 100, 200, 300]
)
# Generate all visualizations
strip_chart = benchmarks.stripplot_TS_results(
width=800, height=500, save_path="strip_plot.html",
legend_position="right" # or "bottom" for horizontal legend
)
bar_chart = benchmarks.plot_barplot_TS_results(
width=700, height=400, save_path="bar_plot.html",
legend_position="bottom", # horizontal legend below plot
dark_mode=False # set True for white text on dark backgrounds
)
line_chart = benchmarks.plot_line_performance_with_error_bars(
width=900, height=600, save_path="line_plot.html",
legend_position="right"
)
# Access computed statistics
print(f"Methods analyzed: {benchmarks.actual_methods}")
summary = benchmarks.get_performance_summary()
Workflow Overview#
Typical analysis workflow:
Run Thompson Sampling - Generate results DataFrames
LibraryAnalysis - Identify top building blocks and their frequencies
LibraryVisualization - Create molecular structure grids
TS_Benchmarks - Compare multiple methods with statistical analysis