scikit-bio is back in active development! Check out our announcement of revitalization.
_images/logo.svg _images/logo_inv.svg

A community-driven Python library for bioinformatics, providing versatile data structures, algorithms and educational resources.

For Researchers

Robust, performant and scalable algorithms tailored for the vast landscape of biological data analysis spanning genomics, microbiomics, ecology, evolutionary biology and more. Built to unveil the insights hidden in complex, multi-omic data.

For Educators

Fundamental bioinformatics algorithms enriched by comprehensive documentation, examples and references, offering a rich resource for classroom and laboratory education (with proven success). Designed to spark curiosity and foster innovation.

For Developers

Industry-standard, production-ready Python codebase featuring a stable, unit-tested API that streamlines development and integration. Licensed under the 3-Clause BSD, it provides an expansive platform for both academic research and commercial ventures.


Install

conda install -c conda-forge scikit-bio
pip install scikit-bio
pip install git+https://github.com/scikit-bio/scikit-bio.git

See detailed instructions on installing scikit-bio on various platforms.

News


Feature Highlights

Biological sequences: Efficient data structure with a flexible grammar for easy manipulation, annotation, alignment, and conversion into motifs or k-mers for in-depth analysis.

Phylogenetic trees: Scalable tree structure tailored for evolutionary biology, supporting diverse operations in navigation, manipulation, comparison, and construction.

Community diversity analysis for ecological studies, with an extensive suite of metrics such as UniFrac and PD, optimized to handle large-scale community datasets.

Ordination methods, such as PCoA, CA, and RDA, to uncover patterns underlying high-dimensional data, facilitating insightful visualization.

Multivariate statistical tests, such as PERMANOVA, BIOENV, and Mantel, to decode complex relationships across data matrices and sample properties.

Compositional data processing and analysis, such as CLR transform and ANCOM, built for various omic data types from high-throughput experiments.