Dask (software)
Dask is an open source library for parallel computing written in Python.[2][3] Originally developed by Matthew Rocklin, Dask is a community project maintained and sponsored by developers and organizations.
![]() | |
| Original author(s) | Matthew Rocklin |
|---|---|
| Developer(s) | Dask |
| Initial release | January 8, 2015 |
| Stable release | 2021.09.01
/ September 1, 2021 |
| Repository | Dask Repository |
| Written in | Python[1] |
| Operating system | Linux, Microsoft Windows, macOS |
| Available in | Python |
| Type | Data analytics |
| License | New BSD |
| Website | dask |
Overview
Dask is a library composed of two parts. It includes a task scheduling component for building dependency graphs and scheduling tasks. Second, it includes the distributed data structures with APIs similar to Pandas Dataframes or NumPy arrays. Dask has a variety of use cases and can be run with a single node and scale to thousand node clusters.[4]
References
- "Dask: Parallel Computation with Blocked algorithms and Task Scheduling" (PDF).
This paper introduces dask, a specification to encode parallel algorithms, using primitive Python dictionaries, tuples, and callables.
- Daniel, Jesse C. (2019). Data Science at Scale with Python and Dask. Manning Publications. ISBN 9781617295607.
- Rocklin, Matthew (2015). "Dask: Parallel Computation with Blocked algorithms and Task Scheduling". Proceedings of the 14th Python in Science Conference: 126–132. doi:10.25080/Majora-7b98e3ed-013.
- "Dask — Dask documentation".
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.
