Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code. Petastorm is an open-source data access library developed at Uber ATG. This library enables single machine or distributed training and evaluation of deep learning models directly from datasets in Apache Parquet format. Petastorm supports popular Python-based machine learning (ML) frameworks such as Tensorflow, PyTorch, and PySpark. It can also be used from pure Python code. A dataset created using Petastorm is stored in Apache Parquet format. On top of a Parquet schema, petastorm also stores higher-level schema information that makes multidimensional arrays into a native part of a petastorm dataset. Petastorm supports extensible data codecs. These enable a user to use one of the standard data compressions (jpeg, png) or implement her own.

Features

  • Selective column readout
  • Open source data access library
  • Multiple parallelism strategies: thread, process, single-threaded (for debug)
  • Plain Python API
  • Row filtering (row predicates)
  • Partitioning for multi-GPU training

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow Petastorm

Petastorm Web Site

Other Useful Business Software
Ship AI Apps Faster with Vertex AI Icon
Ship AI Apps Faster with Vertex AI

Go from idea to deployed AI app without managing infrastructure. Vertex AI offers one platform for the entire AI development lifecycle.

Ship AI apps and features faster with Vertex AI—your end-to-end AI platform. Access Gemini 3 and 200+ foundation models, fine-tune for your needs, and deploy with enterprise-grade MLOps. Build chatbots, agents, or custom models. New customers get $300 in free credit.
Try Vertex AI Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Petastorm!

Additional Project Details

Programming Language

Python

Related Categories

Python Libraries, Python Machine Learning Software, Python Deep Learning Frameworks

Registered

2022-08-15