read#
This module provides functions for reading data from S3 into Polars DataFrames.
It includes functions for reading various file formats (CSV, JSON, NDJSON, Parquet, …) from S3, with support for decompression and reading multiple files.
- aws_sdk_polars.s3.read.read_csv(s3path: S3Path, s3_client: S3Client, pl_kwargs: Optional[Dict[str, Any]] = None, decompress: Union[str, Algorithm] = Algorithm.uncompressed, decompress_kwargs: Optional[Dict[str, Any]] = None) DataFrame[source]#
Read a CSV file from S3 into a Polars DataFrame.
- Parameters:
s3path – S3Path object representing the CSV file to be read.
s3_client – Boto3 S3 client.
pl_kwargs – Optional keyword arguments for Polars read_csv method.
decompress – Decompression algorithm to use, if the file is compressed.
decompress_kwargs – Optional keyword arguments for decompression.
- Returns:
Polars DataFrame containing the data from the CSV file.
- aws_sdk_polars.s3.read.read_json(s3path: S3Path, s3_client: S3Client, pl_kwargs: Optional[Dict[str, Any]] = None, decompress: Union[str, Algorithm] = Algorithm.uncompressed, decompress_kwargs: Optional[Dict[str, Any]] = None) DataFrame[source]#
Read a JSON file from S3 into a Polars DataFrame.
- Parameters:
s3path – S3Path object representing the JSON file to be read.
s3_client – Boto3 S3 client.
pl_kwargs – Optional keyword arguments for Polars read_json method.
decompress – Decompression algorithm to use, if the file is compressed.
decompress_kwargs – Optional keyword arguments for decompression.
- Returns:
Polars DataFrame containing the data from the JSON file.
- aws_sdk_polars.s3.read.read_ndjson(s3path: S3Path, s3_client: S3Client, pl_kwargs: Optional[Dict[str, Any]] = None, decompress: Union[str, Algorithm] = Algorithm.uncompressed, decompress_kwargs: Optional[Dict[str, Any]] = None) DataFrame[source]#
Read an NDJSON (Newline Delimited JSON) file from S3 into a Polars DataFrame.
- Parameters:
s3path – S3Path object representing the NDJSON file to be read.
s3_client – Boto3 S3 client.
pl_kwargs – Optional keyword arguments for Polars read_ndjson method.
decompress – Decompression algorithm to use, if the file is compressed.
decompress_kwargs – Optional keyword arguments for decompression.
- Returns:
Polars DataFrame containing the data from the NDJSON file.
- aws_sdk_polars.s3.read.read_parquet(s3path: S3Path, s3_client: S3Client, pl_kwargs: Optional[Dict[str, Any]] = None, decompress: Union[str, Algorithm] = Algorithm.uncompressed, decompress_kwargs: Optional[Dict[str, Any]] = None) DataFrame[source]#
Read a Parquet file from S3 into a Polars DataFrame.
- Parameters:
s3path – S3Path object representing the Parquet file to be read.
s3_client – Boto3 S3 client.
pl_kwargs – Optional keyword arguments for Polars read_parquet method.
decompress – Decompression algorithm to use, if the file is compressed.
decompress_kwargs – Optional keyword arguments for decompression.
- Returns:
Polars DataFrame containing the data from the Parquet file.
- aws_sdk_polars.s3.read.read_many_csv(s3path_list: Iterable[S3Path], s3_client: S3Client, pl_kwargs: Optional[Dict[str, Any]] = None, decompress: Union[str, Algorithm] = Algorithm.uncompressed, decompress_kwargs: Optional[Dict[str, Any]] = None, merge_col: bool = False) DataFrame[source]#
Read multiple CSV files from S3 into a single Polars DataFrame.
- Parameters:
s3path_list – Iterable of S3Path objects representing the CSV files to be read.
s3_client – Boto3 S3 client.
pl_kwargs – Optional keyword arguments for Polars read_csv method.
decompress – Decompression algorithm to use, if the files are compressed.
decompress_kwargs – Optional keyword arguments for decompression.
merge_col – If True, merge columns of different schemas; if False, use simple concatenation.
- Returns:
Polars DataFrame containing the combined data from all CSV files.
- aws_sdk_polars.s3.read.read_many_json(s3path_list: Iterable[S3Path], s3_client: S3Client, pl_kwargs: Optional[Dict[str, Any]] = None, decompress: Union[str, Algorithm] = Algorithm.uncompressed, decompress_kwargs: Optional[Dict[str, Any]] = None, merge_col: bool = False) DataFrame[source]#
Read multiple JSON files from S3 into a single Polars DataFrame.
- Parameters:
s3path_list – Iterable of S3Path objects representing the JSON files to be read.
s3_client – Boto3 S3 client.
pl_kwargs – Optional keyword arguments for Polars read_json method.
decompress – Decompression algorithm to use, if the files are compressed.
decompress_kwargs – Optional keyword arguments for decompression.
merge_col – If True, merge columns of different schemas; if False, use simple concatenation.
- Returns:
Polars DataFrame containing the combined data from all JSON files.
- aws_sdk_polars.s3.read.read_many_ndjson(s3path_list: Iterable[S3Path], s3_client: S3Client, pl_kwargs: Optional[Dict[str, Any]] = None, decompress: Union[str, Algorithm] = Algorithm.uncompressed, decompress_kwargs: Optional[Dict[str, Any]] = None, merge_col: bool = False) DataFrame[source]#
Read multiple NDJSON files from S3 into a single Polars DataFrame.
- Parameters:
s3path_list – Iterable of S3Path objects representing the NDJSON files to be read.
s3_client – Boto3 S3 client.
pl_kwargs – Optional keyword arguments for Polars read_ndjson method.
decompress – Decompression algorithm to use, if the files are compressed.
decompress_kwargs – Optional keyword arguments for decompression.
merge_col – If True, merge columns of different schemas; if False, use simple concatenation.
- Returns:
Polars DataFrame containing the combined data from all NDJSON files.
- aws_sdk_polars.s3.read.read_many_parquet(s3path_list: Iterable[S3Path], s3_client: S3Client, pl_kwargs: Optional[Dict[str, Any]] = None, decompress: Union[str, Algorithm] = Algorithm.uncompressed, decompress_kwargs: Optional[Dict[str, Any]] = None, merge_col: bool = False) DataFrame[source]#
Read multiple Parquet files from S3 into a single Polars DataFrame.
- Parameters:
s3path_list – Iterable of S3Path objects representing the Parquet files to be read.
s3_client – Boto3 S3 client.
pl_kwargs – Optional keyword arguments for Polars read_parquet method.
decompress – Decompression algorithm to use, if the files are compressed.
decompress_kwargs – Optional keyword arguments for decompression.
merge_col – If True, merge columns of different schemas; if False, use simple concatenation.
- Returns:
Polars DataFrame containing the combined data from all Parquet files.