write#

todo: docstring

aws_sdk_polars.s3.write.configure_s3_write_options(df: DataFrame, polars_writer: Writer, compress: Algorithm, s3pathlib_write_bytes_kwargs: Dict[str, Any]) → str[source]#

Configure S3 write options based on the polars writer.

This function sets up the necessary metadata and content-related parameters for writing a Polars DataFrame to S3. It determines the appropriate file extension and configures compression settings based on the writer format and user preferences.

Parameters:

df – The Polars DataFrame to be written.
polars_writer – The Polars writer object specifying the output format.
gzip_compress – Whether to apply gzip compression (where applicable).
s3pathlib_write_bytes_kwargs – Dictionary of keyword arguments for S3 write operation, to be modified in-place.

Returns:

The appropriate file extension for the configured write operation.

aws_sdk_polars.s3.write.configure_s3path(s3dir: Optional[S3Path] = None, fname: Optional[str] = None, ext: Optional[str] = None, s3path: Optional[S3Path] = None)[source]#

Configure and return an S3Path object for file operations.

This function allows flexible specification of an S3 path. It can either construct a path from individual components (directory, filename, and extension) or use a pre-configured S3Path object.

Parameters:

s3dir – The S3 directory path. Required if s3path is not provided.
fname – The filename without extension. Required if s3path is not provided. for example, if the full file name is “data.csv”, then fname is “data”.
ext – The file extension, including the dot (e.g., ‘.csv’). Required if s3path is not provided.
s3path – A pre-configured S3Path object. If provided, other arguments are ignored.

:return The configured S3Path object representing the full file path in S3.

aws_sdk_polars.s3.write.partition_df_for_s3(df: DataFrame, s3dir: S3Path, part_keys: List[str]) → Iterator[Tuple[DataFrame, S3Path]][source]#

Group dataframe by partition keys and locate the S3 location for each partition.

Parameters:

df – polars.DataFrame object.
s3dir – s3pathlib.S3Path object, the root directory of the S3 location of the datalake table.
part_keys – list of partition keys. for example: [“year”, “month”].

aws_sdk_polars.s3.write.write(df: DataFrame, s3_client: S3Client, polars_writer: Writer, compression: Union[str, Algorithm] = Algorithm.uncompressed, compression_kwargs: Optional[Dict[str, Any]] = None, s3pathlib_write_bytes_kwargs: Optional[Dict[str, Any]] = None, s3dir: Optional[S3Path] = None, fname: Optional[str] = None, s3path: Optional[S3Path] = None) → S3Path[source]#

Write the DataFrame to the given S3Path object, also attach additional information related to the dataframe.

The original polars.write_parquet method doesn’t work with moto, so we use buffer to store the parquet file and then write it to S3.

Parameters:

df – polars.DataFrame object.
s3_client – boto3.client("s3") object.
polars_writer – polars_writer.api.Writer object.
compression – compression method for CSV, JSON. This option is ignored for parquet, deltalake formats. Because it is already defined in polars_writer.
compression_kwargs – compression method keyword arguments. For example, for gzip <https://docs.python.org/3/library/gzip.html>, you can provide {“compresslevel”: 9}.
s3pathlib_write_bytes_kwargs – Keyword arguments for s3path.write_bytes method. See https://s3pathlib.readthedocs.io/en/latest/s3pathlib/core/rw.html#s3pathlib.core.rw.ReadAndWriteAPIMixin.write_bytes
s3dir – The S3 directory path. Required if s3path is not provided.
fname – The filename without extension. Required if s3path is not provided. for example, if the full file name is “data.csv”, then fname is “data”.
s3path – A pre-configured S3Path object. If provided, other arguments are ignored.

Returns:

the S3Path object representing the created file on S3. You could access its attribute like ‘size’, ‘etag’, ‘last_modified_at’