CSV

  • Capella Columnar
  • reference
    +
    This topic explains how to use the COPY TO statement to export data from a database to Amazon S3 or Google Cloud Storage (GCS) in CSV format.

    COPY TO CSV introduces the ability to write results of queries or entire collections to external stores (such as AWS S3 and Google Cloud Storage) in CSV format.

    To be able to read or write data to or from external cloud storage exclusive permissions are required. For more information see Cloud Read/Write Permissions.

    Syntax

    CopyTo EBNF
    CopyTo ::= "COPY" SourceDefinition
               "TO" ContainerIdentifier
               "AT" LinkQualifiedName
               OutputClause
               WithClause
    Show SourceDefinition Diagram
    ( QualifiedName | "(" Query ")" ) ("AS"? AliasIdentifier )?
    SourceDefinition
    Show OutputClause Diagram
    "PATH" "(" OutputPathExpr ("," OutputPathExpr)* ")" OverClause? SchemaClause?
    OutputClause
    Show OutputPathExpr Diagram
    Expr
    OutputPathExpr
    Show OverClause Diagram
    "OVER" "(" PartitionClause? OrderClause? ")"
    OverClause
    Show PartitionClause Diagram
    "PARTITION" "BY" PartitionExpr ("," PartitionExpr)*
    PartitionClause
    Show PartitionExpr Diagram
    Expression ("AS" AliasIdentifier)?
    PartitionExpr
    Show OrderClause Diagram
    "ORDER" "BY"  OrderExpr ("," OrderExpr)*
    OrderClause
    Show OrderExpr Diagram
    Expression ("ASC" | "DESC")? ( "NULLS" ( "FIRST" | "LAST" ) )?
    OrderExpr
    Show SchemaClause Diagram
    "AS" CSV_Type_Expr
    SchemaClause
    Show CSVTypeExpr Diagram
    (" Type_Expr ("," Type_Expr)* ")
    CSVTypeExpr
    Show TypeExpr Diagram
    Field_Name Flat_Type (
    Type_Expr
    Show FlatType Diagram
    (
    FlatType
    Show WithClause Diagram
    WITH
    WithClause
    Show NameValuePair Diagram
    NameStringLiteral ":" ValueLiteral
    NameValuePair

    Example

      COPY (
        SELECT o.custid, o.order_date, o.orderno, o.gender, o.zip
        FROM Orders o
      ) AS t
      TO myS3Bucket
      AT myS3Link
      PATH ("commerce/Orders/zip-" || zip || "/")
      OVER (PARTITION BY t.zip AS zip)
      AS (
          custid string NOT UNKNOWN,
          orderno int,
          order_date string,
          gender string,
          zip string
      )
      WITH {
    	    "format": "csv",
    	    "header": true
    }

    Arguments

    SourceDefinition

    As the source, you specify either the fully qualified name of a collection or provide a query.

    • If you specify a collection name, then the whole collection—or view or synonym—is the source of data to copy.

    • If you specify a query, then the result of that query is the source of data.

    TO

    The TO clause identifies the bucket name on the external data source, an Amazon S3 bucket in this case.

    AT

    The AT clause specifies the name of the link that contains credentials for the S3 bucket name. The specified link must have a type of S3.

    OutputClause

    The OutputClause defines the destination path for the output objects. You supply one or more OutputPathExpr expressions to identify the path prefixes. You can include aliases. If you supply more than one expression, Capella Columnar concatenates the values of all OutputPathExpr and supplies / characters as the path separators. As a result, you do not need to include slash characters between OutputPathExpr expressions.

    The target directory that you specify in the destination path must be empty. The operation fails if the target directory is not empty.
    OverClause

    You supply an optional OverClause to specify output partitioning with a PartitionClause and ordering with an OrderClause. This is similar to the OVER clause of a WINDOW statement.

    • If you specify a PartitionClause, Capella Columnar evaluates the Output_Path_Expr once per logical data partition and refers to aliases if defined by an AS sub-clause.

    • If you do not specify a PartitionClause, Capella Columnar evaluates the OutputPathExpr once for the whole COPY TO output dataset. That is, all of the files end up in the same directory.

    You use the OrderClause to define output object order, either within each partition or for the whole dataset.

    AS (CSV Type Expression)

    UNKNOWN refers to the field being null or missing.

      AS (
    		custid STRING NOT UNKNOWN,
    		orderno BIGINT,
    		order_date DATE,
    		gender STRING
    	)
    WITH

    The WITH clause is optional. You use it to specify the following additional parameters.

    Name Description Schema

    max-objects-per-file (Optional)

    Maximum number of objects per file.

    Default: 10000.

    Minimum allowed value: 1000.

    String

    compression (Optional)

    Compression mechanism.

    Default: none.

    String enum (gz, gzip)

    header(Optional)

    Specifies if the header information must be written or not.

    Default: false

    Boolean

    escape

    Specifies the character used to escape the QUOTE and carriage return value.

    Default: same as the QUOTE value (the quote character is displayed twice if it appears in the data)

    String

    quote

    A character used to enclose strings.

    Default: "

    Value: NONE, ', "

    String

    null (Optional)

    Emits text in CSV output for NULL or MISSING values.

    Default: unquoted empty string

    String

    record-delimiter

    Character that separates each row/line (record).

    String

    delimiter

    Specifies the character that separates columns within each row (line) of the file.

    Default: comma in CSV format

    String

    empty-field-as-null

    Empty data values to be written by <null> string.

    Default: false

    Boolean

    force-quote

    A flag to enable or disable the quotes around the string non-null values.

    Default: Special characters within the string value are always enclosed within quotes.

    Boolean

    Supported Data Types

    Following are the supported data types that can be used in CSV schema:

    • Boolean

    • String

    • Bigint

    • Double

    • Null

    • Temporal Data (Date / Time / DateTime)

    Unsupported Data Types

    Following are the unsupported data types that cannot be used in CSV schema:

    • Object

    • Array

    • Multiset

    Schema Mismatches and Warnings

    In case of a schema mismatch between the expected schema versus the actual record’s schema, a warning is issued and returned as part of the final result. The system skips writing records with mismatches and continues to the next record.

    Examples of schema mismatches:

    • The provided schema has 4 fields, but the actual record has 5 fields.

    • The type for the same field is different in the provided schema vs the field type in the record schema.

    • The actual record has fields that are not present in the provided schema.