Access and Organize Data in Enterprise Analytics

  • concept
    +
    This topic introduces the database objects that you use to view and organize data in Enterprise Analytics.

    Sources of Data

    Data sources include:

    • Remote Couchbase operational databases, which are typically subject to rapid, ongoing modification. Enterprise Analytics connects to Couchbase Server and Couchbase Capella operational clusters.

    • Kafka distributed streaming platform to stream data from other data sources such as databases.

    • External data stores, logs, and archives to support analyses of historical data. Enterprise Analytics can query data residing in Amazon S3 and S3-Compatible storage. This data remains at the source and is not copied into Enterprise Analytics collections. Supported formats are JSON, CSV, TSV, Avro and Parquet.

    • Data Lakes such as delta lake, residing in S3 buckets.

    Database Objects

    To support your analytical queries and manage access to data sources, you add the following objects:

    Diagram

    Indexes

    Indexes are used by certain services, such as Query, Analytics, and Search, as targets for search-routines. Each index makes a predefined subset of data available for the search. Indexes, when well-designed, provide significant enhancements to the performance of search-operations.

    Views

    Views are functions written in JavaScript that can serve several purposes in your application. You can use them to:

    • find all the documents in your database

    • create a copy of data in a document and present it in a specific order

    • create an index to efficiently find documents by a particular value or by a particular structure in the document

    • represent relationships between documents

    • perform calculations on data contained in documents.

    User Defined Functions (UDFs)

    User-defined functions have the same syntax as built-in functions, with brackets () to contain any arguments. When you have created a user-defined function, you can call it in any expression, just like a built-in function.

    The name of the function is usually an unqualified identifier, such as func1 or func-1. In this case, the path to the function is determined by the current query context.

    The name of a user-defined function is case-sensitive, unlike that of a built-in function. You must call the user-defined function using the same case that you used when you created it.

    Scopes

    Scopes are intermediary containers that exist within a database to group related objects like collections, indexes, and functions.

    When you create a cluster, Enterprise Analytics automatically creates a scope named Default in the database named Default. You can add more scopes as needed using the UI or a SQL++ CREATE SCOPE statement.

    You must make scope names unique within a database, but you can use the same scope name across different databases.

    Collections

    Collections are containers that can contain metadata and data that you can query and manipulate. You can add a standalone collection using UI or a CREATE COLLECTION SQL++ statement.

    You must make collection names unique within a scope but you can use the same collection name across different scopes, either in the same database or across different databases.

    Enterprise Analytics has 3 types of collections:

    • Remote collections contain a shadow or mirror copy of data streamed from a remote data source. The remote data source can be a Kafka pipeline or a Couchbase database. A remote collection is associated with a link that provides authentication and connection information for the remote data source. When the link is connected to the remote source, Enterprise Analytics streams data from the remote source into the collection. This streaming means that the remote collection has a local replica of the data in the data source. When the link is disconnected, the collection retains the data as it was when the link disconnected. Queries on remote collections are efficient because of the local shadow copy of the streamed data.

      The remote collection also contains metadata about the data format of the remote source as well as optional data filters.

      You can use the Enterprise Analytics UI or the SQL++ CREATE COLLECTION statement to add a remote collection.

    • External collections let you query data stored in an S3 bucket. Like remote collections, they’re associated with a link. Unlike remote collections, Enterprise Analytics does not copy data from the external data source into the external collection. Instead, every query reads data from the external storage ___location. The external collection contains just the metadata necessary to read data from the S3 bucket. As a result, Enterprise Analytics cannot index external collections.

      You can use the Enterprise Analytics UI or a CREATE EXTERNAL COLLECTION SQL++ statement to add an external collection.

    • Standalone collections allow you to assemble and manipulate groups of documents on an as-needed basis. These are stored, manipulated, and managed locally. Standalone collections do not use links.

      You populate these collections with data by importing data files or by using SQL++ statements to INSERT, COPY INTO, and otherwise add and update documents in a purpose-built collection.

      You can use the Enterprise Analytics UI or a CREATE COLLECTION SQL++ statement to add a standalone collection.

    A link is a metadata store for the authorization and authentication credentials that Enterprise Analytics uses when connecting to a remote or external data source. You can associate multiple collections in different scopes across different databases, with a single link.

    Links are categorized into 2 types:

    • Remote links have connected and disconnected states. When connected, the link provides continuous, real-time updates to the data shadowed in its associated Enterprise Analytics remote collections.

      You incur charges when you connect a remote link.

    • External links contain the credentials Enterprise Analytics needs to view an external storage ___location. These links do not have connected or disconnected states. Instead, each time you query an associated external collection, Enterprise Analytics connects to the external data storage to read its data.

    You use the Enterprise Analytics UI to add links. See Stream Data from Remote Sources or Set Up an External Data Source.

    Other Objects

    At the same hierarchical level as collections—​within a database and scope—​you create views and tabular views, synonyms, and user-defined indexes and functions.

    • To create views and tabular views, you can use the Enterprise Analytics CREATE VIEW SQL++ statement.

    • You use SQL++ statements to create synonyms and user-defined functions.

    • You also create indexes on individual remote and standalone collections with SQL++ statements. See Indexes.