Connecting to Data Sources
- tutorial
You can import datasets from multiple sources to work with sample data in Enterprise Analytics.
This section provides a quick guide to getting started with data streaming from diverse sources in Enterprise Analytics.
The tutorial covers:
-
how to import sample datasets
-
how to set up standalone collections
-
how to create remote collections
It details procedures for working with sample datasets, including a Commerce
example dataset and the travel-sample
and beer-sample
datasets.
Following these steps provides practice in setting up database objects within Enterprise Analytics for different data sources.
For comprehensive information about database objects and advanced data management, refer to the Access and Organize Data in Enterprise Analytics section.
-
Directly import the travel-sample data into a new database object within your Enterprise Analytics cluster.
-
Set up standalone collections and populate them by inserting the Commerce dataset.
-
Set up a remote link in Enterprise Analytics to Couchbase Capella. Define remote collections that shadow data from a Capella collection, the beer-sample dataset.
Import the travel-sample Collections
The travel-sample
dataset is available to import from directly inside the workbench and consists of 5 collections of JSON documents: airline
, airport
, landmark
, hotel
, and route
.
To import the travel-sample
into your Enterprise Analytics cluster:
-
In the UI, select the Workbench tab, and click Samples tab.
-
Select the travel-sample checkbox.
-
Click Load Sample Data.
Enterprise Analytics creates a new database, travel-sample, with the inventory scope and all 5 collections.
Install the Commerce Dataset in Standalone Collections
The Commerce dataset consists of two collections:
-
customers
, with the primary keycustid
which has string values -
orders
, with the primary keyorderno
which has integer values
To work with this dataset in Enterprise Analytics you create a standalone collection for each one. Then, you use INSERT INTO statements to populate them with data.
Create a Standalone Collection
To create a standalone collection:
-
In the Enterprise Analytics UI, select the Workbench tab.
-
Under Databases, click + database. The Create Database dialog box opens.
-
In the Database Name field, enter
sampleAnalytics
. -
In the Scope Name, enter
Commerce
. -
Click Create.
-
In the Explorer, click + standalone collection. The Add Collection dialog box opens.
-
In the Collection Name field, enter
customers
. -
In database.scope, choose newly created database
sampleAnalytics
and scopeCommerce
. -
For the Collection Primary Key, in the Field Name, enter
custid
. -
In the Field Type list, select string.
-
Click Save.
Populate a Standalone Collection
-
Use the query editor’s Query Context lists to select the
sampleAnalytics
database andCommerce
scope. -
In the query editor, begin an INSERT INTO statement as follows:
INSERT INTO customers (
-
Open the customers data, select the contents of the page and copy it.
-
To complete the statement, return to the query editor and paste the JSON document in between the parentheses. You’ll need to add the closing parenthesis ) if you used copy and paste to supply the
INSERT INTO
statement. -
Run the query to populate the
customers
collection. -
To verify that the collection contains data now, run the following query:
SELECT * FROM customers LIMIT 1;
Create another standalone collection for orders
, which uses the Field Name orderno
and a Field Type of int as a primary key.
Use the orders data to select and copy the data for this collection, then populate it using another INSERT INTO statement.
Create Remote Collections for beer-sample.
You can import the Couchbase beer-sample
dataset into a Capella operational cluster or self-managed Couchbase Server cluster.
This dataset consists of a single collection, which contains data on beers and breweries.
Set up remote collections to hold shadow copies of the beer-sample
data in Enterprise Analytics.
Use WHERE clauses to create multiple collections on Enterprise Analytics, instead of creating only a single collection to match what’s in your remote data source in your Capella operational cluster or Couchbase Server.
Any Enterprise Analytics collections that use a WHERE clause apply that clause on an ongoing basis to continuously filter the incoming data event stream. Only documents that meet the WHERE clause criteria are upserted into your Enterprise Analytics collection.
Prepare to Ingest Data from the Remote System
To prepare your Capella operational cluster for creating a remote collection in Enterprise Analytics, first Import the beer-sample sample data in the Capella cluster. After you have imported the sample data, you can set up your Capella operational cluster as a remote data source in Enterprise Analytics.
To establish a remote link from Enterprise Analytics to a Capella operational, some details must be retrieved from the Capella operational cluster. For more details, refer Requirements for Couchbase Capella Links
Create a Data Source for Remote Data
-
In the Enterprise Analytics UI, select the Workbench tab.
-
Under Databases, click + database. The Create Database dialog box opens.
-
In the Database Name field, enter
remoteCapella
. -
In the Scope Name, enter
remoteBeer
. -
Click Create.
-
In the Explorer, Select + new link. The Add Link dialog opens.
-
In the Link Name field, enter
capellaLink
. -
In the Link Type field, select Couchbase.
-
In the Remote Connection String field, select the hostname or IP address of a node in the remote cluster you want to link.
-
In the Encryption Type field, select Full. Enter Remote Username, Remote Password and Remote Cluster Certificate.
-
Select the Prevents Redirects checkbox to prevent HTTP connections for the remote link.
-
Click Save. The remote link to Capella
capellaLink
is created. -
In the Explorer, Select + collection next to the newly created
capellaLink
. The Add Remote Collection dialog opens. -
In the Collection Name field,enter
brewBelgium
. -
In the Database list, select database
remoteCapella
and in the Scope selectremoteBeer
to specify the destination for your remote collection. -
In the Source bucket.scope.collection field, select the beer-sample bucket, and _default scope and collection.
-
In the provided field, enter a WHERE clause for the documents in the collection to filter required data for ingestion:
country = "Belgium"
-
Click Save. Your collection appears under
remoteCapella.remoteBeer
in the Explorer. -
Connect the link
capellaLink
by clicking on the connect link icon or running:CONNECT LINK capellaLink;
-
Verify that your
brewBelgium
collection now contains a shadow copy of the data sourced from Capella by running the following query, with your query context set toremoteCapella
andremoteBeer
:SELECT * FROM brewBelgium LIMIT 1;
Next Steps
You can continue to expand your data landscape by creating more collections using the same remote link to your Capella operational cluster. For those preferring programmatic setup, DDL statements are also available for creating remote Couchbase collections. Refer to the examples in Create a Remote Couchbase Collection.
With your data established, the next step is analysis. Consult the SQL++ Reference to learn how to query and explore your data.