Configure Hive Metadata Auto Sync
Warning
Hive Metadata Auto Sync is an experimental feature in the current version. Do not use it in production environments.
SynxDB Cloud can synchronize Hive metadata in real time through Kafka. The feature listens for Hive Metastore change events and updates the matching external table definitions in SynxDB Cloud without operator action. It complements the manual synchronization functions.
Hive Metadata Auto Sync runs as an independent component, managed separately from the database cluster. You configure it through the Hive Meta Sync tab on the Database Config page of the DBaaS Admin Console, but the feature also needs preparation on the Hive cluster and inside the target SynxDB Cloud database. This document covers the full setup end to end.
How it works
The synchronization pipeline has four components:
Hive Metastore with the SynxDB Cloud listener plugin installed. The plugin intercepts metadata change events such as
CREATE TABLE,ALTER TABLE, andDROP TABLE, and publishes them to a Kafka topic.Kafka broker that transports the metadata change events.
Meta Sync component running in the SynxDB Cloud cluster. It consumes events from Kafka and translates each event into a matching
CREATE FOREIGN TABLEorDROP FOREIGN TABLEstatement.Target database in SynxDB Cloud (typically named
hivedb), where the foreign tables are created against a pre-provisioned foreign server named__hive_auto_sync_server.
The Kafka topic name uses the format <hdw.catalog.name>_fdb-<catalog>_hms, where:
hdw.catalog.nameis the value ofcn.cbdb.apiary.kafka.hdw.catalog.nameconfigured inhive-site.xmlon the Hive side. The same value must appear ashdw.catalog.namein the Meta Sync configuration.<catalog>is each entry incn.cbdb.apiary.kafka.autosync.catalogs. The same value must appear ashive.catalog.namein the Meta Sync configuration so that the consumer subscribes to the correct topic.
Prerequisites
Before configuring Hive Metadata Auto Sync, confirm the following:
The listener plugin is installed on the Hive Metastore. Contact SynxDB Cloud technical support to obtain the
kafka-metastore-listener-<version>-all.jarfile, then install it as described in Step 1.A Kafka cluster is reachable from both the Hive Metastore and the SynxDB Cloud cluster. Both
PLAINTEXTandSASL_PLAINTEXTsecurity protocols are supported. WhenSASL_PLAINTEXTis used, the supported SASL mechanism isSCRAM-SHA-256.The HDFS connection and Hive Connector are already configured in the DBaaS Admin Console. Complete Configure an HDFS connection and Configure a Hive connection first.
A Hive Meta Sync profile is available on the Profile page. If none exists, create one before proceeding.
Step 1. Install the listener on the Hive Metastore
You perform this step on the Hive cluster, not in the SynxDB Cloud console.
Place the listener jar in the Hive Metastore classpath (typically
$HIVE_HOME/lib/). If clients connect through HiveServer2, install the jar there as well.Add the following properties to
hive-site.xmlon every Hive Metastore node:<property> <name>hive.metastore.event.listeners</name> <value>cn.cbdb.apiary.kafka.listener.HiveMetaStoreEventListener</value> </property> <property> <name>cn.cbdb.apiary.kafka.bootstrap.servers</name> <value>kafka-host:9092</value> </property> <property> <name>cn.cbdb.apiary.kafka.hdw.catalog.name</name> <value>hdw_catalog</value> </property> <property> <name>cn.cbdb.apiary.kafka.hive.cluster.name</name> <value>cluster-1</value> </property> <property> <name>cn.cbdb.apiary.kafka.autosync.catalogs</name> <value>hive</value> </property> <property> <name>cn.cbdb.apiary.kafka.autosync.databases.hive</name> <value>*</value> </property> <property> <name>cn.cbdb.apiary.kafka.sync.catalog.wise</name> <value>true</value> </property>
For Kafka brokers that require SASL authentication, also add:
<property> <name>cn.cbdb.apiary.kafka.security.protocol</name> <value>SASL_PLAINTEXT</value> </property> <property> <name>cn.cbdb.apiary.kafka.sasl.mechanism</name> <value>SCRAM-SHA-256</value> </property> <property> <name>cn.cbdb.apiary.kafka.sasl.jaas.config</name> <value>org.apache.kafka.common.security.scram.ScramLoginModule required username="kafka-user" password="kafka-password";</value> </property>
Property reference:
Property
Description
hive.metastore.event.listenersFully qualified class name of the SynxDB Cloud listener. Must be
cn.cbdb.apiary.kafka.listener.HiveMetaStoreEventListener.cn.cbdb.apiary.kafka.bootstrap.serversKafka broker address or addresses, comma-separated.
cn.cbdb.apiary.kafka.hdw.catalog.nameCatalog identifier used as the topic prefix. Must match
hdw.catalog.namein the Meta Sync configuration.cn.cbdb.apiary.kafka.hive.cluster.nameLogical name of this Hive cluster.
cn.cbdb.apiary.kafka.autosync.catalogsComma-separated list of Hive catalogs to publish. Each value in this list must also appear as
hive.catalog.namein the Meta Sync configuration that consumes from it.cn.cbdb.apiary.kafka.autosync.databases.<catalog>Comma-separated list of Hive databases to publish for the given catalog. Use
*to publish all databases.cn.cbdb.apiary.kafka.sync.catalog.wiseWhether a separate topic is used per catalog. Must match
sync.catalog.wisein the Meta Sync configuration.cn.cbdb.apiary.kafka.security.protocolPLAINTEXTorSASL_PLAINTEXT.cn.cbdb.apiary.kafka.sasl.mechanismSCRAM-SHA-256when SASL is used.cn.cbdb.apiary.kafka.sasl.jaas.configJAAS login string for the Kafka client used by the listener.
Restart the Hive Metastore process to load the listener. To confirm the listener is publishing events, run
CREATE TABLEin Hive and check that an event appears in the matching Kafka topic:kafka-console-consumer.sh \ --bootstrap-server <broker-address> \ --topic <hdw.catalog.name>_fdb-<catalog>_hms \ --from-beginning --max-messages 1
Step 2. Prepare the target database
The Meta Sync component writes foreign tables into a designated database in SynxDB Cloud. You need to create this database and provision a foreign server inside it before any sync events arrive.
Create the target database. Run the following statement from the DBaaS User Console worksheet or through
psql. The database name must match the value you set ashdw.databasein the Meta Sync configuration.CREATE DATABASE hivedb;
Create the foreign server. Switch to the target database and create the foreign server that Meta Sync uses:
\c hivedb SELECT public.create_foreign_server( '__hive_auto_sync_server', -- exact name required by Meta Sync; do not change 'gpadmin', -- existing role for the initial user mapping 'datalake_fdw', -- foreign data wrapper 'hdfs-cluster-1' -- must match hdfs.gp.name in the Meta Sync configuration );
Note
The Meta Sync component requires the server name
__hive_auto_sync_server. Use this exact name.Grant privileges to the Meta Sync database user. When the Meta Sync pod connects to SynxDB Cloud, it logs in as a database user that belongs to the account you will select in Step 4. Look up that user in the DBaaS Admin Console under Organizations > your organization > your account > Users; the username is shown in the Name column (for example,
123123).In the target database, grant privileges to that user and create the user mapping. The example uses
<sync_user>as a placeholder; substitute the real username:GRANT USAGE ON FOREIGN SERVER __hive_auto_sync_server TO "<sync_user>"; GRANT ALL ON SCHEMA public TO "<sync_user>"; CREATE USER MAPPING FOR "<sync_user>" SERVER __hive_auto_sync_server;
Warning
Double-quote the username in SQL to avoid identifier parsing errors.
Step 3. Access the Hive Meta Sync tab
Log in to the DBaaS Admin Console.
In the left navigation pane, click Database Config.
Click the Hive Meta Sync tab at the top of the page. This page lists all current Hive Metadata Auto Sync configurations.
Step 4. Create a configuration (basic information)
Click + Create in the upper-right corner of the list.
In the Basic Information step, provide the following details:
Organization: Select the organization for this configuration.
Account: Select the account for this configuration.
Service Configuration Template: Select the appropriate template (for example, Hive Meta Sync Template).
Click Next.
Step 5. Configure sync parameters
Select a Profile for the Hive Meta Sync component. The profile determines the resource allocation (CPU, memory, and storage) for the sync service. This field is required.
In the Hive Meta Sync Content input area, choose Manual Input and provide the YAML body. Every line must start at column 0; see the YAML formatting caveat below.
Template for a PLAINTEXT Kafka broker:
bootstrap.servers: - kafka-host:9092 hdw.catalog.name: hdw_catalog security.protocol: PLAINTEXT prometheus.port: 15888 sync.catalog.wise: true hive.clusters: - hive.gp.name: hive-cluster-1 hive.cluster.name: hive hive.catalog.list: - hive.catalog.name: hive hdw.database: hivedb hive.partition.prov_id: 001 hdw.auth.user: - <sync_user> hdfs.gp.name: hdfs-cluster-1
Template for a SASL_PLAINTEXT Kafka broker:
bootstrap.servers: - kafka-host:9092 hdw.catalog.name: hdw_catalog security.protocol: SASL_PLAINTEXT sasl.mechanism: SCRAM-SHA-256 sasl.jaas.config: 'org.apache.kafka.common.security.scram.ScramLoginModule required username="kafka-user" password="kafka-password";' prometheus.port: 15888 sync.catalog.wise: true hive.clusters: - hive.gp.name: hive-cluster-1 hive.cluster.name: hive hive.catalog.list: - hive.catalog.name: hive hdw.database: hivedb hive.partition.prov_id: 001 hdw.auth.user: - <sync_user> hdfs.gp.name: hdfs-cluster-1
Key parameter descriptions:
bootstrap.servers: Kafka broker address or addresses. Must be reachable from inside the SynxDB Cloud cluster network.hdw.catalog.name: Must matchcn.cbdb.apiary.kafka.hdw.catalog.nameset on the Hive side. Otherwise the consumer subscribes to a topic that no producer writes to.security.protocol,sasl.mechanism,sasl.jaas.config: Authentication settings. Must match the Kafka broker’s configuration.sync.catalog.wise: Must matchcn.cbdb.apiary.kafka.sync.catalog.wiseon the Hive side.hive.gp.name: Must match the connection name created on the Hive Connector tab (for example,hive-cluster-1).hive.cluster.name: Logical cluster name for display purposes.hive.catalog.name(inner): Must appear incn.cbdb.apiary.kafka.autosync.catalogson the Hive side. The topic name is derived as<hdw.catalog.name>_fdb-<hive.catalog.name>_hms.hdw.database: Target database name in SynxDB Cloud. Must already exist (see Step 2).hdw.auth.user: List of SynxDB Cloud users that automatically receiveSELECTpermission on synchronized schemas.hdfs.gp.name: Must match both the connection name created on the HDFS tab and thehdfsClusterNameargument passed tocreate_foreign_serverin Step 2.
Optionally, select an Environment Spec to specify the Kubernetes runtime environment for the sync component.
Click Next.
YAML formatting caveat
Follow these two rules when filling in the Hive Meta Sync Content field:
No leading whitespace on any line. Top-level keys, including
bootstrap.servers:andhive.clusters:, must start at column 0.Single-quote the
sasl.jaas.configvalue. It contains double quotes, semicolons, and equals signs that can be misparsed without single quotes.
Breaking either rule causes the Meta Sync pod to fail to start with a YAML parse error. If this happens, edit the configuration to satisfy both rules and resubmit.
Step 6. Preview and submit
In the Configuration Preview step, review the following sections:
Basic Information: Confirms the account and service configuration template.
Hive Meta Sync Content: Shows the selected profile, environment spec, and a full preview of the sync configuration content.
If everything is correct, click Submit to create the synchronization configuration. SynxDB Cloud provisions the Meta Sync pod automatically, and the pod begins consuming from the configured Kafka topic.
Verify the synchronization
From a Hive client, create a test table in a database listed in
cn.cbdb.apiary.kafka.autosync.databases.<catalog>:-- in beeline CREATE TABLE default.sync_test (id INT, name STRING) STORED AS PARQUET;
After a few seconds, list foreign tables in the target SynxDB Cloud database. The new table appears in the
publicschema:-- in psql, connected to hivedb \det+
Expected output:
Schema | Table | Server | FDW options --------+-----------+-------------------------+------------- public | sync_test | __hive_auto_sync_server | ...
Verify drop synchronization as well:
-- in beeline DROP TABLE default.sync_test;
After a few seconds,
\detinhivedbno longer lists the table.
If the expected foreign table does not appear, check the Meta Sync pod log for errors.
Manage sync tasks
After the task is created, you can manage the synchronization task from the list view on the Hive Meta Sync tab. The list displays the following columns: ID, Account Name, Status, Created, Active/Deactivate, and Action.
Status
Each Hive Meta Sync task has one of the following statuses:
Pending: The task has been created but is not yet running.
Running: The task is actively synchronizing Hive metadata.
Suspended: The task has been deactivated and is not processing metadata changes.
Available operations
Active/Deactivate toggle: Activates or deactivates the sync task. When you deactivate the task, SynxDB Cloud stops synchronizing metadata changes and the status changes to Suspended.
Edit: Opens the edit form, where you can modify the profile, sync configuration content, and environment spec. Click Submit to save your changes.
Delete: Permanently removes the sync task.