SynxML SQL
SynxML SQL is a powerful SQL extension that enables machine learning and natural language processing operations directly within your database using SQL commands.
Features
Natural language processing: Text summarization, translation, classification, question answering, and text generation.
Chat operations: Interactive chat conversations with LLM services.
Model management: Register and manage ML service endpoints.
XGBoost integration: Train and deploy XGBoost models directly in PostgreSQL.
Security: Row-level security and proper privilege management.
Help system: Built-in help functions for all operations.
Prerequisites
Before using SynxML SQL in SynxDB Cloud, ensure the following resources are set up:
Create organization, account, and user/role via the DBaaS Admin Console.
Create a warehouse.
Create an ML cluster.
For details, see Use DBaaS Admin Console to create resources.
Prepare database and extension
Create a database (for example,
testdb) and connect to it:CREATE DATABASE IF NOT EXISTS testdb; \c testdb;
Create and configure the extension (for example, using the
synxml_authrole):CREATE EXTENSION IF NOT EXISTS synxml CASCADE; SELECT synxml.configure_auth_role('synxml_auth');
Manage LLM services
Register LLM services
You can register locally deployed LLM services or third-party services.
-- Register a chat service
SELECT synxml.register_customer_service(
'LLM',
'http://10.14.10.1:8800/vllm/v1',
'zhipu/glm4-9b-chat',
'123'
);
-- Register a multimodal embedding service
SELECT synxml.register_customer_service(
'MM_EMBED',
'http://10.14.10.1:8000/vl_embedding/v1/embeddings',
'bge-vl-base'
);
-- Register an Embedding service
SELECT synxml.register_customer_service(
'EMBED',
'http://10.14.10.1:8000/embedding/v1/embeddings',
'jina-embeddings-v2-base-zh'
);
-- Register a rerank service
SELECT synxml.register_customer_service(
'RERANK',
'https://api.siliconflow.cn/v1/rerank',
'BAAI/bge-reranker-v2-m3',
'sk-xxx' -- Replace with your actual API key
);
Use registered LLM services
Text transformation:
SELECT synxml.transform(
task => 'summarization',
params => '{"text":"Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to the natural intelligence displayed by animals including humans. AI research has been defined as the field of study of intelligent agents, which refers to any system that perceives its environment and takes actions that maximize its chance of achieving its goals.", "max_length": 50}'::JSONB
);
Text summarization:
SELECT
synxml.summarize(
text => 'Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to the natural intelligence displayed by animals including humans. AI research has been defined as the field of study of intelligent agents, which refers to any system that perceives its environment and takes actions that maximize its chance of achieving its goals.',
max_length => 20
);
Translation:
SELECT synxml.translate(
text => 'Hello',
source_language => 'en',
target_language => 'fr'
);
Load data
Example of creating a table and loading data:
CREATE TABLE iris (
id SERIAL,
sepal_length DECIMAL,
sepal_width DECIMAL,
petal_length DECIMAL,
petal_width DECIMAL,
species VARCHAR(50)
);
\copy iris from 'iris.csv' delimiter ',' csv header;
Manage user and permission
SynxML applies row-level security (RLS) for model, service, and job management.
Create a test user and grant permissions. For example:
CREATE ROLE testuser NOLOGIN;
GRANT testuser TO synxml_auth;
GRANT USAGE ON MLCLUSTER ray4synxml TO testuser;
GRANT SELECT ON iris TO testuser;
SET ROLE testuser;
Train model
Example of training an XGBoost model:
SELECT
synxml.xgboost_train(
train_tblname => 'iris'::TEXT,
model_name => 'xgb_iris_classifier'::TEXT,
train_config => '{"objective":"multi:softmax", "num_class":3, "y":"target", "num_boost_round":10, "num_workers":4}'::JSONB
);
Note
This UDF call is asynchronous. It will return a job ID. You can check the job status later.
Manage jobs
List jobs
List all submitted jobs:
SELECT * FROM synxml.jobs;
Check job status
-- Replace 'JOB_ID' with the actual job ID you received from the training call
SELECT synxml.job_status('JOB_ID');
Check job logs
-- Replace 'JOB_ID' with the actual job ID you received from the training call
SELECT synxml.job_logs('JOB_ID', 100);
Manage models
Check the models table once the job is completed:
SELECT * FROM synxml.models;
SELECT * FROM synxml.dirtable_models;
Make predictions
Make predictions using the trained model:
SELECT
synxml.xgboost_predict(
test_tblname => 'iris'::TEXT,
model_name => 'xgb_iris_classifier'::TEXT,
predict_config => '{"keep_columns":["sepal length (cm)", "sepal width (cm)", "petal length (cm)", "petal width (cm)"], "drop_columns":["target"], "max_num_workers":4}'::JSONB,
output_tblname => 'prediction_results'::TEXT
);
Check the prediction results once the job is completed:
SELECT * FROM prediction_results LIMIT 10;
Available UDFs
The following lists all available user defined functions (UDFs) in the synxml schema:
synxml._get_db_uriResult data type:
textType:
func
synxml._has_read_privilegeResult data type:
booleanArgument data types:
table_name textType:
func
synxml.answerResult data type:
textArgument data types:
question text, context text DEFAULT ''::text, temperature double precision DEFAULT 0.3, max_new_tokens integer DEFAULT 1024, llm_service_name text DEFAULT 'LLM'::textType:
func
synxml.chatResult data type:
textArgument data types:
input text, history jsonb DEFAULT '[]'::jsonb, temperature double precision DEFAULT 0.3, max_new_tokens integer DEFAULT 256, llm_service_name text DEFAULT 'LLM'::textType:
func
synxml.chunkResult data type:
textArgument data types:
text text, chunk_size integer, chunk_overlap integerType:
func
synxml.classifyResult data type:
textArgument data types:
text text, categories text[], temperature double precision DEFAULT 0.3, llm_service_name text DEFAULT 'LLM'::textType:
func
synxml.configure_auth_roleResult data type:
voidArgument data types:
auth_role textType:
func
synxml.cross_validateResult data type:
textArgument data types:
model_type text, train_config jsonb, train_tblname text, output_tblname text DEFAULT NULL::text, description text DEFAULT ''::text, jobman_service_name text DEFAULT 'JOBMAN'::textType:
func
synxml.embedResult data type:
double precision[]Argument data types:
input text, embedding_service_name text DEFAULT 'EMBED'::textType:
func
synxml.embed_sizeResult data type:
integerArgument data types:
embedding_service_name text DEFAULT 'EMBED'::textType:
func
synxml.generateResult data type:
textArgument data types:
text text, style text DEFAULT 'default'::text, temperature double precision DEFAULT 0.3, max_new_tokens integer DEFAULT 256, llm_service_name text DEFAULT 'LLM'::textType:
func
synxml.get_algorithmsResult data type:
text[]Type:
func
synxml.get_model_by_nameResult data type:
SETOF synxml.model_infoArgument data types:
name textType:
func
synxml.get_modelsResult data type:
SETOF synxml.model_infoType:
func
synxml.get_models_by_descriptionResult data type:
SETOF synxml.model_infoArgument data types:
description textType:
func
synxml.get_service_settingResult data type:
SETOF synxml.service_settingsType:
func
synxml.get_service_settingResult data type:
SETOF synxml.service_settingsArgument data types:
service_name textType:
func
synxml.helpResult data type:
textType:
func
synxml.helpResult data type:
textArgument data types:
function_name textType:
func
synxml.hybrid_searchResult data type:
SETOF synxml.search_resultArgument data types:
query text, tablename text, text_column text DEFAULT 'chunk'::text, embedding_column text DEFAULT 'embedding'::text, top_k integer DEFAULT 5, embedding_service_name text DEFAULT 'EMBED'::text, rerank_service_name text DEFAULT 'RERANK'::textType:
func
synxml.job_infoResult data type:
TABLE(...)Argument data types:
jobid text, jobman_service_name text DEFAULT 'JOBMAN'::textType:
func
synxml.job_listResult data type:
SETOF synxml.job_statusArgument data types:
jobman_service_name text DEFAULT 'JOBMAN'::textType:
func
synxml.job_logsResult data type:
SETOF textArgument data types:
jobid text, limits integer DEFAULT 100, jobman_service_name text DEFAULT 'JOBMAN'::textType:
func
synxml.job_statusResult data type:
textArgument data types:
jobid text, jobman_service_name text DEFAULT 'JOBMAN'::textType:
func
synxml.job_stopResult data type:
textArgument data types:
jobid text, jobman_service_name text DEFAULT 'JOBMAN'::textType:
func
synxml.mlp_predictResult data type:
textArgument data types:
test_tblname text, model_name text, predict_config jsonb, output_tblname text, description text DEFAULT ''::text, jobman_service_name text DEFAULT 'JOBMAN'::textType:
func
synxml.mlp_trainResult data type:
textArgument data types:
train_config jsonb, train_tblname text, model_name text DEFAULT NULL::text, valid_tblname text DEFAULT NULL::text, description text DEFAULT ''::text, jobman_service_name text DEFAULT 'JOBMAN'::textType:
func
synxml.multimodal_embedResult data type:
double precision[]Argument data types:
input_text text DEFAULT ''::text, input_image text DEFAULT ''::text, multimodal_embedding_service_name text DEFAULT 'MM_EMBED'::textType:
func
synxml.multimodal_embed_sizeResult data type:
integerArgument data types:
multimodal_embedding_service_name text DEFAULT 'MM_EMBED'::textType:
func
synxml.param_tuneResult data type:
textArgument data types:
model_type text, train_config jsonb, train_tblname text, output_tblname text DEFAULT NULL::text, description text DEFAULT ''::text, jobman_service_name text DEFAULT 'JOBMAN'::textType:
func
synxml.predictResult data type:
textArgument data types:
model_type text, test_tblname text, model_name text, predict_config jsonb, output_tblname text, description text DEFAULT ''::text, jobman_service_name text DEFAULT 'JOBMAN'::textType:
func
synxml.register_customer_serviceResult data type:
textArgument data types:
name text, endpoint text, model_name text DEFAULT ''::text, api_key text DEFAULT ''::text, service_provider text DEFAULT 'openai'::text, db_role text DEFAULT CURRENT_USERType:
func
synxml.rerankResult data type:
SETOF synxml.search_resultArgument data types:
query text, docs text[], top_n integer DEFAULT 3, rerank_service_name text DEFAULT 'RERANK'::textType:
func
synxml.semantic_searchResult data type:
SETOF synxml.search_resultArgument data types:
query text, tablename text, text_column text DEFAULT 'chunk'::text, embedding_column text DEFAULT 'embedding'::text, top_k integer DEFAULT 5, embedding_service_name text DEFAULT 'EMBEDD'::textType:
func
synxml.summarizeResult data type:
textArgument data types:
text text, max_length integer DEFAULT 100, temperature double precision DEFAULT 0.3, llm_service_name text DEFAULT 'LLM'::textType:
func
synxml.trainResult data type:
textArgument data types:
model_type text, train_config jsonb, train_tblname text, model_name text DEFAULT NULL::text, valid_tblname text DEFAULT NULL::text, description text DEFAULT ''::text, jobman_service_name text DEFAULT 'JOBMAN'::textType:
func
synxml.transformResult data type:
textArgument data types:
task text, params jsonb, max_new_tokens integer DEFAULT 512, temperature double precision DEFAULT 0.3, llm_service_name text DEFAULT 'LLM'::textType:
func
synxml.translateResult data type:
textArgument data types:
text text, source_language text, target_language text, temperature double precision DEFAULT 0.3, llm_service_name text DEFAULT 'LLM'::textType:
func
synxml.unregister_customer_serviceResult data type:
voidArgument data types:
service_name textType:
func
synxml.xgboost_predictResult data type:
textArgument data types:
test_tblname text, model_name text, predict_config jsonb, output_tblname text, description text DEFAULT ''::text, jobman_service_name text DEFAULT 'JOBMAN'::textType:
func
synxml.xgboost_trainResult data type:
textArgument data types:
train_config jsonb, train_tblname text, model_name text DEFAULT NULL::text, valid_tblname text DEFAULT NULL::text, description text DEFAULT ''::text, jobman_service_name text DEFAULT 'JOBMAN'::textType:
func
Supported algorithms
You can list all supported algorithms using the following query:
SELECT * FROM unnest(synxml.get_algorithms()) AS algorithm;
Supported algorithms include:
Regression:
ARDRegressionAdaBoostRegressorBaggingRegressorBayesianRidgeDLinearRegressorDecisionTreeRegressorElasticNetExtraTreesRegressorGaussianProcessRegressorGradientBoostingRegressorHuberRegressorKNeighborsRegressorKernelRidgeLarsLassoLassoLarsLinearRegressionMLPMeanShiftPassiveAggressiveRegressorQuantileRegressorRANSACRegressorRNNBlockRegressorRandomForestRegressorRidgeSGDRegressorSVRTheilSenRegressorXGBoost
Classification:
AdaBoostClassifierBaggingClassifierCatBoostDecisionTreeClassifierExtraTreesClassifierGradientBoostingClassifierKNeighborsClassifierLightGBMLogisticRegressionMLPClassifierRandomForestClassifierSVC
Clustering:
AffinityPropagationAgglomerativeClusteringBirchDBSCANKMeansMiniBatchKMeansOPTICSSpectralClustering
Others:
BertCausalLMOrthogonalMatchingPursuitResNetTextEmbedder
Get help
To get help for a specific UDF:
-- Replace 'UDF' with the actual UDF name you want to get help for
SELECT synxml.help('UDF');