OpenTox API 1.2 Algorithm

From ToxBank API Wiki
Jump to: navigation, search

Contents

REST operations

Description Method URI Parameters Result Status codes
Get URIs of all available algorithms
GET /algorithm [subjectid]
[?sameas=URI-of-the-owl:sameAs-entry]

List of all algorithm URIs or RDF representation, or algorithms of specific types, if query parameter exists

Returns all algorithms, for which owl:sameAs is given by the query
200,404,503
Get the ontology representation of an algorithm GET /algorithm/{id} [subjectid]
Algorithm representation in one of the supported MIME types
200,404,503
Apply the algorithm POST /algorithm/{id} dataset_uri
prediction_feature,
parameter (specified by the algorithm provider),
dataset_service=datasetservice_uri,
result_dataset,
[subjectid]

model URI (Prefer to create algorithm services that return model URIs instead of datasets or features)
dataset URI
featureURI


[AsyncTask#creating-a-task-post Redirect to task URI for time consuming computations]


200,404,503

Notes

  • dataset_service=datasetservice_uri, pointing to a dataset service. Relevant, if the output of the algorithm is a dataset (e.g. with calculated descriptors). If dataset_service parameter is not specified, the model service uses a pre-configured dataset service.

Background

This is a generic interface for OpenTox algorithms. As algorithms can be used for a wide variety of purposes (e.g. model building, feature calculation, feature selection, similarity calculation, substructure matching), required and optional input parameters and algorithm results (e.g. model or dataset URIs, literal values) have to be specified in the algorithm representation together with a definition of the algorithm.

Algorithm representation

  • RDF representation defined in [../../../../data/documents/development/RDF files/OpenToxOntology OpenTox API ontology][../../../../data/documents/development/RDF files/Algorithm (examples)]
  • All algorithms are subclasses of[../../../data/documents/development/RDF files/Algorithm http://www.opentox.org/api/1.1#Algorithm]
  • Algorithm type in RDF representation is set by direct subclassing (rdf:type) of a class from the algorithm types ontology (ota:[../../../../data/documents/development/RDF files/AlgorithmTypes http://www.opentox.org/algorithms.owl] ) (e.g. <myalgorithm> rdf:type ota:Classification) .

Parameters

Input parameters:

  • dataset_uri is mandatory for all kind of prediction algorithms (machine learning or otherwise), as well for data processing algorithms.
  • prediction_feature is mandatory for prediction (classification/regression) and other supervised learning algorithms. The URI of the feature with the endpoint to predict is expected as value.
  • result_dataset - optional parameter to specify the dataset URI where the results should be stored. If not present, the result URI is generated by the dataset service
  • subjectid (optional) header parameter that contains the OpenSSO A&A token needed to access protected services.
  • parameter contains all the algorithm specific parameters

Algorithm types

Algorithm types are defined in the [../../../../data/documents/development/RDF files/AlgorithmTypes algorithm types ontology] at http://www.opentox.org/algorithms.owl

Data cleanup algorithms

Subclass of [../../../data/documents/development/RDF files/AlgorithmTypes http://www.opentox.org/algorithms.owl#DataCleanup]

  • input parameters: dataset_uri , parameter
  • output parameters: model_uri

Note: Data cleanup algorithms as a subcategory of filtering algorithms should create a filtering resource which is a model. This proves especially useful and necessary for filtering algorithms like PCA and scaling.

Descriptor calculation algorithms

Subclass of [../../../../data/documents/development/RDF files/AlgorithmTypes http://www.opentox.org/algorithmTypes.owl#DescriptorCalculation]

  • input parameters: dataset_uri , parameter
  • output parameters: dataset_uri

Sample curl calls for calculating descriptors:

Calculate all CDK feature:

curl -X POST -d 'dataset_uri=http://apps.ideaconsult.net:8080/ambit2/dataset/662?feature_uris[]=http://apps.ideaconsult.net:8080/ambit2/feature/26701'
-d 'dataset_service=http://apps.ideaconsult.net:8080/ambit2/dataset' http://opentox.informatik.tu-muenchen.de:8080/OpenTox-dev/algorithm/CDKPhysChem

Calculate WienerNumbers using CDK:

curl -X POST -d 'dataset_uri=http://apps.ideaconsult.net:8080/ambit2/dataset/662?feature_uris[]=http://apps.ideaconsult.net:8080/ambit2/feature/26701'
-d 'WienerNumbersDescriptor=true' -d 'dataset_service=http://apps.ideaconsult.net:8080/ambit2/dataset'
 http://opentox.informatik.tu-muenchen.de:8080/OpenTox-dev/algorithm/CDKPhysChem

Available descriptors can be found here: http://opentox.informatik.tu-muenchen.de/trac/TUMOpenTox/wiki/CDKPhysChem

Calculate structural descriptors with FreeTreeMiner with a minimum support of 80%:

curl -X POST -d 'dataset_uri=http://apps.ideaconsult.net:8080/ambit2/dataset/662?feature_uris[]=http://apps.ideaconsult.net:8080/ambit2/feature/26701'
-d 'dataset_service=http://apps.ideaconsult.net:8080/ambit2/dataset' -d 'minSup=0.8' http://opentox.informatik.tu-muenchen.de:8080/OpenTox-dev/algorithm/FTM

An Algorithm service shall provide separate URLs for algorithms with default (or without) parameters and for algorithms with specific parameter values. The second type of algorithm URLs are created on the fly, when an algorithm with specific parameters or a dataset is invoked. For example, when calculating descriptors, depending on the http://dataset.service.eu/dataset/6 and a set of parameters, the calculation service creates the following feature:

<ot:feature>
<ot:NumericFeature rdf:about="http://dataset.service.eu/feature/1">
<dc:creator>Name of creator</dc:creator>
<ot:hasSource rdf:resource="http://algorithm.service.org/algorithm/FTM1/C"/>
<owl:sameAs rdf:resource="http://www.opentox.org/api/1.2#TUM_FTM_C"/>
<ot:units>count</ot:units>

<dc:title>TUM_FTM_C</dc:title>
<rdf:type rdf:resource="http://www.opentox.org/api/1.2#Feature"/>
</ot:NumericFeature>
</ot:feature>

and internally the algorithm service creates a new algorithm entry:

http://algorithm.service.org/algorithm/FTM1/C

with a representation like below:

<ot:Algorithm rdf:about="http://algorithm.service.org/algorithm/FTM1">
    <ot:parameters>
      <ot:Parameter>
        <dc:title rdf:datatype="http://www.w3.org/2001/XMLSchema#string">minSup</dc:title>
        <dc:description rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> Specifies the min support for mining (fraction). Is to be between 0 and 1</dc:description>

        <ot:paramScope rdf:datatype="http://www.w3.org/2001/XMLSchema#string">optional</ot:paramScope>
        <ot:paramValue rdf:datatype="http://www.w3.org/2001/XMLSchema#int">0.8</ot:paramValue>
      </ot:Parameter>
    </ot:parameters>
    <owl:sameAs>http://www.blueobelisk.org/ontologies/chemoinformatics-algorithms/#subtree</owl:sameAs>

    <dc:contributor>contributor.name@domain.org</dc:contributor>
    <ot:parameters>
      <ot:Parameter>
        <dc:title rdf:datatype="http://www.w3.org/2001/XMLSchema#string">dataset_service</dc:title>
        <dc:description rdf:datatype="http://www.w3.org/2001/XMLSchema#string">URI to the dataset service to be used</dc:description>

        <ot:paramScope rdf:datatype="http://www.w3.org/2001/XMLSchema#string">optional</ot:paramScope>
        <ot:paramValue rdf:datatype="http://www.w3.org/2001/XMLSchema#string"></ot:paramValue>
      </ot:Parameter>
    </ot:parameters>
    <rdf:type>http://www.opentox.org/algorithms.owl#DescriptorCalculation</rdf:type>

    <ot:parameters>
      <ot:Parameter>
        <dc:title rdf:datatype="http://www.w3.org/2001/XMLSchema#string">dataset_uri</dc:title>
        <dc:description rdf:datatype="http://www.w3.org/2001/XMLSchema#string">URI to the dataset to be used</dc:description>
        <ot:paramScope rdf:datatype="http://www.w3.org/2001/XMLSchema#string">mandatory</ot:paramScope>

        <ot:paramValue rdf:datatype="http://www.w3.org/2001/XMLSchema#string">http://dataset.service.eu/dataset/6</ot:paramValue>
      </ot:Parameter>
    </ot:parameters>
    <ot:parameters>
      <ot:Parameter>

        <dc:title rdf:datatype="http://www.w3.org/2001/XMLSchema#string">hydrogen</dc:title>
        <dc:description rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Include hydrogen atoms. </dc:description>
        <ot:paramScope rdf:datatype="http://www.w3.org/2001/XMLSchema#string">optional</ot:paramScope>
        <ot:paramValue rdf:datatype="http://www.w3.org/2001/XMLSchema#boolean">false</ot:paramValue>

      </ot:Parameter>
    </ot:parameters>
    <rdf:type>http://www.opentox.org/algorithms.owl#PatternMining</rdf:type>
    <dc:description rdf:datatype="http://www.w3.org/2001/XMLSchema#string">OpenTox REST interface to the FTM algorithm implementation of TUM.</dc:description>
    <dc:contributor>contributor.name@domain.org</dc:contributor>

    <dc:title rdf:datatype="http://www.w3.org/2001/XMLSchema#string">FreeTreeMiner </dc:title>
    <dc:identifier rdf:datatype="http://www.w3.org/2001/XMLSchema#anyURI">http://algorithm.service.org/algorithm/FTM1</dc:identifier>
    <dc:creator>creator.name@domain.org</dc:creator>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">Tue Jun 22 16:14:24 CEST 2010</dc:date>

  </ot:Algorithm>
  • All the complexity is hidden within the algorithm service;
  • If a calculation with the generic http://algorithm.service.org/algorithm/FTM1 algorithm is initiated with specific set of parameters, the service might lookup internally whether such set already exist and eventually reuse, otherwise a new algorithm URL is created along the calculations;

Feature selection algorithms

Subclass of [../../../data/documents/development/RDF files/AlgorithmTypes http://www.opentox.org/algorithmTypes.owl#FeatureSelection]

  • input parameters: dataset_uri , parameter
  • output parameters: feature_uri

Sample curl calls for descriptor selection:

Select the 40 most informative descriptors (according to Information Gain) from dataset http://apps.ideaconsult.net:8080/ambit2/dataset/1037:

curl -X POST -d "dataset_uri=http://apps.ideaconsult.net:8080/ambit2/dataset/1037" -d "prediction_feature=http://apps.ideaconsult.net:8080/ambit2/feature/26701"
 -d 'numToSelect=40' http://opentox.informatik.tu-muenchen.de:8080/OpenTox-dev/algorithm/InfoGainAttributeEval

Supervised learning algorithms

Subclass of [../../../../data/documents/development/RDF files/AlgorithmTypes http://www.opentox.org/algorithmTypes.owl#Supervised]

  • input parameter: dataset_uri , parameter, prediction_feature
  • output parameters: dataset_uri

Sample curl calls for learning models:

Learn a decision tree model:

curl -X POST -d 'dataset_uri=http://apps.ideaconsult.net:8080/ambit2/dataset/1037' -d 'prediction_feature=http://apps.ideaconsult.net:8080/ambit2/feature/26701'
-d 'dataset_service=http://apps.ideaconsult.net:8080/ambit2/dataset' http://opentox.informatik.tu-muenchen.de:8080/OpenTox-dev/algorithm/J48

Available options of J48 can be found at: http://opentox.informatik.tu-muenchen.de/trac/TUMOpenTox/wiki/j48.

Learn a model with the k nearest neighbor algorithm (k=5):

curl -X POST -d 'dataset_uri=http://apps.ideaconsult.net:8080/ambit2/dataset/1037' -d 'prediction_feature=http://apps.ideaconsult.net:8080/ambit2/feature/26701'
 -d 'dataset_service=http://apps.ideaconsult.net:8080/ambit2/dataset' -d 'KNN=5'
http://opentox.informatik.tu-muenchen.de:8080/OpenTox-dev/algorithm/kNNclassification

Superalgorithms

A Superalgorithm is a specific instance of an algorithm that uses other algorithms to create a (super)model or a dataset. Such a superalgorithm could use e.g. a descriptor calculation service, a feature selection service and a modelling algorithm service to create prediction models.

REST operations

Description Method URI Parameters Result Status codes
Get the ontology representation of the algorithm
GET
/algorithm/{id}

(as in [../../Algorithm Algorithm])

(as in [../../Algorithm Algorithm])

(as in [../../Algorithm Algorithm])

Launch superservice algorithm POST /algorithm/{id}

URL parameters:

dataset_uri
feature_uris[]
feature_calculation
feature_selection
model_learning
applicability_domain
validation_service
dataset_service
parameter,

prediction_feature

TP header:
subjectid

Task URI for time consuming applications or

URI of the new OpenTox object

(as in [../../Algorithm Algorithm])

Input parameters:

  • dataset_uri is mandatory parameter.
  • feature_uris[] is an optional parameter, specifying which features should be used for model building
  • feature_calculation (one or more) are the URIs of the descriptor calculation algorithms (ot:Algorithm service) to be applied (optional)
  • feature_selection (one or more) are the URIs of the feature selection algorithms to be applied (ot:Algorithm service) to be applied (optional)
  • model_learning (one or more) is the URI of the learning algorithm (ot:Algorithm service) to be applied (mandatory)
  • applicability_domain (one or more) are the URIs of the applicability domain algorithms to be applied (ot:Algorithm service) to be applied (optional)
  • validation_service(one or more) are the URIs of the validation algorithms to be applied (ot:Algorithm service) to be applied (optional)
  • dataset_service pointing to a dataset service. Relevant, if the output of the algorithm is a dataset (e.g. with calculated descriptors). If dataset_service parameter is not specified, the model service uses a pre-configured dataset service.
  • parameter any parameter that needs to be passed to subservices. It contains the URL-encoded URI of the subservice, followed by the parameter name and its value (e.g http://opentox.ntua.gr:3000/algorithm/svm:kernel=RBF assigns kernel=RDF to http://opentox.ntua.gr:3000/algorithm/svm). Of course, for the sake of convenience, aliases can be adopted by the wrapper service that delegate these lengthy parameter names.
  • prediction_feature is mandatory for prediction (classification/regression) and other supervised learning algorithms. The URI of the feature with the endpoint to predict is expected as value.

Header parameters:

  • subjectid (optional) parameter that contains the OpenSSO A&A token needed to access protected services.

Example here [../../../partners/meetings/meetingsept2010/sept2010meetwp2 http://opentox.org/partners/meetings/meetingsept2010/sept2010meetwp2]

Applicability domain

  • An applicability domain procedure is an OpenTox Algorithm.
  • An applicability domain "model" is created posting a dataset URI to an applicability domain algorithm URI. This creates ot:Model with type ota:ApplicabilityDomain and returns a "AD-model" uri.
  • Alternatively, for AD, embedded in a predictive model, just declare additional rdf:type of the model to be ota:ApplicabilityDomain
  • An applicability domain estimation is done by POSTing a dataset to the "AD-model" uri. This generates another dataset with an extra feature telling whether the corresponding compound belongs to the applicability domain (or in fuzzy terms, how much does it belong to that set).
  • For models with embedded AD, on POST of a dataset to the model , both prediction results and AD estimates are generated.
  • All models provides the estimation results as specified below.

[Background Background information]

Applicability domain RDF representation:

A predictive model can be assigned external or embedded applicability domain

  • In case of AD external to the model:
@prefix ot:      <http://www.opentox.org/api/1.1#> .
@prefix ota:     <http://www.opentox.org/algorithmTypes.owl#> .

</model/mlr-model> ot:hasDomain </model/leverage-ad-model>.


</model/mlr-model> rdf:type ot:Model.
</model/mlr-model> ot:algorithm </algorithm/mlr>.
</algorithm/mlr> rdf:type ot:Algorithm.
</algorithm/mlr> rdf:type ota:Regression.

</model/leverage-ad-model> rdf:type ot:Model.

</model/leverage-ad-model> ot:algorithm </algorithm/leverage>.
</algorithm/leverage> rdf:type ot:Algorithm.
</algorithm/leverage> rdf:type ota:ApplicabilityDomain.
  • In case of AD embedded with the model
@prefix ot:      <http://www.opentox.org/api/1.1#> .
@prefix ota:     <http://www.opentox.org/algorithmTypes.owl#> .


<lazar-model> ot:hasDomain <lazar-model>.

<lazar-model> rdf:type ot:Model.
<lazar-model> ot:algorithm </algorithm/lazar>.

</algorithm/lazar> rdf:type ot:Algorithm.

</algorithm/lazar> rdf:type ota:ApplicabilityDomain.

</algorithm/lazar> rdf:type ota:LazyLearning.

Results form applicability domain estimation

  • by analogy of ot:predictedVariables, used to specify features, where prediction results are stored, one can specify which features hold the result of AD estimation (suggestion for better property names instead of ot:adMembership and ot:adMetric are welcome !)
@prefix ot:      <http://www.opentox.org/api/1.1#> .

//the estimated value, e.g. leverage
ot:Model ot:adMetric ot:Feature.

//the desision for AD membership, based on the estimated value - e.g. "in-domain" if leverage > threshold
//have to agree on the value type - boolean, numeric, string, nominal ?
ot:Model ot:adMembership ot:Feature.


and subsequently use the same ot:dataEntry and ot:FeatureValue RDF constructions , used elsewhere to specify property values, to specify AD results as well:

@prefix ot:      <http://www.opentox.org/api/1.1#> .
@prefix dc:      <http://purl.org/dc/elements/1.1/> .
@prefix :        <http://ambit.uni-plovdiv.bg:8080/ambit2/> .
@prefix ota:     <http://www.opentox.org/algorithmTypes.owl#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl:     <http://www.w3.org/2002/07/owl#> .
@prefix xsd:     <http://www.w3.org/2001/XMLSchema#> .
@prefix ac:      <http://ambit.uni-plovdiv.bg:8080/ambit2/compound/> .
@prefix ad:      <http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix af:      <http://ambit.uni-plovdiv.bg:8080/ambit2/feature/> .


ad:1  a       ot:Dataset ;
      ot:dataEntry
              [ a       ot:DataEntry ;
                ot:compound ac:1 ;
                ot:values
                        [ a       ot:FeatureValue ;
                          ot:feature af:1 ;
                          ot:value "3.14"^^xsd:double
                        ]

                ot:values
                        [ a       ot:FeatureValue ;
                          ot:feature af:9999 ;
                          ot:value "0.0"^^xsd:double
                        ]
              ] .

af:1
      a       ot:Feature , ot:NumericFeature ;
      dc:title "MLR-prediction" ;
      ot:hasSource <http://opentox.ntua.gr/model/mlr> ;
      ot:units "" .


af:9999
      a       ot:Feature , ot:NumericFeature ;
      dc:title "AD-leverage" ;
      ot:hasSource <http://opentox.ntua.gr/model/leverage-ad> ;
      ot:units "" .


ac:1
      a       ot:Compound ;

ot:NumericFeature
      a       owl:Class ;
      rdfs:subClassOf ot:Feature .

ot:DataEntry
      a       owl:Class .

ot:hasSource
      a       owl:ObjectProperty .

ot:units
      a       owl:DatatypeProperty .

ot:values
      a       owl:ObjectProperty .

ot:compound
      a       owl:ObjectProperty .

dc:title
      a       owl:AnnotationProperty .

ot:feature
      a       owl:ObjectProperty .

ot:Dataset
      a       owl:Class .

dc:description
      a       owl:AnnotationProperty .

ot:dataEntry
      a       owl:ObjectProperty .

ot:Compound
      a       owl:Class .

dc:identifier
      a       owl:AnnotationProperty .

ot:FeatureValue
      a       owl:Class .

ot:Feature
      a       owl:Class .

dc:type
      a       owl:AnnotationProperty .

ot:value
      a       owl:DatatypeProperty .




There is no difference in representation of AD results, if AD is embedded in the model itself, besides that ot:hasSource for features , representing predicted values and AD estimation, point to the same ot:Model object

ad:1  a       ot:Dataset ;
      ot:dataEntry
              [ a       ot:DataEntry ;
                ot:compound ac:1 ;
               ot:values
                        [ a       ot:FeatureValue ;
                          ot:feature af:lazar_prediction ;
                          ot:value "1.0"^^xsd:double
                        ]
                ot:values
                        [ a       ot:FeatureValue ;
                          ot:feature af:10000 ;
                          ot:value "0.666"^^xsd:double
                        ]
              ] .

af:10000
      a       ot:Feature , ot:NumericFeature ;
      dc:title "AD-lazar" ;
      ot:hasSource <http://in-silico.ch/model/lazar> ;
      ot:units "" .


af:lazar_prediction
      a       ot:Feature , ot:NumericFeature ;
      dc:title "prediction-lazar" ;
      ot:hasSource <http://in-silico.ch/model/lazar> ;
      ot:units "".

ac:1
      a       ot:Compound ;

Supported MIME types

Mandatory

  • application/rdf xml (default)

Optional

  • application/xml (PMML)
  • text/xml (PMML)
  • text/x-yaml
  • text/x-json
  • application/json
  • ...

HTTP status codes

Interpretation Nr Name
Success 200 OK
No algorithm in the respective category found, or specific algorithm not found 404 Not Found
Incorrect dataset URI, or incorrect parameters 400 Bad request
Model building error 500 Internal Server Error
Service not available 503 Service unavailable
Personal tools