Configuration

The default memory settings assume that Data Studio is run on a dedicated (or largely dedicated) machine which isn't always the case. To ensure that not all available memory is allocated to Data Studio, you can adjust the maximum memory settings.

You can do this in the Aperture Data Studio Service 64bit.ini file which is in the root of the installation directory (by default,C:\Program Files\Experian\Aperture Data Studio <version>\Aperture Data Studio Service 64bit.ini ).

The following line controls memory allocation:

Virtual Machine Parameters=-Xms66:1000:16000P -Xmx66:1000:16000

These default settings mean that Data Studio will use 66% of the total system memory that's available, up to a maximum of 16 GB.

-Xmx<value> This is the maximum amount of memory to be used. Example: to specify a fixed 24 GB for the application, enter: -Xms24g

-Xms<value> This is the initial amount of memory to use upon start up. For the best performance this should be the same as the Xmx value to avoid fragmentation of memory. Example: to specify that Data Studio uses only 8 GB at a maximum, enter: Virtual Machine Parameters=-Xms8G -Xmx8G

After installing the ODBC drivers, you should be able to create a Data Source Name (DSN) for both 32-bit and 64-bit clients.

Set up the DSN

The 32-bit ODBC Administrator is found at %systemdrive%/Windows/SysWoW64/odbcad32.exe.

The 64-bit ODBC Administrator is found at %systemdrive%/Windows/System32/odbcad32.exe.

  1. Select the System DSN tab.
  2. Click Add.
  3. Choose Experian Aperture ODBC Driver.
  4. Fill in the Data Source Name field (e.g. Aperture ODBC server 64).
  1. Fill in the Description, ODBC host (e.g. localhost) and the ODBC port (e.g. 7801).

Test the connection using Excel (Office 365 version)

  1. In Data Studio, go to the Workflow Designer and create a new workflow with a Snapshot step.
  2. In the step dialog, click Additional options and ensure that Publish to ODBC is enabled.
  3. Execute the workflow. You should get a confirmation message.
  4. Open Excel.
  5. Select the Data tab.
  6. Click Get Data and select From Other Sources then From ODBC. If you can't see this option, customize the ribbon to make Get External Data visible.
  7. From the Data source name (DSN) list select the Aperture ODBC server 64 (or whatever you called your DSN) and click OK. If you've used 64-bit client and you can't see the ODBC source, try steps 4-7 again with the 32-bit client.
  8. In the dialog that appears enter the Data Studio administrator username/password and click Connect. If you've entered the wrong credentials, go to Get Data > Data Source Settings > and Clear permissions.
  9. Select a table from the list and click Load. Make sure that the list only contains workflow/snapshot names to which this user has read access.
  10. A Getting Data message will appear, followed by the snapshot row/columns.
  11. Check that the following are correct: the column names, row count, and the cell data.
  12. Check that an audit message has been created for the ODBC session (if login audits have been enabled).
  1. Uninstall the old version of the ODBC drivers.
  2. Install the new version of the ODBC drivers.
  3. Remove the old System DSN in odbcad32.exe:
    • The 32-bit one is found at %systemdrive%/Windows/SysWoW64/odbcad32.exe
    • The 64-bit one is found at %systemdrive%/Windows/System32/odbcad32.exe
  4. Create a new System DSN in odbcad32.exe.
  1. Copy the driver .jar file you'd like to use to the drivers\jdbc folder within the database directory (by default C:\ApertureDataStudio\drivers\jdbc). Note that the Data Studio service doesn't have to be stopped for your new driver to be picked up.
  2. In Data Studio, go to Data Explorer and Click here to create a new data source.
  3. Select JDBC as Data source type.
  4. Your newly added driver will appear in the DBMS list. All new drivers will appear in the list appended with Custom, for example 'Custom MySQL 5.1'.
  5. Select your driver and specify the connection details as appropriate.
  6. Click Test connection to check connectivity and Create to save changes.
  1. In Data Studio, go to Data Explorer and Click here to create a new data source.

  2. Select Apache Hadoop HDFS as Data source type.

  3. Specify the following:

    • Hostname: the IP address or DNS name of the Primary NameNode of your Hadoop cluster.

    • Port: is the value of the fs.default.name property. Typically, you will find this in the Hadoop configuration file core-site.xml.

    • Username/Password: the Linux username/password on the Hadoop host machine.

      Note that your file access permissions will determine which files are visible to Data Studio.

    • Root directory: the starting directory for file discovery. If not specified, it will default to the root of the HDFS file system ("/").

  4. Click Test connection to check connectivity and Create to save changes.

If you get this error while connecting to Excel or PowerBI, run these as administrator.

To start the service:

  1. Search for services from the start menu or go to Control Panel > System and Security > Administrative Tools > Services.
  2. Locate the Experian GDQ Standardize Server, right-click and select Properties.
  3. Open the Log On tab.
  4. Select This account and enter NT AUTHORITY\NETWORK SERVICE as the account.
  5. Click Apply to save changes.
  6. Right-click on the Experian GDQ Standardize Server service and select Start.

View technical recommendations before installing. Note that your setup requirements will depend on the size of your data and Aperture Data Studio usage.

Installation pre-requisites

  • A 64bit version of a compatible Linux operating system is used.
  • The latest version of 64bit Java 8 JDK is installed.
  • The server and the required port (default is 7701) is available to all client machines through all intermediate networks and firewalls. Other default ports that may need opening include:
  • User limits for the dedicated user (default is ApertureDataStudio) are set to unlimited for each resource:
    • You can view your system's limits by entering ulimit -a in the command shell.
    • It's critical that the application can open enough files simultaneously. Ensure this setting is set close to the operating system's limit. For example, for RHEL/Centos set the hard limit to around 64000: when logged in as the dedicated user ulimit -Hn should give the result 64000.

Install

The distribution exists as a .rpm file so you may use yum or rpm to install Data Studio. The installation directory created by the rpm is /home/experian/ApertureDataStudio/ApertureDataStudio_1.6.0.

$ sudo yum _install_ ApertureDataStudio-1.6.0-1.el7.x86_64.rpm

The hierarchy of directories is created at installation and is owned by ApertureDataStudio:experian. The user (ApertureDataStudio) and group (experian) are created, if absent.

The directory and its children are created with permissions of 770, i.e. owner: rwe, group: rwe and all others no access. Also, two service files are created: the first is used to control the Aperture Data Studio server (ApertureDataStudio_1.6.0.service); the second for the Standardize service (Standardize_4.6.14.service).

The following are created:

  • A username ApertureDataStudio (marked as no-login).
  • A group experian.
  • A directory /home/experian/ApertureDataStudio.
  • All files with permissions 770.
  • All files owned by ApertureDataStudio:experian.
  • A service file /etc/systemd/system/ApertureDataStudio_1.6.0.service.
  • A service file /etc/systemd/system/Standardize_4.6.14.service.

To re-install Data Studio:

$ sudo yum _reinstall_ ApertureDataStudio-1.6.0-1.el7.x86_64.rpm

Starting the server

Before you can use Data Studio you have to either start it as a service or run the executable directly in a terminal window.

Starting Aperture Data Studio directly

$ cd /home/experian/ApertureDataStudio/ApertureDataStudio_1.6.0
$ sudo java –Xms16g –Xmx16g –cp .:./pserver.jar:"./lib/*" com.experian.ServerMain STARTUP

Starting Aperture Data Studio as a service

# make the services known to the system
$ sudo systemctl daemon-reload

# create the symlinks
$ sudo systemctl enable ApertureDataStudio_1.6.0

# start the service
$ sudo systemctl start ApertureDataStudio_1.6.0

# Check the service status
$ sudo systemctl status –l ApertureDataStudio_1.6.0

Note that these services will restart on failure and at system boot time.

Tailoring the server properties files

The server will run out-of-the-box, however, you may prefer to relocate the server and the data directories to another partition/directory. Both of these locations may be changed by editing the file server.properties in /home/experian/ApertureDataStudio/ApertureDataStudio_1.6.0

The two properties are:

  • DirName.ROOT
  • DirName.DATA

You may create these directories yourself with the appropriate owner/permissions.

Additional file data sources may be specified in the file filedatastores.properties in /home/experian/ApertureDataStudio/ApertureDataStudio_1.6.0

Configuring data for Validate addresses step

  1. Place the Experian Batch data for the Validate addresses step in a directory of your choice.
  2. Edit /home/experian/ApertureDataStudio/addressValidate/runtime/qawserve.ini:
    • Add InstalledData=GBR,/home/experian/ApertureDataStudio/ApertureDataStudio_1.6.0/Batch
    • Add DataMappings=GBR,United Kingdom,GBR
  3. Add a valid Experian Batch license using Data Studio: click on your username and select Update license.

Installing Standardize

# Install the prerequisites
$ sudo yum -y install lttng-ust libcurl openssl-libs krb5-libs libicu zlib

# make the services known to the system
$ sudo systemctl daemon-reload

# create the symlinks
$ sudo systemctl enable Standardize_4.6.14

# start the service
$ sudo systemctl start Standardize_4.6.14

# Check the service status
$ sudo systemctl status –l Standardize_4.6.14

This error is most likely to occur when the JVM in which Data Studio runs is attempting to allocate more memory than is available on the system.

The memory settings can be configured in the Aperture Data Studio Service 64bit.ini file which is in the root of the installation directory (by default, C:\Program Files\Experian\Aperture Data Studio <version>\Aperture Data Studio Service 64bit.ini).

This line in the file controls the memory allocation:

Virtual Machine Parameters=-Xms66:1000:16000P -Xmx66:1000:16000

By default, the Java Virtual Machine settings will use 66% of the total system memory that's available, up to a maximum of 16 GB (from 24 available). This assumes that Data Studio is run on a dedicated, or largely dedicated box.

If your environment isn't dedicated to Data Studio and has other applications running, it's possible that this much memory can't be allocated. In this case, the VM parameters can be changed to specify an exact value for maximum memory used. For example, to allocate 8 GB RAM: Virtual Machine Parameters=-Xms8g -Xmx8g

If the service isn't starting after changing the memory setting, other possible causes are:

  • Port 7701 is already in use
  • Permissions issue when creating the Data Studio repository (by default, C:\ApertureDataStudio)