Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by integrating retrieval mechanisms with generative capabilities. It improves response accuracy by accessing external databases for relevant information, overcoming LLM limitations in knowledge cut-off and hallucinations. RAG combines the strengths of retrieval (precise, up-to-date data) and generation (contextual, fluent language), making it essential for complex queries, factual correctness, and dynamic knowledge. It optimizes performance, especially in specialized or rapidly evolving fields, ensuring comprehensive, accurate, and contextually relevant outputs, thus significantly enhancing the utility and reliability of LLMs in practical applications.
In this tutorial, I delve into the key topics and advancements related to Retrieval-Augmented Generation (RAG) using executable codes. Each topic is meticulously explained through video tutorials, offering detailed yet accessible discussions and working demonstrations, accompanied by the corresponding code. Some code segments are sourced from relevant libraries for demonstration purposes. This comprehensive tutorial covers the following topics, providing an in-depth understanding of RAG and its practical applications.
Basics of RAG (Retrieval Augmented Generation) with LLM
How to use LLM + RAG to Construct Knowledge Graph.
How to construct Flow-Diagram by Using LLM + RAG.
Graph Based RAG (Retrieval Augmented Generation) Techniques.
Note: In addition to this, it also provides a linked tutorial on a pressing topic: "Use of Long Text Sequences with LLMs Trained on Shorter Text Sequences.". In the future, I will introduce new research advancements in the field of Large Language Models.
1. Basics of RAG (Retrieval Augmented Generation) with LLM.
Video Tutorial.
Basic RAG Code.
import ollama import chromadb
documents = [ "Quantum mechanics is a fundamental theory in physics that describes the behavior of nature at and below the scale of atoms.", "It is the foundation of all quantum physics, which includes quantum chemistry, quantum field theory, quantum technology, and quantum information science.", "Quantum mechanics can describe many systems that classical physics cannot.", "Classical physics can describe many aspects of nature at an ordinary (macroscopic and (optical) microscopic) scale, but is not sufficient for describing them at very small submicroscopic (atomic and subatomic) scales.", "Most theories in classical physics can be derived from quantum mechanics as an approximation valid at large (macroscopic/microscopic) scale.", "Quantum systems have bound states that are quantized to discrete values of energy, momentum, angular momentum, and other quantities, in contrast to classical systems where these quantities can be measured continuously.", "Measurements of quantum systems show characteristics of both particles and waves (wave–particle duality), and there are limits to how accurately the value of a physical quantity can be predicted prior to its measurement, given a complete set of initial conditions (the uncertainty principle)." ] # Create database client = chromadb.Client() collection = client.create_collection(name="docs") # store each document in a vector embedding database for i, d in enumerate(documents): response = ollama.embeddings(model="mxbai-embed-large", prompt=d) embedding = response["embedding"] collection.add( ids=[str(i)], embeddings=[embedding], documents=[d] )
# an example prompt prompt = "What are the key benefits of using quantum mechanics over classical physics?" # generate an embedding for the prompt and retrieve the most relevant doc response = ollama.embeddings( prompt=prompt, model="mxbai-embed-large" ) results = collection.query( query_embeddings=[response["embedding"]], n_results=1 ) data = results['documents'][0][0]
# generate a response combining the prompt and data we retrieved in step 2 output = ollama.generate( model="llama3", prompt=f"Using this data: {data}. Respond to this prompt: {prompt}" )
print(output['response'])
2. How to use LLM + RAG to Construct Knowledge Graph..
Video Tutorial.
Code to Generate Knowledge Graph Triplets.
import ollama import chromadb
documents = [ "Quantum mechanics is a fundamental theory in physics that describes the behavior of nature at and below the scale of atoms.", "It is the foundation of all quantum physics, which includes quantum chemistry, quantum field theory, quantum technology, and quantum information science.", "Quantum mechanics can describe many systems that classical physics cannot.", "Classical physics can describe many aspects of nature at an ordinary (macroscopic and (optical) microscopic) scale, but is not sufficient for describing them at very small submicroscopic (atomic and subatomic) scales.", "Most theories in classical physics can be derived from quantum mechanics as an approximation valid at large (macroscopic/microscopic) scale.", "Quantum systems have bound states that are quantized to discrete values of energy, momentum, angular momentum, and other quantities, in contrast to classical systems where these quantities can be measured continuously.", "Measurements of quantum systems show characteristics of both particles and waves (wave–particle duality), and there are limits to how accurately the value of a physical quantity can be predicted prior to its measurement, given a complete set of initial conditions (the uncertainty principle)." ]
# store each document in a vector embedding database for i, d in enumerate(documents): response = ollama.embeddings(model="mxbai-embed-large", prompt=d) embedding = response["embedding"] collection.add(ids=[str(i)], embeddings=[embedding], documents=[d] )
# an example prompt prompt1 = "What are the key benefits of using quantum mechanics over classical physics?" prompt2 = "List all entities, and generate the knowledge graph triplets by using all entities." # Generate Answers - for Prompt-1: # generate an embedding for the prompt and retrieve the most relevant doc response1 = ollama.embeddings( prompt=prompt1, model="mxbai-embed-large" ) results1 = collection.query( query_embeddings=[response1["embedding"]], n_results=1 ) data1 = results1['documents'][0][0]
# generate a response combining the prompt and data we retrieved in step 2 output1 = ollama.generate( model="llama3", prompt=f"Using this data: {data1}. Respond to this prompt: {prompt1}" ) print("Response for the question -1",output1['response']) # Generate Answers - for Prompt-2: # generate an embedding for the prompt and retrieve the most relevant doc response2 = ollama.embeddings( prompt=prompt2, model="mxbai-embed-large" ) results2 = collection.query( query_embeddings=[response2["embedding"]], n_results=1 ) data2 = results2['documents'][0][0]
# generate a response combining the prompt and data we retrieved in step 2 output2 = ollama.generate( model="llama3", prompt=f"Using this data: {data2}. Respond to this prompt: {prompt2}" ) print("Response for the question -2",output2['response'])
Code to Visualize the Knowledge Graph (by using above triplets).
import networkx as nx import matplotlib.pyplot as plt
# Step 1: Define your triplets triplets = [ ("Quantum mechanics", "a fundamental theory", "Theory"), ("Theory", "in physics", "Physics"), ("Physics", "describes the behavior of", "Nature"), ("Nature", "at and below the scale of", "Atoms"), ("Atoms", "is related to the scale of", "Scale"), ]
# Step 2: Create a directed graph G = nx.DiGraph()
# Step 3: Add edges from triplets for subject, predicate, obj in triplets: G.add_edge(subject, obj, label=predicate)
# Step 4: Draw the graph pos = nx.spring_layout(G, seed=42) # Position nodes using Fruchterman-Reingold force-directed algorithm # Draw nodes and edges nx.draw(G, pos, with_labels=True, node_size=3000, node_color="lightblue", font_size=10, font_weight="bold", arrowsize=20)
# Display the graph plt.title("Knowledge Graph Visualization") plt.show()
3. How to construct Flow-Diagram by Using LLM + RAG.
Video Tutorial.
Code.
import ollama import chromadb
documents = [ "Quantum mechanics is a fundamental theory in physics that describes the behavior of nature at and below the scale of atoms.", "It is the foundation of all quantum physics, which includes quantum chemistry, quantum field theory, quantum technology, and quantum information science.", "Quantum mechanics can describe many systems that classical physics cannot.", "Classical physics can describe many aspects of nature at an ordinary (macroscopic and (optical) microscopic) scale, but is not sufficient for describing them at very small submicroscopic (atomic and subatomic) scales.", "Most theories in classical physics can be derived from quantum mechanics as an approximation valid at large (macroscopic/microscopic) scale.", "Quantum systems have bound states that are quantized to discrete values of energy, momentum, angular momentum, and other quantities, in contrast to classical systems where these quantities can be measured continuously.", "Measurements of quantum systems show characteristics of both particles and waves (wave–particle duality), and there are limits to how accurately the value of a physical quantity can be predicted prior to its measurement, given a complete set of initial conditions (the uncertainty principle)." ] # Create database client = chromadb.Client() collection = client.create_collection(name="docs")
# store each document in a vector embedding database for i, d in enumerate(documents): response = ollama.embeddings(model="mxbai-embed-large", prompt=d) embedding = response["embedding"] collection.add(ids=[str(i)], embeddings=[embedding], documents=[d] )
# an example prompt prompt1 = "Generate a Mermaid diagram." # Generate Answers - for Prompt-1: # generate an embedding for the prompt and retrieve the most relevant doc response1 = ollama.embeddings( prompt=prompt1, model="mxbai-embed-large" ) results1 = collection.query( query_embeddings=[response1["embedding"]], n_results=1 ) data1 = results1['documents'][0][0]
# generate a response combining the prompt and data we retrieved in step 2 output1 = ollama.generate( model="llama3", prompt=f"Using this data: {data1}. Respond to this prompt: {prompt1}" ) print("Response for the question -1",output1['response'])
4. Graph Based RAG (Retrieval Augmented Generation) Techniques.
Video Tutorial.
Code.
import ollama import chromadb
documents = [ "The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables large language models (LLMs) to answer questions over private and/or previously unseen document collections.", "However, RAG fails on global questions directed at an entire text corpus, such as “What are the main themes in the dataset?”, since this is inherently a queryfocused summarization (QFS) task, rather than an explicit retrieval task.", "Prior QFS methods, meanwhile, fail to scale to the quantities of text indexed by typical RAG systems.", "To combine the strengths of these contrasting methods, we propose a Graph RAG approach to question answering over private text corpora that scales with both the generality of user questions and the quantity of source text to be indexed.", "Our approach uses an LLM to build a graph-based text index in two stages: first to derive an entity knowledge graph from the source documents, then to pregenerate community summaries for all groups of closely-related entities.", "Given a question, each community summary is used to generate a partial response, before all partial responses are again summarized in a final response to the user.", "For a class of global sensemaking questions over datasets in the 1 million token range, we show that Graph RAG leads to substantial improvements over a na¨ıve RAG baseline for both the comprehensiveness and diversity of generated answers." ] # Create database client = chromadb.Client() collection = client.create_collection(name="docs") # store each document in a vector embedding database for i, d in enumerate(documents): response = ollama.embeddings(model="mxbai-embed-large", prompt=d) embedding = response["embedding"] collection.add( ids=[str(i)], embeddings=[embedding], documents=[d] )
# an example prompt prompt = "How Graph RAG (Retrieval Augmented Generation, used with Large Language Model) generates a Global Summarization for the given context?" Entity_1 = "Graph RAG" Entity_2 = "Global Summarization" Community_Triplets = [("Graph RAG", "Uses", "Leiden Community Detection Algorithm"), ("LLM", "Extracts", "Entity Knowledge Graph"), ("Graph Index", "Partitioned By", "Community Detection Algorithms"), ("Community Summaries", "Used For", "Global Summarization"),] summary_text = "Graph RAG - Uses - Leiden Community Detection Algorithm; LLM - Extracts - Entity Knowledge Graph; Graph Index - Partitioned By - Community Detection Algorithms; Community Summaries - Used For - Global Summarization" # generate an embedding for the prompt and retrieve the most relevant doc response = ollama.embeddings( prompt=prompt, model="mxbai-embed-large" ) summary_text_embedding = ollama.embeddings( prompt=summary_text, model="mxbai-embed-large" ) results = collection.query( query_embeddings=[summary_text_embedding["embedding"]], n_results=1 ) data = results['documents'][0][0]
# generate a response combining the prompt and data we retrieved in step 2 output = ollama.generate( model="llama3", prompt=f"Using this data: {data}. Respond to this prompt: {prompt}" )
Ding, Yujuan, Wenqi Fan, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, and Qing Li. "A Survey on RAG Meets LLMs: Towards Retrieval-Augmented Large Language Models." arXiv preprint arXiv:2405.06211 (2024).
Wu, Kevin, Eric Wu, and James Zou. "How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs' internal prior." arXiv preprint arXiv:2404.10198 (2024).
Li, Jiarui, Ye Yuan, and Zehua Zhang. "Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases." arXiv preprint arXiv:2403.10446 (2024).
Edge, Darren, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, and Jonathan Larson. "From Local to Global: A Graph RAG Approach to Query-Focused Summarization." arXiv preprint arXiv:2404.16130 (2024).
The unbeatable aspects of Wasserstein Generative Adversarial Networks (WGANs) come from their significant improvements over traditional GAN architectures. They address critical challenges like mode collapse and enhance the generation of high-quality, diverse samples. The following are the key technical advancements in WGAN architecture that motivate me to create tutorials on WGAN:
Wasserstein Distance: Shifts from traditional metrics to the Wasserstein distance for more meaningful training gradients, reducing mode collapse and stabilizing network convergence.
Weight Clipping and Lipschitz Constraint: Initially, WGANs used weight clipping to meet the Lipschitz constraint for the Wasserstein distance, but this approach had drawbacks like capacity underuse and gradient problems. The WGAN-GP variant introduced a gradient penalty to overcome these issues, leading to better training stability and sample quality.
Gradient Penalty (WGAN-GP): Incorporates a gradient penalty in the loss function, promoting stable training and high-quality output by preventing excessive critic gradients.
Critic Role: Unlike traditional GANs' discriminators, WGAN critics assess generated sample quality on a continuous scale, enabling finer quality evaluation and aiding in model training dynamics.
Training Protocol: WGANs employ a distinct training method, often involving more frequent training of the critic than the generator to provide effective gradients, ensuring balanced learning and model stability.
These advancements make WGANs superior for generating realistic samples and ensuring smoother model training, maintaining their unique position in AI research and development.
Video Tutorials.
Part-1
Part-2
Part-3
Code - Training WGAN
# example of training a wgan on mnist from numpy import expand_dims import keras import keras.backend as K import tensorflow as tf import numpy as np from keras import Model from keras.optimizers import Adam from keras.layers import Input, Reshape, Flatten from keras.layers import Dense, BatchNormalization, Conv2D, Conv2DTranspose, LeakyReLU, Dropout batch_size = 32 input_shape = (28, 28, 1) latent_dim = 100 img_shape = (28, 28, 1) class WGAN_1: def __init__(self): print("welcome to WGAN coding") # write code for wasserstein loss. def wasserstein_loss(self, y_true, y_pred): return K.mean(y_true * y_pred)
def preprocess_real_part_training_dataset(self): # load mnist dataset (dataX, dataY), (testDX, testDY) = keras.datasets.fashion_mnist.load_data() # Select the first 1000 rows of training data and labels dataX = dataX[:1000] dataY = dataY[:1000] # Add an additional dimension for the grayscale channel by using expand_dims() from NumPy dataX = expand_dims(dataX, axis=-1) # convert from unsigned ints to floats and scale from [0,255] to [0,1] dataX = dataX.astype(np.float32) / 255.0 return dataX
# example of loading the generator model and generating images import numpy as np from keras.models import load_model from numpy.random import randn from keras.models import load_model from matplotlib import pyplot import matplotlib.pyplot as plt # load model model = load_model('g_model.h5') # Generate synthetic images num_images = 10 latent_dim = 100 noise = np.random.normal(0, 1, (num_images, latent_dim)) generated_images = model.predict(noise)
# Plot the generated images plt.figure(figsize=(10, 10)) for i in range(num_images): plt.subplot(1, num_images, i+1) plt.imshow(generated_images[i, :, :, 0], cmap='gray') plt.axis('off') plt.show()
Reference:
1. Wasserstein GAN; Martin Arjovsky (Courant Institute of Mathematical Sciences), Soumith Chintala, and Leon Bottou1 (Facebook AI Research)
3. Kwon, Dohyun, Yeoneung Kim, Guido Montúfar, and Insoon Yang. "Training Wasserstein GANs without gradient penalties." arXiv preprint arXiv:2110.14150 (2021).
4. Guo, Xin, Johnny Hong, Tianyi Lin, and Nan Yang. "Relaxed Wasserstein with applications to GANs." In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3325-3329. IEEE, 2021.
Training large language models (LLMs) on longer
sequences poses challenges in computational resources, model complexity,
gradient propagation, and overfitting. These include increased memory requirements
due to self-attention mechanisms, longer training times, difficulty in scaling
Transformers for very long sequences, challenges in capturing long-term
dependencies, risk of vanishing or exploding gradients, and potential
overfitting to training data. Solutions like linear biases, RoFormer, and RoPE
improve handling of long-range dependencies, enhance model generalization, and
incorporate positional information for better performance in NLP tasks.
For Example:
Attention with linear Biases
Improved Handling
of Long-Range Dependencies. Traditional attention mechanisms struggle with
capturing long-range dependencies in text due to the quadratic increase in
computational complexity with sequence length. Linear biases help to mitigate
this by effectively incorporating positional information, thus enhancing the
model’s ability to maintain context over long distances within the text.
RoFormer
Improved Model Generalization: By more effectively encoding
positional information, RoFormer helps LLMs to generalize better across
different tasks and datasets. This results in enhanced performance on a wide
range of NLP tasks, including text classification, machine translation, and
semantic analysis.
Enhanced Positional Encoding: RoPE uniquely integrates
positional information with the token embeddings, preserving the relative
distances between tokens. This method enables the model to better understand
and utilize the order of words or tokens, which is crucial for many language
understanding and generation tasks.
Video Tutorial -1
Video Tutorial -2
Video Tutorial -3
References.
Su, Jianlin, Murtadha Ahmed, Yu Lu, Shengfeng
Pan, Wen Bo, and Yunfeng Liu. "Roformer: Enhanced transformer with rotary
position embedding." Neurocomputing 568 (2024): 127063.
Press, Ofir, Noah A. Smith, and Mike Lewis.
"Train short, test long: Attention with linear biases enables input length
extrapolation." arXiv preprint arXiv:2108.12409 (2021).
Vaswani, Ashish, Noam Shazeer, Niki Parmar,
Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia
Polosukhin. "Attention is all you need." Advances in neural information processing systems 30 (2017).
GANs, called Generative Adversarial Networks, are special types of deep learning models that have two main parts: a generator and a discriminator. The generator creates fake data, and the discriminator checks if this data looks real or not compared to real data. By training against each other, GANs get better at making data that looks real, changing how we make new images, expand datasets, and learn without supervision. The following points show the significance of GAN in the area of AI.
Creative Applications: GANs create realistic images, music, text, and videos, enabling creativity in art, content, and virtual environments.
Data Augmentation: GANs generate synthetic data to enhance small datasets, improving the performance of machine learning models.
Defense Against Deepfakes: GANs are used to develop defenses and detect manipulated media content amidst the growth of deepfake technology.
Drug Discovery and Molecular Design: GANs play an increasing role in drug discovery, producing novel molecular structures with desired properties, potentially transforming the pharmaceutical industry.
Scope.
This article comprises interactive video tutorials and code demonstrations to elucidate the GAN architecture. The discussion covers various topics, culminating in the presentation of straightforwardly designed code.
GAN Part-1
GAN Part-2
GAN Part-3.
Keras implementation of GAN
The following contains the Kera implementation of Deep Convolution GAN (DCGAN). Please go through the above video tutorials, to properly understand and use the code.
The system is built using Python 3.10 and relies on several essential library dependencies:
Tensorflow (version 2.15)
tqdm (version 4.66.2)
h5py (version 3.10)
Keras (version 2.115)
Train DCGAN.
# example of training a gan on mnist from numpy import expand_dims from tqdm import tqdm import keras import tensorflow as tf import numpy as np from keras import Model from keras.optimizers import Adam from keras.layers import Input, Reshape, Flatten from keras.layers import Dense, BatchNormalization, Conv2D, Conv2DTranspose, LeakyReLU, Dropout batch_size = 32 input_shape = (28, 28, 1) latent_dim = 100 img_shape = (28, 28, 1) class GAN_1: def __init__(self): print("welcome to GAN coding") # This code prepares a TensorFlow dataset for training by shuffling the data, batching it into # consistent batch sizes, and prefetching batches to optimize data loading during training. def preprocess_real_part_training_dataset(self, batch_size): # load mnist dataset (dataX, dataY), (testDX, testDY) = keras.datasets.fashion_mnist.load_data() # Add an additional dimension for the grayscale channel by using expand_dims() from NumPy dataX = expand_dims(dataX, axis=-1) # convert from unsigned ints to floats and scale from [0,255] to [0,1] dataX = dataX.astype(np.float32) / 255.0 # testDX = testDX.astype(np.float32) / 255.0 trainX = tf.data.Dataset.from_tensor_slices(dataX).shuffle(1000) # Combines consecutive elements of this dataset into batches. trainX = trainX.batch(batch_size, drop_remainder=True).prefetch(1) return trainX
# example of loading the generator model and generating images import numpy as np from keras.models import load_model from numpy.random import randn from keras.models import load_model from matplotlib import pyplot import matplotlib.pyplot as plt # load model model = load_model('g_model.h5') # Generate synthetic images num_images = 10 latent_dim = 100 noise = np.random.normal(0, 1, (num_images, latent_dim)) generated_images = model.predict(noise)
# Plot the generated images plt.figure(figsize=(10, 10)) for i in range(num_images): plt.subplot(1, num_images, i+1) plt.imshow(generated_images[i, :, :, 0], cmap='gray') plt.axis('off') plt.show()
NOTE: This code is not intended for any commercial use. It is created solely for simple educational purposes.
Multi-step time series forecasting involves predicting multiple future time steps in a time series sequence. Mathematically, let yt represent the value of the time series at time t. Multi-step forecasting aims to predict the future values of the time series over a horizon of h time steps. Therefore, the forecasted values can be represented as y^t+1,y^t+2,...,y^t+h, where y^t+i denotes the predicted value at time t+i for i=1,2,...,h.
Multi-variate time series forecasting involves predicting the future values of a time series using multiple input variables, where each variable can influence the target time series. Mathematically, let xt(1),xt(2),...,xt(m) represent m input variables at time t, and yt represent the target time series. The goal of multi-variate time series forecasting is to predict the future values of the target time series, y^t+1,y^t+2,...,y^t+h, based on the input variables. This can be represented as a function f such that y^t+i=f(xt+i(1),xt+i(2),...,xt+i(m)) for i=1,2,...,h.
This article contains the following topics with supported video tutorial and code.
Multivariate Multi-Step Multi-Output Time series Forecasting
Strategy to prepare dataset.
Multivariate Single-Step Multi-Output Time series Forecasting
Strategy to prepare dataset.
Strategy for the Future Enhancements.
Part-1.
Part-2.
Code-1. [Single-Step Multi-Output]
from keras import Model from keras.layers import Input, Dense, Bidirectional, LSTM, RepeatVector, TimeDistributed from sklearn.preprocessing import MinMaxScaler from numpy import array , hstack import numpy as np
# prepare the dataset input_timesteps=3 input_features=3 output_timesteps=1 output_features=3 data_Y = [] data_X = [] for i in range(0, ((len(data_X1))-(input_timesteps+output_timesteps))): print("---------") tmpY2d = [] for i_row in range(i,(i + output_timesteps)): tmpYr = [] print(i_row) for i_col in range(0,output_features): tmpYr.append(dataset_stacked[i_row][i_col]) tmpY2d.append(tmpYr) data_Y.append(tmpY2d) print("---------") tmpX2d = [] for j_row in range((i+output_timesteps), (i + output_timesteps + input_timesteps)): tmpXr = [] print(j_row) for j_col in range(0, input_features): tmpXr.append(dataset_stacked[j_row][j_col])
tmpX2d.append(tmpXr) data_X.append(tmpX2d)
data_X = np.array(data_X) data_Y = np.array(data_Y) print("shape of input data ",data_X.shape) print("input data => ",data_X) print("shape of output data => ", data_Y.shape) print("Output data => ",data_Y)
def define_model(): # Define the Input data shape encoder_inputs = Input(shape=(input_timesteps, input_features)) # Use single BiLSTM as Encoder # Here we can use bigger network also like one BiLSTM with return_sequences=True and # other BiLSTM with return_sequences=False # OR CNN, CNN+LSTM and so many encoder = Bidirectional(LSTM(units=16, return_sequences=True))(encoder_inputs) # Apply RepeatVector to get the result for multiple time steps (here our output_timesteps =2) # For this step Decoder operation starts # repeat_output = RepeatVector(output_timesteps)(encoder) decoder = Bidirectional(LSTM(units=16, return_sequences=False))(encoder) # Use TimeDistributed layer to get multiple Output features out = Dense(output_features)(decoder) # out = TimeDistributed(Dense(output_features))(decoder) model = Model(encoder_inputs, out) # Compile the model model.compile(loss='mae', optimizer='adam', metrics=['mae']) model.summary() return model
# Call the model model = define_model() # Fit the model model.fit(data_X,data_Y,epochs=4,batch_size=2,verbose=1) # Take a test data to test the working of the model test_dataX = [] test_dataX.append(dataset_stacked[0]) test_dataX.append(dataset_stacked[1]) test_dataX.append(dataset_stacked[2]) test_dataX = np.array(test_dataX) # Reshape the data into 3-D numpy array test_dataX = np.reshape(test_dataX,(1,input_timesteps,input_features)) print("test dataset => ",test_dataX) # Run for the prediction output pred_output = model.predict(test_dataX) print("prediction output => ",pred_output) # invert the scaling to get forecast values inverted_output = data_scalar.inverse_transform(pred_output) print("obtained prediction => ",inverted_output)
Code-2. [Multi-Step and Multi-Output]
from keras import Model from keras.layers import Input, Dense, Bidirectional, LSTM, RepeatVector, TimeDistributed from sklearn.preprocessing import MinMaxScaler from numpy import array , hstack import numpy as np
# prepare the dataset input_timesteps=3 input_features=3 output_timesteps=2 output_features=3 data_Y = [] data_X = [] for i in range(0, ((len(data_X1))-(input_timesteps+output_timesteps))): tmpY2d = [] for i_row in range(i,(i + output_timesteps)): tmpYr = [] for i_col in range(0,output_features): tmpYr.append(dataset_stacked[i_row][i_col]) tmpY2d.append(tmpYr) data_Y.append(tmpY2d) tmpX2d = [] for j_row in range((i+output_timesteps), (i + output_timesteps + input_timesteps)): tmpXr = [] for j_col in range(0, input_features): tmpXr.append(dataset_stacked[j_row][j_col]) tmpX2d.append(tmpXr) data_X.append(tmpX2d)
data_X = np.array(data_X) data_Y = np.array(data_Y) print("shape of input data ",data_X.shape) print("input data => ",data_X) print("shape of output data => ", data_Y.shape) print("Output data => ",data_Y)
def define_model(): # Define the Input data shape encoder_inputs = Input(shape=(input_timesteps, input_features)) # Use single BiLSTM as Encoder # Here we can use bigger network also like one BiLSTM with return_sequences=True and # other BiLSTM with return_sequences=False # OR CNN, CNN+LSTM and so many encoder = Bidirectional(LSTM(units=16, return_sequences=False))(encoder_inputs) # Apply RepeatVector to get the result for multiple time steps (here our output_timesteps =2) # For this step Decoder operation starts repeat_output = RepeatVector(output_timesteps)(encoder) decoder = Bidirectional(LSTM(units=16, return_sequences=True))(repeat_output) # Use TimeDistributed layer to get multiple Output features out = TimeDistributed(Dense(output_features))(decoder) model = Model(encoder_inputs, out) # Compile the model model.compile(loss='mae', optimizer='adam', metrics=['mae']) model.summary() return model
# Call the model model = define_model() # Fit the model model.fit(data_X,data_Y,epochs=4,batch_size=2,verbose=1) # Take a test data to test the working of the model test_dataX = [] test_dataX.append(dataset_stacked[0]) test_dataX.append(dataset_stacked[1]) test_dataX.append(dataset_stacked[2]) test_dataX = np.array(test_dataX) # Reshape the data into 3-D numpy array test_dataX = np.reshape(test_dataX,(1,input_timesteps,input_features)) print("test dataset => ",test_dataX) # Run for the prediction output pred_output = model.predict(test_dataX) print("prediction output => ",pred_output) # invert the scaling to get forecast values inverted_output = data_scalar.inverse_transform(pred_output[0]) print("obtained prediction => ",inverted_output)
Note. This code is not intended for any commercial use. It is created solely for simple educational purposes.